The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.
This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs: 2918 -> 2833 (-2.91%)
helped: 79
HURT: 0
GAINED: 0
LOST: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 4de86e1371)
Nominated-by: Christoph Brill <egore911@egore911.de>
Various pieces of code to create compressed textures will first
generate an uncompressed RGBA texture into a temporary buffer,
and then read from that buffer while creating the final compressed
texture in the requested format.
The code reading from the temporary buffer assumes the buffer is
formatted as an array of bytes in RGBA order. However, the buffer
is filled using a _mesa_texstore call with MESA_FORMAT_R8G8B8A8_UNORM
format -- this is defined as an array of *integers* holding the
RGBA values in packed format (least-significant to most-significant).
This means incorrect bytes are accessed on big-endian systems.
This patch fixes this by using the MESA_FORMAT_A8B8G8R8_UNORM format
instead on big-endian systems when filling the buffer. This fixes
about 100 piglit test case failures on s390x for me.
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
(cherry picked from commit bd016a2601)
At the moment if a gbm buffer is imported and the gbm buffer
has an old-style GBM_BO_FORMAT format, the import will crash,
since it's passed directly to DRI functions that expect
a fourcc format (as provided by the newer GBM_FORMAT
definitions)
This commit addresses the problem in two ways:
1) it prevents invalid formats from leading to a crash by
returning EINVAL if the image couldn't be created
2) it translates GBM_BO_FORMAT formats into the comparable
GBM_FORMAT formats.
Reference: https://bugzilla.gnome.org/show_bug.cgi?id=753531
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit 4bf151e662)
If the register types do not match and the instruction
that contains the final destination is saturated, register
coalescing generated non-equivalent code.
This did not happen when using IR because types usually
matched, but it is visible in nir-vec4.
For example,
mov vgrf7:D vgrf2:D
mov.sat m4:F vgrf7:F
is coalesced to:
mov.sat m4:D vgrf2:D
The patch prevents coalescing in such scenario, unless the
instruction we want to coalesce into is a MOV (without type
conversion implied). In that case, the patch sets the register
types to the type of the final destination.
Shader-db results in HSW (only vec4 instructions shown):
total instructions in shared programs: 1754415 -> 1754416 (0.00%)
instructions in affected programs: 74 -> 75 (1.35%)
helped: 0
HURT: 1
GAINED: 0
LOST: 0
Only one extra instruction in one of the shaders, that comes from
eliminating a saturation error by preventing register coalesce.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit 79f1a7ae28)
Some modern apps try to use msaa without keeping in mind the
restrictions on videomem of older cards. Resulting in dmesg saying:
[ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
[ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
[ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12
Because we are running out of video memory, after which the program
using the msaa visual freezes, and eventually the entire system freezes.
To work around this we do not allow msaa visauls by default and allow
the user to override this via NV30_MAX_MSAA.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[imirkin: move env var lookup to screen so that it's only done once]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3e9df0e3af)
Signed-off-by: Emil Velikov <emil.velikov@collabora.co.uk>
Conflicts:
src/gallium/drivers/nouveau/nv30/nv30_screen.c
Indications are that if the colormask indicates a single bit set on
fermi, that value will always be read from $r0 instead of a potentially
higher register (if e.g. green is set). Not to upset the counting logic,
always set the header up with a full color mask for each RT. Such a
situation can basically only ever happen with generated blit shaders.
Fixes the following piglit on Fermi (Kepler is unaffected):
fbo-stencil blit GL_DEPTH32F_STENCIL8
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 39df725f73)
The sifm object has a limit of 1024x1024 for its input size and 2048x2048
for its output. The code checking this was trying to be clever resulting
in it seeing a surface of e.g 1024x256 being outside of the input size
limit.
This commit fixes this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 87073c69f3)
Signed-off-by: Emil Velikov <emil.velikov@collabora.co.uk>
Conflicts:
src/gallium/drivers/nouveau/nv30/nv30_transfer.c
Nothing in the spec allows for the reduced precision, and this also
fixes st_QuerySamplesForFormat for nv50, which does not allow MS8 on
RGBA32F. Now this will be respected instead of reporting MS8 as
supported with an assumption that the format used will be RGBA16F.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e40f32d562)
round(val*dscale) produces a double result, as val and dscale are double.
However, LLVMConstInt receives unsigned long long, so there is an
implicit conversion from double to unsigned long long.
This is an undefined behavior. Therefore, we need to first explicitly
convert the round result to long long, and then let the compiler handle
conversion from that to unsigned long long.
This bug manifests itself in POWER, where all IMM values of -1 are being
converted to 0 implicitly, causing a wrong LLVM IR output.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 4f2290d161)
Note this is not ideal. Since the sifm can only do source sizes upto
1024x1024 we end up using the blitter on nv4x, which is not that fast.
And on nv3x we end up using the cpu which is really slow.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 3c6c4d4f29)
Scanout buffers on nv30 must always be non-swizzled and have special
width alignment constraints.
These constrains have been taken from the xf86-video-nouveau
src/nv_accel_common.c: nouveau_allocate_surface() function.
nouveau_allocate_surface() applies these width constraints only when a
tiled attribute is set, which it sets for all surfaces allocated via
dri, and this "tiling" is not the same as swizzling, scanout surfaces
must be linear / have a uniform_pitch or only complete garbage is shown.
This commit fixes dri3 on nv30 showing a garbled display, with dri3 the
scanout buffers are allocated by mesa, rather then by the ddx, and the
wrong stride of these buffers was causing the garbled display.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 3329703eb1)
Previously we would allow glUniformMatrix4fv on a dmat4 and
glUniformMatrix4dv on a mat4. Both are illegal. That later also
overwrites the storage for the mat4 and causes bad things to happen.
Should fix the (new) arb_gpu_shader_fp64-wrong-type-setter piglit test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7237c937af)
TargetOptions::NoFramePointerElim was removed in llvm-3.7.0svn r238244
"Remove NoFramePointerElim and NoFramePointerElimOverride from
TargetOptions and remove ExecutionEngine's dependence on CodeGen. NFC."
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
(cherry picked from commit 147ffd4816)
Nominated-by: Sedat Dilek <sedat.dilek@gmail.com>
Broadwell's stencil blitting code attempts to bind a renderbuffer as a
texture, using dd->BindRenderbufferTexImage().
This calls _mesa_init_teximage_fields(), which then attempts to set
img->_BaseFormat = _mesa_base_tex_format(ctx, internalFormat), which
assert fails if internalFormat is GL_STENCIL_INDEX8 but
ARB_texture_stencil8 is unsupported.
To work around this, just pretend to support the extension momentarily,
during the blit. Meta has already munged a variety of other things in
the context (including the API!), so it's not that much worse than what
we're already doing.
Fixes regressions since commit f7aad9da20
(mesa/teximage: use correct extension for accept stencil texture.).
v2: Add an XXX comment explaining the situation (requested by Jason
Ekstrand and Martin Peres), and an assert that we don't support
the extension so we remember to remove this hack (requested by
Neil Roberts).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit f83b9e58f6)
Nominated-by: Emil Velikov <emil.velikov@collabora.co.uk>
In various versions of OpenGL and GLSL, it's possible to declare
multiple VS input variables with aliasing attribute locations.
So, when computing the storage requirements for vertex attributes,
we can't simply add up the sizes. Instead, we need to look at the
enabled slots.
This patch begins tracking which attributes are double types that
are larger than 128-bits (i.e. take up two vec4 slots). We then
count normal attributes once, and count the double-size attributes
a second time.
Fixes deQP functional.attribute_location.bind_aliasing.max_cond_* tests
on i965, which regressed with commit ad208d975a.
No Piglit changes on llvmpipe (which actually supports dvecs).
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c3294ca5a1)
Broadwell's stencil blitting code attempts to bind a renderbuffer as a
texture, using dd->BindRenderbufferTexImage().
This calls _mesa_init_teximage_fields(), which then attempts to set
img->_BaseFormat = _mesa_base_tex_format(ctx, internalFormat), which
assert fails if internalFormat is GL_STENCIL_INDEX8 but
ARB_texture_stencil8 is unsupported.
To work around this, just pretend to support the extension momentarily,
during the blit. Meta has already munged a variety of other things in
the context (including the API!), so it's not that much worse than what
we're already doing.
Fixes regressions since commit f7aad9da20
(mesa/teximage: use correct extension for accept stencil texture.).
v2: Add an XXX comment explaining the situation (requested by Jason
Ekstrand and Martin Peres), and an assert that we don't support
the extension so we remember to remove this hack (requested by
Neil Roberts).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit f83b9e58f6)
Nominated-by: Mark Janes <mark.a.janes@intel.com>
Mesa supports EXT_texture_rg and OES_texture_float. This patch adds
support for using unsized enums GL_RED and GL_RG for floating point
targets and writes proper checks for internalformat when format is
GL_RED or GL_RG and type is of GL_FLOAT or GL_HALF_FLOAT.
Later, internalformat will get adjusted by adjust_for_oes_float_texture
after these checks.
v2: simplify to check vs supported enums
v3: follow the style and break out if internalFormat ok (Kenneth)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90748
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 5b0d6f5c1b)
Nominated-by: Mark Janes <mark.a.janes@intel.com>
This reverts commit f3b709c0ac.
The "dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_4.
interpolation.lines_wide" test appears to be broken on Cherryview when
we expose line widths greater than 12.0. I'm not sure why.
For now, just go back to the limits we used on older platforms.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90902
Acked-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 16658f426d)
Nominated-by: Mark Janes <mark.a.janes@intel.com>
commit 472ef9a02f introduced code to
change the types of SEL and MOV instructions for moves that simply
"copy bits around". It didn't account for type conversion moves,
however. So it would happily turn this:
mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:F, vgrf6:UD
into this:
mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:D, -vgrf5:D
which erroneously drops the conversion to float.
Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 2ace64fd59)
The lowered code reads from the destination, which isn't possible from
message registers.
Fixes the following dEQP tests on SNB:
dEQP-GLES3.functional.shaders.precision.int.highp_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.mediump_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.lowp_mul_fragment
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit 9390cb8459)
I've been chasing a geom shader hang on rv635 since I wrote
r600 geom code, and finally I hacked some values from fglrx
in and I could run texelfetch without failures.
This is totally my fault as well, maths fail 101.
This makes geom shaders on r600 not fail heavily.
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 0de53ccc8c)
As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.
I think this fixes one of the earlier shader going off end
of memory problems we've stopped.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 3063913f77)
This adds index queries (glGet*i_v) for GL_TEXTURE_BINDING_* and
GL_SAMPLER_BINDING, as well as textue queries
(glGetTex{,ture}Parameter*) for GL_TEXTURE_TARGET.
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
(cherry picked from commit 5aaaaebf22)
Conflicts:
src/mesa/main/texparam.c
Cube maps are special in that they have separate teximages for each
face. We handled that by copying the data to them separately, but in
case zoffset != 0 or depth != 6 we would read off the end of the client
array or modify the wrong images.
zoffset/depth have already been verified by the time the code gets to
this stage, so no need to double-check.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2259b11100)
The split_virtual_grfs code doesn't properly rewrite reladdr so we need to
make sure that any uniform indirects are lowered away first.
This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit fee0c5af11)
This works if drivers upsample on upload (like all radeon ones do).
The alternative is an unexpected GL error from anything calling
_mesa_update_state and possibly other issues.
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f432ae899f)
On the older platforms where we don't have logical contexts preserving
state across batches, we emit the invariant state setup on every batch
using the brw_invariant_state atom. This includes the pipeline selection
which is cached with the introduction of
commit 0e0e23ef53
Author: Jordan Justen <jordan.l.justen@intel.com>
Date: Wed Apr 22 11:43:50 2015 -0700
i965/state: Emit pipeline select when changing pipelines
However, we do not reset the cache between batches on context-less
platforms resulting in us not setting the pipeline selection and can
cause GPU hangs if a media pipelined was loaded in the meantime (e.g.
mixing mplayer/gstreamer using libva and gnome-shell). A simple solution
is to just forcibly re-emit the pipeline select along with the invariant
state and reset the cache at that point.
Reported-and-tested-by: Tomasz C. <tomaszc@o2.pl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91254
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4e5752e2b7)
Instead of using symbol table, build mask by inspecting IR. This
change is required by further patches to move resource list creation
to happen later when symbol table does not exist anymore.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
(cherry picked from commit ccaf37f449)
The CTS packed_pixels test checks that readpixels doesn't write
into the space between rows, however we fail that here unless
we check the format and stride match.
This fixes all the core mesa problems with CTS packed_pixels
tests.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 32769ac016)
The fastpath currently checks the RowLength != width, but
if you have a RowLength of 7, and Alignment of 4, then
that shouldn't match.
align the rowlength to the pack alignment before comparing.
This fixes compressed cases in CTS packed_pixels_pixelstore
test when SKIP_PIXELS is enabled, which causes row length
to get set.
v1.1: add fxt1 fix (Iago)
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b4a70401f5)
We don't need to use the 3d image address here as that will
include SKIP_IMAGES, and we are only blitting a single
2D anyways, so just use the 2D path.
This fixes some memory overruns under CTS
packed_pixels.packed_pixels_pixelstore when PACK_SKIP_IMAGES
is used.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6a3e1fb958)
Fixes regression from
commit 8c17d53823
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Wed Apr 15 03:04:33 2015 -0700
i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.
which adjusted the coordinates to be relative to the nearest cacheline.
However, this then offsets the coordinates by up to 63 and this may then
cause them to overflow the BLT limits. For the well aligned large
transfer case, we can use 32bpp pixels and so reduce the coordinates by
4 (versus the current 8bpp pixels). We also have to be more careful
doing the last line just in case it may exceed the coordinate limit.
Reported-and-tested-by: kaillasse91@hotmail.fr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit d38a560106)
[Emil Velikov: drop the extra INTEL_MIPTREE_TRMODE_NONE arguments]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/mesa/drivers/dri/i965/intel_blit.c
Shaders that contain instruction data after an instruction with EOP could end
up parsing that as an instruction, leading to various crashes and asserts in
SB as it gets very confused if it sees for instance a loop start instruction
jumping off to some random point.
Add a couple of asserts, and print EOP bit if set in old asm printer.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a830225adb)
GetTexImage can read to stencil8 but only from
a stencil or depthstencil textures.
This fixes a bunch of failures in CTS
GL33-CTS.gtf32.GL3Tests.packed_pixels
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c1452983b4)
[Emil Velikov: use glGetTex%sImage + suffix, instead of caller]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.
v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope
when gl_VertexID is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit 3a1ab23480)
The edge flag data on Gen6+ is passed through the fixed function hardware as
an extra attribute. According to the PRM it must be the last valid
VERTEX_ELEMENT structure. However if the vertex ID is also used then another
extra element is added to source the VID. This made it so the vertex ID is in
the wrong register in the vertex shader and the edge attribute is no longer in
the last element.
v2: Also implement for BDW+
v3 [by Ben]: Remove 10.5 tag. Too late.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit fb02b4ec48)
Nine code uses some C11 features, and this
leads to compile error on gcc <= 4.5
Another way would have been to use the
-fms-extensions CFLAG
Signed-off-by: David Heidelberg <david@ixit.cz>
Cc: "10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 56717c0b06)
A nomination unadorned with a specific version is now interpreted as
being aimed at the 11,0 branch, which was recently opened.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This is done by returning an rvalue of type void in the
ast_function_expression::hir function instead of a void expression.
This produces (in the case of the ternary) an hir with a call
to the void returning function and an assignment of a void variable
which will be optimized out (the assignment) during the optimization
pass.
This fix results in having a valid subexpression in the many
different cases where the subexpressions are functions whose
return values are void.
Thus preventing to dereference NULL in the following cases:
* binary operator
* unary operators
* ternary operator
* comparison operators (except equal and nequal operator)
Equal and nequal had to be handled as a special case because
instead of segfaulting on a forbidden syntax it was now accepting
expressions with a void return value on either (or both) side of
the expression.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85252
Signed-off-by: Renaud Gaubert <renaud@lse.epita.fr>
Reviewed-by: Gabriel Laskar <gabriel@lse.epita.fr>
Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
(cherry picked from commit 7b9ebf879b)
Nominated-by: Mark Janes <mark.a.janes@intel.com>
When gl_VertexID or gl_InstanceID is used a 3DSTATE_VF_SGVS
instruction is sent to create a sort of element to store the generated
values. The last instruction in this chunk of code looks like it was
trying to set the instancing state for the element using the
3DSTATE_VF_INSTANCING instruction. However it was sending
brw->vb.nr_buffers instead of the element index. This instruction is
supposed to take an element index and that is how it is used further
down in the function so the previous code looks wrong. Perhaps
previously the number of buffers coincidentally matched the number of
enabled elements so the value was generally correct anyway. In a
subsequent patch I want to change a bit how it chooses the SGVS
element index so this needs to be fixed.
v2 [by Ben]
Remove stable 10.5 stable tag (it's too late now)
Commit update as follows:
The number of vertex buffers emitted is always <= the number of vertex elements.
To maximize reuse (actually, to minimize relocations - according to the code
comments), a vertex buffer is only emitted once, even when we setup multiple
components (3DSTATE_VERTEX_ELEMENT) from that buffer. This meant that the
previous code would use the wrong indexed element for these reuse cases. This
patch by itself prevents hangs on BSW in the linked bug. It doesn't make the
test pass, the remaining patches are needed for that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91610
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c03247bae0)
In the DRI2 path this event is magically synthesized from the
corresponding DRI2 event, but with Present, the server sends us the
event itself. The DRI2 path fills in the serial number, send_event, and
display fields of the XEvent struct that the app sees, but the Present
path did not.
This is likely related to a class of crashes seen in gtk/clutter apps:
https://bugzilla.redhat.com/attachment.cgi?id=1032631
Note that the crashing instruction is looking up the lock_fns slot in
the Display *, and %rdi (holding the Display *) is 0x1.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 8f7ebcb6fa)
This fixes arb_get_texture_sub_image-get, and any situation where the 2d
engine was being used for multi-layer blits to a non-0 level.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2514c78fba)
When calling either eglCreateWindowSurface or eglCreatePixmapSurface it
was possible for an application to be aborted as a result of it failing
to create a DRI2 drawable on the server. This could happen due to an
application passing in an invalid native drawable handle, for example.
v2: Handle the case where an error has been set on the connection
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 9a4eae61c2)
The value was copied from r300g, which uses 1/12 subpixels, but this hw
uses 1/16 subpixels.
Should fix piglit: gl-1.4-polygon-offset (formerly a glean test)
(untested, ported from radeonsi)
Reviewed-by: Edward O'Callaghan <eocallaghan at alterapraxis.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d335aad11b)
The value was copied from r300g, which uses 1/12 subpixels, but this hw
uses 1/16 subpixels.
Fixes piglit: gl-1.4-polygon-offset (formerly a glean test)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit bfac8ba9d3)
Because we build here an array format, we don't need to swap the
bytes for big endian.
If it isn't an array format, the bytes will be swapped in
_mesa_format_convert.
v2: remove temp variable
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5f1d5b1c78)
Before, if we encountered an array format of 0 on a BE system, we would
flip all the channels even though it's an invalid format. This would
result in a mostly invalid format with a swizzle of yyyy or wwww. Instead,
we should just return 0 if the array format stashed in the format info is
invalid.
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e3eb91af80)
The swizzle defines where in the format you should look for any given
channel. When we flip the format around for BE targets, we need to change
the destinations of the swizzles, not the sources. For example, say the
format is an RGBX format with a swizzle of xyz1 on LE. Then it should be
wzy1 on BE; however, the code as it was before, would have made it 1zyx on
BE which is clearly wrong.
Reviewed-by: Iago Toral <itoral@igalia.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 28d1a506c8)
This patch fixes a bug in big-endian treatment, where the previous
swizzle info wasn't cleared before a new swizzle info was inserted into
the format field using a bitwise-OR operation.
v2: use MESA_ARRAY_FORMAT_SWIZZLE_*_MASK instead of numeric constants
v3: align according to coding style
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(cherry picked from commit 2ac171a7db)
The meta CopyImageSubData path uses BlitFramebuffers to do the actual copy.
The only thing that can affect BlitFramebuffers other than the currently
bound framebuffers is the scissor so we need to save that off and reset it.
If we don't do this, applications that use a scissor together with
CopyImageSubData will get accidentally scissored copies.
Tested-by: Markus Wick <markus at selfnet.de>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 736c6f3cfc)
Otherwise it would crash on Gen8 with scalar VS. The issue can easily
be reproduced with the following patch, but I don't see any reason why
it wouldn't be possible to end up with an ATTR argument here even
without it.
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 42a18ca760)
Page 161 of the OpenGL-ES 3.1 (PDF) spec, and page 207 of the OpenGL 4.5 (PDF),
both on section '8.6. ALTERNATE TEXTURE IMAGE SPECIFICATION COMMANDS', states:
"An INVALID_ENUM error is generated if an invalid value is specified for
internalformat".
It is currently returning INVALID_OPERATION error because
_mesa_get_read_renderbuffer_for_format() is called before the internalformat
argument has been validated. To fix this, we move this call down the validation
process, after _mesa_base_tex_format() has been called. _mesa_base_tex_format()
effectively serves as a validator for the internal format.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.copyteximage2d_invalid_format
Fixes 1 piglit test:
* spec@oes_compressed_etc1_rgb8_texture@basic
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4b07e9a033)
Currently, glTexSubImageXD attempt to resolve the texture object
(by calling _mesa_get_current_tex_object()) before validating the given
target. However, that method explicitly states that target must have been
validated before calling it, so it never returns a user error.
The target validation occurs later when texsubimage_error_check() is called.
This patch reorganizes target validation, taking it out from the error check
function and into a point before the texture object is resolved.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5d64cae842)
[Emil Velikov: s/_mesa_enum_to_string/_mesa_lookup_enum_by_nr/]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/mesa/main/teximage.c
Page 68, section 7.2 'Shader Binaries" of the of the OpenGL ES 3.1,
and page 88 of the OpenGL 4.5 specs state:
"An INVALID_VALUE error is generated if count or length is negative.
An INVALID_ENUM error is generated if binaryformat is not a supported
format returned in SHADER_BINARY_FORMATS."
Currently, an INVALID_OPERATION error is returned for all cases.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.shader.shader_binary
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b38a50f1e3)
Calling eglQuerySurface on a window or pixmap with the EGL_LARGEST_PBUFFER
attribute resulted in the contents of the 'value' parameter being modified.
This is the wrong behaviour according to the EGL spec, which states:
"Querying EGL_LARGEST_PBUFFER for a pbuffer surface returns the
same attribute value specified when the surface was created with
eglCreatePbufferSurface. For a window or pixmap surface, the
contents of value are not modified."
Avoid this from happening by checking that the surface type is EGL_PBUFFER_BIT
before modifying the contents of the parameter.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit b2c5986ea1)
Update the DRI image interface error codes to reflect the needs of the
EGL_EXT_image_dma_buf_import extension. This means updating the existing error
code documentation and adding a new __DRI_IMAGE_ERROR_BAD_ACCESS error code
so that drivers can correctly reject unsupported pitches and offsets. Hook
the new error code up in EGL to return EGL_BAD_ACCESS.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit cfc3200a35)
For 10.6: This fixes graphical corruption occuring on most Southern Islands
Radeon GPUs. This will allow closing a lot of bugs in the bugzilla.
The patch has a better explanation. Just a summary here:
- The CPU always uploads a whole descriptor array to previously-unused memory.
- CP DMA isn't used.
- No caches need to be flushed.
- All descriptors are always up-to-date in memory even after a hang, because
CP DMA doesn't serve as a middle man to update them.
This should bring:
- better hang recovery (descriptors are always up-to-date)
- better GPU performance (no KCACHE and TC flushes)
- worse CPU performance for partial updates (only whole arrays are uploaded)
- less used IB space (no CP_DMA and WRITE_DATA packets)
- simpler code
- corruption issues are fixed on SI cards
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit b0528118df)
For 10.6: This is a prerequisite for the next fix. The below comment is from
the original commit.
This is mainly needed for tessellation where a VS can be bound as VS, ES,
or LS, and TES (tess. evaluationshader) can be bound as VS or ES or neither.
Therefore we need the ability to move pointers to descriptors between
shaders arbitrarily.
The idea is that the context has a mapping from PIPE_SHADER_x to
SPI_SHADER_USER_DATA_x. After a shader is enabled or disabled,
si_shader_change_notify should be called to update this mapping accordingly.
There is a dirty flag for each shader pointer, but only one emit function
for all pointers in the whole context, whose code and logic is separated
from descriptors.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 3ce91c727f)
Before validating vertex arrays we need to check if a VBO is present.
Checking if vb->buffer is not NULL fixes the issue.
Fixes the following piglit test:
gl-3.1-vao-broken-attrib
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit adc816a1e4)
Earlier commit added an extra dup(fd) to fix a ZaphodHeads issue.
Although it did not consider the (very unlikely) case where we might end
up with the valid fd == 0.
Fixes: 28dda47ae4d(winsys/radeon: Use dup fd as key in drm-winsys hash
table to fix ZaphodHeads.)
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
(cherry picked from commit 1307be519b)
This patch adjusts the SKL values to the best known values we have.
v2: Remove HS/DS/CS fields. Adding this makes most sense to add to the
GEN9_FEATURES macro, however, doing that would require updating BXT values, and
Jordan requested I not do that. Conveniently, this request makes a lot of sense
wrt to stable backport as HS, and DS do not even exist there.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit 7eaacc1678)
[Emil Velikov: .supports_simd16_3src is missing in 10.6]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_device_info.c
A simple shader such as
vec4 color;
color.xy.x = 1.0;
would cause ir_assignment::set_lhs() to generate bogus IR:
(swiz xy (swiz x (constant float (1.0))))
We were setting the number of components of each new RHS swizzle based
on the highest channel used in the LHS swizzle. So, .xy.y would
generate (swiz xy (swiz xx ...)), while .xy.x would break.
Our existing Piglit test happened to use .xzy.z, which worked, since
'z' is the third component, resulting in an xxx swizzle.
This patch sets the number of swizzle components based on the size of
the LHS swizzle's inner value, so we always have the correct number
at each step.
Fixes new Piglit tests glsl-vs-swizzle-swizzle-lhs-[23].
Fixes ir_validate assertions in in Metro 2033 Redux.
v2: Move num_components updating completely out of update_rhs_swizzle
(suggested by Timothy Arceri). Simplify.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
(cherry picked from commit e235ca159f)
After recent addition of pbo testing in piglit test getteximage-luminance,
it fails on i965. This patch makes a sub test pass.
This patch adds a clear color operation to meta pbo path, which I think is
better than falling back to software path.
V2: Fix color mask for GL_LUMINANCE_ALPHA
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit aa40546b2d)
_mesa_meta_pbo_GetTexSubImage() uses _mesa_meta_BlitFrameBuffer(),
which will do fragment clamping if enabled. But fragment clamping
doesn't affect ReadPixels and GetTexImage.
Without this patch, piglit test arb_color_buffer_float-clear fails,
when forced to use the meta pbo path.
v2: Apply this fix to both glReadPixels and glGetTexImage.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit ca4e17e03e)
Meta pbo path for ReadPixels rely on BlitFramebuffer which doesn't support
signed to unsigned integer conversions and vice versa.
Without this patch, piglit test fbo_integer_readpixels_sint_uint fails, when
forced to use the meta pbo path.
v2: Make need_signed_unsigned_int_conversion() a static function. (Iago)
Bump up the comment and the commit message. (Jason)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral <itoral@igalia.com>
(cherry picked from commit 0d207905e6)
Currently used ctx->_ImageTransferState check is not sufficient
because it doesn't include the read color clamping enabled with
GL_CLAMP_READ_COLOR. So, use the helper function
_mesa_get_readpixels_transfer_ops().
Also, transfer operations don't affect glGetTexImage(). So, do
the check only for glReadPixles.
Without this patch, arb_color_buffer_float-readpixels test fails, when
forced to use meta pbo path.
V2: Add a comment and bump up the commit message.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 1252d53c19)
None of the draw states are used here.
This fixes a crash in piglit: ext_framebuffer_blit/blit-early
Calling st_manager_validate_framebuffers is the minimum requirement here.
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit d082c53249)
These conditionals are used to guard both dri modules and loader(s).
Currently if we try to build the gallium swrast dri module (without glx)
on a system that's missing libdrm the build will fail.
v2: Make sure we assign prior to checking the have_libdrm variable.
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 16f6d432de)
Conflicts:
configure.ac
It appears that the G80 did not have support for the sampler view
first/last clamping. Put the view's last level in the place of the
texture's so that it doesn't go past what the sampler view allows.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 801d41fa43)
There's no need to deal with samplers for texture size queries. That
code also was accidentally setting an invalid sIndirectSrc position, but
it can now just be removed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 346ce0b988)
A fragment program from "Pixel Piracy" contains redundant OPTION
directives:
!!ARBfp1.0
OPTION ARB_precision_hint_fastest;
OPTION ARB_fog_exp2;
OPTION ARB_precision_hint_fastest;
OPTION ARB_fog_exp2;
...
We already allow redundant ARB_precision_hint_fastest directives, but
disallow the redundant (yet consistent) ARB_fog_exp2 directives, failing
to compile the program.
The specification seems to contradict itself - the main text says that
only one fog application option may be specified, but then backpedals,
indicating the intent is to disallow /contradictory/ flags. One of the
issues suggests that specifying contradictory ones is stupid, but
allowed, and only the last one should take effect.
Accepting multiple redundant (but consistent) directives seems harmless,
and like a reasonable interpretation of the specification. It also
fixes a fragment program found in the wild.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 4b17f0d9f5)
Instead of relying on hardware defaults the i915 kernel driver is
going program custom MOCS tables system-wide on Gen9 hardware. The
"WT" entry previously used for renderbuffers had a number of problems:
It disabled caching on eLLC, it used a reserved L3 cacheability
setting, and it used to override the PTE controls making renderbuffers
always WT on LLC regardless of the kernel's setting. Instead use an
entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE,
L3CC=WB.
The "WB" entry previously used for anything other than renderbuffers
has moved to a different index in the new MOCS tables but it should
have the same caching semantics as the old entry.
Even though the corresponding kernel change ("drm/i915: Added
Programming of the MOCS") is in a way an ABI break it doesn't seem
necessary to check that the kernel is recent enough because the change
should only affect Gen9 which is still unreleased hardware.
v2: Update MOCS values for the new Android-incompatible tables
introduced in v7 of the kernel patch.
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-July/071080.html
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
(cherry picked from commit af768922ca)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_defines.h
Don't assume that $(top_srcdir)/.git is a directory. It may be a
gitlink file [1] if $(top_srcdir) is a submodule checkout or a linked
worktree [2].
[1] A "gitlink" is a text file that specifies the real location of
the gitdir.
[2] Linked worktrees are a new feature in Git 2.5.
Cc: "10.6, 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 75784243df)
XA was never unref'ing last_fence in the various call paths to
pipe->flush(). Add this to xa_context_flush() and update the other
open-coded calls to pipe->flush() to use xa_context_flush() instead.
This fixes a memory leak reported with xf86-video-freedreno.
Reported-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
(cherry picked from commit 0a8af6361e)
We need to check what the 3D pipe is able to handle for the mixer, not what
the decoder is able to decode. This fixes output of resolutions like 720x1280.
Signed-off-by: Christian König <christian.koenig@amd.com>
CC: mesa-stable@lists.freedesktop.org
(cherry picked from commit 2cfa64e159)
Since commit 104c8fc2c2 the GLSL IR will be freed if NIR is
being used. This was causing it to segfault if INTEL_DEBUG=wm is set.
This patch just makes it avoid dumping the GLSL IR in that case.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit c0ca6c30ea)
This reverts commit c2ff3485b3.
Ilia and I noticed a memory leak caused by this patch: at least with
fixed-function programs, we clone things using ProgramResourceList as
the context before reralloc makes it non-NULL.
I believe Tapani found other bugs with these patches, so I'm just going
to revert them for now and let him pursue them further.
(cherry picked from commit 6218c68bec)
When there are no color buffer render targets, gen6 and gen7 still
use the first BLEND_STATE element to determine alpha test.
gen6_upload_blend_state was allocating zero elements when
ctx->Color.AlphaEnabled was false.
That left _3DSTATE_CC_STATE_POINTERS or _3DSTATE_BLEND_STATE_POINTERS
pointing to random data from some previous brw_state_batch().
That sometimes suppressed depth rendering when those bits
happened to mean COMPAREFUNC_NEVER.
This produced flickering shadows for dota2 reborn.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80500
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit fe2b748a39)
Nominated-by: Kenneth Graunke <kenneth@whitecape.org>
The shader-cache isn't finished, so the configure checks are a bit
premature and will only stand to confuse users of Mesa 10.6.
This is a squash of the follow four reverts:
Revert "Rename sha1.c and sha1.h to mesa-sha1.c and mesa-sha1.h"
Revert "configure: Add machinery for --enable-shader-cache (and --disable-shader-cache)"
Revert "sha1: Fix gcry_md_hd_t typo."
Revert "mesa: Add mesa SHA-1 functions"
Reviewed-by: Carl Worth <cworth@cworth.org>
Since there was an ABI break and linking twice against libudev.so.0 and
libudev.so.1 causes the application to quickly crash, we first check if
the application is currently linked against libudev before dlopening a
local handle. However for backwards/forwards compatability, we need to
inspect the application for current linkage against all known versions
first. Not doing so causes a crash when both libraries are present and
so mesa chooses libudev.so.1 but the application was linked against
libudev.so.0.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Emil Velikov:
I'm ever so slightly conserned that RTLD_NOLOAD is not part of the POSIX
standard, thus it's missing on some platforms (*BSD seems ok, while
Solaris, MacOS are not).
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit f241345793)
Matrix vertex attributes have their columns padded out to vec4s, which
I was failing to account for. Scalar NIR expects them to be packed,
however.
Fixes 1256 dEQP tests on Broadwell.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
(cherry picked from commit 73d0e7f345)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_fs_nir.cpp
In this bit of code point_five can be NULL if the expression is not a
constant. This fixes it to match the pattern of the rest of the chunk
of code so that it checks for NULLs.
Cc: Matt Turner <mattst88@gmail.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 86a3557d7c)
There is a piece of code that is trying to match expressions of the
form (mul (floor (add (abs x) 0.5) (sign x))). However the check for
the add expression wasn't checking whether it had the expected
operation. It looks like this was just an oversight because it doesn't
match the pattern for the rest of the code snippet. The existing line
to check whether add_expr!=NULL was added as part of a coverity fix in
3384179f.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91226
Cc: Matt Turner <mattst88@gmail.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 18039078e0)
Ben noticed that I said each PIPE_CONTROL was 4 DWords, but it's
actually 5 DWords on Gen6-7. We've been reserving insufficient space
for performance monitoring on Sandybridge, which means it would likely
break if you used that functionality. (Thankfully, no one does...)
Also, the existing number of 146 was the result of me flubbing up the
arithmetic: it should have actually been 140.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
(cherry picked from commit d9ab95b365)
On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if
the shader sends a message to the pixel interpolator. This fixes the
interpolateAt* tests on SKL, apart from interpolateatsample-nonconst
but that is not implemented anywhere so it's not a regression.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 493af150fb)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/mesa/drivers/dri/i965/brw_fs_nir.cpp
src/mesa/drivers/dri/i965/gen8_ps_state.c
The first argument to UCMP needs to be compared against 0, but the
latter arguments are treated as float and need to be able to properly
apply neg/abs arguments. Adjust the inferSrcType function accordingly.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f70719cc4b)
Same problem and fix as for nouveau's ZaphodHeads trouble.
See patch ...
"nouveau: Use dup fd as key in drm-winsys hash table to fix ZaphodHeads."
... for reference.
Cc: "10.3 10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 28dda47ae4)
In the immediate form, src2 == dst, so it does not need to be emitted.
Otherwise it overlaps with the immediate value's low bits.
Fixes: 09ee907266 (nv50/ir: Fold IMM into MAD)
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit c3215ef204)
The current implementation only moves the joinAt when splitting after
the given instruction, not before it. So if you have a BB with
foo
instr
bar
joinat
and thus with joinAt set, we end up first splitting before instr, at
which point the instr's bb is updated to the new bb. Since that bb
doesn't have a joinAt set (despite containing one), when splitting after
the instr, there is nothing to copy over. Since the joinat will be in
the "split" bb irrespective of whether we're splitting before or after
the instruction, move it over in either case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91124
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5dcb28c3d2)
Desktop GLSL < 130 and GLSL ES < 300 allow sampler array indexing where
index can contain a loop induction variable. This extra check will warn
during linking if some of the indexes could not be turned in to constant
expressions.
v2: warning instead of error for backends that did not enable
EmitNoIndirectSampler option (have dynamic indexing)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9350ea6979)
Patch provides new compiler option for backend to force unroll loops
that have non-constant expression indexing on sampler arrays.
This makes sure that we can never end up with a shader that uses loop
induction variable as sampler array index but does not unroll because
of having too much instructions. This would not work without dynamic
indexing support.
v2: change option name as EmitNoIndirectSampler
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e4512e1581)
Dynamic indexing of sampler arrays is prohibited by GLSL ES 3.00.
Earlier versions allow 'constant-index-expression' indexing, where
index can contain a loop induction variable.
Patch allows dynamic indexing for sampler arrays when GLSL ES < 3.00.
This change makes 'sampler-array-index.frag' parser test in Piglit
pass + fishgl.com works when running Chrome on OpenGL ES 2.0 backend
v2: small change and some more commit message (Tapani)
v3: refactor checks to make it more readable (Ian Romanick)
v4: change warning comment in GLSL ES case (Curro)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84225
(cherry picked from commit edb8383c98)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/glsl/ast_array_index.cpp
The dup'ed fd owned by the nouveau_screen for a device node
must also be used as key for the winsys hash table, instead
of using the original fd passed in for a screen, to make
multi-x-screen ZaphodHeads configurations work on nouveau.
The original fd's lifetime differs from that of the nouveau_screen stored
in the hash. The hash key is the fd, and in order to compare hash entries
we fstat them, so the fd must be around for as long as the screen is.
This is an extension of the fix in commit a59f2bb1 (nouveau: dup fd
before passing it to device).
Cc: "10.3 10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit a98600b0eb)
The meta code was setting a default depth range for all viewports
and 'restoring' all viewports to depth range values saved from viewport 0.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 2a210b797e)
This is based on Kenneth's patch to delete 'most of the IR'. Due to
linker changes to clone variables, we can now free all of IR.
Saves 58MB of memory when replaying a Dota 2 trace on Broadwell.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 104c8fc2c2)
This increases memory pressure during linking but makes it easier
for backend to free IR after it is not needed anymore.
v2: use resource list as ralloc context in case of relink (Kenneth)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit c2ff3485b3)
Without first running the bo through pushbuf_refn, the nouveau drm
library will have uninitialized structures regarding this bo, and will
insert incorrect data.
This fixes supertuxkart 0.9 crash on start (where it ends up doing a lot
of indirect draws).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 78d58e6425)
Since we clear the TFB bufctx binding point above, we need to put all of
the active tfb's back in, even if they haven't changed since last time.
Otherwise the tfb may get moved into sysmem and the underlying mapping
will generate write errors.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9fcbf515b4)
The whole of GBM does not rely on even a single symbol from the GL
dispatch library, unsuprisingly. The only need for it comes from the
unresolved symbols in the DRI modules, which are now correctly handled
with Frank's commit.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit a0dc6b7824)
Dri driver libs are not linked to pull in libglapi so gbm_create_device()
fails when it tries to dlopen them (unless the application is linked
with something that does pull in libglapi, like libGL).
Until dri drivers can be fixed properly, dlopen libglapi before trying
to dlopen them.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Henigman <fjhenigman@google.com>
[Emil Velikov: Drop misleading bugzilla link, mention that libname differs]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 828f13330c)
This implements a workaround (exact excerpt as a comment in the code). The docs
specify [clearly, after you struggle for a while] that the offset isn't relative
to state base. This actually makes sense. This fixes hangs on SKL.
Buffer #0 is meant to be used for normal uniforms.
Buffer #1 is typically used for gather constants when using RS.
Buffer #1-#3 could be used to push a bunch of UBO data which would just be
somewhere in memory, and not relative to the dynamic state.
NOTE: I've moved away from the ternary operator for the new gen9 conditions.
Admittedly it's probably not great to do this, but I really want to fix this all
up in the subsequent patch and doing it here makes that diff a lot nicer. I want
to split out the gen8/9 code to make the function a bit more readable, but to
keep this easily cherry-pickable I am doing this fix first. If we decide not to
merge the cleanup patch then I can revisit this.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Valtteri Rantala <Valtteri.rantala@intel.com>
(cherry picked from commit 90754d2df0)
This was apparently missed when ARB_sso support was added.
Add label support to pipeline objects just like all the other
debug-related objects.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 770f141866)
Triggers an INVALID_OPCODE warning on GK208. Seems rare enough to not
warrant verification on other chips. Fixes the new piglits:
ubo_array_indexing/fs-nonuniform-control-flow.shader_test
ubo_array_indexing/vs-nonuniform-control-flow.shader_test
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 36e3eb6a95)
The state tracker will pass through requests from buggy applications
which will have the buffer size larger than the max allowed (64k). Clamp
the size to 64k so that we don't get errors when uploading the constbuf
data.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8b24388647)
One of the places we have to insert texbars is in situations where the
result of the tex gets overwritten by a different instruction (e.g. in a
conditional statement). However in some situations it can actually
appear as though the original tex itself is an overwriting instruction.
This can naturally never really happen, so just ignore the tex
instruction when it comes up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90347
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a2af42c1d2)
Rather than forcing everyone to provide their own definition of the symbol
provide a common (dummy) one.
This helps us resolve the build of the standalone pipe-drivers (amongst
others), which are missing the symbol.
Cc: Rob Clark <robclark@freedesktop.org>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 1df5a6c71e)
From ARB_program_interface_query:
"Note that if an interface enumerates a single active resource list
entry for an array variable (e.g., "a[0]"), a <name> identifying
any array element other than the first (e.g., "a[1]") is not
considered to match."
It doesn't apply to arrays of interface blocks but just to array
variables.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 4ee69a97bb)
we don't check the validity of pscreen until dri_init_screen_helper
hit this trying to init glamor on a device with no driver (udl).
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 563706c146)
Some hardware reads only the low 16-bits even if the type is UD, but
other hardware like Cherryview can't handle this.
Fixes spec@arb_gpu_shader5@execution@sampler_array_indexing@fs-simple on
Cherryview.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90830
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
(cherry picked from commit d46d04529b)
When calculating the binding table index for non-constant sampler
array indexing it needs to add the base binding table index which is a
constant within the generated code. Often this base is zero so we can
avoid a redundant instruction in that case.
It looks like nothing in shader-db is doing non-constant sampler array
indexing so this patch doesn't make any difference but it might be
worth having anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
(cherry picked from commit 7f62fdae16)
Previously when generating the send instruction for a sample
instruction with an indirect sampler it would use the destination
register as a temporary store. This breaks when used in combination
with the opt_sampler_eot optimisation because that forces the
destination to be null. This patch fixes that by avoiding the temp
register altogether.
The reason the temporary register was needed was because it was trying
to ensure the binding table index doesn't overflow a byte by and'ing
it with 0xff. The result is then or'd with samper_index<<8. This patch
instead just and's the whole thing by 0xfff. This will ensure that a
bogus sampler index won't overflow into the rest of the message
descriptor but unlike the previous code it won't ensure that the
binding table index doesn't overflow into the sampler index. It
doesn't seem like that should matter very much though because if the
shader is generating a bogus sampler index then it's going to just get
garbage out either way.
Instead of doing sampler_index<<8|(sampler_index+base_table_index) the
new code avoids one operation by doing
sampler_index*0x101+base_table_index which should be equivalent.
However if we wanted to avoid the multiply for some reason we could do
this by adding an extra or instruction still without needing the
temporary register.
This fixes a number of Piglit tests on Skylake that were using
indirect samplers such as:
spec@arb_gpu_shader5@execution@sampler_array_indexing@fs-simple
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 6c846dc57b)
AFAICT, there is no real way to make sure a send message with EOT is properly
ignored from compact, nor can I see a way to actually encode EOT while
compacting. Before the single send optimization we'd always bail because we hit
the is_immediate && !is_compactable_immediate case. However, with single send,
is_immediate is not true, and so we end up trying to compact the un-compactible.
Without this, any compacting single send instruction will hang because the EOT
isn't there. I am not sure how I didn't hit this when I originally enabled the
optimization. I didn't check if some surrounding code changed.
I know Neil and Matt were both looking into this. I did a quick search and
didn't see any patches out there to handle this. Please ignore if this has
already been sent by someone. (Direct me to it and I will review it).
Reported-by: Neil Roberts <neil@linux.intel.com>
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit b307921c3f)
In commit fe74fee8fa we rounded the line width to the nearest integer to
match the GLES3 spec requirements stated in section 13.4.2.1, but that seems
to break a dEQP test that renders wide lines in some multisampling scenarios.
Ian noted that the Open 4.4 spec has the following similar text:
"The actual width of non-antialiased lines is determined by rounding the
supplied width to the nearest integer, then clamping it to the
implementation-dependent maximum non-antialiased line width."
and suggested that when ES removed antialiased lines, they removed
"non-antialised" from that paragraph but probably should not have.
Going by that note, this patch restricts the quantization implemented in
fe74fee8fa only to regular aliased lines. This seems to keep the
tests fixed with that commit passing while fixing the broken test.
v2:
- Drop one of the clamps (Ken, Marius)
- Add a rule to prevent advertising line widths that when rounded go beyond
the limits allowed by the hardware (Ken)
- Update comments in the code accordingly (Ian)
- Put the code in a utility function (Ian)
Fixes:
dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90749
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f9a18acb56)
When we import a dma-buf fd from another driver the kernel
gives us the right info, and this trashes it.
Convert the kernel bo flags into the domain flags.
This helps getting reverse prime and glamor working.
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c6877c9e59)
Previously, we just put the message for the EOT send as high in the file as
it would go. This is because the register pre-filling hardware will stop
all over the early registers in the file in preparation for the next thread
while you're still sending the last message. However, if something happens
to spill, then the MRF hack interferes with the EOT send message and, if
things aren't scheduled nicely, will stomp on it.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90520
Reviewed-by: Neil Roberts <neil@linux.intel.com>
(cherry picked from commit 86e5afbfee)
Since the introduction of
commit 536003c11e
Author: Boyan Ding <boyan.j.ding@gmail.com>
Date: Wed Mar 25 19:36:54 2015 +0800
i965: Add XRGB8888 format to intel_screen_make_configs
winsys buffers no longer have an alpha channel. This causes
_mesa_format_matches_format_and_type() to reject previously working BGRA
uploads from using the BLT fast path. Instead of using the generic
routine for matching formats exactly, export the slightly more relaxed
check from intel_miptree_blit() which importantly allows the blitter
routine to apply a small number of format conversions.
References: https://bugs.freedesktop.org/show_bug.cgi?id=90839
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Alexander Monakov <amonakov@gmail.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 922c0c9fd5)
The blitter already has code to accommodate filling in the alpha channel
for BGRX destination formats, so expand this to also allow filling the
alpha channgel in RGBX formats.
More importantly for the next patch is moving the test into its own
function for the purpose of exporting the check to the callers.
v2: Fix alpha expansion as spotted by Alexander with the fix suggested by
Kenneth
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Alexander Monakov <amonakov@gmail.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit c2d0606827)
The BLT pitch is specified in bytes for linear surfaces and in dwords
for tiled surfaces. In both cases the programmable limit is 32,767, so
adjust the check to compensate for the effect of tiling.
v2: Tweak whitespace for functions (Kenneth)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 8da79b8378)
In the ARB_fragment_program specification, the result.depth output
variable is treated as a vec4, where the fragment depth is stored in the
.z component, and the other three components are undefined.
This is different than GLSL, which uses a scalar value (gl_FragDepth).
To make this consistent for driver backends, this patch makes
prog_to_nir use a scalar output variable for FRAG_RESULT_DEPTH,
moving result.depth.z into the first component.
Fixes Glean's fragProg1 "Z-write test" subtest.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90000
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 7b8f20ec55)
This probably got broken when the samplers were converted to be indexed
by shader type.
Seen when looking at bug 89819 though I'm not sure if that really was what
the bug was about...
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 6e5970ffee)
I just botched this when writing the original code.
From the ARB_vertex_program specification:
"The RSQ instruction approximates the reciprocal of the square root of
the absolute value of the scalar operand and replicates it to all four
components of the result vector."
Fixes a Glean vertProg1 subtest:
RSQ test 2 (reciprocal square root of negative value)
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90547
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit 15a12795c6)
Clearing can happen at a time when various state objects are incoherent
and not ready for a draw. Some of the validation functions don't handle
this well, so only flush the framebuffer state. This has the advantage
of also not doing extra work.
This works around some crashes that can happen when clearing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
(cherry picked from commit aba3392541)
The next patch will add a test for compatibility profile dispatch, and
it seems to make more sense to share the lists.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 49ab670f52)
Conflicts:
src/mesa/main/tests/dispatch_sanity.cpp
Currently on the functions that are exclusive to core-profile are
implemented. The remainder continue to live in the XML. Additional
functions can be moved later.
The functions for GL_ARB_draw_indirect and GL_ARB_multi_draw_indirect
are put in the dispatch table inside the VBO module, so they do not need
to be moved over.
The diff of src/mesa/main/api_exec.c before and after this patch is as
expected. All of the functions listed in apiexec.py moved out of a 'if
(_mesa_is_desktop(ctx))' block into a new 'if (ctx->API ==
API_OPENGL_CORE)' block.
v2: Remove stray shebang line in apiexec.py. Suggested by Ilia.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dylan Baker <baker.dylan.c@gmail.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f20899b727)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Conflicts:
src/mapi/glapi/gen/gl_genexec.py
Starting with GEN8, there is documentation that the multisample state command
must be emitted before the 3DSTATE_WM_HZ_OP command any time the multisample
count changes. The 3DSTATE_WM_HZ_OP packet gets emitted as a result of a
intel_hix_exec(), which is called upon a fast clear and/or a resolve. This can
happen before the state atoms are checked, and so the multisample state must be
put directly in the function.
v1:
- In v0, I was always emitting the command, but Ken came up with the condition to
determine whether or not the sample count actually changed.
- Ken's recommendation was to set brw->num_multisamples after emitting
3DSTATE_MULTISAMPLE. This doesn't work. I put my best guess as to why in the XXX
(it was causing 7 regressions on BDW).
v2:
Flag NEW_MULTISAMPLE state. As Ken found, in state upload we check for the
multisample change to determine whether or not to emit certain packets. Since
the hiz code doesn't actually care about the number of multisamples, set the
flag and let the later code take care of it.
Jenkins results:
http://otc-mesa-ci.jf.intel.com/view/dev/job/bwidawsk/136/
Fixes around 200 piglit tests on SKL. I'm somewhat surprised that it seems to
have no impact on BDW as the restriction is needed there as well.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com> (v0)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
(cherry picked from commit e2d84d99f5)
If the multiplication's result is unused, except by a conditional_mod,
the destination will be null. Since the final instruction in the lowered
sequence is a partial-write, we can't put the conditional mod on it and
we have to store the full result to a register and do a MOV with a
conditional mod.
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90580
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0596134410)
Commit 4bdbb588a9 introduced new _glapi_new_nop_table() and
_glapi_set_nop_handler() functions in the glapi dispatcher (which
live in libGL.so). The calls to those functions from context.c
would be undefined (i.e. an ABI break) if the libGL used at runtime
was older.
For the time being, use the old single generic_nop() function for
non-Windows builds to avoid this problem. At some point in the future
it should be safe to remove this work-around. See comments for more
details.
v2: Incorporate feedback from Emil. Use _WIN32 instead of
GLX_DIRECT_RENDERING to control behavior, move comments.
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit be71bbfaa2)
Squashed with commit
glapi: Encapsulate nop table knowledge in new _mesa_new_nop_table function
Encapsulate the knowledge about how to build the nop table in a new
_mesa_new_nop_table function. This makes it easier for dispatch_sanity
to keep working now and in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2b8c51834b)
When using SIMD4x2 on Skylake, the sampler instructions need a message
header to select the correct mode. This was added for most sample
instructions in 0ac4c2727 but the TXF_MCS instruction is emitted
separately and it was missed.
This fixes a bunch of Piglit tests which test texelFetch in a geometry
shader, for example:
spec/arb_texture_multisample/texelfetch/2-gs-sampler2dms
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 5ae6c7bfce)
The problem is that the EDGEFLAG has to be toggled at vertex submission
time. This can be done from either the draw or the regular paths. Avoid
falling back to draw just because there's an edgeflag.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3ec1815285)
Commit 8acaf862df switched things over to use TEXCOORD instead of
GENERIC, but did not update the nv30 swtnl draw paths. This teaches the
draw logic about TEXCOORD.
Among other things, this fixes a crash in demos/arbocclude when using
swtnl. Curiously enough, the point-sprite piglit works without this.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 25be70462d)
These are only used once per draw, so it makes sense to keep them in
GART. Also take this opportunity to modernize the buffer mapping API
usage.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c3d36a2e1a)
Apparently some compilers think we probably wanted to do !(x == y) instead
and issue a warning, so just shut it up... No functional change, obviously.
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6a111e54d7)
This fixes glxgears with NV30_SWTNL=1 forced on. Probably fixes a bunch
of other situations where we fall back to the swtnl path.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 147816375d)
nv30_validate_clip depends on the rasterizer state. Also we should
upload all the new clip planes on change since next time the plane data
won't have changed, but the enables might.
This fixes fixed-clip-enables and vs-clip-vertex-enables shader tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7518fc3c66)
There can be scenarios where the "indirect" arg of a PFETCH becomes
known, and so the code will attempt to propagate it. Use this
opportunity to just fold it into the first argument, and prevent the
load propagation pass from touching PFETCH further.
This fixes gs-input-array-vec4-index-rd.shader_test and
vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit fa7f9f123b)
We forgot to convert to VFETCH in case of indirect access. Fix that.
This avoids crashes on the new gs-input-array-vec4-index-rd and
vs-output-array-vec4-index-wr-before-gs but they still fail.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 217301843a)
When we get something like IN[ADDR[0].x+5], we will now guess that we
should look at IN[5] for the "base" information.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0bab3962f5)
get_immediate will return a const reference, the requested immediate
isn't necessarily in the x slot. Make sure to use the swizzle.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 3e7bc67285)
Passing -module to glibtool causes the resulting library to be called
libSomething.so rather than libSomething.dylib on darwin.
Regardless if libOSMesa is a library or a module, it has been used as
the former for quite some time. Update the build to reflect that and
resolve the naming issue.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
[Emil Velikov: Tweak the commit message.]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 06ff751f97)
Add the file to the API_XML list, otherwise there will be no knowledge
by the build that it should be included in the tarball.
Thus the (scons) build will fail.
Fixes: b297fc27aa9(glapi: add GL_ARB_program_interface_query skeleton)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This just created extra upkeep and the push to move extern
C's into mesa code would mean a large number of extern's
in core Mesa driver interfaces. The Haiku Gallium renderers
are mostly insulated via the C-based Haiku state tracker.
As any future hardware support in Haiku will be gallium
based, lets just drop swrast.
Haiku has a Mesa 7.12 fork for gcc2 that uses swrast.
This commit fixes the last of the Haiku build issues.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
total instructions in shared programs: 2742062 -> 2681339 (-2.21%)
instructions in affected programs: 1514770 -> 1454047 (-4.01%)
helped: 5813
HURT: 1120
The gained programs are ARB vertext programs that were previously going
through the vec4 backend. Now that we have prog_to_nir, ARB vertex
programs can go through the scalar backend so they show up as "gained" in
the shader-db results.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
A fence can outlive the ctx, so we shouldn't deref the ctx to get at the
screen. We need some updates in libdrm_freedreno API to completely
handle fences properly, but this is at least an improvement.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Gen9 surface state is very similar to the previous generation. The important
changes here are aux mode, and the way clear colors work.
NOTE: There are some things intentionally left out of this decoding.
v2: Redo the string for the aux buffer type to address compressed variants.
v3: Use the shift for compression enable (instead of compression mode) (Topi)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
0x00007da0: 0xc1da740e: SF_CLIP VP: guardband xmin = -27.306667
0x00007da4: 0x41da740e: SF_CLIP VP: guardband xmax = 27.306667
0x00007da4: 0x41da740e: SF_CLIP VP: guardband ymin = -23.405714
0x00007da8: 0xc1bb3ee7: SF_CLIP VP: guardband ymax = 23.405714
0x00007db0: 0x00000000: SF_CLIP VP: Min extents: 0.00x0.00
0x00007db8: 0x00000000: SF_CLIP VP: Max extents: 299.00x349.00
While here, fix the wrong offsets for the guardband (I didn't check if it used
to be valid on GEN4).
v2: Remove leftover GET_BITS which belongs later in the series. (Topi)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's true that not all surfaces apply for every gen, but for the most part this
is what we want. (The unfortunate case is when we use a valid surface, but not
for the specific GEN).
This was automated with a vim macro.
v2: Shortened common forms such as R8G8B8A8->RGBA8. Note that this makes some of
the sample output in subsequent commits slightly incorrect.
v3: Use the name from the table (Ken). This requires declaring the surface
format array as extern, and declaring the struct in the .h file.
v4: Move the struct back and create a helper function to obtain the name (Ken)
Get rid of the now useless helper in the state_dump.c
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v3)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ivybridge and Baytrail can't use mach with 2Q quarter control, so just
do it without the accumulator. Stupid accumulator.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The next commit uses an add(16) with a UW destination with a stride of
2, which needs compression control since it's writing two registers. The
old code would have failed to set compression control correctly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Gen8+'s MUL instruction doesn't ignore the high 16-bits of one source
like on earlier platforms, so we can constant propagate into it without
worry. Integer multiplies (not into the accumulator, which is done for
imul_high) are lowered in lower_integer_multiplication(), so it's safe
there as well.
On Broadwell, fragment shaders only:
total instructions in shared programs: 4377769 -> 4377451 (-0.01%)
instructions in affected programs: 48064 -> 47746 (-0.66%)
helped: 156
On Broadwell, vertex shaders only:
total instructions in shared programs: 2858885 -> 2856313 (-0.09%)
instructions in affected programs: 26380 -> 23808 (-9.75%)
helped: 134
On Broadwell, vertex shaders only (with INTEL_USE_NIR=1):
total instructions in shared programs: 2911688 -> 2865984 (-1.57%)
instructions in affected programs: 1421715 -> 1376011 (-3.21%)
helped: 6186
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
32-bit x 32-bit integer multiplication requires multiple instructions
until Broadwell. This patch just lets us treat the MUL instruction in
the FS backend like it operates on Broadwell, and after optimizations
we lower it into a sequence of instructions on older platforms.
Doing this will allow us to some extra optimization on integer
multiplies.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Currently, when the MinFilter is GL_LINEAR or GL_NEAREST we hide the
actual miplevel count from the hardware (and we avoid re-creating
the miptree structure with all the levels), since we don't expect
levels other than the base level to be needed. Unfortunately,
GLSL's textureSize() function is an exception to this rule. This
function takes a lod parameter that we need to use to return the
size of the appropriate miplevel (if it exists). The spec only
requires that the miplevel exists, so even if the sampler is
configured with a linear or nearest MinFilter, as far as the user
has uploaded miplevels for the texture, textureSize() should return
the appropriate sizes.
This patch fixes this by exposing the actual miplevel count for all
sampling engine textures while keeping the original implementation
for render targets (for render targets textures we do not provide
the miplevel count but the actual LOD we are wrting to, so we
want to make sure that we make this the base level).
Fixes 28 dEQP tests in the following category:
dEQP-GLES3.functional.shaders.texture_functions.texturesize.*
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Changes generated by:
cd src/mapi/glapi/gen
for i in *.xml; do
cat $i |\
sed 's/[[:space:]]*offset="[^"]*">/>/' |\
sed 's/[[:space:]]*offset="[^"]*"[[:space:]]*$//' |\
sed 's/[[:space:]]*offset="[^"]*"[[:space:]]*/ /' > x
mv x $i
done
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Comparing the output of
nm -D libGL.so.349.16 | grep ' T gl[^X]' | sed 's/.* T //'
between Catalyst NVIDIA 349.16 and this commit, the only change is a bunch
of functions that NVIDIA exports that Mesa does not.
If a function is not statically exported by either of the major binary
drivers on Linux, there is almost zero chance that any application
statically links with it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Comparing the output of
nm -D arch/x86_64/usr/X11R6/lib64/fglrx/fglrx-libGL.so.1.2 |\
grep ' T gl[^X]' | sed 's/.* T //'
between Catalyst 14.6 Beta and this commit, the only change is a bunch
of functions that AMD exports that Mesa does not and some OpenGL ES
1.1 functions that Mesa exported but AMD does not.
The OpenGL ES 1.1 functions (e.g., glAlphaFuncx) are added by extensions
in desktop. Our infrastructure doesn't allow us to statically export a
function in one lib and not in another. The GLES1 conformance tests
expect to be able to link with these functions, so we have to export
them.
If a function is not statically exported by either of the major binary
drivers on Linux, there is almost zero chance that any application
statically links with it.
As a side note... I find it odd that AMD exports glTextureBarrierNV but
not glTextureBarrier.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Comparing the output of
nm libGL.so | grep ' T gl[^X]' | sed 's/.* T //'
between 10.3.7 and this commit, the only change is the removal of
glFramebufferTextureFaceARB. This function was removed a couple commits
previously.
glClipControl was, at the time 10.3 shipped, a very new function. It
was added by GL_ARB_clip_control. That extension was ratified by the
Khronos Board of Promoters on August 7, 2014. It's less than a year
old, and I don't think it's is likely that there are many applications
using that extension... much less statically linking with the function.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Comparing the output of
nm libGL.so | grep ' T gl[^X]' | sed 's/.* T //'
between 10.4.7 and this commit, the only change is the removal of
glFramebufferTextureFaceARB. This function was removed a couple commits
previously.
None of these functions are particuarly new. If applications were not
statically linking them with 10.4.7, there's approximately zero chance
they will for 10.6.
Almost all of these functions are for GL_ARB_direct_state_access.
Since the whole DSA API wasn't statically exported (and the extension
wasn't enabled!), I think there's exactly zero chance anyone linked
against these symbols.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Comparing the output of
nm libGL.so | grep ' T gl[^X]' | sed 's/.* T //'
between 10.5.5 and this commit, the only change is the removal of
glFramebufferTextureFaceARB. This function was removed a couple commits
previously.
None of these functions are particuarly new. If applications were not
statically linking them with 10.5.5, there's approximately zero chance
they will for 10.6.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Changes generated by:
cd src/mapi/glapi/gen
for i in *.xml; do
cat $i |\
sed 's/[[:space:]]*static_dispatch="[^"]*">/>/' |\
sed 's/[[:space:]]*static_dispatch="[^"]*"[[:space:]]*$//' |\
sed 's/[[:space:]]*static_dispatch="[^"]*"[[:space:]]*/ /' > x
mv x $i
done
Comparing the output of
nm libGL.so | grep ' T gl[^X]' | sed 's/.* T //'
before and after this commit showed no differences.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dylan Baker <baker.dylan.c@gmail.com>
The set of functions with static dispatch is (supposed to be) defined by
the Linux OpenGL ABI. We export quite a few more functions than that
for historical reasons. However, this list should never grow.
This table is used instead of the static_dispatch tag in the XML to
generate the static dispatch functions. I used
nm libGL.so | grep ' T gl[^X]' | sed 's/.* T //'
before and after the change. diff showed no differences.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dylan Baker <baker.dylan.c@gmail.com>
Since the set of functions with static will never change, there is no
reason to store it in the XML. It's just one of those fields that
confuses people adding new functions.
This is split out from the rest of the series so that in-code assertions
can be used to verify that the data in the Python code matches the XML.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dylan Baker <baker.dylan.c@gmail.com>
Mesa does not (and probably never will) support GL_ARB_geometry_shader4,
so this function will never exist. Having a function that is
exec="skip" and offset="assign" is just weird.
There are still a couple 'exec="skip" offset="assign"' functions
remaining. These remain because we either support GLX protocol for them
(glSampleMaskSGIS and glSamplePatternSGIS) or older DRI drivers still
need them in the dispatch table (glResizeBuffersMESA). The SGIS
functions can be removed later.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Without this the next patch will try to put these functions in the
dispatch table in indirect_init.c.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
With DSA we can no longer rely on this being done in st_validate_state
in response to the framebuffer bindings having changed.
This fixes the ext_framebuffer_multisample-bitmap piglit test.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 3687d75 changed the fs_visitor constructors, but it didn't update
all the users. As a result, 'make check' fails.
I added the explicit cast to the gl_program* parameter to make it more
clear which NULL was which.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@Whitecape.org>
We also reduce the amount of need-to-know information about st_api
to require one less extern "C" in st_manager.h
Reviewed-by: Brian Paul <brianp@vmware.com>
For scalar GS support, we either need to add a fourth constructor which
takes the GS structures, or combine the existing two and pass the shader
stage.
Given that they're not significantly different, I opted for the latter.
v2: Remove more stuff from the .h file (Jason and Jordan).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
With earlier commit (7a58262e58 egl: Remove skeleton implementation of
EGL_MESA_screen_surface) we've removed the skeleton implementation of
eglCopyContextMESA(). Just like EGL_MESA_screen_surface this extension
was never implemented in mesa.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
The extension requires that the address of the core functions should be
available via eglGetProcAddress. Currently the list is guarded by
_EGL_GET_CORE_ADDRESSES, which was only set for the scons (windows)
build.
Unconditionally enable it for all the builds (automake, android and
haiku) considering that the extension is not platform specific and is
always enabled.
v2: Drop the _EGL_GET_CORE_ADDRESSES macro altogether.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
s/EGL_MESA_dma_buf_image_export/EGL_MESA_image_dma_buf_export as defined by the spec
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The EGL 1.3, 1.4 and 1.5 spec (as quoted below) explicitly mentions that
providing static symbols for functions provided by EGL extensions is not
portable. Considering that relatively recently we've seen a non-mesa
desktop EGL implementation, the fact that we opt for such behaviour has
gone unnoticed.
From the EGL 1.5 specification:
For functions that are queryable with eglGetProcAddress,
implementations may choose to also export those functions
statically from the object libraries implementing those
functions. However, portable clients cannot rely on this
behavior.
To encourage devs against writing such non-portable code, let's hide the
symbols similar to the official binary driver from NVIDIA.
v2: Quote the EGL 1.5 spec, as suggested by Chad.
Cc: Brian Paul <brianp@vmware.com>
Cc: Chad Versace <chad.versace@intel.com>
Cc: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Similar to other EGL extensions - guard the function prototypes by
EGL_EGLEXT_PROTOTYPES as the libEGL library does (should) not provide
the symbols statically.
Instead users should call eglGetProcAddress, which returns the function
pointer. The latter of which was missing the type declaration (typedef).
Cc: Dave Airlie <airlied@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The driver search/load is not done at eglGetDisplay (or eglOpenDisplay
as the readme called it) time, but during eglInitialize().
Drop _eglMain (available only for external drivers) reference. Mention
we use function(s), specific to the built-in driver(s).
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This was causing corruption with hw binning on a306. Unlikely that it
is a306 specific, but rather the smaller gmem size resulted in different
tile configuration which was triggering the bug at certain resolutions.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4" and "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Whitelist adreno 306 (as found in msm8916/apq8016). Works pretty much
out of the box, although the smaller GMEM size requires more tiles to
fit 1920x1080, so bump up the max # of tiles as well.
Since it is just whitelist + trivial change, it makes sense to land on
all the active release branches.
Note that a305c ends up with gpu-id "306", hence a306 ends up with
gpu-id of "307". Apparently that is what happens when you let the
marketing dept name things.
Cc: "10.4" and "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
These warnings have been detected by Clang 3.6.
codegen/nv50_ir_from_tgsi.cpp:1319:10: warning: struct 'Source' was
previously declared as a class [-Wmismatched-tags] const struct tgsi::Source *code;
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The nv40_fp_bra() function in the same file is also unused but this is
the only place where the nv30/nv40 isa is documented.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes a crash when trying to monitor MP counters because compute
support is not implemented for nvf0.
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Make it clear that ARB_direct_state_access is only available on
drivers that support GL 2.0+
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Assume that all drivers that advertise support for NPOT textures
are able to support GL 2.0.
v2: Add a comment.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This extension requires OpenGL 2.0, so enable it on gen3 and later.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This could have added a new DD table entry for DrawBuffers that takes an
arbitrary draw buffer, but, after looking at the existing DD functions,
Kenneth Graunke recommended that we just skip calling the DD functions in the
case of ARB_direct_state_access. The DD implementations for DrawBuffer(s)
have limited functionality, especially with respect to
ARB_direct_state_access.
[Fredrik: Call the driver function when fb is the bound draw buffer]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
[Fredrik: Fix the name of the buf parameter in the XML file]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
This could have added a new DD table entry for ReadBuffer that takes an
arbitrary read buffer, but, after looking at the existing DD functions,
Kenneth Graunke recommended that we just skip calling the DD functions in the
case of ARB_direct_state_access. The DD implementations for ReadBuffer
have limited functionality, especially with respect to
ARB_direct_state_access.
[Fredrik: Call the driver function when fb is the bound read buffer]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
[Fredrik: Fix the name of the buf parameter in the XML file]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
This could have added a new DD table entry for DrawBuffer that takes an
arbitrary draw buffer, but, after looking at the existing DD functions,
Kenneth Graunke recommended that we just skip calling the DD functions in the
case of ARB_direct_state_access. The DD implementations for DrawBuffer(s)
have limited functionality, especially with respect to
ARB_direct_state_access.
[Fredrik: Call the driver function when fb is the bound draw buffer]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
The ARB_direct_state_access specification says (as of 2015.02.05):
"Interactions with OpenGL 4.3 or ARB_framebuffer_no_attachments
If neither OpenGL 4.3 nor ARB_framebuffer_no_attachments are supported,
ignore the support for NamedFramebufferParameteri and
GetNamedFramebufferParameteriv."
This commit adds stubs for these entry points.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Mesa's ClearBuffer framework is very complicated and thoroughly married to the
object binding model. Moreover, the OpenGL spec for ClearBuffer is also very
complicated. At some point, we should implement buffer clearing for arbitrary
framebuffer objects, but for now, we will just wrap ClearBuffer.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Mesa's ClearBuffer framework is very complicated and thoroughly married to the
object binding model. Moreover, the OpenGL spec for ClearBuffer is also very
complicated. At some point, we should implement buffer clearing for arbitrary
framebuffer objects, but for now, we will just wrap ClearBuffer.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Mesa's ClearBuffer framework is very complicated and thoroughly married to the
object binding model. Moreover, the OpenGL spec for ClearBuffer is also very
complicated. At some point, we should implement buffer clearing for arbitrary
framebuffer objects, but for now, we will just wrap ClearBuffer.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Mesa's ClearBuffer framework is very complicated and thoroughly married to the
object binding model. Moreover, the OpenGL spec for ClearBuffer is also very
complicated. At some point, we should implement buffer clearing for arbitrary
framebuffer objects, but for now, we will just wrap ClearBuffer.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Previously, we used _mesa_update_state to update the currently bound
framebuffers prior to performing a blit. Now that _mesa_blit_framebuffer
uses arbitrary framebuffers, _mesa_update_state is not specific enough.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
This wasn't neccessary for ARB_direct_state_access, but felt like a good idea
for the sake of completeness.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
_mesa_update_framebuffer now operates on arbitrary read and draw framebuffers.
This allows BlitNamedFramebuffer to update the state of its arbitrary read and
draw framebuffers.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
[Fredrik: - Update one of the error messages to reflect that the
framebuffer might not be the bound framebuffer.
- Whitespace fixes.]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
[Fredrik: - Retain the debugging code in CheckFramebufferStatus.
- Whitespace fixes.]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
This splits off the (still) rather large chunk that is
get_texture_for_framebuffer into lots of smaller functions specialized to
service the wide variety of unique needs of *FramebufferTexture* entry points.
The result is much cleaner because, rather than having a pile of branches and
confusing conditions (like the boolean layered), the uniqueness is baked into
the entry points. The entry points know whether or not they are layered or use
a textarget.
[Fredrik: - Mention the value of <textarget> in the error message.
- Rename check_zoffset to check_layer, and zoffset to layer.
The zoffset parameter was renamed to layer in
ARB_framebuffer_object.
- Make layered a GLboolean since the value is visible to the API.
- Remove EXT suffixes in refactored code.
- Whitespace fixes.]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
This moves a few blocks around so that the control flow is more obvious. If
the texture is 0, just return true at the beginning of the function.
Likewise, if the texObj is NULL, return true at the beginning of the function
as well.
[Fredrik: Fix the texObj NULL check]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Split apart utility function framebuffer_texture to better prepare for
implementing NamedFramebufferTexture and NamedFramebufferTextureLayer. This
should also pave the way for some future cleanup work.
[Fredrik: - Mention which limit was exceeded when <layer> is out of range.
- Update a comment to reflect that <fb> might not be the bound
framebuffer.
- Make it clear that the error message in glFramebufferTexture*D
refers to the <textarget> parameter.
- Remove EXT suffixes.]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
gl*FramebufferTexture should generate GL_INVALID_VALUE when the
texture doesn't exist.
[Fredrik: Split this change out from the next commit]
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
[Fredrik: - Remove the DummyRenderbuffer checks now that they are
done in _mesa_lookup_renderbuffer_err.
- Fix the <renderbuffertarget> name in error messages.
- Make the error message in _mesa_framebuffer_renderbuffer
reflect that <fb> might not be the bound framebuffer.
- Remove EXT suffixes from GL tokens.]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Rename _mesa_framebuffer_renderbuffer to _mesa_FramebufferRenderbuffer_sw in
preparation for adding the ARB_direct_state_access backend function for
FramebufferRenderbuffer and NamedFramebufferRenderbuffer to share.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
[Fredrik: Generate an error for non-existent renderbuffers]
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Some bits were already there for texture views but some were missing.
In particular for cube map views things needed to change a bit.
For simplicity I ended up removing the separate face addr bit (just use
the z bit) - cube arrays didn't use it already, so just follow the same
logic there. (In theory using separate bits could allow for better hash
function but I don't think anyone ever did some measurements of that so
probably not worth the trouble, if we'd reintroduce it we'd certainly
wanted to use the same logic for cube arrays and cube maps.)
Also extend the seamless cube sampling to cube arrays - as there were no
piglit failures before this is apparently untested, but things now generally
work quite the same for cube textures and cube array textures so there
hopefully shouldn't be any trouble...
49 new piglits, 47 pass, 2 fail (both due to fake multisampling).
v2: incorporate Brian's feedback, add sampler view validation,
function rename, formatting fixes.
Reviewed-by: Brian Paul <brianp@vmware.com>
All the functionality was pretty much there, just not tested.
Trivially fix up the missing pieces (take target info from view not
resource), and add some missing bits for cubes.
Also add some minimal debug validation to detect uninitialized target values
in the view...
49 new piglits, 47 pass, 2 fail (both related to fake multisampling,
not texture_view itself). No other piglit changes.
v2: move sampler view validation to sampler view creation, update docs.
Reviewed-by: Brian Paul <brianp@vmware.com>
This was missing, and drivers relying on the target in the view could get
into quite some trouble.
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* No impact risk to any other platforms
* Tracing printf needs stdio.h now due to child header change
* Add missing #/src include directory for util/macros.h
This problem can easily be reproduced with a number of
ARB_shader_image_load_store piglit tests, which use a buffer object as
PBO for a pixel transfer operation and later on bind the same buffer
to the pipeline as shader image -- The problem is not exclusive to
images though, and is likely to affect other kinds of buffer objects
that can be bound to the 3D pipeline, including vertex, index,
uniform, atomic counter buffers, etc.
CC: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
gcc 4.4.7 really doesn't like them, and they aren't standard
C++, they seem to be a gcc extension.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now that ARB_texture_stencil8 is supported, this might happen.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Refactor ::trigger and ::abort to split out the operations that access
concurrently modified data members and require locking from the
recursive and possibly re-entrant part of these methods. This will
avoid some deadlock situations when locking is implemented.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
CC: 10.5 <mesa-stable@lists.freedesktop.org>
This fixes bugs with special cases where we have arrays of
structures containing samplers or arrays of samplers.
I've verified that patch results in calculating same index value as
returned by _mesa_get_sampler_uniform_value for IR. Patch makes
following ES3 conformance test pass:
ES3-CTS.shaders.struct.uniform.sampler_array_fragment
v2: remove unnecessary comment (Topi)
simplify changes and the overall code (Jason)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
Previously whenever a primitive is drawn the driver would call
_mesa_check_conditional_render which blocks waiting for the result of
the query to determine whether to render. On Gen7+ there is a bit in
the 3DPRIMITIVE command which can be used to disable the primitive
based on the value of a state bit. This state bit can be set based on
whether two registers have different values using the MI_PREDICATE
command. We can load these two registers with the pixel count values
stored in the query begin and end to implement conditional rendering
without stalling.
Unfortunately these two source registers were not in the whitelist of
available registers in the kernel driver until v3.19. This patch uses
the command parser version from intel_screen to detect whether to
attempt to set the predicate data registers.
The predicate enable bit is currently only used for drawing 3D
primitives. For blits, clears, bitmaps, copypixels and drawpixels it
still causes a stall. For most of these it would probably just work to
call the new brw_check_conditional_render function instead of
_mesa_check_conditional_render because they already work in terms of
rendering primitives. However it's a bit trickier for blits because it
can use the BLT ring or the blorp codepath. I think these operations
are less useful for conditional rendering than rendering primitives so
it might be best to leave it for a later patch.
v2: Use the command parser version to detect whether we can write to
the predicate data registers instead of trying to execute a
register load command.
v3: Simple rebase
v4: Changes suggested by Kenneth Graunke: Split the
load_64bit_register function out to a separate patch so it can be
a shared public function. Avoid calling
_mesa_check_conditional_render if we've already determined that
there's no query object. Some styling fixes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Adds brw_load_register_mem64 which is similar to brw_load_register_mem
except that it queues two GEN7_MI_LOAD_REGISTER_MEM commands in order
to load both halves of a 64-bit register. The function is implemented
by splitting the 32-bit version into an internal helper function which
takes a size.
This will later be used to set the 64-bit predicate source registers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to detect whether the predicate source registers can be used
in a later patch we will need to know the version number for the
command parser. This patch just adds a member to intel_screen and does
an ioctl to get the version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It was possible for some events never to get triggered if one thread
was creating events and another threads was waiting for them.
This patch consolidates soft_event::wait() and hard_event::wait()
into event::wait() so that hard_event objects will now wait for
all their dependencies to be submitted before flushing the command
queue.
v2:
- Rename variables
- Use mutable varibales so we can keep event::wait() const
- Open code signalled() call so mutex can be atted to signalled
without deadlocking.
CC: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This fixes a potential crash where on a sequence like this:
Thread 0: Check if queue is not empty.
Thread 1: Remove item from queue, making it empty.
Thread 0: Do something assuming queue is not empty.
CC: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
No backend wires this up to anything, and the extension spec has been
marked obsolete for 4+ years.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
This function is always used with EGL_WINDOW_BIT. Pixmaps are forbidden
for Wayland, and PBuffers are unimplemented.
Reviewed-by: Daniel Stone <daniels@collabora.com>.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When the server gpu and requested gpu are different:
. They likely don't support the same tiling modes
. They likely do not have fast access to the same locations
Thus we do:
. render to a tiled buffer we do not share with the server
. Copy the content at every swap to a buffer with no tiling
that we share with the server.
This is similar to the glx dri3 DRI_PRIME implementation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
It is possible the server advertises a render-node.
In that case no authentication is needed,
and Gem names are forbidden.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
v2: do not check for __DRI_IMAGE_DRIVER, but instead
do not advertise __DRI_DRI2_LOADER when on a render-node.
Checks blitImage is implemented.
Initially having the __DRIimageExtension extension
at version 9 at least meant blitImage was supported.
However some implementation do advertise version >= 9
without implementing it.
CC: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
EGL_SOFTWARE is not supported anywhere in the code,
whereas LIBGL_ALWAYS_SOFTWARE is.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The opt_sampler_eot optimisation seems to break when the last
instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing
this so it causes a lot of regressions. I can't find any documentation
or known workarounds to indicate that this is expected behaviour, but
considering that this is probably a pretty unlikely situation in a
real use case we might as well disable it in order to avoid the
regressions. In total this fixes 451 tests.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This was really the original purpose, for enabling the path for
ES3.1 tests without the extension being set. Set also fallthrough
comment for Coverity (caught by Matt).
v2: .. and test the right way, not wrong one (Ilia Mirkin)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This avoids future addition to PIPE_PRIM_ from causing regressions
on r600g.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Our lower if/else pass was missed when converting NIR to use linked
lists rather than hashsets to track use/def sets.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since we update num_vtxelts here, we could otherwise end up with stale
instancing information in the upper bits which wouldn't otherwise get
reset. (Also we run the risk of the previous draw having set the first
element as instanced.)
This appears as one of the causes for the test pointed out in fdo#90363
to fail on nvc0.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90363
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
See identical commit for nv50. Destroying the current context and then
creating a new one or switching to another existing context would cause
the "current" state to not be properly initialized, so we save it off in
the screen.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Previously, this case was being handled in match_expression prior to
calling match_value. However, there is really no good reason for this
given that match_value has all of the information it needs. Also, they
weren't being handled properly in the commutative case and putting it in
match_value gives us that for free.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit switches us from the current setup of using hash sets for
use/def sets to using linked lists. Doing so should save us quite a bit of
memory because we aren't carrying around 3 hash sets per register and 2 per
SSA value. It should also save us CPU time because adding/removing things
from use/def sets is 4 pointer manipulations instead of a hash lookup.
Running shader-db 50 times with USE_NIR=0, NIR, and NIR + use/def lists:
GLSL IR Only: 586.4 +/- 1.653833
NIR with hash sets: 675.4 +/- 2.502108
NIR + use/def lists: 641.2 +/- 1.557043
I also ran a memory usage experiment with Ken's patch to delete GLSL IR and
keep NIR. This patch cuts an aditional 42.9 MiB of ralloc'd memory over
and above what we gained by deleting the GLSL IR on the same dota trace.
On the code complexity side of things, some things are now much easier and
others are a bit harder. One of the operations we perform constantly in
optimization passes is to replace one source with another. Due to the fact
that an instruction can use the same SSA value multiple times, we had to
iterate through the sources of the instruction and determine if the use we
were replacing was the only one before removing it from the set of uses.
With this patch, uses are per-source not per-instruction so we can just
remove it safely. On the other hand, trying to iterate over all of the
instructions that use a given value is more difficult. Fortunately, the
two places we do that are the ffma peephole where it doesn't matter and GCM
where we already gracefully handle duplicates visits to an instruction.
Another aspect here is that using linked lists in this way can be tricky to
get right. With sets, things were quite forgiving and the worst that
happened if you didn't properly remove a use was that it would get caught
in the validator. With linked lists, it can lead to linked list corruption
which can be harder to track. However, we do just as much validation of
the linked lists as we did of the sets so the validator should still catch
these problems. While working on this series, the vast majority of the
bugs I had to fix were caught by assertions. I don't think the lists are
going to be that much worse than the sets.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The linked list in gallium is pretty much the kernel list and we would like
to have a C-based linked list for all of mesa. Let's not duplicate and
just steal the gallium one.
Acked-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
We were rolling our own rewrite_src variant in copy-propagation. Let's
stop doing that and use the ones in core NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The out-of-SSA pass was one of the first passes written when getting SSA
up-and-going (for obvious reasons). As such, it came before a lot of the
nifty SSA-based helpers were introduced. This commit modernizes it so that
we're no longer doing nearly as much manual banging on use/def sets.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The former logic would copy the saturate up to any mul with an immediate
if there was a subsequent mul with a saturate. However we only want to
do that if we collapsed 2 muls by multiplying their immediates (or were
able to put the immediate in as a post-multiplier).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
All paths that produce GLSL IR for NIR lower ir_unop_log. All paths
that consume NIR will explode if they geta nir_op_flog.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
All paths that produce GLSL IR for NIR lower ir_unop_exp. All paths
that consume NIR will explode if they geta nir_op_fexp.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
It's a weird thing that provides some values related to 2**x. It's also
already handled by a case in the switch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Originally I wrote that removing the first parameter doesn't work but
I didn't know why. I now found a mention of this in the PRM so it's
probably worthing adding it to the comment.
This is needed to implement glGetVertexArrayIndexediv and
glGetVertexArrayIndexed64iv.
v2: Make the vao parameter const.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
The only difference between these functions is the legal types and
sizes, so consolidate the code into a single vertex_attrib_format()
function and call it from all three entry points.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
This saves the cost of repeated hash table lookups when the same
vertex array object is referenced in a sequence of calls such as:
glVertexArrayAttribFormat(vao, ...);
glVertexArrayAttribBinding(vao, ...);
glEnableVertexArrayAttrib(vao, ...);
...
Note that VAO's are container objects that are not shared between
contexts.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
This is a convenience function that generates GL_INVALID_OPERATION
when the array object doesn't exist.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
opt_sampler_eot enables a direct write to framebuffer from a sample.
In order to do this the sample message needs to have a message header
so if there wasn't one already then the function adds one. In addition
the function sets the destination register to null because it's no
longer used. However it was only doing this in cases where it was
adding a message header. This patch just moves setting the destination
so that it happens even if there's a messge header. In practice this
doesn't seem to make any difference but it's a bit cleaner.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Commit 94ee908448 added a header size parameter to the function to
create the LOAD_PAYLOAD instruction. However this broke
opt_sampler_eot which manually constructs the instruction and so
wasn't setting the header_size. This ends up making the parameters for
the send message all have the wrong location and it all falls apart.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This takes a different approach to previously, we cannot index into the
inputMapping with anything but the mesa attribute index, so we can't use
the just add one to index trick, we need more info to add one to it
after we've mapped the input.
(Fixed copy propgation and cleaned up a little)
v2: drop float64 format check, just attr->Doubles.
merge enable patch.
v3: cleanup code a bit.
v3.1: minor review fixups (comment, newline) (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support in the vbo and array code to handle
double vertex attributes.
v0.2: merge code to handle doubles in vbo layer.
v1: don't use v0, merge api_array elt code.
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec is vague all over the place about this, but this seems
to be the intent, we can probably make this optional later if
someone makes hw that cares and writes a driver.
Basically we need to double count some of the d types but
only for totalling not for slot number assignment.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
instead of doing the attempts at dual slot handling here,
let the backend do it.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Just more boilerplate stuff.
v2:
bad fallthrough on versioning,
this is my ugly but self contained solution (Ian)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This hack for fixing gl_FragDepth apparantly caused a GLSL shader
outputting a single double to try and output a dvec4, but we hadn't
assigned outputs for the secondary bit.
This avoids going into the hack code for scalar doubles.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Normally this is always needed but for internal blits and clears
we need to be able to disable it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reduces the number of conditions tested in if to one in case of
non-integer formats. Makes no functional changes.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
... and (a >= c) || (b >= c) as max(a, b) >= c.
Similar to commit 97e6c1b9.
total instructions in shared programs: 6182276 -> 6182180 (-0.00%)
instructions in affected programs: 6400 -> 6304 (-1.50%)
helped: 68
HURT: 4
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Helps the same set of programs as the previous commit.
instructions in affected programs: 4490 -> 4346 (-3.21%)
helped: 8
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Four shaders in Unreal 4's Sun Temple are helped, and gain SIMD16
because we avoid an integer multiplication.
instructions in affected programs: 2353 -> 2245 (-4.59%)
helped: 4
GAINED: 4
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
This enables EGL_KHR_fence_sync and EGL_KHR_wait_sync.
Below is the difference in piglit results, before and after this patch.
No regressions and several tests improve from 'skip' to 'pass'. Out of
EGL_KHR_fence_sync tests, two of the multithreaded tests skip; all other
tests pass.
cmdline: piglit run -p gbm -t sync tests/quick.py
mesa: master@1ac7db0
piglit: 4069bec
hw: Ivybridge
| before after
------+-------------
pass | 32 46
fail | 0 0
crash | 0 0
skip | 35 21
total | 67 67
v2:
- Set fence->signalled = true in brw_fence_has_completed() too.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm about to implement DRI2_Fenc in intel_syncobj.c. To prevent
madness, we need to prefix functions for GL_ARB_sync with 'gl' and
functions for DRI2_Fence with 'dri'. Otherwise, the file will become
a jumble of similiarly named functions.
For example:
old-name: intel_client_wait_sync()
new-name: intel_gl_client_wait_sync()
soon-to-come: intel_dri_client_wait_sync()
I wrote this renaming commit separately from the commit that implements
DRI2_Fence because I wanted the latter diff to be reviewable.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Don't pass NULL to drm_intel_bo_unreference(). It doesn't like that.
Bug found by code inspection.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Don't pass NULL to drm_intel_bo_unreference(). It doesn't like that.
Bug found by code inspection.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should make it more obvious in bug reports while also removing
any sort of guesswork for developers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Fix build error introduced with commit 1c5a57a "glapi/es3.1: Add support
for GLES versions > 3.0" with Python < 2.7.
File "src/mapi/glapi/gen/gl_genexec.py", line 230, in <module>
printer.Print(api)
File "src/mapi/glapi/gen/gl_XML.py", line 120, in Print
self.printBody(api)
File "src/mapi/glapi/gen/gl_genexec.py", line 187, in printBody
condition_parts.append('(ctx->API == API_OPENGLES2 && ctx->Version >= {})'.format(int(f.api_map['es2'] * 10)))
ValueError: zero length field name in format
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Having the wrong inferred type prevents a number of optimizations,
including constant propagation (since float immediates work differently
than integer immediates).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fix Clang return-type error introduced with commit
96f164f6f0 "gallium: make
pipe_context::begin_query return a boolean".
CC r600_query.lo
r600_query.c:443:3: error: non-void function 'r600_begin_query' should return a value [-Wreturn-type]
return;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This especially helps with NIR because we currently emit MOVs at the top
of the shader to copy from various ATTR registers to a giant VGRF array
of all inputs. (This could potentially be done better, but since
there's only ever one write to each register, it should be trivial to
copy propagate away...)
With NIR - only vertex shaders:
total instructions in shared programs: 3129373 -> 2889581 (-7.66%)
instructions in affected programs: 3119717 -> 2879925 (-7.69%)
helped: 20833
Without NIR - only vertex shaders:
total instructions in shared programs: 2745901 -> 2724483 (-0.78%)
instructions in affected programs: 693426 -> 672008 (-3.09%)
helped: 3516
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The effective_width field was an ill-concieved hack to get around issues in
the LOAD_PAYLOAD instruction. Now that the LOAD_PAYLOAD instruction is far
more sane, this field can die.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The newly reworked instruction is far more straightforward than the
original. Before, the LOAD_PAYLOAD instruction was lowered by a the
complicated and broken-by-design pile of heuristics to try and guess
force_writemask_all, exec_size, and a number of other factors on the
sources.
Instead, we use the header_size on the instruction to denote which sources
are "header sources". Header sources are required to be a single physical
hardware register that is copied verbatim. The registers that follow are
considered the actual payload registers and have a width that correspond's
to the LOAD_PAYLOAD's exec_size and are treated as being per-channel. This
gives us a fairly straightforward lowering:
1) All header sources are copied directly using force_writemask_all and,
since they are guaranteed to be a single register, there are no
force_sechalf issues.
2) All non-header sources are copied using the exact same force_sechalf
and force_writemask_all modifiers as the LOAD_PAYLOAD operation itself.
3) In order to accommodate older gens that need interleaved colors,
lower_load_payload detects when the destination is a COMPR4 register
and automatically interleaves the non-header sources. The
lower_load_payload pass does the right thing here regardless of whether
or not the hardware actually supports COMPR4.
This patch commit itself is made up of a bunch of smaller changes squashed
together. Individual change descriptions follow:
i965/fs: Rework fs_visitor::LOAD_PAYLOAD
We rework LOAD_PAYLOAD to verify that all of the sources that count as
headers are, indeed, exactly one register and that all of the non-header
sources match the destination width. We then take the exec_size for
LOAD_PAYLOAD directly from the destination width.
i965/fs: Make destinations of load_payload have the appropreate width
i965/fs: Rework fs_visitor::lower_load_payload
v2: Don't allow the saturate flag on LOAD_PAYLOAD instructions
i965/fs_cse: Support the new-style LOAD_PAYLOAD
i965/fs_inst::is_copy_payload: Support the new-style LOAD_PAYLOAD
i965/fs: Simplify setup_color_payload
Previously, setup_color_payload was a a big helper function that did a
lot of gen-specific special casing for setting up the color sources of
the LOAD_PAYLOAD instruction. Now that lower_load_payload is much more
sane, most of that complexity isn't needed anymore. Instead, we can do
a simple fixup pass for color clamps and then just stash sources
directly in the LOAD_PAYLOAD. We can trust lower_load_payload to do the
right thing with respect to COMPR4.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a new is_copy_payload helper to fs_inst that takes the
place of the similarly named functions in cse and register coalesce. The
two is_copy_payload functions in CSE and register coalesce were subtly
different and potentially subtly broken. The new version unifies the two
and should be more correct.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we had a special case for uniforms and immediates and then a
bunch of asserts for various other pessimal things. This commit changes it
so that it explicitly does something on each register file. Some of them
are disallowed and others are treated properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Was off-by-one. llvm says inserting an element with an index higher than the
number of elements yields undefined results. Previously such inserts were
ignored but as of llvm revision 235854 the vector gets replaced with undef,
causing failures.
This fixes piglit gl-3.2-layered-rendering-gl-layer, as mentioned in
https://llvm.org/bugs/show_bug.cgi?id=23424.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
This was missing from my patchset to support the query-related entry
points of Direct State Access.
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Increases pass rate of ES31-CTS.*program_interface_query* tests
when run with MESA_EXTENSION_OVERRIDE='GL_ARB_compute_shader'. Many
of the negative tests that happen to use compute stage in queries
start passing.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Other drivers which want to enable this extension must expose groups of
GPU hardware performance counters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
This patch defines "Driver statistics" and "MP counters" groups, but
only the latter will be exposed through GL_AMD_performance_monitor.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
This is based on the original patch of Christoph Bumiller.
v2 (Samuel Pitoiset):
- improve Gallium interface for this extension
- rewrite some parts of the original code
- fix compilation errors and piglit tests
v3:
- only enable this extension when the underlying driver expose GPU counters
- get rid of the ring buffer of queries
v4:
- add a debug message when the maximum number of counters has been
reached
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
This allows queries to return different numeric types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
According to the spec of GL_AMD_performance_monitor, valid type values
returned are UNSIGNED_INT, UNSIGNED_INT64_AMD, PERCENTAGE_AMD, FLOAT.
This also introduces the new field group_id in order to categorize
queries into groups.
v2: add PIPE_DRIVER_QUERY_TYPE_BYTES
v3: fix incorrect query type for radeon and svga drivers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Driver queries are organized as a single hierarchy where queries are
categorized into groups. Each group has a list of queries and a maximum
number of queries that can be sampled. The list of available groups can
be obtained using pipe_screen::get_driver_query_group_info.
This will be used by GL_AMD_performance monitor.
v2: add group type (CPU/GPU)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Trivial. Fixes the following compiler warning (from GCC 5.1.0):
brw_context.c:629:10: warning: type defaults to ‘int’ in declaration
of ‘simd_size’ [-Wimplicit-int]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This was present in Eric's initial implementation of the compaction code
for Sandybridge (commit 077d01b6). There is no documentation saying this
is necessary, and removing it causes no regressions in piglit on any
platform.
Some application, such as drm backend of weston, uses XRGB8888 config as
default. i965 doesn't provide this format, but before commit 65c8965d,
the drm platform of EGL takes ARGB8888 as XRGB8888. Now that commit
65c8965d makes EGL recognize format correctly so weston won't start
because it can't find XRGB8888. Add XRGB8888 format to i965 just as
other drivers do.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89689
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Commit 7e414b5864 broke the gl_FragData array
into separate gl_FragData[i] variables, so drivers can eliminate useless
writes to gl_FragData improving their performance.
The problem occurs when GLSL IR code is linked in the following case:
* The FS output variable base data type does not match gl_FragData one (float
vector)
* The FS output variable is replaced by gl_out_FragDataX because of commit
7e414b5864 with X from 0 to GL_MAX_DRAW_BUFFERS.
Then the FS output variable base data type is lost in the resulting GLSL IR,
making that the driver does a wrong assignment to gl_out_FragData components
because of unmatching data types.
This patch reverts the fragdata array lowering when the output var base data type
doesn't match gl_out_FragData, i.e., when output variable base data type is
not a float or a float vector.
This patch fixes 250 dEQP tests (tested in an Intel Haswell machine)
dEQP-GLES3.functional.fragment_out.random.* (22 failed tests)
dEQP-GLES3.functional.fragment_out.array.uint.* (120 failed tests)
dEQP-GLES3.functional.fragment_out.array.int.* (108 failed tests)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Skylake it is possible to choose your own alignment values for
compressed textures but they are expressed as a multiple of the block
size. The minimum alignment value we can use is 4 so we effectively
have to align to 4 times the block size. This patch makes it initially
set mt->align_[wh] to the large alignment value and then later divides
it by the block size so that it can be uploaded as part of the surface
state.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
After talking to Jon Leech he suggested this should be fine.
update spec to the version in the registry.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Make the checks in the Python script and the generated code more generic
to support arbitrary GLES versions >= 2.0.
The updated dispatch_sanity.cpp test discovered this problem. Without
this, the next patch would erroneously enable GLES 3.1 functions in GLES
2.0 and GLES 3.0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Currently no 3.10 ES features (beyond 3.00 ES) are enabled. That will
come later.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Along with a couple secondary goals, the dispatch sanity test had two
major, primary goals.
1. Ensure that all functions part of an API version are set in the
dispatch table.
2. Ensure that functions that cannot be part of an API version are not
set in the dispatch table.
Commit 4bdbb58 removed the tests ability to fulfill either of its
primary goals by removing anything that used _mesa_generic_nop(). It
seems like the problem on Windows could have been resolved by adding the
NULL context pointer check from nop_handler to _mesa_generic_nop().
There is, however, some debugging benefit to actually getting the
(supposed) function name logged in the "unsupported function called"
message.
The preceding commit added a function, _glapi_new_nop_table, that
allocates a table of per-entry point no-op functions. Restore the
ability to actually validate the sanity of the dispatch table by using
_glapi_new_nop_table.
Previous to this commit removing a function from one of the
*_functions_possible lists would not cause the test to fail. With this
commit removing such a function will result in failure, as is expected.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ARB_gpu_shader5 requires sampler array indexing expressions to be
dynamically uniform, this however doesn't have any implications on the
control flow that leads to the evaluation of that expression being
uniform. Use emit_uniformize() to obtain an arbitrary live value from
the binding table index calculation instead of assuming that the first
channel is always live.
Fixes the following Piglit test cases:
arb_gpu_shader5/execution/sampler_array_indexing/fs-nonuniform-control-flow.shader_test
arb_gpu_shader5/execution/sampler_array_indexing/vs-nonuniform-control-flow.shader_test
part of the series:
http://lists.freedesktop.org/archives/piglit/2015-February/014615.html
Reviewed-by: Matt Turner <mattst88@gmail.com>
ARB_gpu_shader5 requires UBO array indexing expressions to be
dynamically uniform, this however doesn't have any implications on the
control flow that leads to the evaluation of that expression being
uniform. Use emit_uniformize() to obtain an arbitrary live value from
the binding table index calculation instead of assuming that the first
channel is always live.
Fixes the following Piglit tests:
arb_gpu_shader5/execution/ubo_array_indexing/fs-nonuniform-control-flow.shader_test
arb_gpu_shader5/execution/ubo_array_indexing/vs-nonuniform-control-flow.shader_test
part of the series:
http://lists.freedesktop.org/archives/piglit/2015-February/014616.html
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Save some CPU cycles by doing 'return progress' rather than
'depth++' in the discard jump special case.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This instruction calculates the index of an arbitrary channel enabled
in the current execution mask. It's expected to be used as input for
the BROADCAST opcode, but it's implemented as a separate instruction
rather than being baked into BROADCAST because FIND_LIVE_CHANNEL has
no dependencies so it can always be CSE'ed with other instances of the
same instruction within a basic block.
v2: Whitespace fixes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The BROADCAST instruction picks the channel from its first source
given by an index passed in as second source. This will be used in
situations where all channels from the same SIMD thread have to agree
on the value of something, e.g. a surface binding table index.
This is in particular the case for UBO, sampler and image arrays,
which can be indexed dynamically with the restriction that all active
SIMD channels access the same index, provided to the shared unit as
part of a single scalar field of the message descriptor. Simply
taking the index value from the first channel as we were doing until
now is incorrect, because it might contain an uninitialized value if
the channel had previously been disabled by non-uniform control flow.
v2: Minor style fixes. Improve commit message.
Reviewed-by: Matt Turner <mattst88@gmail.com>
And rename _mesa_glsl_parse_state::early_fragment_tests to
fs_early_fragment_tests for consistency with other FS-specific flags in the
same struct.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Image memory qualifiers (coherent, volatile, restrict, readonly and writeonly)
follow slightly different rules from storage qualifiers, e.g. the uniqueness
rule doesn't apply. Make them a separate non-terminal.
Reviewed-by: Matt Turner <mattst88@gmail.com>
There's no indication in the spec that the image unit state other than the
bound texture object shouldn't be updated when glBindImageTexture() is called
passing the zero texture as argument. It's very unlikely that any application
would ever have relied on this, but it's easy to get right, and it fixes the
"state" ARB_shader_image_load_store piglit test.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is the required initial image unit state according to "Table 23.45. Image
State (state per image unit)" of the OpenGL 4.3 specification.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This matches what _mesa_BindImageTextures() does. The derived image format
(gl_texture_image::TexFormat) isn't necessarily equivalent to the internal
format of the texture image. If a forbidden internal format has been
specified we need to mark the image unit as invalid as required by the spec,
regardless of the derived format. Fixes the "invalid"
ARB_shader_image_load_store piglit test.
Reviewed-by: Matt Turner <mattst88@gmail.com>
gl_texture_object::_MaxLevel doesn't have any meaningful value until
_mesa_test_texobj_completeness() has been run. Fixes the "level"
ARB_shader_image_load_store piglit test.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This function will be useful for back-ends to translate an image internal
format as specified in GLSL code into a mesa format.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is consistent with the untyped surface read opcode. From now on
all typed and untyped surface access opcodes will follow the same
pattern: src[0] will be the message payload, src[1] will be the
surface index and src[2] will be a control immediate (atomic operation
for atomic opcodes and number of vector components for surface read
and write opcodes).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The generate_untyped_*() methods do nothing useful other than calling
the corresponding function from brw_eu_emit.c. The calls to
brw_mark_surface_used() will go away too in a future commit.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Change brw_untyped_atomic() and brw_untyped_surface_read() to take the
surface index as a register instead of a constant and to use
brw_send_indirect_message() to emit the indirect variant of send with
a dynamically calculated message descriptor. This will be required to
support variable indexing of image arrays for
ARB_shader_image_load_store.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Tested on Ivybridge, Haswell and Broadwell.
v2:
* Use SET_FIELD. (Ken)
* Use simd_size / 16 to support SIMD8/16/32. Ken suggested
that we might be able to do it arithmetically rather than just
supporting SIMD8 and SIMD16 with a conditional.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2:
* Don't bother checking for 'gen > 5' (krh)
* Populate sampler data in key (krh)
v3:
* Drop no8 support, and simplify code in several places (Ken)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For ES, we set the max counts based on SIMD8, which is currently
accurate.
For desktop GL, we set the max counts based on SIMD16, which can fail
in some cases where a SIMD16 program is not currently supported.
Therefore, this value is not currently accurate, but will work fine in
many cases, and lets us run more test cases. Eventually we want to
always be able to generate a SIMD16 program.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2:
* Clean out some unneeded code copied from run_fs (krh)
* Always use NIR
* Split shader time out into a separate commit
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2:
* Don't rely on brw_eu* to generate the send instruction. We now
generate the send here, and drop the "i965/cs: Add support for the
SEND message that terminates a CS thread" brw_eu* patch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2:
* Do more work at the visitor level. g0 is loaded and sent to the
generator now.
v3:
* Use Ken's comment explaining g0 usage
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At the moment it's not wired up to anything. Later patches will hook
it up to the compute shader back-end.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If a send message is emitted with a message length that is less than
required for the message then the remaining parameters default to
zero. We can take advantage of this to save a register when a shader
passes constant zeroes as the final coordinates to the sample
function.
I think this might be useful for GLES applications that are using 2D
textures to simulate 1D textures.
On Skylake it will be useful for shaders that do
texelFetch(tex,something,0) which I think is fairly common. This helps
more on Skylake because in that case the order of the instruction
operands are u,v,lod,r which is good for 2D textures whereas before
they were u,lod,v,r which is only good for 1D textures.
On Haswell:
total instructions in shared programs: 8535730 -> 8533261 (-0.03%)
instructions in affected programs: 236968 -> 234499 (-1.04%)
helped: 1174
On Skylake:
total instructions in shared programs: 10345646 -> 10341237 (-0.04%)
instructions in affected programs: 293011 -> 288602 (-1.50%)
helped: 1218
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Applied suggestions by Kenneth Graunke:
- Only apply on Gen5+
- Apply to all texture opcodes, not just TEX and TXF.
Moved the optimisation into the loop as suggested by Matt Turner.
Fix the array index when there is a header.
On Gen9+ there needs to be a header when sampling using SIMD4x2. The
header is set up by copying from the g0 register. Commit 07c571a39f
tried to fix this mov instruction to always use an exec size of 8
because previously it was incorrectly using 4. It did this by casting
the type of the destination register to vec8. This was done because
there is code in brw_set_dest to guess the exec size based on the
width of the dest register. However I misunderstood how this works
because it is actually only used when the width is less than 8. That
means the patch actually changed it to use the default exec size which
on SIMD16 would be 16 and the MOV would clobber over the first
register in the send message. This patch makes it additionally set the
default exec size to 8. This is similar to how the message is set up
in fs_generator::generate_tex.
I think this wasn't picked up by any Piglit tests because we don't
have any fragment shaders that hit this code path so nothing was using
SIMD16. However the patch caused failures in deqp tests.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90153
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
The stage_abbrev and stage_name fields in backend_visitor provide what
we need without any additional effort. It also means we'll get the
right names for compute shaders, SIMD8 geometry shaders, and both kinds
of tessellation shaders.
This does unfortunately change the capitalization of the stage
abbreviation in the INTEL_DEBUG=optimizer output filenames. It doesn't
seem worth adding code to handle, though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2: - move interop.cpp to clover/api
- change intptr_t to void* in the interface
- add a virtual function fence() to simplify some code
v3: - use bool in the interface
v4: - enclose the last two interop functions in try..catch
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
src1 would contain the predicate, which would get emitted as a register
source by an undiscerning srcId helper. Work around this in the same way
as in emitTEX.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Original blorp writes only one buffer per shader invocation. Once
the launch mechanism is shared with glsl-based programs there will
be need for supporting multiple render targets.
Also drop the always constant color write disable settings.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Note that the magic number of one in gen7 logic is replaced by
BRW_SF_URB_ENTRY_READ_OFFSET ( == 1 also) for clarity.
On gen6 the change from zero to one (BRW_SF_URB_ENTRY_READ_OFFSET)
has no effect for native blorp as blorp doesn't use any
additional attributes. In fact, regular pipeline setup always
uses BRW_SF_URB_ENTRY_READ_OFFSET even when there are no additional
attributes. Hence the change makes the two (blorp and regular)
consistent.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This was still needed when we had support for blorp clears but now
this is fixed to nop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now the uploading depends only on the input parameters instead
of consulting the current gl-state.
v2: Rebased on top of sampler count clamping
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Read and write parts of the state stage are also split into
explicit arguments allowing future patches to use constant
program data.
v2 (Ken): s/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Note that brw_update_renderbuffer_surfaces() already had a helper
variable which was used in parallel to direct access of the current
draw buffer of the context.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Notice that in gen7_wm_surface_state.c there is also indentation
change in the surrounding code removing tabs.
v2 (Matt): Fixed whitespace: tabs -> spaces
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The value is actually clamped to 0-16 as sample state pointer
can be used to support more than 16 samplers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In response to another patch, Emil asked for some clarification how this
stuff works. Rather than just reply to the e-mail, I decided to update
the exlanation in the code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
The opt_sampler_eot optimisation of fs_visitor effectively assumes
that it is running on a fragment shader because it casts the program
key to a brw_wm_prog_key. However on Skylake fs_visitor can also be
used for vertex shaders. It looks like this usually works anyway
because the optimisation is skipped if key->nr_color_regions != 1.
However for a vertex shader the key is actually a brw_vs_prog_key so
the space for nr_color_regions is probably taken up by
key->base.program_string_id. This can end up making nr_color_regions
be 1 in which case the function will later assert when the last
instruction is not FS_OPCODE_FB_WRITE. This was making the DEQP test
suite assert. Presumably this only happens there because that compiles
a lot of shaders so it would end up with a high value for
program_string_id.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously binding an unitialized managed texture
was causing a crash, and a workaround was added to
prevent the crash.
This patch removes this workaround and instead set the initial
state of managed textures as dirty, so that when the texture is bound
for the first time, it is always initialized.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
For D3DUSAGE_AUTOGENMIPMAP textures, applications can only
lock/copy from/get surface descriptor for/etc the first level.
Thus it makes sense to restrict the LOD to 0, and use only the first
level to generate the sublevels.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
That part of the code was quite obscure.
This new implementation tries to make it clearer
by separating the differents parts, and commenting more.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Remove the Surface9 code for dirty rects, used only for Managed
resources. Instead convey the information to the parent texture.
According to documentation, this seems to be the expected behaviour,
and if documentation is wrong there, that's not a problem since it can
only leads to more texture updates in corner cases.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Some applications assume the memory for multilevel
textures is allocated per continuous blocks.
This patch implements that behaviour.
v2: cache offsets
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This code was supposed to be removed, but a rebase seems to have
made it stay.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous code was trying to optimise to call set_vertex_buffers on
big packets, and thus avoids as many calls as possible.
However in practice doing so won't be faster (drivers implement
set_vertex_buffers by a loop over the buffers we want to bind)
When we want to unbind a buffer, we were calling set_vertex_buffers
on a buffer with vtxbuf->buffer = NULL. It works on some drivers,
but not on all of them, because it isn't in Gallium spec.
This patch fixes that.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Ignore D3DUSAGE_QUERY_POSTPIXELSHADER_BLENDING when
D3DUSAGE_RENDERTARGET is not specified.
This behaviour matches windows drivers.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Avoid blocking when retrieving D3DQUERYTYPE_TIMESTAMP result with
NineQuery9_GetData(), when D3DGETDATA_FLUSH is not specified.
This mimics Win behaviour and gives slightly better performance
for some games.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
D3DQUERYTYPE_TIMESTAMPFREQ is supposed to give the frequency
at which the clock of D3DQUERYTYPE_TIMESTAMP runs.
PIPE_QUERY_TIMESTAMP returns a value in ns, thus the corresponding
frequency is 1000000000.
PIPE_QUERY_TIMESTAMP_DISJOINT returns the frequency at which
PIPE_QUERY_TIMESTAMP value is updated. It isn't always
1000000000.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
As on wined3d and windows, when D3DCREATE_FPU_PRESERVE is not
specified, change the fpu control word to all exceptions masked,
single precision, round to nearest.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
No major vendor advertises it, and we weren't supporting it.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This adds an additional check to make sure the bound depth-buffer doesn't
exceed the rendertarget size when clearing depth and color buffer at once.
D3D9 clears only a rectangle with the same dimensions as the viewport, leaving
other parts of the depth-buffer intact.
This fixes failing WINE test visual.c:depth_buffer_test()
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
It's returning random values, because RESOURCE_VAR() is casting
different objects into ir_variable pointers.
This updates _mesa_count_active_attribs to filter the resources with the
same logic used in _mesa_longest_attribute_name_length.
https://bugs.freedesktop.org/show_bug.cgi?id=90207
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
v2: add commments for limitation of max references numbers,
and what the caculation is based
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
When there is a colormask active that does not cover all the channels,
enable reading in the destination like with a combining blend
operation. This fixes fbo-blending-formats on a3xx.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Enables ARB_depth_buffer_float. There is no sampling support for
interleaved Z32F_S8, so we store the two textures separately, one as
Z32F, the other as S8. As a result, we need a lot of additional logic
for restores and transfers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
32-bit depth buffers are stored as unorm, and thus need special handling
when moving to and from gmem. They are copied into gmem by writing
depth, and resolved from gmem using a special resolve bit which
apparently float-ifies the data.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Code generation is not allowed to fail for any reason - in fact,
fs_generator has no mechanism for failing. The visitor is responsible
for that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have to use W/UW type for src1 of the multiply in the MUL/MACH macro,
but in order to read the low 16-bits of each 32-bit integer, we need to
set the appropriate stride.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 9f5e5bd34d.
I have no idea what made me believe these didn't apply to Gen > 7. They
do, and without them we generate bad code that causes failures on Gen 8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In some situations it is convenient for a driver to expose a higher GLSL
version while some extensions are still incomplete. However in that
situation, it would report a GLSL version that was higher than the GL
version. Avoid that situation by limiting the GLSL version to the GL
version.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently we're producing errors like
User error: GL_INVALID_OPERATION in glglDeleteProgramsARB(invalid call)
And noop_warn appears to be called with the full function name. Don't
prepend a gl prefix.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Extends the syntax of GALLIUM_HUD environment variable to:
- Add options to set the size and exact location of each pane.
- Add an option to limit the maximum allowed value of the X axis on a
pane, clamping the graph down to not go above this value.
- Add an option to auto-adjust the value of the Y axis down to the
highest value still visible on the graph.
v2:
- Make the patch simpler and smaller.
- With dynamic auto-adjusting on, adjust the Y axis once per pane
update instead of updating once every several seconds.
- No longer mishandle pane height when having more than one graph per
pane.
This makes INTEL_DEBUG=perf report shader recompiles due to CMS vs.
UMS/IMS differences and Sandybridge textureGather workarounds.
Previously, we just flagged them as "Something else".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Previously, sampler messages were decoded as
sampler (1, 0, 2, 2) mlen 6 rlen 8 { align1 1H };
I don't know how much time we've collectly wasted trying to read this
format. I can never recall which number is the surface index, sampler
index, message type, or...whatever that other number is. Figuring out
the message name from the numerical code is also painful.
Now they decode as:
sampler sample_l SIMD16 Surface = 1 Sampler = 0 mlen 6 rlen 8 { align1 1H };
This is easy to read at a glance, and matches the format I used for
render target formats.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Add the 4.0/4.1/4.2 extensions lists to compute_version. A couple of
extensions aren't in mesa yet, so those are marked with 0 until they
become supported.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Main motivation here is to get rid of iterating IR and
encapsulate queries within program resources.
No functional changes.
Piglit tests calling the modified functionality:
- gl-get-active-attrib-returns-all-inputs
- glsl-1.50-get-active-attrib-array
- getactiveattrib
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
The INTEL_DEBUG variable is a uint64_t and if we want a enum value higer
than 32 bits, you need to use ull. We might as well use it for all of them.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The BLT engine on Gen8+ requires linear surfaces to be cacheline
aligned. This restriction was added as part of converting the BLT to
use 48-bit addressing.
intel_emit_linear_blit needs to handle blits that are not cacheline
aligned, as we use it for arbitrary glBufferSubData calls and subrange
mappings.
Since intel_emit_linear_blit uses 1 byte per pixel, we can use the src/dst
pixel X offset field to represent the unaligned portion, and subtract
that from the address so it's cacheline aligned.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88521
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
File glapi_entrypoint.c calls memcpy() function, but does not include
string.h header. So compilation can fail at error: implicit declaration
of function 'memcpy'.
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
This code is only used when our memory debugging wrappers are enabled,
as we use the C runtime functions directly elsewhere.
Tested llvmpipe on Windows w/ memory debugging enabled.
VMware PR894263.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We were resetting the prim id count for each run of the prim assembler,
hence this only worked when the draw calls were very small (the exact limit
depending on the vertex size), since larger draw calls get split up.
So, do the same as we do already if there's a gs, reset it to zero explicitly
for every new instance (this possibly could use the same variable but that
isn't doable without some heavy refactoring and I'm not sure it makes sense).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90130.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
The nir_lower_source_mods pass does a weak form of copy propagation to
clean up all of the mov-with-negate's that get generated. However, we
weren't properly checking that the sources were SSA and so we could end up
moving a register read which is not, in general, valid.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The old code wasn't correctly handling the case where the new value of the
source contains an indirect.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, this function returned the number of elements for structures
and arrays and 0 for everything else. In NIR, this is almost never what
you want because we also treat matricies as arrays so you have to
special-case constantly. This commit glsl_get_length treat matrices
as an array of columns by returning the number of columns instead of 0
This also fixes a bug in locals_to_regs caused by not checking for the
matrix case in one place.
v2: Only special-case for matrices and return a length of 0 for vectors as
we did before. This was needed to not break the TGSI-based drivers and
doesn't really affect NIR at the moment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
copy drivers from the stencil_texturing list,
softpipe is definitely broken for stencil texturing
since it uses float, but I'll look at that later.
v2.1: update relnotes
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
if we support stencil texturing, enable texture_stencil8
there is no requirement to support native S8 for this,
the texture can be converted to x24s8 fine.
v2: fold fixes from Marek in:
a) put S8 last in the list
b) fix renderable to always test for d/s renderable
fixup the texture case to use a stencil only format
for picking the format for the texture view.
v3: hit fallback for getteximage
v4: put s8 back in front, it shouldn't get picked now (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Parts of this were implemented previously, so finish it off.
v2: fix getteximage falling into the integer check
add fixes for the FBO paths, (fbo-stencil8 test).
v3: fix getteximage path harder.
v4: remove swapbytes from getteximage path (Ilia)
v5: brown paper bag the swapbytes removal. (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This code was added by Brian Paul in 2009 but, as far as Matt and I can
tell, it's been dead ever since the new GLSL compiler was added.
Reviewed-by: Brian Paul <brianp@vmware.com>
This name better matches what it's actually used for. The patch was
generated with the following command:
for file in *; do
sed -i -e s/brw_compile/brw_codegen/g $file
done
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since commit 2881b123, we have used 0/~0 for representing booleans on all
gens. However, we still had a bunch of places in the visitor code where we
were still referring to ctx->Const.UniformBooleanTrue. Since this is
always ~0, we can just remove them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This also involves moving revision checking to screen creation time and
passing that into brw_get_device_info so that we can get the right
device_info for early versions of SKL. Since the only place we used
revision was to check for SIMD16 3-src instruction support, it's safe to
remove the revision field from brw_context.
Reviewed-by: Matt Turner <mattst88@gmail.com>
In future tests, we will start relying on devinfo and not just brw in the
compiler. Changing this now keeps these tests from failing in the future.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It wasn't really being used anyway. We used it to assert that gpu_shader5
is supported in the back-end but that should be caught by the front-end.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The bit to enable .z is still commented out, as it is triggering gpu
hangs in 0ad. But at least gl_FragCoord.w works now, and we know what
bits we are *supposed* to set for .z (with that uncommented all piglit
fragcoord tests are passing).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Update formats table with new formats that Ilia has figured out, and fix
sampling from srgb texture and integer vbo's.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This should be more efficient than the previous snprintf() solution.
But more importantly, it avoids a buffer overflow bug that could result
in crashes or unpredictable results when processing very large interface
blocks.
For the app in question, key->length = 103 for some interfaces. The check
if size >= sizeof(hash_key) was insufficient to prevent overflows of the
hash_key[128] array because it didn't account for the terminating zero.
In this case, this caused the call to hash_table_string_hash() to return
different results for identical inputs, and then shader linking failed.
This new solution also takes all structure fields into account instead
of just the first 15 when sizeof(pointer)==8.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If an app called glTextureBarrierNV() without checking if the
extension was available, we'd crash with some gallium drivers
in st_TextureBarrier() because the pipe_context::texture_barrier()
pointer was NULL.
Generate GL_INVALID_OPERATION instead.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
- Do not attempt to create the save folder twice - both dir $@ and
PRIVATE_LOCALEDIR point to the same place.
- Use @ and $(hide), for mkdir and python, to avoid spamming the
output.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
v2 [Emil Velikov]
- Keep the PRIVATE_LOCALEDIR variable.
- Do not use $(@D) but the more widespead $(dir $@)
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Android 5.0 allows modules to generate source into $OUT/gen, which will
then be copied into $OUT/obj and $OUT/obj_$(TARGET_2ND_ARCH) as necessary.
Modules will need to change calls to local-intermediates-dir into
local-generated-sources-dir.
The patch changes local-intermediates-dir into local-generated-sources-dir.
If the Android version is less than 5.0, fallback to local-intermediates-dir.
The patch also fixes the 64-bit building issue of Android 5.0.
v2 [Emil Velikov]
- Keep the LOCAL_UNSTRIPPED_PATH variable.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Define _GNU_SOURCE to enable features (__USE_XOPEN2K and __USE_UNIX98)
required to build the host binaries.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Commit dd6f641303c(mesa: Build with subdir-objects.) removed the SRCDIR
variable, but forgot to update all references of it.
v2: Fix path - must be relative to LOCAL_PATH. (Chih-Wei)
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Required by the i965 driver.
v2:
- Split out the nir_builder_opcodes.h rules.
- Do not unconditionally hide the python command - use $(hide)
- Use LOCAL_EXPORT_C_INCLUDE_DIRS to manage includes for the generated
sources.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
[Emil Velikov: Split from a larger commit, v2]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Similar to e8c5cbfd921(mesa: Add gallium include dirs to more parts of
the tree.)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
... via local_shared_libraries. Otherwise the sync/sync.h header won't
be found.
Note: 10.5 and earlier will need similar change in st/egl.
v2: Append the library to the local_shared_libraries list. (Chih-Wei)
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
... to manage the LIBDRM*_CFLAGS. The former is the recommended approach
by the Android build system developers while the latter has been
depreciated for quite some time.
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
I missed the fact that the ARB_fragment_program SWZ instruction allows
per-component negation. To fix this, move Abs/Negate handling into both
the simple case and the SWZ case's per-component loop.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90000
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The X and Y values come interleaved in g1 (.4-.11 inclusive), so we can
calculate them together with a single add(32) instruction on some
platforms like Broadwell and newer or in SIMD8 elsewhere.
Note that I also moved the PIXEL_X/PIXEL_Y virtual opcodes from before
LINTERP to after it. That's because the writes_accumulator_implicitly()
function in backend_instruction tests for <= LINTERP for determining
whether the instruction indeed writes the accumulator implicitly. The
old FS_OPCODE_PIXEL_X/Y emitted ADD instructions, which did, but the new
opcodes just emit MOVs, which don't. It doesn't matter, since we don't
use these opcodes on Gen4/5 anymore, but in the case that we do...
On Broadwell:
total instructions in shared programs: 7192355 -> 7186224 (-0.09%)
instructions in affected programs: 1190700 -> 1184569 (-0.51%)
helped: 6131
On Haswell:
total instructions in shared programs: 6155979 -> 6152800 (-0.05%)
instructions in affected programs: 652362 -> 649183 (-0.49%)
helped: 3179
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This lets SIMD16 programs on G45 and Gen5 use the PLN instruction.
On Ironlake:
total instructions in shared programs: 5634757 -> 5518055 (-2.07%)
instructions in affected programs: 1745837 -> 1629135 (-6.68%)
helped: 11439
HURT: 4
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These were used only on Gen4 and 5. emit_interpolation_setup_gen6() emits
ADDs directly. The virtual opcodes weren't providing anything useful.
I'm going to repurpose these opcodes, so deleting and readding them makes
it simpler to see what's going on.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Like LINE (commit 92346db0), src0 must have a scalar region. Setting
src1's region to <8,8,1> lets us pass a properly sized combined delta_xy
argument in a few commits without getting a bogus <16,16,1> region.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
LINTERP's src0 is PLN's src1, and PLN's src1 reads exec_size / 4
registers.
Having that information lets us drop the delta_x/y special case code in
split_virtual_grfs().
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
guess_execution_size() does two things:
1. Cope with small destination registers.
2. Cope with SIMD8 vs SIMD16 mode.
This patch replaces the first with a simple if block in brw_set_dest: if
the destination register width is less than 8, you probably want the
execution size to match. (I didn't put this in the 3src block because
it doesn't seem to matter.)
Since only the FS compiler cares about SIMD16 mode, it's easy to just
set the default execution size there.
This pattern was already been proven in the Gen8+ generator, but we
didn't port it back to the existing generator when we combined the two.
This is based on a patch from Ken from about a year ago. I've rebased it
and and fixed a few bugs.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Commit 34df5eb introduced regression to GetActiveUniformBlockiv
when querying one of the following properties:
GL_UNIFORM_BLOCK_ACTIVE_UNIFORMS
GL_UNIFORM_BLOCK_ACTIVE_UNIFORM_INDICES
Implementation counted all uniforms in ubo directly while query should
check first if the uniform in question is _active_.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90109
Reviewed-By: Martin Peres <martin.peres@linux.intel.com>
On Skylake the qpitch value is uploaded as part of the surface state
so we don't need to add the extra rows that are done for other
generations. However for 3D textures it needs to be aligned to the
tile height and for depth/stencil textures it needs to be a multiple
of 8. Unlike previous generations the qpitch is measured as a multiple
of the block size for compressed surfaces. When the horizontal mipmap
layout is used for 1D textures then the qpitch is measured in pixels
instead of rows.
v2: Align the depth/stencil textures to a multiple of 8
v3: Add an assert that ALL_SLICES_AT_EACH_LOD is not used. Ignore the
vertical alignment when picking the qpitch for 1D_ARRAY textures.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The render surface state command for Skylake doesn't have the surface
array spacing bit so it's not possible to select this layout. I think
it was only used in order to make it pick a tightly-packed qpitch
value that doesn't include space for the mipmaps. However this won't
be necessary after the next patch because it will automatically pick a
packed qpitch value whenever first_level==last_level. It is better to
remove this layout entirely on Gen8+ because although it can
effectively be implemented with a small qpitch value when there are no
mipmaps it isn't possible to support the case where there are mipmaps
because in that case the layout is very different.
It could be good to make a similar change for Gen8 if we also change
the layouting code to pick the qpitch value in a similar way.
v2: Make the commit message and comments more convincing
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Clover not longer compile with llvm <= 3.5.0 since e1d363b3.
e1d363b3 implies c++11 and llvm 3.5.0 CXXFLAGS provided it.
No one seems to have noticed it, it's now official.
Acked-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
LLVM removed JITEmitDebugInfo from TargetOptions since they weren't used
v2: Be consistent with the LLVM version check (Aaron Watry)
Signed-off-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
This allows drivers to provide consistent flat shading for quads.
Otherwise a driver that only supported tris would have to force last
provoking vertex when drawing quads (and would have to say that quads
don't follow the provoking vertex convention).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This should match to how drivers program hardware. flatshade relates to
whether color inputs are interpolated, not the provoking vertex
convention.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
We haven't implemented proper unsynchronized map support on !LLC systems
(pre-SNB, Atom). MapBufferRange with GL_MAP_UNSYNCHRONIZE_BIT will
actually do a synchronized map, probably killing performance.
Also warn on BufferSubData, when we should be doing an unsynchronized
upload, but instead have to do a synchronous map.
v2: Only complain if the buffer is actually busy - we use unsynchronized
maps internally for vertex upload and such, but expect those to not
be busy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Jason noticed that shader_time was bumping the reference count on the
gl_shader_program and gl_program structures, in code called during
compilation.
Not only were these never unreferenced, but it meant fragment shaders
might be referenced twice (SIMD8 and SIMD16)...or only once.
We don't actually need the programs. We just need their numeric ID and
their language (GLSL/ARB/FF) or KHR_debug label. If there's a label, we
have to strdup it since the underlying program could be deleted.
To be fair, we're not exactly cleaning that up either, but we at least
ralloc it out of the shader_time arrays, so if we ever bother cleaning
those up, they'll go away properly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
It is true that a gl_shader_program with ID 0 will be a fixed-function
fragment program; a gl_program with ID 0 but NULL gl_shader_program
means that it's a fixed-function vertex shader.
But that's not terribly interesting or relevant to what we're doing.
We just need to know that ID 0 means "fixed function".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
0 is not a valid GLSL shader or ARB program ID. For some reason,
shader_time used -1 instead...so we had code to detect 0, then override
it to -1.
We can just delete that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This enables using _mesa_meta_pbo_TexSubImage() to upload data
to R16G16B16X16 texture. Earlier it fell back to slower paths.
Jenkins run shows no piglit regressions.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
SEL and MOV instructions, as long as they don't have source modifiers, are
just copying bits around. This commit adds support to copy propagation to
switch the type of a SEL or MOV instruction as needed so that it can
propagate source modifiers. This is needed because NIR generates integer
SEL and MOV instructions whenver it doesn't know what else to generate.
shader-db results with NIR:
total FS instructions in shared programs: 4360910 -> 4360186 (-0.02%)
FS instructions in affected programs: 59094 -> 58370 (-1.23%)
helped: 341
HURT: 0
GAINED: 2
LOST: 0
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There can be problems with floats and conditional modifiers when
copy-propagating a negated UD source. The problem arises when a source
modifier is applied to a UD value. In this case, a 33-bit representation
is internally used. If you do the following:
1: mov foo:UD 7U
2: mov bar:UD -foo:UD
3: mov out:F bar:UD
the out register will have the value (float)(unt32_t)-7 which is some very
large floating-point number. However, if we allow copy-propagation of the
second mov, we get
1: mov foo:UD 7U
3: mov out:f -bar:UD
and, since the negation is computed in 33-bits, we get a value of -7.0f
which is clearly not the same. This is a similar problem if the
instruction has a conditional modifier where the 33-bit value is used in
the comparison and not the 32-bit version.
Previously, we checked the source to be copied for the negate and then
checked the source being propagated to for the type. This isn't quite what
we want because we are really just looking for negated UD sources. A check
later in the file ensures that both ends of the propagate have the right
type so it works. However, if we relax the restriction that both ends of
the propagation have the same type, it ends up causing us to bail early in
cases we don't want.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
isaml needs to scale up coords based on LoD. Also fix bogus bary.f
varying # when there are non-bary frag shader inputs. And use sub.s of
a positive immediate rather than add.s of negative (since CP is better
about figuring out that those can be collapsed into the cat2 instr).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For now, completely flatten if/else blocks. That will almost certainly
change once we have flow control.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Basically just sync up the cmdstream emit parts to match the changes
already done on a3xx.
Also, fix scheduling for mem instructions. This is needed on a4xx, and
I am a bit surprised it isn't needed for a3xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
There is a level param stashed away in the .w component of the first
src.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: move ishl into ttn (instead of driver backend) to keep the units
consistent between immediate and indirect offsets
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: also use ttn_src_for_indirect() everywhere for addr access, rather
than open-coding it for INPUT/CONST srcs
v3: move ralloc out of ttn_src_for_indirect() into the one call site
that needs a ptr
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
To match CALLOC_STRUCT macro.
Fixes memory corruption on Windows when u_memory's memory debugging is
enabled.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When running piglit w/ llvmpipe on Windows several tests terminate
abnormally just when the test exits.
The problem was that LLVMContextDispose was being called
after LLVM global destructors.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This fixes piglit shaders@glsl-fs-uniform-array-loop-unroll with immediate
shader compilation - it's a compiler test, so it has never been translated
to TGSI before.
Cc: 10.4 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
As mentioned by Michel Dänzer for LLVM >= 3.6 we create the
LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi
context. For older LLVM or hardware (r600) the triple is always r600--
and is created at a later stage - radeon_llvm_compile()
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Commit 5a06ee738 added a step to the generator to set up the message
header when generating the VS_OPCODE_PULL_CONSTANT_LOAD_GEN7
instruction. That pseudo opcode is implemented in terms of multiple
actual opcodes, one of which writes to one of the source registers in
order to set up the message header. This causes problems because the
scheduler isn't aware that the source register is written to and it
can end up reorganising the instructions incorrectly such that the
write to the source register overwrites a needed value from a previous
instruction. This problem was presenting itself as a rendering error
in the weapon in Enemy Territory: Quake Wars.
Since commit 588859e1 there is an additional problem that the double
register allocated to include the message header would end up being
split into two. This wasn't happening previously because the code to
split registers was explicitly avoided for instructions that are
sending from the GRF.
This patch fixes both problems by splitting the code to set up the
message header into a new pseudo opcode so that it will be done
outside of the generator. This new opcode has the header register as a
destination so the scheduler can recognise that the register is
written to. This has the additional benefit that the scheduler can
optimise the message header slightly better by moving the mov
instructions further away from the send instructions.
On Skylake it appears to fix the following three Piglit tests without
causing any regressions:
gs-float-array-variable-index
gs-mat3x4-row-major
gs-mat4x3-row-major
I think we actually may need to do something similar for the fs
backend and possibly for message headers from regular texture sampling
but I'm not entirely sure.
v2: Make sure the exec-size is retained as 8 for the mov instruction
to initialise the header from g0. This was accidentally lost
during a rebase on top of 07c571a39f.
Split the patch into two so that the helper function is a separate
change.
Fix emitting the MOV instruction on Gen7.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89058
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
There were three places in the visitor that had a similar chunk of
code to emit the VS_OPCODE_PULL_CONSTANT_LOAD opcode using a register
for the offset. This patch combines the chunks into a helper function
to reduce the code duplication. It will also be useful in the next
patch to expand what happens on Gen9+. This shouldn't introduce any
functional changes.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
`interface` is a define on Windows -- an alias for `struct` keyword,
used when declaring COM interfaces in C or C++.
So use instead `programInterface`, therefore matching the name used
in GL_ARB_program_interface_query spec/headers, which was renamed exactly
for the same reason:
"Revision 10, May 10, 2012 (pbrown)
- Rename the formal parameter <interface> used by the functions in this
extension to <programInterface>. Certain versions of the Microsoft
C/C++ compiler and/or its headers cause "interface" to be treated as a
reserved keyword."
Trivial.
Patch adds new function 'mesa_bufferiv' and refactors existing
GetActiveUniformBlockiv and GetActiveAtomicCounterBufferiv to
use it.
corresponding Piglit tests:
arb_uniform_buffer_object*
arb_shader_atomic_counters*
(Many tests hit the corresponding queries.)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Instead of iterating IR, retrieve required information through
the new program resource functions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Patch adds required helper functions to shaderapi.h and
the actual implementation.
The property query functionality can be tested with tests for
following functions that are refactored by later patches:
GetActiveAtomicCounterBufferiv
GetActiveUniformBlockiv
GetActiveUniformsiv
v2: code cleanup (Ilia Mirkin)
add bufSize < 0 check and error out
fix is_resource_referenced to return bool
check for propCount and bufSize, fixes in buffer_prop
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Patch adds required helper functions to shaderapi.h and
the actual implementation.
The added functionality can be tested by tests for following
functions that are refactored by later patches:
GetFragDataIndex
v2: return -1 if output not referenced by fragment stage
(Ilia Mirkin)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Patch adds required helper functions to shaderapi.h and
the actual implementation.
corresponding Piglit test:
arb_program_interface_query-resource-location
The added functionality can be tested by tests for following
functions that are refactored by later patches:
GetAttribLocation
GetUniformLocation
GetFragDataLocation
v2: code cleanup, changes to array element
syntax checking (Ilia Mirkin)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Patch adds required helper functions to shaderapi.h and
the actual implementation.
Name generation copied from '_mesa_get_uniform_name' which can
be removed later by refactoring functions to use resource list.
The added functionality can be tested by tests for following
functions that are refactored by later patches:
GetActiveUniformName
GetActiveUniformBlockName
v2: no index for geometry shader inputs (Ilia Mirkin)
add bufSize < 0 check and error out
validate enum
corresponding Piglit test:
arb_program_interface_query-getprogramresourcename
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.org>
Patch adds required helper functions to shaderapi.h and
the actual implementation.
v2: code cleanup (Ilia Mirkin)
corresponding Piglit test:
arb_program_interface_query-getprogramresourceindex
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.org>
Patch adds required helper functions to shaderapi.h and
the actual implementation.
v2: code cleanup (Ilia Mirkin)
fix array size fo xfb varyings
validate programInterface and throw error
v3: put GL_MAX_NUM_COMPATIBLE_SUBROUTINES where
it belongs
corresponding Piglit test:
arb_program_interface_query-getprograminterfaceiv
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Patch adds ProgramResourceList to gl_shader_program structure.
List contains references to active program resources and is
constructed during linking phase.
This list will be used by follow-up patches to implement hooks
for GL_ARB_program_interface_query. It can be also used to
implement any of the older shader program query APIs.
v2: code cleanups + note for SSBO and subroutines (Ilia Mirkin)
v3: code cleanups + assert(MESA_SHADER_STAGES < 8) (Martin Peres)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
v2: update dispatch_sanity test (Jason Ekstrand)
+ small code cleanups
v3: xml and Makefile fixes (Ilia Mirkin, Matt Turner)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Previously linker did not take in to account case where one would
have only gs and fs (with SSO), patch adds the case by refactoring
code around assign_varying_locations. This makes sure locations for
gs get populated correctly.
This was found with some of the SSO subtests of Martin's upcoming
GetProgramInterfaceiv Piglit test which passes with the patch, no
Piglit regressions.
v2: code cleanups (Martin Peres)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Previously GLX_EXT_create_context_es2_profile was marked as "direct
only" so that it would not depend on server support. Since the
extension required functions that are part of
GLX_ARB_create_context_profile, support for the EXT was disabled if the
ARB was not supported.
This was complete rubbish. If the server supported the ARB but not the
EXT, sending a request with GLX_CONTEXT_ES2_PROFILE_BIT_EXT would result
in GLXBadProfileARB.
Instead of the misguided hack, make GLX_EXT_create_context_es2_profile
properly depend on server support by not marking it as "direct only."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
We could potentially support the right combination of 8888 to 565, but the
important thing for now is to not mix up our orderings of 8888. Fixes
fbo-copyteximage regressions.
Now, if we set MESA_LOG_FILE and MESA_GLSL=dump, all the shader info
will get logged to the named file instead of stderr.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
_mesa_log() simply writes log information to stderr or MESA_LOG_FILE.
_mesa_get_log_file() returns the file handle to use for logging.
This will be used for shader dumping/logging instead of always printing
to stderr.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
- Use GetModuleHandle instead of LoadLibrary to avoid incrementing the
opengl32.dll reference count (otherwise the opengl32.dll will linger
in memory forever.)
- Ensure we use our fake wglCreateContext/wglDeleteContext when using
Mesa as a drop-in replacement for opengl32.dll
Untested. Just noticed by accident.
Reviewed-by: Brian Paul <brianp@vmware.com>
When a vec has more elements than row components in a matrix, the
code could end up failing an assert inside assign_to_matrix_column().
This patch makes sure that when there is still room in the matrix for
more elements (but in other columns of the matrix), the data is actually
assigned.
This patch fixes the following dEQP test:
dEQP-GLES3.functional.shaders.conversions.matrix_combine.float_bvec4_ivec2_bool_to_mat4x2_vertex
dEQP-GLES3.functional.shaders.conversions.matrix_combine.float_bvec4_ivec2_bool_to_mat4x2_fragment
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
We weren't comparing the right number of components when checking
swizzles. Use nir_ssa_alu_instr_num_src_components() to do the right
thing.
No piglit regressions, and no fixes either.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Certain platforms support the ability to sample from a texture, and write it out
to the file RT - thus saving a costly send instructions (note that this is a
potnential win if one wanted to backport to a tag that didn't have the patch
from Topi which removed excess MOVs from LOAD_PAYLOAD - 97caf5fa04),
v2: Modify the algorithm. Instead of iterating in reverse through blocks and
insts, since the last block/inst is the only thing which can benefit. Rebased
on top of Ken's patching modifying is_last_send
v3: Rebased over almost 2 months, and Incorporated feedback from Matt:
Some comment typo fixes and rewordings.
Whitespace
Move the optimization pass outside of the optimize loop
v4: Some cosmetic changes requested from Ken. These changes ensured that the
optimization function always returned true when an optimization occurred, and
false when one did not. This behavior did not exist with the original patch. As
a result, having the separate helper function which Matt did not like no longer
made sense, and so now I believe everyone should be happy.
Benchmark (n=20) %diff
*OglBatch5 -1.4
*OglBatch7 -1.79
OglFillTexMulti 5.57
OglFillTexSingle 1.16
OglShMapPcf 0.05
OglTexFilterAniso 3.01
OglTexFilterTri 1.94
No piglit regressions:
(http://otc-gfxtest-01.jf.intel.com:8080/view/dev/job/bwidawsk/112/)
[*] I believe my measurements are incorrect for Batch5-7. If I add this new
optimization, but never emit the new instruction I see similar results.
v5: Remove declaration of combine_tex_header since v4 dropped that function
(Ben)
Remove check for impossible case of an empty block (Matt)
Set dest earlier to avoid extra special-casing in generate_tex (Matt)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Based originally on a patch from Ken in May 2014 of the same title. Things
changed enough that I didn't feel comfortable leaving his authorship.
v2: Replace fp->UsesKill with wm_prog_data->uses_kill. Since Ken took the time
to also explain the difference to me, here is his explanation for posterity:
"fp->UsesKill indicates that a ARB_fragment_program shader uses the KIL
instruction, or that a GLSL shader uses the "discard" insntruction
(which are analogous).
On Gen4-5, we sometimes have to simulate OpenGL's "Alpha Test" feature
by emitting shader code that implicitly does a "discard" instruction.
In the key setup, we do:
/* key->alpha_test_func means simulating alpha testing via discards,
* so the shader definitely kills pixels.
*/
prog_data.uses_kill = fp->program.UsesKill || key->alpha_test_func;
Even though the shader may not technically contain a "discard", we need
to act as if it does.
I've also been trying to move the i965 state setup code to use
brw_wm_prog_key for everything, rather than poking at core Mesa's
gl_program/gl_fragment_program/gl_shader/gl_shader_program structures.
--Ken"
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When an instruction has a side effect, it impacts the available options when
reordering an instruction. As the EOT flag is an implied write to the render
target in the FS, it can be considered a side effect.
This patch shouldn't actually have any impact on the current code since the EOT
flag implies that the opcode is already one with side effects,
FS_OPCODE_FB_WRITE. The next patch however will introduce an optimization
whereby the EOT flag can occur with an opcode SHADER_OPCODE_TEX, and as that
instruction will perform the same implied write to the render target, it cannot
be reordered.
v2: Remove extra whitespace (Matt)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Consistently just use C99's __func__ everywhere.
No functional changes.
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Marius Predut <marius.predut@intel.com>
Consistently just use C99's __func__ everywhere.
No functional changes.
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Marius Predut <marius.predut@intel.com>
Consistently just use C99's __func__ everywhere.
No functional changes.
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Marius Predut <marius.predut@intel.com>
Consistently just use C99's __func__ everywhere.
The patch was verified with Microsoft Visual studio 2013
redistributable package(RTM version number: 18.0.21005.1)
Next MSVC versions intends to support __func__.
No functional changes.
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Marius Predut <marius.predut@intel.com>
Consistently just use C99's __func__ everywhere.
The patch was verified with Microsoft Visual studio 2013
redistributable package(RTM version number: 18.0.21005.1)
Next MSVC versions intends to support __func__.
No functional changes.
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Marius Predut <marius.predut@intel.com>
Consistently just use C99's __func__ everywhere.
The patch was verified with Microsoft Visual studio 2013
redistributable package(RTM version number: 18.0.21005.1)
Next MSVC versions intends to support __func__.
No functional changes.
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Marius Predut <marius.predut@intel.com>
Consistently just use C99's __func__ everywhere.
The patch was verified with Microsoft Visual studio 2013
redistributable package(RTM version number: 18.0.21005.1)
Next MSVC versions intends to support __func__.
No functional changes.
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Marius Predut <marius.predut@intel.com>
Consistently just use C99's __func__ everywhere.
The patch was verified with Microsoft Visual studio 2013
redistributable package(RTM version number: 18.0.21005.1)
Next MSVC versions intends to support __func__.
No functional changes.
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Marius Predut <marius.predut@intel.com>
Commit e93566a15c changed the message header code needed to
make Skylake use SIMD4x2 so that it uses a register with width 4
instead of 8 as the source register in the send message. However it
also changed the width for the dest in the MOV instruction which is
used to initialise the header register with the values from g0. The
width of the destination is used to determine the exec size in
brw_set_dest so this would end up making the MOV have an exec size of
4. I think this would end up leaving the top half of the register
uninitialised. The top half of the header has meaningful values so
this probably isn't a good idea.
This patch just casts the dest register for the MOV instruction back
to a vec8 to fix it. It doesn't cause any changes to a Piglit run.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Commit b616164 added an optimization of b2f generation of a comparison.
It also included an extra optimization of one of the comparison values
is a constant of zero. The trick was that some value was known to be
zero, so that value could be used in the SEL instruction instead of
potentially loading 0.0 into a register.
This change switched the order of the arguments to the SEL, and, for
some unknown reason, I thought that the predicate should therefore
only be inverted for the == case. Clearly, it should always be
inverted.
Fixes piglit fs-notEqual-of-expression.shader_test and
fs-equal-of-expression.shader_test.
v2: Don't do the "register already has zero" optimization for the '== 0'
case. In that case, the register does not have zero when we want to
produce a zero result.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89722
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Tested-by: Lu Hua <huax.lu@intel.com>
new_prim was declared as a stack variable within a nested scope; we
tried to retain a pointer to that data beyond the scope, which is bogus.
GCC with -O1 eliminated most of the code that set new_prim's fields.
Move the declaration to fix the bug.
v2: Also fix new_ib (thanks to Matt Turner and Ben Widawsky).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81025
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Cc: mesa-stable@lists.freedesktop.org
I finally managed to dig up some information on our mysterious GPU hangs.
A wiki page from the Crestline validation team mentions that they found
a GPU hang in "Serious Sam 2" (on Windows) with remarkably similar
conditions to the ones we've seen in Google Chrome and glmark2.
Apparently, if WM_STATE has "PS Use Source Depth" enabled, CC_STATE has
most depth state disabled, and you issue a CONSTANT_BUFFER command and
immediately draw, the depth interpolator makes a small mistake that
leads to hangs.
Most of the traces I looked at contained a CONSTANT_BUFFER packet
immediately followed by 3DPRIMITIVE, or at least very few packets.
It appears they also have "PS Use Source Depth" enabled - either at the
hang, or a little before it. So I think this is our bug.
The workaround is to emit a non-pipelined state packet after issuing a
CONSTANT_BUFFER packet. This is really similar to the workaround I
developed in commit c4fd0c9052.
v2: Fix word-wrapping issues.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit 4ebeb71573, I deleted the
emit_shader_time_end() call in emit_urb_writes(). But I failed to add
it to run_vs(), as I intended. So no data was recorded at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For blitting, we want to fire off an RCL-only job. This takes a bit of
tweaking in our validation and the simulator support (and corresponding
new code in the kernel).
I want to be able to have multiple jobs being set up at the same time (for
example, a render job to do a little fixup blit in the course of doing a
render to the main FBO).
So, it turns out my simulator doesn't *quite* match the hardware. And the
errata about raster textures tells you most of what's wrong, but there's
still stuff wrong after that. Instead, if we're asked to sample from
raster, we'll just blit it to a tiled temporary.
Raster textures should only be screen scanout, and word is that it's
faster to copy to tiled using the tiling engine first than to texture from
an entire raster texture, anyway.
These are required to get piglit's idiv tests working. The
unsigned<->float conversions are wrong, but are good enough to get
piglit's small ranges of values working.
We create textures internally for texsubimage, and we use
the values from sub image to create a new texture, however
we don't align these to valid sizes, and cube map arrays
must have an array size aligned to 6.
This fixes texsubimage cube_map_array on CAYMAN at least,
(it was causing GPU hang and bad values), it probably
also fixes it on radeonsi and evergreen.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89957
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since we can subimage upload a number of cube map array layers,
that aren't a complete cube map array, we should specify things
as a 2D array and blit from that.
Suggested by Ilia Mirkin as an alternate fix for texsubimage
cube map array issues.
seems to work just as well.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This change fixes a regression with timer queries introduced with
commit 3eb6258. There the pending batchbuffer is flushed
only if glEndQuery is executed. This present change adds such
a flush to glQueryCounter which also schedules a value query
just like glEndQuery does. The patch fixes GPU timer queries
going mad from within osgviewer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Cc: mesa-stable@lists.freedesktop.org
We're over-allocating our BCL in vc4_draw.c, so this never mattered.
However, new RCL-only blit support might end up here without having set up
any BCL contents.
Coverity is confused by the "float < int / 2" expression and suggests
casting MAX_GLUINT to unsigned, which I believe it was supposed to have
been already.
Reviewed-by: Brian Paul <brianp@vmware.com>
Allow glEGLImageTargetRenderbufferStorageOES and
glEGLImageTargetTexture2DOES for dma_buf EGLImages if the image is
a single RGBA8 unorm plane. This is safe, despite fast color clears,
because i965 disables allocation of auxiliary buffers for EGLImages.
Chrome OS needs this, because its compositor uses dma_buf EGLImages for
its scanout buffers.
Testing:
- Tested on Ivybridge Chromebook Pixel with WebGL Aquarium and
YouTube.
- No Piglit regressions on Broadwell with `piglit run -p gbm
tests/quick.py`, with my Piglit patches that update the
EGL_EXT_image_dma_buf_import tests.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
EGL does not yet have extensions to manage the flushing and invalidating
of driver-internal aux buffers. So we must disable aux buffers of
dma_buf-backed EGLImages in order to safely render into them.
This patch is obviously needed for renderbufers. It's also needed for
textures because the user can attach the texture to a framebuffer and
because the driver sometimes renders to textures for internal reasons.
Testing:
- Tested on Ivybridge Chromebook Pixel with WebGL Aquarium and
YouTube.
- No Piglit regressions on Broadwell with `piglit run -p gbm
tests/quick.py`.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Add parameter 'bool disable_aux_buffers'.
This is a refactor patch. The patch changes no behavior because the new
parameter is false in every call.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The new field disables allocation of auxiliary buffers, such as the HiZ
buffer and MCS buffer. This is useful for sharing the miptree bo with an
external client that doesn't understand auxiliary buffers.
We need this field to safely render to a buffer that was imported with
EGL_EXT_image_dma_buf_import, because EGL does not yet have extensions
to manage flushing and invalidating auxiliary buffers.
Nothing yet enables this field. That's left to follow-up patches.
Testing:
- Tested on Ivybridge Chromebook Pixel with WebGL Aquarium and
YouTube.
- No Piglit regressions on Broadwell with `piglit run -p gbm
tests/quick.py`.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Every caller of this function uses it to determine if the current
miptree needs a hiz buffer to be allocated. Strangely, the function
doesn't take a miptree argument. So, this function effectively decides
if and when a miptree's hiz buffer gets allocated without inspecting the
miptree itself. Luckily, the driver behaves correctly despite the
brw_is_hiz_depth_format's quirk.
I will soon make some changes to the miptree that will require
inspecting the miptree to determine if it needs a hiz buffer. So this
patch renames
brw_is_hiz_depth_format -> intel_miptree_wants_hiz_buffer
and gives it a miptree parameter.
This patch shouldn't change any behavior.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
I'm not sure what was the original intention, but currently
USE_EXTERNAL_DXTN_LIB always ends up defined, one way or another.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Missed out with commit d99135b2e9b(configure: nuke
--with-max-{width,height})
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The option was deprecated with commit 959e83d6507(clover: Adapt libclc's
INCLUDEDIR and LIBEXECDIR to make use of the new introduced libclc.pc.)
back in 2012 with mesa 9.2.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Previously, we translated into NIR and did all the optimizations and
lowering as part of running fs_visitor. This meant that we did all of
that work twice for fragment shaders - once for SIMD8, and again for
SIMD16. We also had to redo it every time we hit a state based
recompile.
We now generate NIR once at link time. ARB programs don't have linking,
so we instead generate it at ProgramStringNotify time.
Mesa's fixed function vertex program handling doesn't bother to inform
the driver about new programs at all (which is rather mean), so we
generate NIR at the last minute, if it hasn't happened already.
shader-db runs ~9.4% faster on my i7-5600U, with a release build.
v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother
using _mesa_program_enum_to_shader_stage as we already know it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Storing this here is pretty sketchy - I don't know if any driver other
than i965 will want to use it. But this will make it a lot easier to
generate NIR code at link time. We'll probably rework it anyway.
(Ian suggested making nir_assign_var_locations_scalar_direct_first
simply modify the nir_shader's fields, rather than passing pointers
to them. If this stays long term, we should do that. But Jason and
I suspect we'll be reworking this area again in the near future.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Just build up arrays for src0/src1, and use create_collect()..
Also add back missing .3d flag for 3d/cube textures.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
I noticed some cases where we where trying to copy-propagate indirect
src's into places they cannot go, like 2nd src for cat3 (mad, etc).
Expand out valid_flags() to be aware of relativ flag, and fix up a few
related spots.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When we get in a scenario where we cannot schedule any more instructions
due to address register conflict, clone the instruction that writes the
address register, and switch the remaining unscheduled users for the
current address register over to the new clone.
This is simpler and more robust than the previous attempt (which tried
and sometimes failed to ensure all other dependencies of users of the
address register were scheduled first).. hint it would try to schedule
instructions that were not actually needed for any output value.
We probably need to do the same with predicate register, although so far
it isn't so heavily used so we aren't running into problems with it
(yet).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A bit fugly.. try and make this cleaner.. note if we hoist all the
get_addr() out of the loop we can drop the hashtable and just use
create_addr()..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It probably *should* be an assert, but for now TGSI f/e isn't very good
about dealing w/ CONST vs ABS/NEG. So for debug builds, print a warning
instead of crashing with an assert for now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Without this, a3xx breaks.. a4xx would too if it had already implemented
support for passing driver params.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For a normal MAD (ie. not MADSH), if first source is gpr and second
source is const, we can swap the first two sources to avoid needing a
mov instruction.
This gives back the biggest advantage TGSI f/e had over NIR f/e for
common shaders, since TGSI f/e had this logic in the f/e. Note that
doing this in copy-prop step has the advantage that it will also work
for cases like:
MOV TEMP[b], CONST[x]
MAD TEMP[d], TEMP[a], TEMP[b], TEMP[c]
Signed-off-by: Rob Clark <robclark@freedesktop.org>
I guess I was looking too much at how lower_system_values worked when
writing lower_idiv.
Since ttn wasn't emitting load_var for sysvals and the only drivers
using lower_idiv were using ttn, I think nothing was broken as a result.
But might as well fix this before it becomes a problem.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Originally you had to have one or the other. But actually I don't want
either. (Or rather I want whatever is the minimum # of instructions.)
TODO: not sure where the best place to insert a check that driver hasn't
set *both* lower_negate and lower_sub?
Signed-off-by: Rob Clark <robclark@freedesktop.org>
So far just the system values that freedreno supports, so we may add
more later.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
With TXD we also have the ddx/ddy sources (before the sampler).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Split out from ttn_tex() since it is kind of a weird instruction that
maps to two NIR opcodes, and it was cleaner this way.
v2: query_levels doesn't take any args
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We'll need this as well for TXQ. Split this out first to reduce noise
in the next patch.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since the rest of NIR really would rather have these as variables rather
than registers, create a nir_variable per array. But rather than
completely re-arrange ttn to be variable based rather than register
based, keep the registers. In the cases where there is a matching var
for the reg, ttn_emit_instruction will append the appropriate intrinsic
to get things back from the shadow reg into the variable.
NOTE: this doesn't quite handle TEMP[ADDR[]] when the DCL doesn't give
an array id. But those just kinda suck, and should really go away.
AFAICT we don't get those from glsl. Might be an issue for some other
state tracker.
v2: rework to use load_var/store_var with deref chains
v3: create new "burner" reg for temporarily holding the (potentially
writemask'd) dest after each instruction; add load_var to initialize
temporary dest in case not all components are overwritten
v4: review comments: asserts and use ttn_src_for_indirect() in
ttn_array_deref() so we can drop later patch converting to use vec1 for
addr reg (since ttn_src_for_indirect() handles the imov to vec1 from
tgsi addr component that we want)
v5: rebase: new requirements about parent mem ctx for derefs
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Extract tgsi_dst->Index into a local.. split out from 'gallium/ttn: add
support for temp arrays' for noise reduction..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
AC_MSG_ERROR([--enable-gallium-llvm is required when building $1])
fi
LLVM_REQUIRED_VERSION_MAJOR="3"
LLVM_REQUIRED_VERSION_MINOR="4"
LLVM_REQUIRED_VERSION_PATCH="2"
if test "${LLVM_VERSION_INT}${LLVM_VERSION_PATCH}" -lt "${LLVM_REQUIRED_VERSION_MAJOR}0${LLVM_REQUIRED_VERSION_MINOR}${LLVM_REQUIRED_VERSION_PATCH}"; then
AC_MSG_ERROR([LLVM $LLVM_REQUIRED_VERSION_MAJOR.$LLVM_REQUIRED_VERSION_MINOR.$LLVM_REQUIRED_VERSION_PATCH or newer is required for $1])
fi
llvm_check_version_for "3" "4" "2" $1
if test true && $LLVM_CONFIG --targets-built | grep -qvw 'R600' ; then
AC_MSG_ERROR([LLVM R600 Target not enabled. You can enable it when building the LLVM
sources with the --enable-experimental-targets=R600
@@ -2392,6 +2247,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/targets/libgl-xlib/Makefile
src/gallium/targets/omx/Makefile
src/gallium/targets/opencl/Makefile
src/gallium/targets/opencl/mesa.icd
src/gallium/targets/osmesa/Makefile
src/gallium/targets/osmesa/osmesa.pc
src/gallium/targets/pipe-loader/Makefile
@@ -2527,12 +2383,6 @@ else
echo " Gallium: no"
fi
dnl Shader cache
echo ""
echo " Shader cache: $enable_shader_cache"
if test "x$enable_shader_cache" = "xyes"; then
echo " With SHA1 from: $with_sha1"
fi
dnl Libraries
echo ""
@@ -2556,6 +2406,7 @@ if test "x$MESA_LLVM" = x1; then
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=69226">Bug 69226</a> - Cannot enable basic shaders with Second Life aborts attempt</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71591">Bug 71591</a> - Second Life shaders fail to compile (extension declared in middle of shader)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88521">Bug 88521</a> - GLBenchmark 2.7 TRex renders with artifacts on Gen8 with !UXA</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89455">Bug 89455</a> - [NVC0/Gallium] Unigine Heaven black and white boxes</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89689">Bug 89689</a> - [Regression] Weston on DRM backend won't start with new version of mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90130">Bug 90130</a> - gl_PrimitiveId seems to reset at 340</li>
</ul>
<h2>Changes</h2>
<p>Boyan Ding (1):</p>
<ul>
<li>i965: Add XRGB8888 format to intel_screen_make_configs</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: Add sha256 sums for the 10.5.4 release</li>
<li>r300: do not link against libdrm_intel</li>
<li>Update version to 10.5.5</li>
</ul>
<p>Ilia Mirkin (4):</p>
<ul>
<li>nvc0/ir: flush denorms to zero in non-compute shaders</li>
<li>gk110/ir: fix set with a register dest to not auto-set the abs flag</li>
<li>nvc0/ir: fix predicated PFETCH emission</li>
<li>nv50/ir: fix asFlow() const helper for OP_JOIN</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.</li>
<li>i965: Disallow linear blits that are not cacheline aligned.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60797">Bug 60797</a> - 1px lines in octave plot aliased to 0</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67564">Bug 67564</a> - HiZ buffers are much larger than necessary</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=69226">Bug 69226</a> - Cannot enable basic shaders with Second Life aborts attempt</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71591">Bug 71591</a> - Second Life shaders fail to compile (extension declared in middle of shader)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79202">Bug 79202</a> - valgrind errors in glsl-fs-uniform-array-loop-unroll.shader_test; random code generation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86837">Bug 86837</a> - kodi segfault since auxiliary/vl: rework the build of the VL code</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86944">Bug 86944</a> - glsl_parser_extras.cpp", line 1455: Error: Badly formed expression. (Oracle Studio)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86974">Bug 86974</a> - INTEL_DEBUG=shader_time always asserts in fs_generator::generate_code() when Mesa is built with --enable-debug (= with asserts)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88246">Bug 88246</a> - Commit 2881b12 causes 43 DrawElements test regressions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88248">Bug 88248</a> - Calling glClear while there is an occlusion query in progress messes up the results</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88521">Bug 88521</a> - GLBenchmark 2.7 TRex renders with artifacts on Gen8 with !UXA</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88534">Bug 88534</a> - include/c11/threads_posix.h PTHREAD_MUTEX_RECURSIVE_NP not defined</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88561">Bug 88561</a> - [radeonsi][regression,bisected] Depth test/buffer issues in Portal</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88815">Bug 88815</a> - Incorrect handling of GLSL #line directive</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88883">Bug 88883</a> - ir-a2xx.c: variable changed in assert statement</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88885">Bug 88885</a> - Transform feedback uses incorrect interleaving if a previous draw did not write gl_Position</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89039">Bug 89039</a> - [SKL]etqw system hang</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89058">Bug 89058</a> - [SKL]Render error in some games (etqw-demo, nexuiz, portal)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89068">Bug 89068</a> - glTexImage2D regression by texstore_rgba switch to _mesa_format_convert</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89069">Bug 89069</a> - Lack of grass in The Talos Principle on radeonsi (native\wine\nine)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89131">Bug 89131</a> - [Bisected] Graphical corruption in Weston, shows old framebuffer pieces</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89156">Bug 89156</a> - r300g: GL_COMPRESSED_RED_RGTC1 / ATI1N support broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89180">Bug 89180</a> - [IVB regression] Rendering issues in Mass Effect through VMware Workstation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89210">Bug 89210</a> - GS statistics fail on SNB</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89224">Bug 89224</a> - Incorrect rendering of Unigine Valley running in VM on VMware Workstation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89260">Bug 89260</a> - macros.h:34:25: fatal error: util/u_math.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89292">Bug 89292</a> - [regression,bisected] incomplete screenshots in some cases</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89328">Bug 89328</a> - python required to build Mesa release tarballs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89342">Bug 89342</a> - main/light.c:159:62: error: 'M_PI' undeclared (first use in this function)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89343">Bug 89343</a> - compiler/tests/radeon_compiler_optimize_tests.c:43:3: error: implicit declaration of function ‘fprintf’ [-Werror=implicit-function-declaration]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89345">Bug 89345</a> - imports.h:452:58: error: expected declaration specifiers or '...' before 'va_list'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89364">Bug 89364</a> - c99_alloca.h:40:22: fatal error: alloca.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89679">Bug 89679</a> - [NV50] Portal/Half-Life 2 will not start (native Steam)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89689">Bug 89689</a> - [Regression] Weston on DRM backend won't start with new version of mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89726">Bug 89726</a> - [Bisected] dEQP-GLES3: uniform linking logic in the presence of structs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89746">Bug 89746</a> - Mesa and LLVM 3.6+ break opengl for genymotion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89754">Bug 89754</a> - vertexAttrib fails WebGL Conformance test with mesa drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89758">Bug 89758</a> - pow WebGL Conformance test with mesa drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89759">Bug 89759</a> - WebGL OGL ES GLSL conformance test with mesa drivers fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89963">Bug 89963</a> - lp_bld_debug.cpp:100:31: error: no matching function for call to ‘llvm::raw_ostream::raw_ostream()’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90310">Bug 90310</a> - Fails to build gallium_dri.so at linking stage with clang because of multiple redefinitions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90350">Bug 90350</a> - [G96] Portal's portal are incorrectly rendered</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90363">Bug 90363</a> - [nv50] HW state is not reset correctly when using a new GL context</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90397">Bug 90397</a> - ARB_program_interface_query: glGetProgramResourceiv() returns wrong value for GL_REFERENCED_BY_*_SHADER prop for GL_UNIFORM for members of an interface block with an instance name</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90466">Bug 90466</a> - arm: linker error ndefined reference to `nir_metadata_preserve'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90520">Bug 90520</a> - Register spilling clobbers registers used elsewhere in the shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90830">Bug 90830</a> - [bsw bisected regression] GPU hang for spec.arb_gpu_shader5.execution.sampler_array_indexing.vs-nonzero-base</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90839">Bug 90839</a> - [10.5.5/10.6 regression, bisected] PBO glDrawPixels no longer using blit fastpath</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90347">Bug 90347</a> - [NVE0+] Failure to insert texbar under some circumstances (causing bad colors in Terasology)</li>
</ul>
<h2>Changes</h2>
<p>Anuj Phogat (4):</p>
<ul>
<li>mesa: Handle integer formats in need_rgb_to_luminance_conversion()</li>
<li>mesa: Use helper function need_rgb_to_luminance_conversion()</li>
<li>mesa: Turn need_rgb_to_luminance_conversion() in to a global function</li>
<li>meta: Abort meta path if ReadPixels need rgb to luminance conversion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73528">Bug 73528</a> - Deferred lighting in Second Life causes system hiccups and screen flickering</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80500">Bug 80500</a> - Flickering shadows in unreleased title trace</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82186">Bug 82186</a> - [r600g] BARTS GPU lockup with minecraft shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91047">Bug 91047</a> - [SNB Bisected] Messed up Fog in Super Smash Bros. Melee in Dolphin</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91056">Bug 91056</a> - The Bard's Tale (2005, native) has rendering issues</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91117">Bug 91117</a> - Nimbus (running in wine) has rendering issues, objects are semi-transparent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91124">Bug 91124</a> - Civilization V (in Wine) has rendering issues: text missing, menu bar corrupted</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85252">Bug 85252</a> - Segfault in compiler while processing ternary operator with void arguments</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91570">Bug 91570</a> - Upgrading mesa to 10.6 causes segfault in OpenGL applications with GeForce4 MX 440 / AGP 8X</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91610">Bug 91610</a> - [BSW] GPU hang for spec.shaders.point-vertex-id gl_instanceid divisor</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>glx: Fix __glXWireToEvent for BufferSwapComplete</li>
</ul>
<p>Alex Deucher (2):</p>
<ul>
<li>radeonsi: add new OLAND pci id</li>
<li>radeonsi: properly set the raster_config for KV</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.4</li>
<li>vc4: add missing nir include, to fix the build</li>
<li>Revert "radeonsi: properly set the raster_config for KV"</li>
<li>Update version to 10.6.5</li>
</ul>
<p>Frank Binns (1):</p>
<ul>
<li>egl/x11: don't abort when creating a DRI2 drawable fails</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nouveau: no need to do tnl wakeup, state updates are always hooked up</li>
<li>gm107/ir: indirect handle goes first on maxwell also</li>
<li>nv50,nvc0: take level into account when doing eng2d multi-layer blits</li>
</ul>
<p>Jason Ekstrand (4):</p>
<ul>
<li>meta/copy_image: Stash off the scissor</li>
<li>mesa/formats: Only do byteswapping for packed formats</li>
<li>mesa/formats: Fix swizzle flipping for big-endian targets</li>
<li>mesa/formats: Don't flip channels of null array formats</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90925">Bug 90925</a> - "high fidelity": Segfault in _mesa_program_resource_find_name</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91673">Bug 91673</a> - Segfault when calling glTexSubImage2D on storage texture to bound FBO</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
</ul>
<h2>Changes</h2>
<p>Chris Wilson (2):</p>
<ul>
<li>i965: Prevent coordinate overflow in intel_emit_linear_blit</li>
<li>i965: Always re-emit the pipeline select during invariant state emission</li>
</ul>
<p>Daniel Scharrer (1):</p>
<ul>
<li>mesa: add missing queries for ARB_direct_state_access</li>
</ul>
<p>Dave Airlie (8):</p>
<ul>
<li>mesa/arb_gpu_shader_fp64: add support for glGetUniformdv</li>
<h1>Mesa 10.6.8 Release Notes / September 20, 2015</h1>
<p>
Mesa 10.6.8 is a bug fix release which fixes bugs found since the 10.6.7 release.
</p>
<p>
Mesa 10.6.8 implements the OpenGL 3.3 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 3.3. OpenGL
3.3 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90621">Bug 90621</a> - Mesa fail to build from git</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
</ul>
<h2>Changes</h2>
<p>Alejandro Piñeiro (1):</p>
<ul>
<li>i965/vec4: fill src_reg type using the constructor type parameter</li>
</ul>
<p>Antia Puentes (1):</p>
<ul>
<li>i965/vec4: Fix saturation errors when coalescing registers</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.7</li>
<li>cherry-ignore: add commit non applicable for 10.6</li>
</ul>
<p>Hans de Goede (4):</p>
<ul>
<li>nv30: Fix creation of scanout buffers</li>
<li>nv30: Implement color resolve for msaa</li>
<li>nv30: Fix max width / height checks in nv30 sifm code</li>
<li>nv30: Disable msaa unless requested from the env by NV30_MAX_MSAA</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>mesa: Pass the type to _mesa_uniform_matrix as a glsl_base_type</li>
<li>mesa: Don't allow wrong type setters for matrix uniforms</li>
</ul>
<p>Ilia Mirkin (5):</p>
<ul>
<li>st/mesa: don't fall back to 16F when 32F is requested</li>
<li>nvc0: always emit a full shader colormask</li>
<li>nvc0: remove BGRA4 format support</li>
<li>st/mesa: avoid integer overflows with buffers >= 512MB</li>
<li>nv50, nvc0: fix max texture buffer size to 128M elements</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.