Uniform names (even for hidden uniforms) are required to be unique; some
parts of the compiler assume they can be looked up by name.
Fixes the piglit test: tests/spec/glsl-1.20/linker/array-initializers-1
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When converting a uniform array reference to a pull constant load, the
`reladdr` expression itself may have its own `reladdr`, arbitrarily
deeply. This arises from expressions like:
a[b[x]] where a, b are uniform arrays (or lowered const arrays),
and x is not a constant.
Just iterate the lowering to pull constants until we stop seeing these
nested. For most shaders, there will be only one pass through this loop.
Fixes the piglit test:
tests/spec/glsl-1.20/linker/double-indirect-1.shader_test
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves all the CUBE section above the gradients section,
so that the gradient emission happens on one block which
is what sb/hardware expect.
v2: avoid changes to bytecode by using spare temps
v2.1: shame gcc, oh the shame. (uninit var warnings)
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The piglit tests were failing, and it appeared to be SB
optimising out things, but Glenn pointed out the gradients
are meant to be clause local, so we should emit the texture
instructions in the same clause. This moves things around
to always copy to a temp and then emit the texture clauses
for H/V.
v2: Glenn pointed out we could get another ALU fetch in
the wrong place, so load the src gpr earlier as well.
Fixes at least:
./bin/tex-miplevel-selection textureGrad 2D
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
res->bind is not an indicator of how the resource is currently bound.
buffers can be rebound across different binding points without changing
underlying storage.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
This creates a usable layout for all NPOT textures. Of course these
still have lots of limitations, but at least we can render to a
level.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
For NPOT texture layouts, we want to be able to access texture levels
other than 0 directly. Since the hw doesn't support that, We do it by
adding the offset directly.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
This happens with glsl-convolution-1, where we have 64 constants. This
doesn't make the test pass (we don't have 64 constants anyway, only
32) but this prevents it from crashing.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
This patch remove workaround related to LLVM < 3.2 bug.
Original bug has been closed as fixed in 2011.
At this moment gallium requires LLVM 3.3 (2013).
LLVM has been tested without SSE2 support in commit
ca70de9bd2 and removed after requiring
LLVM 3.3 in commit 013ff2fae1
Original LLVM bug: http://llvm.org/bugs/show_bug.cgi?id=6960
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In commit 5e37a2a4a8, I made the pull constant code stop calling
_mesa_load_state_parameters() when there were no pull parameters.
This worked fine on Gen6+ because the push constant code also called
it if there were any push constants. However, the Gen4-5 push constant
code wasn't doing this. This patch makes it do so, like the Gen6+ code.
A better long term solution would be to make core Mesa just handle this
for us when necessary.
Fixes around 8766 Piglit tests on Ironlake, and probably Gen4 as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Fix one of the few cases where we can't reliable touch the destination hazard
bits. I am explicitly doing this patch individually so it is easy to backport. I
was tempted to do this patch before the previous patch which reorganized the
code, but I believe even doing that first, this is still easy to backport.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Move this to a separate function so that we can begin to add other little
caveats without making too big a mess.
NOTE: There is some desire to improve this function eventually, but we need to
fix a bug first.
v2:
Use const for the inst for the hazard check (Matt)
Invert safe logic to get rid of the double negative (Matt)
Add PRM reference for predicates (Matt)
Add note about empirical evidence for math (Matt)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The visitor emits MOVs to temporary registers for immediates, so these
never trigger. For further proof, check case ir_triop_fma.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
texture_offset was only used by some texturing operations, and offset
was only used by spill/unspill and some URB operations. These fields are
never used at the same time.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Check that the target is GL_TEXTURE_CUBE_MAP before emitting
TEXCOORDTYPE_VECTOR texture coordinates.
I'm not sure if the hardware would like CARTESIAN coordinates
with cube maps, and as I'm too lazy to find out just emit the
VECTOR coordinates for cube maps always. For other targets use
CARTESIAN or HOMOGENOUS depending on the number of texture
coordinates provided.
Fixes rendering of the "electric" background texture in chromium-bsu
main menu. We appear to be provided with three texture coordinates
there (I'm guessing due to the funky texture matrix rotation it does).
So the code would decide to use TEXCOORDTYPE_VECTOR instead of
TEXCOORDTYPE_CARTESIAN even though we're dealing with a 2D texure.
The results weren't what one might expect.
demos/cubemap still works, which hopefully indicates that this doesn't
break things.
Also tested with:
bin/glean -o -v -v -v -t +texCube --quick
bin/cubemap -auto
from piglit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Add support for decoding the new branch control bit. I saw two things wrong with
the existing code.
1. It didn't bother trying to decode the bit.
- While we do not *intentionally* emit this bit today, I think it's interesting
to see if we somehow ended up with the bit set. It may also be useful in the
future.
2. It seemed to be the wrong bit.
- The docs are pretty poor wrt which bit this actually occupies. To me, it
/looks/ like it should be bit 28. I am not sure where Ken got 30 from. I
verified it should be 28 by looking at the simulator code.
I also added the most basic support for GOTO simply so we don't need to remember
to change the function in the future.
v2:
Move the branch_ctrl check out of the if gen >= 6 check to make it more
readable. (Matt)
ENDIF doesn't have branch_ctrl (Matt + Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts f4dd099171.
The src/gallium/tests/unit/translate_test.c gives the same results on
MinGW 64-bits as on Linux 64-bits. And since MinGW is often used for
development/testing due to its convenience, it's better not to have this
sort of differences relative to MSVC.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
No changes required in the driver itself, all handled by draw.
piglit results in a quick run:
skip->pass 7
skip->fail 2
(The new failures in the ARB_fragment_layer_viewport group are expected,
we fail the same if gs doesn't write these outputs regardless of the vs.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Mostly add a couple cases so we don't just check gs for this.
There's only one gotcha, the built-in vp transform in the llvm vs can't
handle it (this would be fixable though non-trivial due to vp index being
non-constant for the SoA outputs, but we don't use it if there's a gs
neither - the whole clip/vp transform integration there is suboptimal).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When calling vaCreateImage() an internal copy of VAImage is maintained
since the allocation of "image" may not be guaranteed to live long enough.
Signed-off-by: Michael Varga <Michael.Varga@amd.com>
For 1D and 2D arrays we don't want the other coordinates being
offset and affecting where we sample. I wrote this patch 6 months
ago but lost it.
Fixes:
./bin/tex-miplevel-selection textureLodOffset 1DArray
./bin/tex-miplevel-selection textureLodOffset 2DArray
./bin/tex-miplevel-selection textureOffset 1DArray
./bin/tex-miplevel-selection textureOffset 1DArrayShadow
./bin/tex-miplevel-selection textureOffset 2DArray
./bin/tex-miplevel-selection textureOffset(bias) 1DArray
./bin/tex-miplevel-selection textureOffset(bias) 2DArray
v2: rewrite to handle more cases and be consistent with code
above.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously, the kernel would dispatch thread 0, wait, then dispatch thread
1. By insisting that the thread contents use semaphores in the right
place, the kernel can sleep for longer by dispatching both threads at
once.
When using the stand alone compiler, if we try to link a shader with vertex
attributes it will segfault on linking as the binding hash tables are not
included in the shader program. Obviously, we cannot make the linking stage
succeed without the bound attributes but we can prevent the crash and just
let the linker spit its own error.
Reviewed-by: Brian Paul <brianp@vmware.com>
We cannot guarantee that vertex buffers have the necessary alignment for
fetching all AoS members at once (for instance 4x32bit XYZW data). We can
however guarantee that for textures. This did not cause errors for older
llvm versions but it now matters and will cause segfaults if the data
happens to not be aligned. Thus we need to set alignment manually.
(Note that we can't actually really guarantee data to be even element aligned
due to offsets in vertex buffers being bytes and OpenGL allowing this, but
it does not matter for x86 as alignment is only required for sse vectors -
not sure what happens on other archs, however.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=85467.
Not all drivers can set gl_Layer from VS. Add a fallback that passes the
instance id from VS to GS, and then uses the GS to set the layer.
Tested by adding
quad_buffers |= clear_buffers;
clear_buffers = 0;
to the st_Clear logic, and forcing set_vertex_shader_layered in all
cases. No piglit regressions (on piglits with 'clear' in the name).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
DRI_PRIME setups have different issues due the lack of dma-buf fences
support in the drivers. For DRI3 DRI_PRIME, a race can appear, making
tearings visible, or worse showing older content than expected. Until
dma-buf fences are well supported (and by all drivers), an alternative
is to send the buffers to the server only when rendering has finished.
Since waiting the rendering has finished in the main thread has a
performance impact, this patch uses an additional thread to offload the
wait and the sending of the buffers to the server.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Implements vblank_mode and throttling, which allows us change default ratio
between framerate and input lag.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Work of Joakim Sindholt (zhasha) and Christoph Bumiller (chrisbmr).
DRI3 port done by Axel Davy (mannerov).
v2: - nine_debug.c: klass extended from 32 chars to 96 (for sure) by glennk
- Nine improvements by Axel Davy (which also fixed some wine tests)
- by Emil Velikov:
- convert to static/shared drivers
- Sort and cleanup the includes
- Use AM_CPPFLAGS for the defines
- Add the linker garbage collector
- Restrict the exported symbols (think llvm)
v3: - small nine fixes
- build system improvements by Emil Velikov
v4: [Emil Velikov]
- Do no link against libudev. No longer needed.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: David Heidelberg <david@ixit.cz>
v3: thanks to Brian, improved coding style, also glennk helped spot few
things (unsigned -> int, two constify)
v4: thanks Ilia improved function, dropped u_box_clip_3d
v5: incorporated rest of Gregor proposed changes,clean ups
v6: u_box_clip_2d simplify proposed by Ilia Mirkin
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
At this moment we use only zero or positive values.
v2: Implement it for also for Solaris, MSVC assembly
and enable for other combinations.
v3: Replace MSVC assembly by assert + warning during compilation
v4: remove inc and dec with return for MSVC assembly
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: David Heidelberg <david@ixit.cz>
Implement pipe_loader_sw_probe_wrapped which allows to use the wrapped
software renderer backend when using the pipe loader.
v2: - remove unneeded ifdef
- use GALLIUM_PIPE_LOADER_WINSYS_LIBS
- check for CALLOC_STRUCT
thanks to Emil Velikov
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Some of the geom shader tests produce an empty vertex shader,
on cayman we'd crash in the finaliser because last_cf was NULL.
cayman doesn't need the NOP workaround, so if the code arrives
here with no last_cf, just emit an END.
fixes crashes in a bunch of piglit geom shader tests.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The sampler_array_size field was added by "mesa/st: add support for
dynamic sampler offsets". But the field wasn't getting copied in
the get_pixel_transfer_visitor() or get_bitmap_visitor() functions.
The count_resources() function then didn't properly compute the
glsl_to_tgsi_visitor::samplers_used bitmask. Then, we didn't declare
all the sampler registers in st_translate_program(). Finally, we
asserted when we tried to emit a tgsi ureg src register with File =
TGSI_FILE_UNDEFINED.
Add the missing assignments and some new assertions to catch the
invalid register sooner.
Cc: "10.3, 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Using the asynchronous DMA engine for multi-dimensional operations seems
to cause random GPU lockups for various people. While the root cause for
this might need to be fixed in the kernel, let's disable it for now.
Before re-enabling this, please make sure you can hit all newly enabled
paths in your testing, preferably with both piglit and real world apps,
and get in touch with people on the bug reports below for stability
testing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85647
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83500
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Unfortunately no LLVM type was generated for pipe_viewport_state -- it
was being treated as a single floating point array --, so llvmpipe (and
any driver that relies on draw/llvm) got totally busted.
We were missing a few files
- The version scripts
- Android & scons build scripts
- A few headers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Add all headers into Makefile.sources
- Don't forget the target-helpers
- Add the python scripts & the formats table/list (csv)
- Temporary add vl/vl_winsys_dri.c to EXTRA_DIST until we rework the
way VL is build.
- Add the following to EXTRA_DIST - they are included via the
generated u_indices_gen.c thus we should not add them to *SOURCES.
indices/u_indices.c
indices/u_unfilled_indices.c
XXX: Should we nuke gallivm/f.cpp ? It seems that no-one is using it.
v2: Rebase
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The DRM_IOCTL_MODE_CREATE_DUMB (and others) IOCTL isn't very rigorously
specified, which has the effect that some kernel drivers do not consider
the .pitch and .size fields of struct drm_mode_create_dumb outputs only.
Instead they will use these as lower bounds and overwrite them only if
the values that they compute are larger than what userspace provided.
This works if and only if userspace initializes the fields explicitly to
either 0 or some meaningful value. However, if userspace just leaves the
values uninitialized and the struct drm_mode_create_dumb is allocated on
the stack for example, the driver may try to overallocate buffers.
Fortunately most userspace does zero out the structure before passing it
to the IOCTL, but there are rare exceptions. Mesa is one of them. In an
attempt to rectify this situation, kernel drivers are being updated to
not use the .pitch and .size fields as inputs. However in order to fix
the issue with older kernels, make sure that Mesa always zeros out the
structure as well.
Future IOCTLs should be more rigorously defined so that structures can
be validated and IOCTLs rejected if output fields aren't set to zero.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Addition of color fmt bitfield to this register (compared to a3xx) means
we need to re-emit if either prog or framebuffer state is dirty.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This reverts commit 8d3f739383.
In the last commit we've updated our check to determine if the actual
code is buildable, rather than if the compiler acknowledges the option.
I.e. did anyone provide -mno-sse4.1 vs is my compiler too old.
Now this code will never be attemped to be build, in both cases.
Confirmed by building mesa with
export CFLAGS='-march=native -mno-sse4.1'
./configure && make
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
So when checking/building sse code we have three possibilities:
1 Old compiler, throws an error when using -msse*
2 New compiler, user disables sse* (-mno-sse*)
3 New compiler, user doesn't disable sse
The original code, added code for #1 but not #2. Later on we patched
around the lack of handling #2 by wrapping the code in __SSE4_1__.
Yet it lead to a missing/undefined symbol in case of #1 or #2, which
might cause an issue for #2 when using the i965 driver.
A bit later we "fixed" the undefined symbol by using #1, rather than
updating it to handle #2. With this commit we set things straight :)
To top it all up, conventions state that in case of conflicting
(-enable-foo -disable-foo) options, the latter one takes precedence.
Thus we need to make sure to prepend -msse4.1 to CFLAGS in our test.
v2: Clean the #includes. Suggested by Ilia, Matt & Siavash.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Tested-by: Siavash Eliasi <siavashserver@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Very initial support. Basic stuff working (es2gears, es2tri, and maybe
about half of glmark2). Expect broken stuff. Still missing: mem->gmem
(restore), queries, mipmaps (blob segfaults!), hw binning, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This will be reused for the scalar VS pass.
v2 (Ken): Rebase on master.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll reuse this toplevel optimization driver for the scalar VS.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These last few operations all only apply when we've actually generated
code, optimized and allocated registers. The dummy and the repclear
shaders don't need the gen4 send workaround, and don't spill. This
means we can move these lines into the else-branch, which will make
the following refactoring easier.
v2 (Ken): Rebase on master, which removed the uncompressed stack.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We split out SIMD8 and SIMD16 generation into seperate calls to
new method generate_code(), which returns the start offset for the
generated code. A new get_assembly() method returns the generated code.
This avoids asserting MESA_SHADER_FRAGMENT and accessing wm_prog_data
in the generator.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Derived from st/glx's GLX_EXT_create_context_es/es2_profile implementation.
Tested with an OpenGL ES 2.0 ApiTrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
The latest version of the specs explicitly allow it, and given that Mesa
universally supports KHR_debug we should definitely support it.
Totally untested. (Just happened to noticed this while implementing
GLX_EXT_create_context_es2_profile for st/xlib.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
17 insertions(+), 102 deletions(-). Works just as well.
v2: Make emit_math take const references (suggested by Matt),
drop redundant WRITEMASK_XYZW setting (Matt and Curro).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Every other unit in the geometry pipeline automatically enables
statistics gathering. This part of the pipe has been controlled by the
DEBUG_STATS variable, but this is asymmetric. This dates back to the
original implementation, and I am not sure if there is a reason for it.
I need access to these stats to implement ARB_pipeline_statistics_query.
Eric wrote it, and Ken touched it last. Do you have any opposition?
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86145
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
According to gen2 BSpec the pipeline must be flushed at least up to the
windower before changing the scissor rect enable field. Emitting the
3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE is sufficient
to do that.
gen3 BSpec no longer has that piece of text, but let's make the same
change there too for symmetry. The spec does still say that the scissor
rectangle must be defined before enabling it, so the new order does seem
more in line with the spec.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Gen2 doesn't have fragment shaders so we shouldn't be calling
_mesa_meta_glsl_Clear() on gen2. Restore the appropriate
ARB_fragment_shader check to the clear path which was lost in:
commit 94f22fbe78
Author: Tapani Pälli <tapani.palli@intel.com>
Date: Wed Aug 8 20:46:45 2012 +0300
intel: use _mesa_meta_Clear with OpenGL ES 1.1 v2
v2: Fix spelling in commit message
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
TEXTURE_SET() is the only register macro that forgets to wrap the
argument evaluation in parens. Only simple integers are passed to this
macro so there's no bug but sitll it seems prudent to add the
parens.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Gen2 doesn't support depth/stencil textures, and since
commit c1d4d49993
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Thu Apr 24 14:11:43 2014 +0300
i915: Don't advertise Z formats in TextureFormatSupported on gen2
depth/stencil formats are no longer accepted as texture formats.
However we still want depth/stencil renderbuffers, so add explicit
format checks to intel_alloc_renderbuffer_storage() to allow such
things.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
gen2 doesn't supporte linear mip filter with anisotropic min/mag
filtering. The hardware would automagically downgrade the min/mag
filters to linear in such cases, which IMO looks worse than forcing
the mip filter to nearest.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
The spec says using DOT4 for alpha is undefined unless DOT4 is also used
for color. It seems to do the right thing anyway, but better safe than sorry.
Also override numAlphaArgs to 2 for DOT4 since that's what it wants.
This migth fix something in case the specified alpha mode has only one
argument. Also avoids emitting a needless 3DSTATE_MAP_BLEND_ARG if
the specified alpha mode has three arguments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
All the shaders we've received so far had this be the case, but with
nir-to-tgsi that changed.
I might decide to make nir-to-tgsi keep the outputs in the same order, for
debugging sanity, but I'm not sure.
apitrace now supports it, and it makes it much easier to test
tracing/replaying on OpenGL ES contexts since
GLX_EXT_create_context_{es2,es}_profile are widely available.
Reviewed-by: Brian Paul <brianp@vmware.com>
I used these in the SEL peephole, but they require extra tracking and
fix ups. The SEL peephole can pretty easily find the blocks it needs
without these.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Make the helpers fill out valid Gen7 3DSTATE_SF and 3STATE_SBE. This
prevents the helpers from having to do
dw[0] = GEN7_SBE_DW1_x; // setting DW1 value to dw[0]!?
and simplifies gen7_3DSTATE_{SF,SBE}().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Render stream and render enable are independent from so enable. Having a
single return point makes it easier to see that.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Started to make pipe_stream_output_info mandatory, but ended up adding support
for stream id and making a workaround Gen7-specific.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It replaces gen6_fill_3dstate_constant(). gen6_3DSTATE_CONSTANT_{VS,GS,PS}
are made wrappers of the new function.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Add gen6_so_3DSTATE_GS(), gen6_disable_3DSTATE_GS(), and
gen7_disable_3DSTATE_GS() to do SO on GEN6 or to disable GS.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
On x86-64 this saves 8 bytes of padding in the structure, and this
reduces the size of the structure to 32 bytes.
v2: Fix constructor so that GCC won't warn about the order of
initialization.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Due to the total number of bits used in the bitfield, this does not
increase the size of the structure.
It does, however, reduce the number of instructions required each time
one of these fields is accessed. To access ::matrix_columns with the
bitfield, three instructions were required:
movzbl 0x9(%rdx),%eax
shr %al
and $0x7,%eax
As a uint8_t, only one instruction is required.
movzbl 0xa(%rdx),%eax
These fields are accessed *a lot*.
Valgrind callgrind results for a trace of Tesseract:
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (64-bit): 48,103,497 16,556,096 676,447
After (64-bit): 45,722,616 15,737,964 670,607
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (32-bit): 61,472,611 21,051,222 821,361
After (32-bit): 57,987,421 19,872,226 811,609
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
There are no uniforms in OpenGL ES 1.x, so we can't even get to this
code in that API.
Also, reorder the checks. First check that transpose is true, then
check whether or not that is legal in the current API. transpose should
never be true in an ES2 context, so this gets one check (the more
expensive one) out of the main path.
Valgrind callgrind results for a trace of Tesseract:
_mesa_UniformMatrix4fv _mesa_UniformMatrix3fv
Before (64-bit): 96,119,025 24,240,510
After (64-bit): 90,726,569 22,926,662
_mesa_UniformMatrix4fv _mesa_UniformMatrix3fv
Before (32-bit): 132,434,452 29,051,808
After (32-bit): 126,658,112 27,989,316
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Before ARB_explicit_uniform_location, Mesa's location encoding allowed
locations for non-array types that had non-zero array indices.
Basically, part of the location was the uniform and part was the array
index. This meant that some checks had to occur for arrays and
non-arrays. This is no longer possible, we the checks can be split up.
Valgrind callgrind results for a trace of Tesseract:
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (64-bit): 50,499,557 17,487,316 686,227
After (64-bit): 50,023,791 17,274,432 684,293
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (32-bit): 62,968,039 21,732,380 828,147
After (32-bit): 62,373,967 21,490,756 826,223
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
I really wanted to remove 'shProg != NULL' as well, but that would have
required adding a dummy program as the default program. That seemed
like more churn than removing one test was worth.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Only one caller wanted to generate an error when location == -1, so move
the error generation to that caller. There will be more callers in the
future that do not want to generate errors.
Move the location == -1 check later in validate_uniform_parameters. As
currently implemented, glUniform1iv(-1, -1, data) would not generate an
error, but it should due to count being < 0.
The location that I have moved it to will make more sense with the next
commit.
Valgrind callgrind results for a trace of Tesseract:
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (64-bit): 51,241,217 17,740,162 689,181
After (64-bit): 50,499,557 17,487,316 686,227
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (32-bit): 63,940,605 21,987,918 831,065
After (32-bit): 62,968,039 21,732,380 828,147
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The GL_ enums were previously used because glsl_types.h couldn't be used
in C code. That was fixed some time ago (and uniforms.c already
includes glsl_types.h), so this is no longer necessary.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Derive whether a RT supports blending, logicop, and the like when
set_framebuffer_state() is called. This enables us to simplify
gen6_BLEND_STATE().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
According to the documentation, line widths higher than 40.0 may have
quality problems. That's already 20 times larger than we've been
exposing, so it seems totally sufficient.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We've artificially been limiting this to 5 for no particular reason.
On Gen4-5, the limit is [0, 7.5] with a granularity of 0.5 (U3.1).
On Gen6+, the limit is [0, 7.9921875]. Since it's a U3.7, the
granularity should be 0.125 (1/8).
This patch conservatively advertises one granularity smaller than the
hardware's maximum value, just in case there's a problem using the
largest possible value. On Gen4-5, this is 7.5 - 0.5 = 7.0. On Gen6+,
this is 8.0 - 0.125 = 7.875.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Rather than hardcoding platform values in every code path, just use the
maximum value we set.
Currently, ctx->Const.LineWidth == 5, which is smaller than the hardware
limit. But applications shouldn't be using a value larger than we
support anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Line Width moved to DW1 bits 29:12. It's actually now a U11.7.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Use clamping constants that guarantee no integer overflows.
As spotted by Chris Forbes.
This causes the code to change as:
- value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f);
+ value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f);
- value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f));
+ value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483520.0f));
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
On Windows, DllMain calls and thread creation/destruction are
serialized, so when llvmpipe is destroyed from DllMain waiting for the
rasterizer threads to finish will deadlock.
So, instead of waiting for rasterizer threads to have finished, simply wait for the
rasterizer threads to notify they are just about to finish.
Verified with this very simple program:
#include <windows.h>
int main() {
HMODULE hModule = LoadLibraryA("opengl32.dll");
FreeLibrary(hModule);
}
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
No longer needed as the file was removed with
commit 8c229d306b
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Mon Aug 11 10:07:07 2014 -0700
i965: Delete the Gen8 code generators.
We now use the brw_eu_emit.c code instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
During teardown we free the driver_configs list pointer, but we forget
to deallocate each config in that list.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
The function is not called by platform_drm. As such one needs to
pay special attention at teardown.
v2: Fix the comment block. Spotted by Ken.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Earlier commit failed to attribure that for drm platforms one does not
call dri2_create_screen, thus it does not create the screen and
driver_configs but inherits them from the "display" - gbm.
As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent
the issue and still cleanup correctly for non-drm platforms.
v2:
- Drop the ifdef HAVE_DRM_PLATFORM, reindent the code and fix the
comment block. Suggested by Ken.
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Move opcode to string mappings to functions of their own. Have for consistent
outputs for similar opcodes.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
When the earlier block ended with control flow, we'd mistakenly remove
some of its links to its children. The same happened with the later
block.
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
A pattern in certain shaders is:
uniform vec4 colors[NUM_LIGHTS];
for (int i = 0; i < NUM_LIGHTS; i++) {
...use colors[i]...
}
In this case, the application author expects the shader compiler to
unroll the loop. By doing so, it replaces variable indexing of the
array with constant indexing, which is more efficient.
This patch extends the heuristic to see if arrays accessed within the
loop are indexed by an induction variable, and if the array size exactly
matches the number of loop iterations. If so, the application author
probably intended us to unroll it. If not, we rely on the existing
loop-too-large heuristic.
Improves performance in a phong shading microbenchmark by 2.88x, and a
shadow mapping microbenchmark by 1.63x. Without variable indexing, we
can upload the small uniform arrays as push constants instead of pull
constants, avoiding shader memory access. Affects several games, but
doesn't appear to impact their performance.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Consider GLSL code such as:
const ivec2 offsets[] =
ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1),
ivec2(0, -1), ivec2(0, 0), ivec2(0, 1),
ivec2(1, -1), ivec2(1, 0), ivec2(1, 1));
ivec2 offset = offsets[<non-constant expression>];
Both i965 and nv50 currently handle this very poorly. On i965, this
becomes a pile of MOVs to load the immediate constants into registers,
a pile of scratch writes to move the whole array to memory, and one
scratch read to actually access the value - effectively the same as if
it were a non-constant array.
We'd much rather upload large blocks of constant data as uniform data,
so drivers can simply upload the data via constbufs, and not have to
populate it via shader instructions.
This is currently non-optional because both i965 and nouveau benefit
from it, and according to Marek radeonsi would benefit today as well.
(According to Tom, radeonsi may want to handle this itself in the long
term, but we can always add a flag when it becomes useful.)
Improves performance in a terrain rendering microbenchmark by about 2x,
and cuts the number of instructions in about half. Helps a lot of
"Natural Selection 2" shaders, as well as one "HOARD" shader.
total instructions in shared programs: 5473459 -> 5471765 (-0.03%)
instructions in affected programs: 5880 -> 4186 (-28.81%)
v2: Use ir_var_hidden to avoid exposing the new uniform via the GL
uniform introspection API.
v3: Alphabetize Makefile.sources properly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In the compiler, we'd like to generate implicit uniforms for internal
use. These should not be visible via the GL uniform introspection API.
To support that, we add a new ir_variable::how_declared value of
ir_var_hidden, and plumb that through to gl_uniform_storage.
v2 (idr): Fix some memory management issues in
move_hidden_uniforms_to_end. The comment block on the function has more
details.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Makes use of SSE 4.1 to speed up compute of min and max elements.
Callgrind cpu usage results from pts benchmarks:
Openarena 0.8.8: 3.67% -> 1.03%
UrbanTerror: 2.36% -> 0.81%
V5:
- actually make use of the optimisation in android (Emil Velikov)
- set a better array size limit for using SSE and added TODO
V4:
- fixed bugs with incrementing pointer and updating counters
V3:
- Removed sse_minmax.c from Makefile.sources
- handle the first few values without SSE until the pointer is aligned
and use _mm_load_si128 rather than _mm_loadu_si128
- guard the call to the SSE code better at build time
V2:
- removed GL* types
- use _mm_store_si128() rather than _mm_store_ps()
- add runtime check for SSE
- use aligned attribute for local mix/max
- bunch of tidyups
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Walk through the list and free each config, and finally free the list
itself. Freeing approx 20KiB of memory, according to valgrind.
Inspired by a similar patch by enpeng xu.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
driUnbindContext() checks for valid drawables before calling the driver
unbind function. In case of Surfaceless contexts, the drawables are always
Null and we end up not releasing the underlying DRI context. Moving the
call to the driver function before the drawable validity checks fixes things.
Steps to trigger this bug are following:
- create surfaceless context and make it current
- make some other context current
- {another thread} destroy surfaceless context
- make another context current
Signed-off-by: Alexandros Frantzis <Alexandros.Frantzis@canonical.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74563
The dummy shader sends an EOT message to end itself. There are many more
works need to be done on the compiler side before we can advertise
PIPE_CAP_COMPUTE.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Use util_dynarray in ilo_set_global_binding() to allow for unlimited number of
global bindings. Add a comment for global bindings.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
We need to know the local/input/private sizes and others. This is not
complete. We need many others for CURBE setup.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
They will be used to report compute params or program compute states.
thread_count can also be used for 3DSTATE_VS.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This started hitting an assertion recently. Only affects Haswell
(Ivybridge doesn't support this meddling with the sampler state pointer,
and ARB_gpu_shader5 is not enabled yet on Broadwell)
14 Piglits crash->pass.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves performance in GLBenchmark 2.7 TRex by 3.88889% +/- 0.336383%
(n=80) at 1280x720 on Broadwell GT3. Together with the previous patch,
it improves performance by 5.42738% +/- 0.541971% (n=10) at 1920x1080.
Note that without the PMA stall fix, this would instead decrease
performance by 22%.
v2: Update comment (noticed by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Certain non-promoted depth cases typically incur stalls. In very
specific cases, we can enable a workaround which improves performance.
Improves performance in GLBenchmark 2.7 TRex by 1.17762% +/- 0.448765%
(n=75) at 1280x720 on Broadwell GT3.
Haswell has this feature as well, but we can't currently write registers
from userspace batches (and we'd incur additional software batch
scanning overhead as well), so we haven't enabled it. Broadwell allows
us to write CACHE_MODE_1. Backporters beware: the formula and flushing
incantation differs between Haswell and Broadwell.
v2: Move pma_stall_bits from brw->state to brw itself (requested by
Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Matt requested this in review feedback on the original patch, which I
completely missed when pushing this series. Kristian also made this
change, but I grabbed the wrong version of the patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise, calling glPopAttrib on drivers that don't support
ARB_clip_control gives you a GL error, which is surprising at best.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We're not programming the clear values yet, so this won't work.
This patch should be (effectively) reverted eventually.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
On Skylake, the MOCS bits are an index into a table of 63 different,
configurable cache configurations. As for previous GENs, we only care about
WB and WT, which are available in the documented default set. Define
SKL_MOCS_WB and SKL_MOCS_WT to the indices for those configucations and use
those for the Skylake MOCS values.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake has separate controls for enabling the Z Clip Test for the near
and far planes. For now, maintain the legacy behavior by setting both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake's 3DSTATE_DS packet has a few more fields; we don't support
domain shaders yet though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
They are the same as for BDW, so just add a case for SKL to the init switch.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On SKL, 3DSTATE_CONSTANT_* command is not committed until we give
the corresponding 3DSTATE_BINDING_TABLE_POINTERS_* command. If we
fail to do so, the constant buffers wont be read and push constants
will be wrong.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Otherwise they overlap and horrible things happen. All the new DWords
are for fast color clear values, which we don't do yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We will need to allocate more DWords on Skylake.
v2: Don't mark brw_context parameter const. It's modified.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake introduces a new base address for a feature we don't yet expose.
Setting these to 0 should be safe.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake uploads the stencil reference values in DW3 of the
3DSTATE_WM_DEPTH_STENCIL packet, rather than in COLOR_CALC_STATE.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake has some extra bits in PIPELINE_SELECT, none of which are
interesting for a 3D driver. In order to selectively change them, it
also introduces new "mask bits" in 15:8. We care about the "Pipeline
Selection" bits (1:0), so set the mask to 0x3.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This commands has seen the addition of 2 dwords that allow to specify
which channels of which attributes need to be forwarded to the fragment
shader.
v2: Rebase forward a year (done by Ken).
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The CSE pass now prints out why it thinks a value is not a candidate for
adding to the AE set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The optimization in commit d056863b covers these cases, which were the
first optimizations I added to the GLSL compiler.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should trigger CL_INVALID_VALUE if device_list is NULL and num_devices
is greater than zero.
Introduced by e5468dfa52
Reported by: EdB
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Between release 3.2 and 3.3 LLVM stopped aligning properly when certain
conditions (no allocas, but large number of vectors causing spills to
the stack, and frame pointer omission enabled).
We were already disabling frame-pointer-omission on several build types,
but we now disable it on all build types.
It's not clear whether this affects 32-bits x86 processes only, or if it
can also affect 64-bits x86_64 processes when AVX registers are
available and used. So disable frame-pointer-omission on both
x86/x86_64 to be on the safe side.
See also:
- http://llvm.org/PR21435
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
AFAICT the number of threads is 80, not 70. I am not sure if Ken knows
something I do not.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Need to do a sqrt().
FWIW, the html that Sphinx 1.1.3 generates for the math expressions
looks completely broken.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Pass and return tgsi_token buffers instead of pipe_shader_state.
And update softpipe driver (the only user of this function).
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Fixes polygon stipple if both DO_PSTIPPLE_IN_DRAW_MODULE and
DO_PSTIPPLE_IN_HELPER_MODULE are zero/off.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This was a regression introduced by
611d66fe45
Passing a binary program to clBuildProgram() is legal, but passing one
to clCompileProgram() is not.
v2:
- Code cleanups.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This adds a query which allows drivers to access the config
information of a specific function within the LLVM generated ELF
binary. This makes it possible for the driver to handle ELF
binaries with multiple kernels / global functions.
We are about to change mesa to spawn threads for deferred glCompileShader and
glLinkProgram, and we need to make sure those threads can send compiler
warnings/errors to the debug output safely.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There may be two contexts compiling shaders at the same time, and we want the
anonymous struct id to be globally unique.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
_mesa_strtod and _mesa_strtof may be called from multiple threads. They need
to be thread-safe.
v2: platform checks are now done in configure.ac
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the assumptions that xlocale.h implies newlocale and strtof_l. SCons is
updated to define HAVE_XLOCALE_H on linux and darwin.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both core mesa and glsl have their own wrappers for strtof_l. Merge
and move them to util/. They are compiled with a C++ compiler so that
we can make them thread-safe in a following commit.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whiteacpe.org>
This reverts commit cabc93c5ad.
Mark thinks the failures on the SNB GT2 in the lab are actually because
of faulty hardware, not instruction compaction. The GT1 didn't see any
problems after changes to the compaction code.
Helps a small number of vertex shaders in the games Dungeon Defenders
and Shank, as well as an internal benchmark.
instructions in affected programs: 2801 -> 2719 (-2.93%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use the UST value provided in the PRESENT_COMPLETE_NOTIFY event
rather than gettimeofday(), which gives us the presentation time
instead of the time when SwapBuffers was called. Suggested by
Keith Packard. This relies on the fact that the X DRI3/Present
implementations use microseconds for UST.
v3: Properly ignore PresentCompleteKindMSCNotify; multiply in 64 bits
(caught by Keith Packard).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Keith Packard <keithp@keithp.com> [v3]
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
These source files support actual geometry shaders, so using "gs" for
the name makes a lot of sense. We're going to be adding SIMD8 geometry
shader support as well, at which point "vec4_gs" will be a misnomer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
The brw_gs.[ch] and brw_gs_emit.c source files contain code for
emulating fixed-function unit functionality (VF primitive decomposition
or SOL) using the GS unit. They do not contain code to support proper
geometry shaders.
We've taken to calling that code "ff_gs" (see brw_ff_gs_prog_key,
brw_ff_gs_prog_data, brw_context::ff_gs, brw_ff_gs_compile,
brw_ff_gs_prog). So it makes sense to make the filenames match.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
The GL functions and driver hooks use corresponding names---for example,
glMapBufferRange and Driver.MapBufferRange. But our implementation was
called "intel_bufferobj_map_range," which has the words "map" and
"buffer" swapped, as well as randomly adding "obj."
FlushMappedBufferRange was even trickier: it ordered the words
3, "obj", 1, 2, 4: intel_bufferobj_flush_mapped_range.
Even though the old names were consistent, I always had trouble
rearranging the jumble of words when searching for a function,
and it took a few tries to eventually land there.
The new names match the word order of GL and the driver hooks;
FlushMappedBufferRange is simply brw_flush_mapped_buffer_range.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The OpenGL 4.0 core profile specification, section 2.17.3
Transform Feedback Draw Operations says:
"The error INVALID_VALUE is generated if <stream> is greater
than or equal to the value of MAX_VERTEX_STREAMS.
...
The error INVALID_OPERATION
is generated if EndTransformFeedback has never been called
while the object named by id was bound."
Fixes the piglit test:
ARB_transform_feedback3/arb_transform_feedback3-draw_using_invalid_stream_index
(with the test itself fixed to eliminate an unrelated failure)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we're checking if the framebuffer is sRGB capable, call
is_format_supported() with the PIPE_BIND_DISPLAY_TARGET flag.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This reverts commit 20836c8185.
255 is a huge number. If you have a loop with 255 iterations, unrolling it
will exceed the SM3 instruction limit. Let's use the default again.
The comment about a SM3 limit doesn't make sense. For SM3, we generally
want 32 (default) or a lower number due to the SM3 instruction limit, which
is 512 instructions. For SM4, we can try higher numbers if needed, but
some shaders can end up being pretty huge and shader compilation can take
more time.
This fixes a shader compile failure on R500/SM3. Reported on IRC.
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Caveat: Shaders using UBO/sampler indexing will
not be optimized by SB, due to SB not currently
supporting the necessary CF_INDEX_[01] index
registers.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
The GL side of this extension just provides an accessor via glGetIntegerv for
the value of GL_CONTEXT_RELEASE_BEHAVIOR so it is trivial to implement. There
is a constant on the context for the value of the enum which is initialised to
GL_CONTEXT_RELEASE_BEHAVIOR_FLUSH. The extension is always enabled because it
doesn't need any driver interaction to retrieve the value.
If the value of the enum is anything but FLUSH then _mesa_make_current will
now refrain from calling _mesa_flush. This should only affect drivers that
explicitly change the enum to a non-default value.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The main incentive to do this is to get the defines for the
GL_KHR_context_flush_control extension.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Different platforms require the offset to be in different units. However,
the generator fixes all of this up for us and only requires an offset in
bytes. Previously, we were getting this wrong all over the place. Some
computed/used it correctly as bytes while others treated the offset as
whole registers or computed it as bytes or bytes*2 in SIMD16 mode. This
commit cleans all this up and makes us properly treat it as bytes
everywhere.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
I thought this would be a clever way to make spilling less expensive.
However, it appears that the oword read/write messages we are using for
spilling ignore the execution size and assume SIMD16 whenever working with
more than one register.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Fanin (merge) nodes require it's srcs to be "adjacent" in consecutive
scalar registers. Keep track of instruction neighbors in copy-
propagation step and avoid eliminating mov's which would cause an
instruction to need multiple distinct left and/or right neighbors.
This lets us not fall on our face when we encounter things like:
1: MOV TEMP[2], IN[0].xyzw
2: TEX OUT[0].xy, TEMP[2], SAMP[0], SHADOW2D
3: MOV TEMP[2].xy, IN[0].yxzz
4: TEX OUT[0].zw, TEMP[2], SAMP[0], SHADOW2D
5: END
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Always insert extra mov's for the tex coord into the fanin. This
simplifies things a bit, and avoids a scenario where multiple sam
instructions can have mutually exclusive input's to it's fanin, for
example:
1: TEX OUT[0].xy, IN[0].xyxx, SAMP[0], 2D
2: TEX OUT[0].zw, IN[0].yxxx, SAMP[0], 2D
The CP pass can always remove the mov's that are not actually needed,
so better to start out with too many mov's in the front end, than not
enough.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
dscis -> noscis
dbypass -> nobypass
a bit more consistant w/ nobin, etc. And IMO a bit more sensible names.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Kills get added to the outputs list, to ensure they get scheduled. But
they aren't *really* outputs so skip them in the header comment block.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Adding this makes 'make check' catch failures introduced from
within ARB_clip_control.xml earlier.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
In order to test compiler changes more easily, spit out the assembled
shader with some header information so that we can know about
inputs/outputs more easily.
See: git://people.freedesktop.org/~robclark/ir3test
In ir3test we have a big collection of tgsi shaders and reference
ir3_compiler outputs. When making compiler changes, regenerate the
compiler outputs and feed to ir3test to compare the new vs reference
shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The last few dwords were skipped if the total number of dwords was not a
multiple of 4. Change the formatting for better readability.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
We only get here if the VS/GS compiled programs change, but we can even
skip it if the VS/GS size didn't change.
Affects cairo runtime on glamor by -1.26471% +/- 0.674335% (n=234)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just remove the parameter. Silences:
brw_program.c: In function 'brw_dump_ir':
brw_program.c:566:33: warning: unused parameter 'brw' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Originally I just fixed some unused parameter warnings in this
function. However, Ken pointed out:
"You could instead remove this driver hook. If the dd pointer is
NULL, arbprogram.c will return true. I think I'd prefer that."
Way, way back in time, I think _mesa_GetProgramivARB had the opposite
behavior. Given that it works the way it now works, I also prefer
removing the driver hook.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just remove the parameter. Silences:
../../src/mesa/main/uniform_query.cpp:1062:1: warning: unused parameter 'ctx' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Silences:
../../src/mesa/main/shaderobj.c: In function '_mesa_init_shader_program':
../../src/mesa/main/shaderobj.c:239:46: warning: unused parameter 'ctx' [-Wunused-parameter]
For now, this adds a couple other unused parameter warnings, but future
patches will clean those up.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_link_shader_program already calls _mesa_clear_shader_program_data
before calling link_shaders, so this is already done.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
_mesa_UseShaderProgramEXT, _mesa_ActiveProgramEXT, and
_mesa_CreateShaderProgramEXT were all removed when support for
GL_EXT_separate_shader_objects was removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just remove the parameter. Silences:
../../src/mesa/main/ff_fragment_shader.cpp:668:1: warning: unused parameter 'p' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were allowing the register allocation code to do the
computation for us in ra_set_finalize. However, the runtime for this
computation is O(c^4 * g) where c is the number of classes and g is the
number of GRF registers. However, these q-values are directly computable
based on the way we lay out our register classes so there is no need for
the aweful runtime algorithm.
We were doing ok until commit 7210583eb where we bumped the number of
register classes from 11 to 16. While startup times don't normally matter,
this caused piglit to take 4 times as long to run on Bay Trail. This patch
should make generating the ra_set much faster and melt the piglit run
times.
v2: Fixed a couple of bugs. I have now verified that the same q-values are
generated both ways.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On older GENs in SIMD16 mode, we were accidentally building too much
interference into our register classes. Since everything is divided by 2,
the reigster allocator thinks we have 64 base registers instead of 128.
The actual GRF mapping still needs to be doubled, but as far as the ra_set
is concerned, we only have 64. We were accidentally adding way too much
interference.
Signed-off-by: Jason Ekstrand <jason.ekstrand@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For GEN6 SIMD16 mode, we have to 2-align all the registers, so we only have
the even-numbered ones. This means that we have to divide the register
number by 2 when we precolor. This wasn't a problem before because we were
setting up the interference between ra_node registers wrong. This will be
fixed in the next commit.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This shouldn't be a functional change since reg_belongs_to_class is just a
wrapper around BITSET_TEST. It just makes the code a little easier to
read.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gallium should be prepared fine for ARB_clip_control.
So enable this and mention it in the release notes.
v2:
Only enable for drivers announcing the freshly introduced
PIPE_CAP_CLIP_HALFZ capability.
v3:
Use extension enable infrastructure to connect PIPE_CAP_CLIP_HALFZ
with ARB_clip_control.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
In preparation of ARB_clip_control. Let the driver decide if
it supports pipe_rasterizer_state::clip_halfz being set to true.
v3:
Initially enable on ilo.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de
Restore clip control to the default state if MESA_META_VIEWPORT
or MESA_META_DEPTH_TEST is requested.
v3:
Handle clip control state with MESA_META_TRANSFORM.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Implement the mesa parts of ARB_clip_control.
So far no driver enables this.
v3:
Restrict getting clip control state to the availability
of ARB_clip_control.
Move to transformation state.
Handle clip control state with the GL_TRANSFORM_BIT.
Move _FrontBit update into state.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This is for preparation of ARB_clip_control.
v3:
Add comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This allows vc4_opt_cse.c to CSE-away operations involving the same
uniform values.
total instructions in shared programs: 37341 -> 36906 (-1.16%)
instructions in affected programs: 10233 -> 9798 (-4.25%)
total uniforms in shared programs: 10523 -> 10320 (-1.93%)
uniforms in affected programs: 2467 -> 2264 (-8.23%)
This saves a bunch of extra flushes when texsubimaging a whole texture
that's been used for rendering, or subdataing a whole BO. In particular,
this massively reduces the runtime of piglit texture-packed-formats (when
the probes have been moved out of the inner loop).
I'm going to be using VC4_DEBUG=shaderdb,norast to do shaderdb stats, but
when debugging regressions, I want to match shaderdb output to shader
disassembly.
fd_bo_cpu_prep() doesn't realize the bo is already referenced in
unflushed cmdstream. It could be made to do so (but would have to be
implemented twice, ie. both for msm and kgsl). But we still can't do
the expected thing if the caller isn't using _NOSYNC. Because of the
way the tiling works, we need to build quite a bit of cmdstream at flush
time, which is not possible to do at the libdrm level.
So rather than trying to make fd_bo_cpu_prep() smarter than it can
possibly be, just *always* discard and reallocate if the
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag is set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
- Use macro for munmap under Android - the STATIC_ASSERT uses
a off_t which is not used under Android for mmap. As loff_t size
does not vary as does off_t just ignore the assert.
- Wrap the long lines to improve readability.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The setMCJITMemoryManager method doesn't exist in LLVM 3.3.
I thought I had tested the latest version of my earlier change with LLVM
3.3, but it looks I missed it.
Trivial.
JITMemoryManager was removed in LLVM 3.6, and replaced by its base class
RTDyldMemoryManager.
This change fixes our JIT memory managers specializations to derive from
RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.
This enables llvmpipe to run with LLVM 3.6.
However, lp_free_generated_code is basically a no-op because there are
not enough hook points in RTDyldMemoryManager to track and free the code
of a module. In other words, with MCJIT, code once created, stays
forever allocated until process destruction. This is not speicfic to
LLVM 3.6 -- it will happen whenever MCJIT is used regardless of version.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We'll need to update gallivm for the interface changes in LLVM 3.6, and
the fewer the number of older LLVM versions we support the less hairy that
will be.
As consequence HAVE_AVX define can disappear. (Note HAVE_AVX meant
whether LLVM version supports AVX or not. Runtime support for AVX is
always checked and enforced independently.)
Verified llvmpipe builds and runs with with LLVM 3.3, 3.4, and 3.5.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The functionality exposed by those extensions does not appear in ES3
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the user calls util_blitter_cache_all_shaders() set a flag and assert
that we never try to create any new fragment shaders after that point.
If the assertions fails, it means we missed generating some shader in
util_blitter_cache_all_shaders().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, we generated an extra CMP instruction:
cmp.ge.f0(8) g6<1>D g1<0,4,1>F 0F
cmp.nz.f0(8) null g6<4,4,1>D 0D
(+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F
The first operand is always a boolean, and we want to predicate the SEL
on that. Rather than producing a boolean value and comparing it against
zero, we can just produce a condition code in the flag register.
Now we generate:
cmp.ge.f0(8) null g1<0,4,1>F 0F
(+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F
No difference in shader-db.
v2: Remember to delete the old code (thanks Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using dst_reg(this, ir->type) automatically sets the writemask to the
proper size for the type; src_reg(dst_reg) preserves that. This should
be equivalent, but less code.
Note that src_reg(dst_reg) either uses SWIZZLE_XXXX or SWIZZLE_XYZW, so
the old code did need the manual writemask adjustment, since it
constructed the registers the other way around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Nothing uses the vector_elements temporary variable.
Setting this->result.file is dead because we overwrite this->result a
few lines later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we generated an extra CMP instruction:
cmp.ge.f0(8) g4<1>D g2<0,1,0>F 0F
cmp.nz.f0(8) null g4<8,8,1>D 0D
(+f0) sel(8) g120<1>F g2.4<0,1,0>F g3<0,1,0>F
The first operand is always a boolean, and we want to predicate the SEL
on that. Rather than producing a boolean value and comparing it against
zero, we can just produce a condition code in the flag register.
Now we generate:
cmp.ge.f0(8) null g2<0,1,0>F 0F
(+f0) sel(8) g124<1>F g2.4<0,1,0>F g3<0,1,0>F
total instructions in shared programs: 5473459 -> 5473253 (-0.00%)
instructions in affected programs: 6219 -> 6013 (-3.31%)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A while back, Matt made the uniform upload functions simply upload
ctx->Const.UniformBooleanTrue for boolean values instead of 0/1, which
removed the need to convert it later. We also set UniformBooleanTrue to
1.0f for drivers which want to treat booleans as 0.0/1.0f.
Nothing ever sets these, so they are dead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't have a scissor enable bit in hw, so when a raster state change
results in scissor enable bit changing, we need to also mark scissor
state as dirty.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The optimization of avoiding restore (mem2gmem) if there was a clear
falls down a bit if you don't have a fullscreen scissor. We need to
make the decision logic a bit more clever to keep track of *what* was
cleared, so that we can (a) completely skip mem2gmem if entire buffer
was cleared, or (b) skip mem2gmem on a per-tile basis for tiles that
were completely cleared.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
DataLayoutPass was added in LLVM 3.5 r202168, commit
57edc9d4ff1648568a5dd7e9958649065b260dca "Make DataLayout a plain
object, not a pass.".
This patch fixes this build error with LLVM 3.4.
CXX llvm/libclllvm_la-invocation.lo
llvm/invocation.cpp: In function 'void {anonymous}::optimize(llvm::Module*, unsigned int, const std::vector<llvm::Function*>&)':
llvm/invocation.cpp:324:18: error: expected type-specifier
PM.add(new llvm::DataLayoutPass(mod));
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85189
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The values are hardcoded in the LLVM backend, but the TGSI definitions are
going to be changed with tessellation, e.g. TGSI_PROCESSOR_COMPUTE will be
increased by 2.
We'll use VS for LS and HS, because there's nothing special about them
from the LLVM backend point of view, even though the hardware side is
different. We do the same for ES.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I'll need indexed loads without the meta data flag for tessellation later.
Also rename load_const to buffer_load_const to distinguish it from indexed
const loads.
v2: add comments
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
We must convert it to boolean from the DX9 float encoding that Gallium
specifies.
Later, we should probably define that FACE should be 0 or ~0 if native
integers are supported.
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
With 5 shader stages and various combinations of enabled and disabled shaders,
the maximum number of outputs in one shader doesn't have to be equal to
the maximum number of inputs in the following shader.
v2: return 32 for softpipe and llvmpipe
If the writemask doesn't compress, then we want to put in the uncompressed
writemask, not the compressed writemask failure value (all-on).
Fixes glean's stencil2 and fbo-clear-formats on stencil.
It seems like the hardware is unhappy if we execute a kill instruction
prior to last input (ei). Probably the shader thread stops executing
and the end-input flag is never set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If app only updates (for example) vertex uniforms, it would be nice to
only re-emit those and not also frag uniforms. Means we need to mark
the first frag shader const buffer dirty after a clear.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The get_variable_being_redeclared() function can free the 'var' argument.
Thereafter, we cannot assume that 'var' is a valid pointer. This patch
replaces 'var->name' with 'earlier->name' in two places and calls
is_gl_identifier(var->name) before 'var' might get freed.
This fixes several piglit GLSL crashes, including:
spec/glsl-1.50/execution/geometry/clip-distance-in-param
spec/glsl-1.50/execution/geometry/clip-distance-bulk-copy
spec/glsl-1.50/compiler/gs-redeclares-pervertex-out-before-global-redeclaration.geom
I'm not sure why these were not spotted sooner.
A similar bug was previously fixed by f9cecca7a.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Patch fixes 'glsl-2types-of-textures-on-same-unit' in WebGL conformance
test suite. No Piglit regressions, fixes gl-2.0-active-sampler-conflict.
To avoid adding potentially heavy check during draw (valid_to_render),
check is done during uniform updates by inspecting TexturesUsed mask.
A new boolean variable is introduced to cache validation state.
v2: take into account case where 2 uniforms use same unit (curro)
also do the check only when SSO is not in use, SSO has own
path for sampler validation.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Patch removes old variable based logic for handling a break inside
switch. Switch is put inside a loop so that existing infrastructure
for loop flow control can be used for the switch, now also dead code
elimination works properly.
Possible 'continue' call inside a switch needs now special handling
which is taken care of by detecting continue, breaking out and calling
continue for the outside loop.
v2: remove one unnecessary ir_expression (Curro)
Fixes following Piglit tests:
fs-exec-after-break.shader_test
fs-conditional-break.shader_test
No Piglit or es3conform regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
GLES2 doesn't have GL_TEXTURE_BASE_LEVEL, so the hardware doesn't. Fixes
piglit levelclamp, tex-miplevel-selection, and texture-storage/2D mipmap
rendering.
Suppresses a bunch of warning noise about sample_map possibly being used
uninitialized.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Before, we used the a signed d-word for booleans and the immedates we
emitted varried between signed and unsigned. This commit changes the type
to unsigned (I think that makes more sense) and makes immediates more
consistent. This allows copy propagation to work better cleans up some
instructions.
total instructions in shared programs: 5473519 -> 5465864 (-0.14%)
instructions in affected programs: 432849 -> 425194 (-1.77%)
GAINED: 27
LOST: 0
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
gl_SampleID is a built-in variable that always is of type "int".
Suggested by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Notably this included the EOF flag (the other bits are the full buffer
dump selection, but we don't do full dumps), which caused the kernel
checking for frame completion to trigger.
The other driver does this manually before calling into each tile, but we
can just let it get binned into the tiles (saving repeated kernel
validation on the packet).
Fixes simulator assertion failures on polygon-mode and non-auto texwrap.
We don't need to emit all of our current state at the end of each bin
list. We're going to be smashing it all at the start of the next tile's
bin list, anyway.
The trick is to generate a unique buffer usage value for each possible
combination of domains and flags, with only one bit set each for the
domains and flags. This ensures pb_check_usage() only returns TRUE when
the domains and flags the cached buffer was created for exactly match
the requested ones.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are two debug variables:
CLOVER_DEBUG which you can set to any combination of llvm,clc,asm
(separated by commas) to dump llvm IR, OpenCL C, and native assembly.
CLOVER_DEBUG_FILE which you can set to a file name for dumping output
instead of stderr. If you set this variable, the output will be split
into three separate files with different suffixes: .cl for OpenCL C,
.ll for LLVM IR, and .asm for native assembly. Note that when data
is written, it is always appended to the files.
v2:
- Code cleanups
- Add CLOVER_DEBUG_FILE environment variable for dumping to a file.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2:
- Split build_module_native() into three separate functions.
- Code cleanups.
v3:
- More cleanups.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Drivers can return this value for PIPE_COMPUTE_CAP_IR_TARGET
if they want clover to give them native object code.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
NewBufferObject took a "target" parameter, which it blindly passed to
_mesa_initialize_buffer_object(), which ignored it.
Not much point in passing it around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is used to implement GLSL's atomicCounter() intrinsic. Previously
it *worked*, but the disassembly was bogus.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This would have *almost never* actually been an issue, since other state
tends to get flagged at the same time as new ABOs -- but still bogus.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This didn't make any sense, but papered over the missing TexBO flagging
we've just fixed, in a bunch of cases.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When a buffer object is bound to one of the indexed uniform buffer
binding points, assume that from that point on it may be used as
a uniform buffer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In the drivers, we occasionally want to reallocate the backing
store for a buffer object; often to avoid waiting for the GPU
to be finished with the previous contents.
At the point that happens, we don't have a good way of determining
where else the buffer object may be bound, and so no good way of
determining which dirty flags need to be raised -- it's fairly
expensive to go looking at all the possible binding points.
Until now, we've considered any BO to be possibly bound as a UBO or
TexBO, and flagged all that state to be reemitted.
Instead, remember what kinds of binding point this buffer has ever
been used with, so that the drivers can flag only what they need.
I don't expect these bits to ever be reset, but that doesn't matter
for reasonable apps.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The kernel files are built into a separate static library and
all the functions that require it are already wrapped in ifdef
USE_VC4_SIMULATOR. Don't forget the header file :)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we've made all the texture emit code mostly independent of GLSL
IR, this isn't necessary any more.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Before, we had 3 different emit functions for various different gen's,
as well as some ancilliary work that was the same across all gen's which
was either contained in functions or duplicated across the GLSL IR and
Mesa IR backends. Now, we have a single method, emit_texture(), that
takes all the information needed to make a texture instruction and
handles all the setup, and all we have to do to emit a texture
instruction while converting from GLSL IR, Mesa IR, or any new backend
is to extract the information emit_texture() needs and then call it.
v2: Significant rebasing (by Ken).
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This happened to work before, but it would convert the output to a float
and then back to an integer which seems bad.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This drops a dependency on ir_texture objects.
v2 (Ken): Rename lod_components to grad_components, as it only has a
meaningful value for ir_txd. We could set it to 1 for TXL,
but there's no real need.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Our new IR won't have ir_texture objects, but using glsl_type is fine.
v2 (Ken): Drop redundant ir->coordinate NULL check; rebase.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This is slightly clearer. Based on a patch by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This moves the handling of non-constant texel offset subexpression trees
to the place where we visit other such subtrees. It also removes some
uses of ir->offset in emit_texture_gen7, which will be useful when we
write the backend for our new upcoming IR.
Based on a patch by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
brw_lower_unnormalized_offset sets ir->offset to NULL if it applies the
texelFetchOffset workarounds, so there's no need to special case it
here---there won't be an offset for ir_txf.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Eric's original code to work around TXF offset bugs contained a comment
explaining the problem, which was lost when Chris generalized it to an
IR transformation (in commit 598ca510b8).
This commit adds the original comment to the newer code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Because we reuse various bits of emit code (for state/vertex/prog/etc)
for both regular draws and internal draws (gmem<->mem, clear, etc), the
number of parameters getting passed around has been growing. Refactor
to group these into fd3_emit. This simplifies fxn signatures, avoids
passing around shader key on the stack, etc. It also gives us a nice
place to cache shader-variant lookup to avoid looking up shader variants
multiple times per draw (without having to *also* pass them around as
fxn args everywhere).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Get rid of fd3_vertex_buf and use fd_vertex_state directly for all
draws. Removes a tiny bit of CPU overhead for munging around the vertex
state every time it is emitted, but more importantly it cleans things up
for later optimizations, so the emit paths don't have to special case
internal draws (gmem<->mem, clears, etc) with regular draws.
Instead of constructing fd3_vertex_buf array each time for internal
draws, and context init time pre-create solid_vbuf_state and
blit_vbuf_state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Due to the implicit move-from-GRF, unary math looks a lot like the Gen6+
math instruction: it's a single instruction (SEND) with a GRF source.
The difference is that it also implicitly clobbers a message register.
The only visible effect is that CSE will remove the MRF-clobbering from
later math operations. This should be fine; compute_to_mrf and
remove_redundant_mrf_writes don't look at the values populated by
implied writes, so they can't rely on those values being present.
Less interference may actually help those passes make more progress.
Binary math is still problematic, since it involves a separate MOV
instruction to load the second operand. We continue disallowing CSE for
binary math operations.
total instructions in shared programs: 3340303 -> 3340100 (-0.01%)
instructions in affected programs: 26927 -> 26724 (-0.75%)
Nothing hurt, gained, or lost. ~6% reduction on a few shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise the caching buffer manager may return a buffer which was created
with a different set of flags, which can cause trouble.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, when no pkg-config was available for
libexpat we would just add the needed linking
flags without any extra check.
Now, we check that the library and the headers are
also installed in the building environment.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Now that the freedreno_lowering code is moved to tgsi_lowering, remove
our private copy and switch over to using the common version.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Originally the variables were set only once via the ?= operator but
that causes issues when doing incremental builds. They appear to be
undefined and missing from the dependency list despite their addition
to LIBADD.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84807
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The texturing hardware takes the POT level 0 width/height and minifies
those. This is different from what we were doing, for example, for
273-wide's level 5: POT(273>>5) == 8, while POT(273)>>5 == 16.
Fixes piglit-depthstencil-render-miplevels 273.
This patch fixes this build error on DragonFly BSD.
CC os/os_misc.lo
os/os_misc.c: In function 'os_get_total_physical_memory':
os/os_misc.c:132:2: error: #error Unsupported *BSD
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The current implementation of glxUseXFont requires creating
a temporary pixmap and graphics context, which requires a real
old-school X11 Window, not a glxDrawable. This patch changes
things so that glxUseXFont will also accept a glxWindow or
glxPixmap, and lookup the underlying X11 Drawable. Without
this patch glxUseXFont generates a giant stream of Xerrors
about bad drawables and bad graphics contexts.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54372
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
It does not look like an issue now but it is good to be future proof. Spotted
by Courtney Goeltzenleuchter.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This gets them to pass glsl-sin/cos. There was an obvious problem that I
was using the FRC code on the scaled input value, which means that we had
a range in [0, 1], while our taylor is most accurate across [-0.5, 0.5].
We can just slide things over, but that means flipping the sign of the
coefficients. After that, it was just a matter of stuffing more
coefficients in.
There's no reason to stall on pwrite - the CPU always appends to the
buffer and never modifies existing contents, and the GPU never writes
it. Further, the CPU always appends new data before submitting a batch
that requires it.
This code predates the unsynchronized mapping feature, so we simply
didn't have the option when it was written.
Ideally, we would do this for non-LLC platforms too, but unsynchronized
mapping support only exists for LLC systems.
Saves a bunch of stall avoidance copies when uploading shaders.
v2: Rebase on changes to previous patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
We don't really want unnecessary buffer copying, so it'd be nice to know
when it's happening.
v2: Drop stall warnings when doing a read-only CPU mapping of the cache
BO. The GPU also uses it in a read-only fashion, so there won't be
any stalls, even though the buffer is busy. (Thanks to Chris Wilson
for catching this mistake.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
This is easy: we just need to use brw_map_bo instead of mapping it
directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If the VS doesn't output a value that the FS needs, we still need to read
the right contents for the remaining FS inputs, by emitting padding. And
if the VS outputs something the FS doesn't need, we shouldn't put it in
the VPM at all (so the code producing it can get DCEed).
Fixes 77 piglit tests.
Also fixes two sided lighting which was broken at least
on pre-evergreen by commit b1eb00.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is the last use tgsi_parse_token in radeonsi.
It looks ugly because the code was re-indented, but there is really no change
in behavior.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
They were reinventing tgsi_shader_info. They are unused now.
radeon_llvm_context::load_input can be NULL if input fetching is implemented
in some other way.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
No code in Mesa sets the usage mask to any other value.
The final mask is AND'ed with enable bits from the rasterizer state anyway.
If somebody implements setting usage masks in st/mesa, we can use
tgsi_shader_info to get it more easily.
This is a prerequisite for the following commit.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
[ Francisco Jerez: Split off from a larger patch, and take a slightly
different approach for passing the implicit arguments around. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Our current atan()-approximation is pretty inaccurate at 1.0, so
let's try to improve the situation by doing a direct approximation
without going through atan.
This new implementation uses an 11th degree polynomial to approximate
atan in the [-1..1] range, and the following identitiy to reduce the
entire range to [-1..1]:
atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x)
This range-reduction idea is taken from the paper "Fast computation
of Arctangent Functions for Embedded Applications: A Comparative
Analysis" (Ukil et al. 2011).
The polynomial that approximates atan(x) is:
x * 0.9999793128310355 - x^3 * 0.3326756418091246 +
x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 +
x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444
This polynomial was found with the following GNU Octave script:
x = linspace(0, 1);
y = atan(x);
n = [1, 3, 5, 7, 9, 11];
format long;
polyfitc(x, y, n)
The polyfitc function is not built-in, but too long to include here.
It can be downloaded from the following URL:
http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m
This fixes the following piglit test:
shaders/glsl-const-folding-01
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When mapping the buffer a second time, we need to use the new pointer,
not the one from the previous mapping. Otherwise, we will most likely
crash.
Apparently, we've just been getting lucky and getting the same
bo->virtual pointer in both cases. libdrm probably has a hand in that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Copy propagating these might result in reading the r4 after some other
instruction has written r4. Just prevent all copy propagation of this for
now.
Fixes bad rendering with upcoming indirect register access support, where
the copy propagation was consistently happening across another read.
Merging VS and CS into the same struct wasn't winning us anything except
for not allocating a separate BO (but if we want to pack programs into
BOs, we should pack not just those 2 programs together). What it was
getting us was a bunch of code duplication about hash table lookups and
propagating vc4_compile contents into a vc4_compiled_shader.
I was about to make the situation worse with indirect uniform buffer
access.
I wanted to make another set of texture uploads for handling reladdr
constants, and duplicating all the bitshifting looked like a terrible
idea. In the process, this fixes a swap of the s/t texture wrap modes.
Under the simulator, reading registers before writing them triggers an
assertion failure. c->undef gets treated as r0, which will usually be
written, but not if it's used in the first instruction. We should
definitely not be aborting in this case, and return some sort of undefined
value instead.
Fixes glsl-user-varying-ff.
The border color is only needed when using the GL_CLAMP_TO_BORDER or
(deprecated) GL_CLAMP wrap modes; all others ignore it, including the
common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes.
In those cases, we can skip uploading it entirely, saving a bit of space
in the batchbuffer. Instead, we just point it at the start of the
batch (offset 0); we have to program something, and that address is safe
to read.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Write-back caching cannot be used for buffers being scanned out by the
display engine; surfaces used for scan-out must be write-through or
uncached. I originally chose WT for render targets because it works in
all cases. However, we really want to use write-back caching where
possible, as it is more efficient.
Most renderbuffers are not used for scanout - off-screen FBOs certainly
are fine, and non-pageflipped backbuffers should be fine as well. So
in most cases WB will work. However, we don't know what will be used
for scan-out, so we instead simply use the PTE value specified by the
kernel, as it knows these things.
This matches our MOCS choice on Haswell.
Fixes performance regressions since commit ee4484be3d
in a microbenchmark (spotted by Eero Tamminen). Improves performance
in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a
Broadwell GT2. Improves performance in a bunch of other microbenchmarks
by ~15% or so.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all
three caches (L3, LLC, and eLLC where available), but leaves the LLC
caching mode up to the kernel's page table entry.
This allows the kernel to pick WB/WT/UC based on whether it's using a
buffer for scanout.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
These days, most driver debug output happens via stderr, not stdout.
Some applications (such as Xephyr) also appear to close stdout which
makes these messages go nowhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The non-base NPOT levels are stored as POT-aligned images. We get that
POT alignment by minifying the POT-aligned base level.
This means that level strides are also POT aligned, so we have to tell the
rendering mode config that our resource is larger than the actual
requested area.
Fixes the fbo-generatemipmap-formats NPOT cases. Regresses
depthstencil-render-miplevels 273 * -- the texture presentation now works
(where it was completely broken before), it looks like there's some
overflow of image bounds happening at the lower miplevels.
It's fairly easy, thanks to Rob Clark's lowering code. Fixes
two-sided-lighting and 4 vertex-program-two-side testcases, while
regressing 8 testcases that involve enabling two-sided color while only
initializing one of the two colors in the VS. If you're enabling two
sided color, it's of course expected that you really do set up both
colors, so this is still an improvement (and when we set up a linker for
TGSI, we'll hopefully fix those 8 fails).
Lots of drivers need to transform the weird instructions in TGSI into
reasonable scalar ops, and this code can make those translations
canonical.
Acked-by: Rob Clark <robclark@freedesktop.org>
The simulator assertion fails if you have a write to a reg and then a read
(for example, in the NOP side of an instruction), even if the read isn't
used for anything. By setting unused raddrs to NOP, we avoid the problem
(since only the phsyical registers are tracked).
Original patch by Petri Latvala <petri.latvala@intel.com>:
Add an optimization pass that drops min/max expression operands that
can be proven to not contribute to the final result. The algorithm is
similar to alpha-beta pruning on a minmax search, from the field of
AI.
This optimization pass can optimize min/max expressions where operands
are min/max expressions. Such code can appear in shaders by itself, or
as the result of clamp() or AMD_shader_trinary_minmax functions.
This optimization pass improves the generated code for piglit's
AMD_shader_trinary_minmax tests as follows:
total instructions in shared programs: 75 -> 67 (-10.67%)
instructions in affected programs: 60 -> 52 (-13.33%)
GAINED: 0
LOST: 0
All tests (max3, min3, mid3) improved.
A full shader-db run:
total instructions in shared programs: 4293603 -> 4293575 (-0.00%)
instructions in affected programs: 1188 -> 1160 (-2.36%)
GAINED: 0
LOST: 0
Improvements happen in Guacamelee and Serious Sam 3. One shader from
Dungeon Defenders is hurt by shader-db metrics (26 -> 28), because of
dropping of a (constant float (0.00000)) operand, which was
compiled to a saturate modifier.
Version 2 by Iago Toral Quiroga <itoral@igalia.com>:
Changes from review feedback:
- Squashed various cosmetic changes sent by Matt Turner.
- Make less_all_components return an enum rather than setting a class member.
(Suggested by Mat Turner). Also, renamed it to compare_components.
- Make less_all_components, smaller_constant and larger_constant static.
(Suggested by Mat Turner)
- Change mixmax_range to call its limits "low" and "high" instead of
"range[0]" and "range[1]". (Suggested by Connor Abbot).
- Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by
Connor Abbot).
- Make the logic more clearer by rearrenging the code and commenting.
(Suggested by Connor Abbot).
- Added comment to explain why we need to recurse twice. (Suggested by
Connor Abbot).
- If we cannot prune an expression, do not return early. Instead, attempt
to prune its children. (Suggested by Connor Abbot).
Other changes:
- Instead of having a global "valid" visitor member, let the various functions
that can determine this status return a boolean and check for its value
to decide what to do in each case. This is more flexible and allows to
recurse into children of parents that could not be prunned due to invalid
ranges (so related to the last bullet in the review feedback).
- Make sure we always check if a range is valid before working with it. Since
any use of get_range, combine_range or range_intersection can invalidate
a range we should check for this situation every time we use any of these
functions.
Version 3 by Iago Toral Quiroga <itoral@igalia.com>:
Changes from review feedback:
- Now we can make get_range, combine_range and range_intersection static too
(suggested by Connor Abbot).
- Do not return NULL when looking for the larger or greater constant into
mixed vector constants. Instead, produce a new constant by doing a
component-wise minmax. With this we can also remove of the validations when
we call into these functions (suggested by Connor Abbot).
- Add a comment explaining the meaning of the baserange argument in
prune_expression (suggested by Connor Abbot).
Other changes:
- Eliminate minmax expressions operating on constant vectors with mixed values
by resolving them.
No piglit regressions observed with Version 3.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Patch fixes following test case from 'shaders-with-varyings' WebGL
conformance suite: "vertex shader with unused varying and fragment
shader with used varying must succeed"
v2: emit still a warning if the condition happens (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When a shader needs N surfaces, we should upload N surfaces and not depend on
how many are bound. This commit is larger than it should be because we did
not export how many surfaces a surface uses before.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It only needs the constant buffer with clip planes and read-write resources
for the GS->VS ring and streamout. That's 2 pointers.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is a wrong place to flush caches to say the least.
I don't think we need to flush the instruction caches if we don't patch
shaders with DMA.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
st/mesa has the same flag in its shader key, we don't need to do it
in the driver anymore.
Instead, use TGSI_INTERPOLATE_LOC_SAMPLE, which is what st/mesa sets.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The first compiled shader is sometimes useless, because the key doesn't match
the key for the draw call where it's used.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes a few issues, including a potential empty-IB (which triggers gpu
hangs in piglit occlusion_query_meta_no_fragments)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Possibly we should map the front color to black (zeroes). But not sure
there is a way to do that without generating a shader variant.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
1: MOV OUT[0], IMM[0].xyxx
2: END
cause unhappyness. They have an IN[], but once this is compiled the
useless TEX instruction goes away. Leaving a varying that is never
fetched, which makes the hw unhappy.
In the process fix a signed vs unsigned compare. If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
On Windows, the Piglit primitive-restart test was failing a
glGetError()==0 assertion when it was run w/out any command line
arguments. Piglit's all.py script only runs primitive-restart
with arguments so this case isn't normally hit during a full
piglit run.
The basic problem is Microsoft's opengl32.dll calls glFlush
from wglGetProcAddress() and Piglit uses wglGetProcAddress() to
resolve glPrimitiveRestartNV() which is called inside glBegin/End.
See comments in the code for more info.
Plus, improve the comments for _mesa_alloc_dispatch_table().
Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Sinclair Yeh <syeh@vmware.com>
This isn't perfect -- the filtering is happening on the srgb values, and
we're decoding afterwards, which is not what you want. I think that's the
cause of some additional texwrap(GL_CLAMP, LINEAR) failures, though many
other texwrap tests on srgb start to pass since unfiltered values come out
correct.
Since the RA has to be done s.t. each one gets its own (adjacent)
register, it would complicate matters if instructions were allowed to be
repeated. This enables copy-propagation use in situations where
previously that might have happened.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
While running piglit in virgl, I hit an assert in intel driver.
"qemu-system-x86_64: intel_tex.c:219: intel_map_texture_image: Assertion `tex_image->TexObject->Target != 0x8C18 || h == 1' failed."
Thanks to Eric and Ken for pointing me in the right direction,
Fix the get_tex_depth to do the same fixup as get_tex_rgba does
for 1D array textures.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
With current makefiles the build fails because source and build paths
are generated incorrectly. With Android build system the top_srcdir and
top_builddir variables are undefined and all paths are relative to where
Android.mk is located. This ends up with path likes
external/mesa/src/mesa/src/mesa/ for both source and build paths, which
are obviously wrong.
This patch fixes this by overriding resulting SRCDIR and BUILDDIR
variables with empty string, so that paths end up being relative to
Android.mk file again. Appending correct build path to generated files
is already done in Android.gen.mk.
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This patch fixes Android build failures by including src/util directory
in compilation. Files inside of this directory are compiled into
libmesa_util static library and linked with resulting libGLES_mesa.
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Before, we were hard-coding the base_mrf based on dispatch width not number
of registers spilled at a time. This caused us to emit instructions with a
base_mrf or 14 and a mlen of 3 so we used the magical non-existant m16
register. This fixes the problem.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead.
However, some FB write messages can validly be longer than this so we need
something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on
its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for
FB writes.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes a bug where 1-wide operations don't properly translate down to
1-wide instructions.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Above a certain limit use CACHE mode instead of BUFFER mode. This
should solve gpu hangs with large shader programs.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The existing code was not setting several fields, most importantly the
target, which is required on nv50/nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Patch fixes failing test in WebGL conformance test
'point-no-attributes' when running Chrome on OpenGL ES.
(Shader program may draw points using constant data in shader.)
No Piglit regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
They're actually as documented in the HW specs and the GL mipmapping enums
order. Fixes fbo-generatemipmap-filtering , and some other tests where we
were off by a few bits due to unexpected linear filtering.
Extension enables doing a multisample buffer resolve and buffer
scaling using a single glBlitFrameBuffer() call. Currently, we
have this extension implemented in BLORP which is only used by
SNB and IVB. This patch implements the extension in meta path
which makes it available to Broadwell.
Implementation features:
- Supports scaled resolves of 2X, 4X and 8X multisample buffers.
- Avoids unnecessary shader compilations by storing the pre compiled
shaders for each supported sample count.
- Uses bilinear filtering for both GL_SCALED_RESOLVE_FASTEST_EXT and
GL_SCALED_RESOLVE_NICEST_EXT filter options. This is an allowed
behavior in the extension's spec.
- I tried doing bicubic filtering for GL_SCALED_RESOLVE_NICEST_EXT
filter. It made the edges in the image look little smoother but
the image gets blurred causing no overall quality improvement.
For now I have dropped the idea of doing different filtering for
nicest filter.
V2:
- Minor changes to simplify the fragment shader.
- Refactor the code to move i965 specific sample_map computation out
of Meta. We now use ctx->Const.SampleMap{2,4,8}x variables initialized
by the driver.
- Use a simple msaa resolve shader for scaled resolves with scaling
factor = 1.0.
V3:
- Make changes to create a string out of ctx->Const.SampleMap{2,4,8}x
variables and use it in fragment shader.
V4:
- Make changes to use uint8_t type ctx->Const.SampleMap{2,4,8}x
variables.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
with values specific to Intel hardware.
V2: Define and use gen6_get_sample_map() function to initialize
the variables.
V3: Change the function name to gen6_set_sample_maps() and use
memcpy() to fill in the data.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
SampleMap{2,4,8}x variables are used in later patches to implement
EXT_framebuffer_multisample_blit_scaled extension.
V2: Use integer array instead of a string.
Bump up the comment.
V3: Use uint8_t type array.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch implements functions for images support,
which basically supports copy data between video
surface and user buffers, in this case supports
SW decode, and other video output
v2: fix buffer size for odd-sized image case
expose I420 format as well
v3: fix YUV 4:2:2 format data buffer size
cleanup I420 format exposure
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch implements codec for mpeg2 h264 and vc1,
populates codec parameters and pass them to HW driver.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch implements context managements, relate it HW driver,
functions for video surface managements, and functions for
application data memory buffer managements.
implemented functions:
vlVa(Create|Destroy)Context
vlVa(Create|Destroy|Put)Surfaces
vlVa(Create|Destroy)Buffer
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch is for application to query configuration,
such as profiles, entrypoints, and attributes
v2: fix missing profile with query
Signed-off-by: Michael Varga <michael.varga@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch adds a skeleton VA-API state tracker,
which is filled with live in the subsequent patches.
v2: fixes in configure.ac and va state_tracker Makefile.am
v3: do not link against libva.
detect libva version, and correctly set driver entrypoint name.
rebase(cleanup) targets/va/Makefile.am
v4: cleanup va version auto detection
add back targets/va/va.sym
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Break out these functions so that they can be shared with a other
state trackers. They will be used in subsequent patches for the new
VA-API state tracker.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cuts the number of i965 color calculator viewport uploads by 100x
(11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG,
which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not
needed here anyway - only SBE needs it. Just a copy and paste mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function flagged BRW_NEW_*_PROGRAM
When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa
calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM.
However, brw_upload_state also checks for that changing, sets the same
flags, and also updates brw->fragment_program and so on. So, this looks
to be entirely redundant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Now that the bitfield is a uint64_t, we should use 1ull. Currently, we
only have 32 entries, so 1 works fine, but it's not future-proof.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits
beyond 1 << 31. We missed doing this when widening the driver flags
from uint32_t to uint64_t.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Fix build error introduced with commit
eedbce9c63.
lp_test_format.c: In function ‘test_format_unorm8’:
lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’
gallivm = gallivm_create("test_module_unorm8");
^
In file included from ../../../../src/gallium/auxiliary/gallivm/lp_bld_format.h:38:0,
from lp_test_format.c:42:
../../../../src/gallium/auxiliary/gallivm/lp_bld_init.h:58:1: note: declared here
gallivm_create(const char *name, LLVMContextRef context);
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84538
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Instead of just segfaulting in the driver when a buffer allocation fails,
report error messages indicating what went wrong so that we can debug things.
As a simple example, chromium wraps Mesa in a sandbox which doesn't allow
access to most syscalls, including the ability to create shared memory
segments for fences. Before, you'd get a simple segfault in mesa and your 3D
acceleration would fail. Now you get:
$ chromium --disable-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted
[10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
This made it pretty easy to diagnose the problem in the referenced bug report.
Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681
Signed-off-by: Keith Packard <keithp@keithp.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
A driver which doesn't have async flip support will queue up flips without any
way to replace them afterwards. This means we've got a scanout buffer pinned
as soon as we schedule a flip and so we need another buffer to keep from
stalling.
When vblank_mode=0, if there are only three buffers we do:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC 2.
Now we block after this, waiting for one of the three buffers to become idle.
We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1
because it's sitting in the kernel waiting to become the next scanout buffer
and we can't use buffer 2 because that's the most recent frame which will
become the next scanout buffer if the application doesn't manage to generate
another complete frame by MSC 2.
With four buffers, we get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC
2. The X server will queue this swap until buffer 1 becomes
the scanout buffer.
Render frame 3 to buffer 3
PresentPixmap for buffer 3 at MSC 1
As soon as the X server sees this, it will replace the pending
buffer 2 swap with this swap and release buffer 2 back to the
application
Render frame 4 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Now we're in a steady state, flipping between buffer 2 and 3
waiting for one of them to be queued to the kernel.
...
current scanout buffer = 1 at MSC 1
Now buffer 0 is free and (e.g.) buffer 2 is queued in
the kernel to be the scanout buffer at MSC 2
Render frames, flipping between buffer 0 and 3
When the system can replace a queued buffer, and we update Present to take
advantage of that, we can use three buffers and get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting waiting for vblank to become the next scanout
buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Queue this for display at MSC 1
1. There are three possible results:
1) We're still before MSC 1. Buffer 1 is released,
buffer 2 is queued waiting for MSC 1.
2) We're now after MSC 1. Buffer 0 was released at MSC 1.
Buffer 1 is the current scanout buffer.
a) If the user asked for a tearing update, we swap
scanout from buffer 1 to buffer 2 and release buffer
1.
b) If the user asked for non-tearing update, we
queue buffer 2 for the MSC 2.
In all three cases, we have a buffer released (call it 'n'),
ready to receive the next frame.
Render frame 3 to buffer n
PresentPixmap for buffer n
If we're still before MSC 1, then we'll ask to present at MSC
1. Otherwise, we'll ask to present at MSC 2.
Present already does this if the driver offers async flips, however it does
this by waiting for the right vblank event and sending an async flip right at
that point.
I've hacked the intel driver to offer this, but I get tearing at the top of
the screen. I think this is because flips are always done from within the
ring, and so the latency between the vblank event and the async flip happening
can cause tearing at the top of the screen.
That's why I'm keying the need for the extra buffer on the lack of 2D
driver support for async flips.
Signed-off-by: Keith Packard <keithp@keithp.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Need to unwrap the indirect resource otherwise bad things will happen.
Fixes random crashes and timeouts with piglit's arb_indirect_draw tests.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
IVB had a restriction that prevented us from emitting compressed
three-source instructions, and although that was lifted on Haswell,
Haswell had a new restriction that said BFI instructions specifically
couldn't be compressed.
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
The improvement here is that we've broken a dependency between these
instructions. Leads to 330 fewer INV instructions and 330 more RSQ.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
In most cases the sqrt's result is still used, so the improvement here
is that we've broken a dependency between these instructions. Leads to
80 fewer INV instructions and 80 more RSQ.
Occasionally the sqrt's result is no longer used, leading to:
instructions in affected programs: 5005 -> 4949 (-1.12%)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The next patch adds an algebraic optimization for the pattern
sqrt a, b
rcp c, a
and turns it into
sqrt a, b
rsq c, b
but many vertex shaders do
a = sqrt(b);
var1 /= a;
var2 /= a;
which generates
sqrt a, b
rcp c, a
rcp d, a
If we apply the algebraic optimization before CSE, we'll end up with
sqrt a, b
rsq c, b
rcp d, a
Applying CSE combines the RCP instructions, preventing this from
happening.
No shader-db changes.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Helps a handful of programs in Serious Sam 3 that use do-while loops.
instructions in affected programs: 16114 -> 16075 (-0.24%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Do not rely on LLVMMCJITMemoryManagerRef being available.
The c binding to the memory manager objects only appeared
on llvm-3.4.
The change is based on an initial patch of Brian Paul.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Blitter can still have transfers hanging around which it frees in
util_blitter_destroy(). So let it clean up before we yank the
transfer_pool from under it.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Indirect registers consume an additional token. Try to clean up the
token calculation math a bit, and fix it at the same time.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0
After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0
Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0
After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0
A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These few places were using ir_var_auto for seemingly no reason. The
names were not added to the symbol table.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No change Valgrind massif results for a trimmed apitrace of dota2.
v2: Minor rebase on _mesa_init_constants changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Later patches will give every ir_var_temporary the same name in release
builds. Adding a bunch of variables named "compiler_temp" to the symbol
table can only cause problems.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Specifically, ir_var_temporary variables constructed with a NULL name
will all have the name "compiler_temp" in static storage.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0
Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
At least one of these pointers must be NULL, and we can determine which
will be NULL by looking at other fields. Use this information to store
both pointers in the same location.
If anyone can think of a better name for the union than "u", I'm all
ears.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0
After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0
After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Also move num_state_slots inside ir_variable_data for better packing.
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
warn_extension_index was moved to improve packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
After (32-bit): 73 40,575,751,558 68,116,528 62,618,607 5,497,921 0
Before (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
After (64-bit): 62 37,123,578,526 95,150,784 87,711,304 7,439,480 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
v2: Use the enum name with the bit-field and remove the extra casts.
Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v1]
Also move the new warn_extension_index into ir_variable::data. This
enables slightly better packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 82 40,580,040,531 68,488,992 62,973,695 5,515,297 0
After (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
Before (64-bit): 65 37,124,013,542 95,892,768 88,466,712 7,426,056 0
After (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The payoff for this will come in the next patch.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
After compilation (and before linking) we can eliminate quite a few
built-in variables. Basically, any uniform or constant (e.g.,
gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can
be eliminated. System values, vertex shader inputs (with one
exception), and fragment shader outputs that are not used and not
re-declared in the shader text can also be removed.
gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in
function ftransform. There are some complications with eliminating
these variables (see the comment in the patch), so they are not
eliminated.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0
After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0
Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0
After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0
A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit.
v2: Don't remove any built-in with Transpose in the name.
v3: Fix comment typo noticed by Anuj.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
All built-in uniforms are supposed to be backed by some GL state. The
state_slots field describes this backing state.
This helped me track down a bug in a later patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reuse the LLVMContext already allocated in llvmpipe_context
for draw_llvm if ppossible. This should decrease the memory
footprint of an llvmpipe context.
v2: Fix compile with llvm disabled.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This fixes the remaining problem with the recently introduced
global jit memory manager. This change again uses a memory manager
that is local to gallivm_state. This implementation still frees
the majority of the memory immediately after compilation.
Only the generated code is deferred until this code is no longer used.
This change and the previous one using private LLVMContext instances
I can now safely run several independent OpenGL contexts driven
by llvmpipe from different threads.
v3: Rebase on llvm-3.6 compile fixes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This is one step to make llvmpipe thread safe as mandated by the OpenGL
standard. Using the global LLVMContext is obviously a problem for
that kind of use pattern. The patch introduces two LLVMContext
instances that are private to an OpenGL context and used for all
compiles. One is put into struct draw_llvm and the other
one into struct llvmpipe_context.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
The big pile of patches I just pushed regresses about 25 piglit tests on
SNB. This fixes the regressions.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
The screen argument isn't actually used by lp_jit_screen_init() at this
time, but let's move the call so that we pass a valid pointer.
v2: don't leak screen if lp_jit_screen_init() fails.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For cube resources, the array_size value should be 6. So handle
that case as we do for array texture resources. But assert that
array_size==6 just to be safe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
layers (aka array_size) should be 6 for cube textures so we don't need
to special-case it. But add an assertion just to be safe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Earlier in the function we assert layers==6 for PIPE_TEXTURE_CUBE so
there's no reason to special-case the pt.array_size = layers assignment.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The core sw primitive restart code is still around, because i965 uses it
in some cases, but there are no drivers that want it on all the time.
Reviewed-by: Rob Clark <robdclark@gmail.com>
The drivers not flagging primitive restart support are r300 swtcl, svga,
nv30, and vc4.
The point of primitive restart is to slightly reduce draw call overhead
for apps by batching multiple draws. If we do an extra pass to read the
index buffer and split back into multiple draws, we've entirely missed the
point. This is particularly bad for drivers that otherwise have hardware
IB reads, where the readback is probably uncached.
Reviewed-by: Rob Clark <robdclark@gmail.com>
On gen 7, the MRF was removed and we gained the ability to do send
instructions directly from the GRF. This commit enables that
functinoality for FB writes.
v2: Make handling of components more sane.
i965/fs: Force a high register for the final FB write
v2: Renamed the array for the range mappings and added a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If we are going to use LOAD_PAYLOAD operations to fill MRF registers, then
we will need this.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we were use the base_mrf parameter of fs_inst to store the MRF
location. In preparation for doing FB writes from the GRF, we now also
allow you to set inst->base_mrf to -1 and provide a source register.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we have execution sizes, we can use that instead of the
dispatch width. This way it also works for 8-wide instructions in
SIMD16.
i965/fs: Make effective_width a variable instead of a function
i965/fs: Preserve effective width in constant propagation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will, eventually, allow us to manage execution sizes of
instructions in a much more natural way from the fs_visitor level.
i965/fs: Explicitly set instruction execute size a couple of places
i965/blorp: Explicitly set instruction execute sizes
Since blorp is all 16-wide and nothing isn't, in general, very careful
about register width, we'll just set it all explicitly.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we track both halves of a 16-wide vgrf, we no longer need to worry
about force_sechalf or force_uncompressed. The only real issue is if the
destination is too small.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit fixes a bug in register coalesce that happens when one register
is moved to another the proper number of times but the channels are
re-arranged. When this happens, the previous code would happily coalesce
the registers regardless of the fact that the channel mappins were wrong.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that offset() can properly handle MRF registers, we can use an MRF
fs_reg and let offset() handle incrementing it correctly for different
dispatch widths. While this doesn't have any noticeable effect currently,
it does ensure that the destination register is 16-wide which will be
necessary later when we start detecting execution sizes based on source and
destination registers.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is actually the squash of a bunch of different changes. Individual
commit titles follow:
i965/fs: Always 2-align registers SIMD16 for gen <= 5
i965/fs: Use the register width when applying offsets
This reworks both byte_offset() and offset() to be more intelligent.
The byte_offset() function now supports offsets bigger than 32. The
offset() function uses the byte_offset() function together with the
register width and the type size to offset the register by the correct
amount.
i965/fs: Change regs_read to be in hardware registers
i965/fs: Change regs_written to be actual hardware registers
i965/fs: Properly handle register widths in LOAD_PAYLOAD
The LOAD_PAYLOAD instruction is a bit special because it collects a
bunch of registers (with possibly different widths) into a single
payload block. Once the payload is constructed, it's treated as a
single block of data and most of the information such as register widths
doesn't matter anymore. In particular, the offset of any particular
source register is the accumulation of the sizes of the previous source
registers.
i965/fs: Properly set writemasks in LOAD_PAYLOAD
i965/fs: Handle register widths in demote_pull_constants
i965/fs: Get rid of implicit register doubling in the allocator
i965/fs: Reserve enough registers for PLN instructions
i965/fs: Make sources and destinations interfere in 16-wide
i965/fs: Properly handle register widths in CSE
i965/fs: Properly handle register widths in register_coalesce
i965/fs: Properly handle widths in copy propagation
i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD
i965/fs: Properly handle register widths and odd register sizes in spilling
i965/fs: Don't waste a register on texture lookups for gen >= 7
Previously, we were waisting a register in SIMD16 mode because we could
only allocate registers in pairs. Now that we can allocate and address
odd-sized registers, let's get rid of this special-case.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every register in i965 assembly implicitly has a concept of a "width".
Usually, this is derived from the execution size of the instruction.
However, when writing a compiler it turns out that it is frequently a
useful to have the width explicitly in the register and derive the
execution size of the instruction from the widths of the registers used in
it.
This commit adds a width field to fs_reg along with an effective_width()
helper function. The effective_width() function tells you how wide the
register effectively is when used in an instruction. For example, uniform
values have width 1 since the data is not actually repeated, but when used
in an instruction they take on the width of the instruction. However, for
some instructions (LOAD_PAYLOAD being the notable exception), the width is
not the same.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just pass the visitor into is_copy_payload() and is_coalesce_candidate()
instead of a register size and the virtual_grf_sizes array. Among other
things, this makes the code more obvious because you don't have to figure
out where src_size came from.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Right now, this function is a no-op but it indicates that we intend to only
use the first half of the 16-wide register.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit reworks copy propagation a bit to support propagating the
copying of partial registers. This comes up every time we have pull
constants because we do a pull constant read immediately followed by a move
to splat the one component of the out to 8 or 16-wide. This allows us to
eliminate the copy and simply use the one component of the register.
Shader DB results:
total instructions in shared programs: 5044937 -> 5044428 (-0.01%)
instructions in affected programs: 66112 -> 65603 (-0.77%)
GAINED: 0
LOST: 0
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A switch statement is much easier to read/edit than a big giant or
statement.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This splits emit_fb_writes into two functions: emit_fb_writes and
emit_single_fb_write. This reduces the amount of duplicated code in
emit_fb_writes and makes the register number fiddling less arcane.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be
able to see them.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we disabled compact_virtual_grfs when dumping optimizations.
The idea here was to make it easier to diff the dumped shader because you
didn't have a sudden renaming. However, sometimes a bug is affected by
compact_virtual_grfs and, when this happens, you want to keep dumping
instructions with compact_virtual_grfs enabled. By turning it into an
optimization pass and dumping it along with the others, we retain the
ability to diff because you can just diff against the compact_virtual_grf
output.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We also set the register width equal to the dispatch width. Right now,
this is effectively a no-op since we don't do anything with it. However,
it will be important once we add an actual width field to fs_reg.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, if an instruction wrote to more than one register, we
implicitly assumed that it filled the entire register. We never hit this
before because the only time we did multi-register writes was things like
texturing which always wrote to all of the registers. However, with the
upcoming ability to do 16-wide instructions in SIMD8 and things of that
nature, we can have multi-register writes at offsets and we'll hit this.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using a floating-point type doesn't usually cause hangs on my HSW, but the
simulator complains about it quite a bit.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have this wonderful offset() function for advancing registers, but we're
not using it. Using offset() allows us to do some sanity checking and
avoid manually touching fs_reg::reg_offset. In a few commits, we will make
offset do even more nifty things for us.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The original vgrf splitting code was written with the assumption that vgrfs
came in two types: those that can be split into single registers and those
that can't be split at all It was very conservative and bailed as soon as
more than one element of a register was read or written. This won't work
once we start allowing a regular MOV or ADD operation to operate on
multiple registers. This rewrite allows for the case where a vgrf of size
5 may appropriately be split in to one register of size 1 and two registers
of size 2.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Previously, we were generating the fast-clear shader from GLSL. The
problem is that fast clears require that we use a replicated write rather
than a regular write instruction. In order to get this we had a
complicated and somewhat fragile optimization pass that looked for places
where we can use a replicated write and used it. Since replicated writes
have a lot of restrictions, we only ever use them for fast-clear
operations.
This commit replaces the optimization pass with a function that just
generates the shader we want. This is a) less code, b) less fragile than
the optimization pass, and c) generates a more efficient shader.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes GPUVM faults when running the piglit test "getteximage-formats
init-by-rendering" with R600_DEBUG=forcedma on SI.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We are currently only dealing with depth-only or stencil-only resources
here, not with resources having both depth and stencil[0]. In both cases,
the tiling mode index is in the tile_mode field, not in the
stencil_tile_mode field.
[0] Add an assertion for that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add finalize_vertex_elements() to finalize ilo_ve_state. This fixes a
potential issue with URB entry allocation for VS and move the complexity of
gen6_3DSTATE_VERTEX_ELEMENTS() to the new function.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
The code was kind of mixed up what buffers were getting stored in the case
that a resolve bit was unset (which are set based on the GL state at draw
time) and the buffer wasn't actually bound. In particular, depth-only
rendering would store the color buffer contents, which happen to be
pointing at the depth buffer.
Thanks to clearing out the resolve bits for things we really can't
resolve, now I can drop the safety checks for buffer presence around the
actual stores.
Fixes 42 piglit tests.
My attempts to clarify the code with _compacted/_uncompacted prefixed
variables apparently failed. Hopefully this is clearer.
In any case, the previous code wasn't clear enough to gcc to let it
optimize division by a power of two into a shift. No problems now.
Also, the previous code (in the ADD case) didn't work on 32-bit x86, due
to complicated set of interactions best summed up as unsigned division
and compiler optimizations.
Tested-by: Mark Janes <mark.a.janes@intel.com>
We need to keep track if a state change other than frag/vert shader
state will trigger us to need a different shader variant, and if
necessary mark the appropriate shader state as dirty. Otherwise we will
forget to re-emit the shader state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is for hw that needs to emulate some texture wrap modes (like
CLAMP) with some help from the shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Keep the existing function as a common helper. But this lets us move an
a2xx specific hack out of common code. And the PIPE_TEX_WRAP_CLAMP
emulation will require an a3xx specific hack. So rather than piling on
hacks, split this out.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We just clamp the incoming texture coordinates. This breaks the lambda
calculation, but it gets the piglit tests to pass. This is the same
behavior as in i965.
One spot in the docs says that it's stored at a miplevel just beyond the
last miplevel, which was scary. But really, you just load it as the R
coordinate (which conflicts with cubemaps, but you don't do border
clamping on cubes).
We have to expose them for GL 2.0, but we just always return a value of 0.
We should be advertising 0 query bits instead of 64, but gallium doesn't
have plumbing for that yet. At least this stops the segfaults.
This may reduce register pressure and uniform counts. Drops a bunch of 0
uniform loads on vs-temp-array-mat4-index-col-row-wr.shader_test, which is
failing to register allocate.
It's not passing some of the piglit tests, because it looks like at small
miplevels some contents from surrounding faces are getting filtered in at
the corners. It does get 7 new tests passing.
In the other related fields, "0" refers to the size of the first miplevel,
while this is a field in a slice. The other implicit slices we have
(cubemap layers) don't vary in size compared to the first one.
Almost always, the MOV will get copy propagated out. Even if it doesn't,
it's probably better to just reload the uniform at next use (to reduce
register pressure) rather than try to save instruction count.
I was looking at this because in the presence of texturing (which calls
add_uniform() directly to get the uniform load forced into the
instruction) the c->uniform_contents indices don't match 1:1 with the
temporary qregs.
commit 4ed23fd broke creation of pbuffer surfaces, patch fixes
the failure, noticed when running chrome with '--use-gl=egl'.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
When an instruction's result was consumed by multiple mov.sat
instructions, we would decide that we couldn't move the saturate
modifier because something else was using the result, even though it was
just another mov.sat!
total instructions in shared programs: 4275598 -> 4274842 (-0.02%)
instructions in affected programs: 75634 -> 74878 (-1.00%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When we find a mov.sat, we search backwards. We might as well search
everything else backwards as well and potentially look at fewer
instructions.
This change enables the next patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Get rid of the 'default' case (as suggestied by imirkin) so compiler
warns us about missing caps. Also add some caps that were missing until
now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
4155d1c7 'st/mesa: drop dependence on API profile in st_init_extensions'
broke freedreno because somehow 'PIPE_CAP_MAX_VIEWPORTS' fell through
the cracks. Resulting that we reported zero viewports. So the state
tracker never bothered to give us any valid viewport!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is the only guaranteed way get the patch level for llvm,
since the define cannot always be found in config.h depending
on the version of llvm or the build system used.
CC: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jonathan Gray <jsg@jsg.id.au>
Added back in 2009, with osmesa/GLU in mind. Unlikely to be working
any more since the removal of the static makefiles.
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rather than having double negatives -> disable-opencl, default=no
simply use enabled/disabled. It makes things a bit easier for the
reader and consistent throughout the file.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The location of the egl driver(s) is matter that we should have
never exposed to the user. Currently the dri2 driver is built
into the libEGL loader, with the gallium based one soon to follow.
v2: Fold EGL_DRIVER_INSTALL_DIR within the makefiles. Suggested by Matt.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80615
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It is not used outside the render code. There are also too many details in it
that we do not want other components to access directly.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Remove emit_draw() and ILO_RENDER_DRAW indirections. With all emit functions
being direct now, ilo_render_estimate_size() and more can also be removed.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Add these new high-level functions
ilo_render_get_draw_dynamic_states_len()
ilo_render_emit_draw_dynamic_states()
ilo_render_get_rectlist_dynamic_states_len()
ilo_render_emit_rectlist_dynamic_states()
ilo_render_get_draw_surface_states_len()
ilo_render_emit_draw_surface_states()
for draw and rectlist state emission. They are implemented in the new
ilo_render_dynamic.c and ilo_render_surface.c.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
For all supported query types, we always emit a PIPE_CONTROL. Call
ilo_render_get_flush_len() for simplicity and clarity.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
ilo_render is based on ilo_builder. We should only care if the builder
buffers are invalidated, or if the hardware context is invalidated. Replace
ilo_render_invalidate() with flags by ilo_render_invalidate_builder() and
ilo_render_invalidate_hw().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Both dynamic buffer and surface buffer are state buffers. We should not use
state buffer to refer to the former.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Move members of ilo_3d that still make sense to ilo_context. With ilo_3d
gone, rename functions whose names begin with ilo_3d to something more
appropriate.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
According to GLSL(4.2) and GLSL-ES (1.0, 3.0) spec, Structures must
have the same name to be considered same type. We currently ignore
the name check while checking if two records are same. This patch
fixes this.
Patch fixes failing tests in WebGL conformance test
'shaders-with-uniform-structs' when running Chrome on OpenGL ES.
v2: Do not force name comparison with unnamed types (Tapani)
v3: Cleanups (Matt)
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83934
The draw module would still try to use gallivm, causing many piglit tests
to fail with an assertion failure. llvmpipe might have been similarly
affected.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
mesa_texstore expects pixel data, not compressed data. For compressed
textures, we want to just copy the bits in without any conversion.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Francisco Jerez <currojerez@riseup.net>
What happens is that a SPLIT operation is part of the spill node, and as
a pseudo op, the instruction gets erased after processing its first def.
However the later defs still need to refer to it, so instead delay
deleting until after that whole RA node is done processing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79462
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
These are pretty catastrophic, "should never happen" failure paths (though
4 tests in piglit hit them currently, due to a single bug). An abort()
that you can gdb on easily is probably more useful than a clean exit,
particularly since a bug in piglit framework right now is causing early
exit(1)s to simply not be recorded in the results at all.
I only made IS_NEGATIVE(x) use signbit in commit 0f3ba405 in an attempt
to fix 54805, but it didn't help. We didn't use signbit on some
platforms and instead defined it to x < 0.0f.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Was only tracked to be printed at the end of configure, but configure
quits if it can't build something we requested, rather than silently
dropping it, so printing these directories has little use.
Note that I had to add support for testing the packed attribute to
m4/ax_gcc_func_attribute.m4.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [C bits]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Presumbly this will let clang and other compilers use the built-ins as
well.
Notice two changes specifically:
- in _mesa_next_pow_two_64(), always use __builtin_clzll and add a
static assertion that this is safe.
- in macros.h, remove the clang-specific definition since it should
be able to detect __builtin_unreachable in configure.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [C bits]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The spec says the type must be W (JIP is 16-bits after all), but we've
been emitting it with a UD type all along and have experienced no
adverse effects. Changing the type to D allows ELSE and ENDIF
instructions to be compacted.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We're currently emitting compactable control flow instruction the wrong
types, preventing their compaction. The next patch will fix this and
actually enable compaction.
On chips that cannot compact control flow instructions, attempts to find
a match in the datatype table will fail.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The array was previously indexed in units of brw_compact_inst (8-bytes),
but before compaction all instructions are uncompacted, so every odd
element was unused.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Used to pass over previously compacted instructions in this loop, but no
longer. No point in checking.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It may be possible to create a contrived example in which a 3-src
instruction would have been compacted on Gen < 8. I'd rather not
discover it in the wild.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently a no-op, since instruction compaction isn't implemented for the
generations that have a programmable strips-and-fans unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Despite what the Sandybridge PRM says, ENDIF has Jump Count in <dst>,
not JIP in <src1>. (The same mistake appears about WHILE as well).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both sizes are VERT_ATTRIB_MAX, so this has no effect. But it drops a
few trivial uses of the derived state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm not familiar with this code, but this sure appears to be a typo.
It looks like the intent is to set each array element, not arrays[0]
each time. Notably, the loop just below uses "array", not "arrays".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
The code in get.c that handles this uses ctx->Array.VAO->VertexAttrib,
which is a gl_vertex_attrib_array structure, not a gl_client_array.
The offsets of all fields happened to be the same in both structures, at
least on x86_64. "Size," "Type," and "Stride" are obviously the same:
both structures start with the same fields, in the same order.
"Enabled" is dicier: there are different fields before it in both
structures, including pointer sized values which might need special
alignment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
Dead since the _MaxElement removal, but these functions seemed generally
applicable, so I decided to remove them in a separate patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
max_index was coming from either the user telling us as part of
glDrawRangeElements, or from an incidental calculation as part of some
sort of primitive conversion fallback. Sometimes, it was just set to the
default "I don't know" ~0 value.
If it wasn't set to the actual max index, then the kernel would reject the
draw call for allowing out-of-bounds VBO reads. So, compute the max index
from the sizes of the VBOs, which isn't too expensive (unlike mapping and
reading the index buffer) and is reliable.
Fixes piglit vao-element-array-buffer.
Still some open questions.. and at any rate, no additional piglit passes
due to various wrap modes that we need to emulate in at least some
cases :-(
But it does fix some mystery page-faults.. So add some comments in the
code where there are things that we need to emulate or do more r/e, and
push as-is.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead of separate color and Z/S writemasks, just have one writemask
parameter that takes a mask of the PIPE_MASK_[RGBAZS] flags.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
PIPE_TEX_MIPFILTER_x is not legal for the pipe_sampler_state::
min/mag_img_filter fields. But PIPE_TEX_MIPFILTER_x == PIPE_TEX_FILTER_x
so we were getting lucky.
This also makes the code consistent with u_blitter.c.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The register coalescing portion of this patch hurts three shaders in
Guacamelee by one instruction each, but examining the diff makes me
believe that what we were generating was (perhaps harmlessly) incorrect.
When the instructions aren't in a flat list, this wouldn't have worked.
Also, this should be faster.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The only trick is changing a break into a return true in register
coalescing, since the macro is actually a double loop, and break will do
something different than you expect. (Wish I'd realized that earlier!)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that nothing invalidates the CFG, we can calculate_cfg() immediately
after emit_fb_writes()/emit_thread_end() and never again.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This cleans up the debug flags to be consistently indented, use bit
shifting instead of hex-values and fixes a bug where the new DEBUG_NO8 flag
used the same value as the DEBUG_VUE flag. This was hidden by the numbers not
being aligned. Also removes gaps in the range where DEBUG_IOCTL (0x4) and
DEBUG_REGION (0x400) used to be.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
LLVM commit r218316 removes the JITMemoryManager class, which is
the parent for a seemingly important class in gallivm. In order to
fix the build, I've wrapped most of lp_bld_misc.cpp in
if HAVE_LLVM < 0x0306 and modifyed the
lp_build_create_jit_compiler_for_module() function to return false
for 3.6 and newer which effectively disables the gallivm functionality.
I realize this is overkill, but I could not come up with a simple
solution to fix the build. Also, since 3.6 will be the first release
without the old JIT, it would be really great if we could
move gallivm to use the C API only for accessing MCJIT. There
is still time before the 3.6 release to extend the C API in
case it is missing some functionality that is required by gallivm.
This fixes heap corruption. The sampler view can be bound in the context,
so we cannot call destroy directly.
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead, pass the layout of GS inputs in memory to the ES using the shader
key. Only 64 bits are needed to represent the layout in the key.
Mixing and matching different VS and GS shaders should now always work.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I will need this for fixing sample shading with 1 sample.
The good news is that all shader pm4 states no longer use the current context
state, so we can generate the pm4 states outside of draw_vbo if needed.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Generic varyings in TGSI were based on the value of VARYING_SLOT_TEX0, so VAR0
was always GENERIC[22] (with tessellation patches). Some drivers might not
be able to cope with that.
This commit defines a proper mapping, so that PNTC is GENERIC[8] and VAR0 is
GENERIC[9].
Reviewed-by: Brian Paul <brianp@vmware.com>
The extensions and limits being set in the conditional block are core-only
anyway and don't have any effect on other profiles.
Reviewed-by: Brian Paul <brianp@vmware.com>
E.g. the 4.0 compatibility profile can be forced with:
MESA_GL_VERSION_OVERRIDE=4.0COMPAT
Some tests that I have require 4.0 compatibility.
Reviewed-by: Brian Paul <brianp@vmware.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
The respective HAVE_{SOFT,LLVM}PIPE are already descriptive
enough. Additionally the svga modules does not really use either
one, but the auxiliary draw & gallivm modules.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
The dri, vdpau, omx, xvmc and gbm targets don't need any authentication
even the VL ones never used it. Either the respective loader or the
library itself (vl) is doing its auth prior to calling create_screen()
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Make sure that MEGADRIVERS is set in order to create the hardlinks.
The variable name is not the most appropriate and will be sorted
out in upcoming commits.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
There are only 32 bits in the flatshade flags (which are 1 bit per
component), the simulator crashes when you use more than about this many
varyings, and the original Broadcom code drop only exposed 8 as well.
Fixes 26 piglit tests in the varying-packing group, and makes many others
go from crash to fail (due to not checking their varying counts and
treating link failures as failures). Regresses ARB_fp/minmax (due to 8
varyings instead of 10).
This is just the GL 1.1 flat shading of colors -- we don't need to support
TGSI constant interpolation bits, because we don't do GLSL 1.30.
Fixes 7 piglit tests.
They still provide register pressure since I haven't made a special class
for them, but since they're only live for one instruction it probably
doesn't matter.
This improves the readability of QPU assembly.
The A-file unpack is just like R4 unpack, except that if you don't do a
floating-point operation it won't do float conversion (so int16 gets
scaled up to int32).
This will let me more reliably allocate a-file registers, which are going
to be even more in demand when I start using a-file unpacks.
Also fixes a bug where the reservation of payload registers (FRAG_Z/W) was
off by one but just caused failure to register allocate at all if the
off-by-one was fixed.
The r300 gallium driver is using it outside of the Mesa tree, and I wanted
to do so for vc4 as well. Rather than make the multiple-definitions
problem even more complicated, just move it to more-shared code.
v2: Don't forget to delete the symlink in r300 (review by Matt).
Delete more r300-helper references (review by Emil)
Don't prefix util/ header inclusion with "util/" (review by Emil)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
ffeb77c7b0 had a typo which turned all signed
integer divisions into unsigned ones. Oops.
This gets us back the 51 little piglits
(all from glsl built-in-functions, fs/vs/gs-op-div-int-ivec2 and similar).
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If _mesa_get_tex_image() return NULL there is already error
set in context. Other error pats free allocated texture.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Check if _mesa_meta_bind_rb_as_tex_image() did give the texture.
If no texture was given there is already either
GL_INVALID_VALUE or GL_OUT_OF_MEMORY error set in context.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Running fast clear glClear with SNB caused Valgrind to
complain about this.
v2: line 237 fixed glClear from leaking memory, other
strdups are also now changed to ralloc_strdups but I
don't know what effect those have. At least no changes in
my Piglit quick run.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add current_pipe_control_dw1 and deferred_pipe_control_dw1 to track what have
been done since lsat 3DPRIMITIVE and what need to be done before next
3DPRIMITIVE. Based on them, we can emit WAs more smartly.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It was used to set has_gen6_wa_pipe_control to false when the batch buffer
changed. When called from emit_flush() and others, it also unset
ILO_3D_PIPELINE_INVALIDATE_BATCH_BO so that the following emit_draw() will not
set has_gen6_wa_pipe_control to false again. It sounded error-prone and was
just ugly.
We should be able to achieve the same goal by reset has_gen6_wa_pipe_control
in ilo_3d_pipeline_invalidate(). With handle_invalid_batch_bo() gone, the
emit functions can also be inlined.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
We do not need to call it from GEN7 pipeline anymore since software
PIPE_QUERY_PRIMITIVES_EMITTED is gone.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Previously we would get a potentially computed post-swizzle coord based
on the texture target info, which would not include the bias/lod in the
last argument.
The second argument does not have to be adjacent, so adjusting the order
array did not make sense.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This will make life a lot easier as we add support for additional
instructions.
v2: shadow reference value is always .z or .w
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The only place the enum pipe_type was used is for the TGSI sampler
view return type. So make it a TGSI type. Note: it appears this
part of TGSI isn't used by anyone so it may be removed in the future.
v2: the new name is tgsi_return_type, not tgsi_type. This means we
can drop the previously posted tgsi_type -> tgsi_opcode_type patch.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Called when the user can insert new decls, instructions.
This could be used in a few places in the 'draw' module.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Instead we store a void pointer to the key, and cast it to
brw_wm_prog_key for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This helps:
1. Reduce the need to have fs_visitor::key's type be brw_wm_prog_key*
2. Align the code to allow brw_sampler_prog_key_data to be pulled out of other
prog_key types for different stages.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Instead we store a brw_stage_prog_data pointer, and cast it to
brw_wm_prog_data for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Conditional rendering is not limited to draw_vbo(). Move the support to
ilo_context, and replace ilo_3d_pass_render_condition() by
ilo_skip_rendering().
SOL_RESET happens before bo execution. It should not be observed by the
commands that are already in the bo.
Move the code out of the pipeline now that it submits.
And config query and DRM_CONF_SHARE_FD to both mega-driver and
traditional build configs, so that EGL_EXT_image_dma_buf_import
works.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We used different lists for different types of queries because we wanted to
update software queries quickly. Now that there is no software queries, we
are fine with a single list.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Read PIPE_QUERY_PRIMITIVES_GENERATED and PIPE_QUERY_PRIMITIVES_EMITTED from
hardware registers. Because all queries now have a bo, remove unnecessary
checks for q->bo.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Add support for PIPE_QUERY_PRIMITIVES_GENERATED and
PIPE_QUERY_PRIMITIVES_EMITTED in ilo_3d_pipeline_emit_query().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It replaces
ilo_3d_pipeline_emit_write_timestamp(),
ilo_3d_pipeline_emit_write_depth_count(), and
ilo_3d_pipeline_emit_write_statistics().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Make it own()'s responsibility to make room for release() and itself. To be
able to do that, allow ilo_cp_submit() in own().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Move pipe states in ilo_context to the new ilo_state_vector. The motivation
is that ilo_context consists of several loosely related things. When we need
an ilo_context somewhere, we usually need only one or two of the things in it.
This change makes ilo_state_vector one such thing.
An immediate result is that we no longer need ilo_context in 3D pipelines,
something we have planned for since early days.
Tested on my snb-gt2:
4 tests skip->pass in spec/EXT_texture_array
51 tests skip->pass in spec.glsl-3.30
4 tests skip->pass in spec/!OpenGL 3.3
No regressions; no skip->fail changes.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Some users don't understand that these variables can break OpenGL.
The general is rule is that if an app supports MSAA, you mustn't use
GALLIUM_MSAA.
For example, if an app has an 8xMSAA FBO and GALLIUM_MSAA=4
is set, resolving the FBO to the back buffer will be rejected which will look
like this on all gallium drivers:
http://www.phoronix.com/scan.php?page=article&item=amd_radeonsi_msaa
The environment variables also have no effect on modern apps like TF2, but
there is still a performance hit due to wasted bandwidth and VRAM.
In a nutshell, it does more harm than good.
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
These instructions only have vex encodings, thus they can't be used without
avx. (Technically, one can still use avx-128 if avx isn't available because
the environment doesn't store the ymm registers, however I don't think llvm
can.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Geometry shaders was the only thing we needed to enable GLSL 1.50 and
OpenGL 3.2 in gen6.
v2: Layered clears do not work properly in gen6 with OpenGL 3.2. Kenneth
and Jordan realized that for this to work we also need
GL_AMD_vertex_shader_layered (which requires OpenGL 3.2, so it could not be
enabled before this patch), so we agreed to enable this together with
OpenGL 3.2 in this patch.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
In gen6 we will use the geometry shader implementation from gen6_gs_visitor.cpp
and keep the implementation in brw_vec4_gs_visitor.cpp for gen7+. Notice that
gen6_gs_visitor inherits from brw_vec4_gs_visitor so it is not a completely
seprate implementation of geometry shaders.
Also, gen6 does not support multiple dispatch modes, its default operation mode
is equivalent to gen7's SINGLE mode, so select that in gen6 for consistency.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Uniforms declared as uniform blocks are stored in ubo surfaces and need to
be pulled from the geometry shader program so make sure we upload them first
and do the same for pull constants.
This fixes all piglit tests that use uniform blocks:
bin/shader_runner tests/spec/glsl-1.50/uniform_buffer/gs-*
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For gen6 geometry shaders we use the first BRW_MAX_SOL_BINDINGS entries of the
binding table for transform feedback surfaces. However, vec4_visitor will
setup the binding table so that textures use the same space in the binding
table. This is done when calling assign_common_binding_table_offsets(0) as
part if its run() method.
To fix this clash we add a virtual method to the vec4_visitor hierarchy to
assign the binding table offsets, so that we can change this behavior
specifically for gen6 geometry shaders by mapping textures right after the
first BRW_MAX_SOL_BINDINGS entries.
Also, when there is no user-provided geometry shader, we only need to upload
the binding table if we have transform feedback, however, in the case of a
user-provided geometry shader, we can't only look into transform feedback
to make that decision.
This fixes multiple piglit tests for textureSize() and texelFetch() when these
functions are called from a geometry shader in gen6, like these:
bin/textureSize gs sampler2D -fbo -auto
bin/texelFetch gs usampler2D -fbo -auto
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Currently we buffer transform feedack varyings separately. This patch makes
it so that we reuse the values we have already buffered for all the output
varyings of the geometry shader instead.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since geometry shaders can alter the value of varyings packed in the first
output VUE slot (PSIZ), we need to buffer it together with all the other
vertex data so we can emit the right value for each vertex when we do the
URB writes.
This fixes the following piglit test in gen6:
tests/spec/glsl-1.50/execution/redeclare-pervertex-out-subset-gs.shader_test
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Update gen6_gs_binding_table and gen6_sol_surface to use user-provided
geometry program information when present. This is necessary to implement
transform feedback support.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This takes care of generating code required to handle transform feedback.
Notice that transform feedback isn't enabled yet, since that requires
additional setups in other parts of the code that will come in later patches.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We will use this parameter in later patches to provide information relevant
to transform feedback that needs to be set as part of the FF_SYNC message.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This opcode generates code to copy the specified destination index
into subregister 5 of the MRF message header.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This opcode will be used when sending SVB WRITE messages to save
transform feedback outputs into Streamed Vertex Buffers.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
So far in gen6 we only used geometry shaders to implement transform feedback
in vertex shaders, so we assumed that the VUE map for the geometry shader
stage was always the same as for the vertex shader stage. This is no longer
true now that we support user provided geometry shaders in gen6 too.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For this we will need to move PrimitiveID information, delivered in the thread
payload in r0.1, to a separate register (we use GS_OPCODE_SET_PRIMITIVE_ID
for this), then map the corresponding varying slot to that register in the
setup_payload() method.
Notice that we cannot use a virtual register as the destination for the
PrimitiveID because we need to map all input attributes to hardware registers
in setup_payload(), which happens before virtual registers are mapped to
hardware registers. We could work around that issue if we were able to compute
the first non-payload register in emit_prolog() and move the PrimitiveID
information to that register, but we can't because at that point we still
don't know the final number uniforms that will be included in the payload.
So, what we do is to place PrimitiveID information in r1, which is always
delivered as part of the payload but its only populated with data
relevant for transform feedback when we set GEN6_GS_SVBI_PAYLOAD_ENABLE
in the 3DSTATE_GS state packet.
When we implement transform feedback, we wil make sure to move the value of r1
to another register before we overwrite it with the PrimitiveID.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In gen6 the geometry shader payload includes the PrimitiveID information in
r0.1. When the shader code uses glPimitiveIdIn we will have to move this to
a separate hardware register where we can map this attribute. This opcode
takes the selected destination register and moves r0.1 there.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In gen6 we need to end the thread differently depending on whether we have
emitted at least one vertex or not. In case we did, the EOT message must
always include the COMPLETE flag or else the GPU hangs. If we have not
produced any output, however, we can't use the COMPLETE flag.
This would lead us to end the program with an ENDIF opcode, which we want
to avoid (and actually is not permitted since it hits an assertion), so
instead what we do is that we always request a new VUE handle every time we do
an URB WRITE, even for the last vertex we emit. With this we make sure that
whether we have emitted at least one vertex or none at all we have to finish the
thread without writing to the URB, which works for both cases by setting the
COMPLETE and UNUSED flags in the EOT message.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Just in case the GS algorithm does not call EndPrimitive() for the last
primitive produced. This is relevant only for non point outputs, since for
this we are already setting the PrimEnd flag on each vertex we emit.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Geometry shaders in gen6 are significantly different from gen7+ so it is better
to have them implemented in a different file rather than adding gen6 branching
paths all over brw_vec4_gs_visitor.cpp.
This commit adds an initial implementation that only handles point output, which
is the simplest case.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In gen7+ we emit vertices as they come, however in gen6 geometry shaders we
have to buffer vertex data for all vertices and then emit it all in one go
at the end. To achieve this we need to generalize emit_urb_slot() to store
vertex data in general purpose registers and not only MRF registers.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We had GS_OPCODE_SET_DWORD_2_IMMED but this required its source argument to be
an immediate. In gen6 we need to set dword 2 of the URB write message header
from values stored in separate register, so we need something more flexible.
This change replaces GS_OPCODE_SET_DWORD_2_IMMED with GS_OPCODE_SET_DWORD_2.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen6 seems to require that EOT messages include the complete flag too or else
the GPU hangs. We add will this flag to the instruction when we emit the
thread end opcode.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen6 geometry shaders need to allocate URB handles for each new vertex they
emit after the first (the URB handle for the first vertex is obtained via the
FF_SYNC message).
This opcode adds the URB allocation mechanism to regular URB writes.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This implements the FF_SYNC message required in gen6 geometry shaders to
get the initial URB handle.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is needed to support user-provided geometry shaders, since the
brw_ff_gs_prog atom in gen6 only takes care of implementing transform feedback
for vertex shaders.
If there is no user-provided geometry shader the implementation falls back to
the original code.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, gen6 only uses geometry shaders for transform feedback so the state
we emit is not suitable to accomodate general purpose, user-provided geometry
shaders. This patch paves the way to add these support and the needed
3DSTATE_GS packet modifications for it.
Previous code that emitted state to implement transform feedback in gen6 goes
to upload_gs_state_adhoc_tf().
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, when a geometry shader can't use dual object mode we fall back to
dual instance mode, however, when invocations == 1, single dispatch mode is
more performant and equally efficient in terms of register pressure.
Single dispatch mode requires that the driver can handle interleaving of
input registers, but this is already supported (dual instance mode has
the same requirement). However, to take full advantage of single dispatch mode
to reduce register pressure we would also need the ability to store two
separate vec4 output values into vec8 registers, which would approximately
double our capacity to store temporary values, but currently the vec4 visitor
and generator classes do not support this, so at the moment register pressure
in single and dual instance modes is the same.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It has been a bad name since we added the builder. Rename it to
ILO_DEBUG=batch to match i965, and call ilo_builder_decode() from
ilo_cp_submit_internal().
"Flush" is used for too many things already: pipe resource flush, pipe context
flush, pipe transfer region flush, and hardware pipeline flush. Rename it to
ilo_cp_submit(). As such, ILO_DEBUG=flush is renamed to ILO_DEBUG=submit.
Fredrik's implementation of ARB_vertex_attrib_binding introduced new
gl_vertex_attrib_array and gl_vertex_buffer_binding structures, and
converted Mesa's older gl_client_array to be derived state. Ultimately,
we'd like to drop gl_client_array and use those structures directly.
One hitch is that gl_client_array::_MaxElement doesn't correspond to
either structure (unlike every other field), so we'd have to figure out
where to store it. The _MaxElement computation uses values from both
structures, so it doesn't really belong in either place. We could put
it in the VAO, but we'd have to pass it around everywhere.
It turns out that it's only used when ctx->Const.CheckArrayBounds is
set, which is only set by the (rarely used) classic swrast driver.
It appears that drivers/x11 used to set it as well, which was intended
to avoid segmentation faults on out-of-bounds memory access in the X
server (probably for indirect GLX clients). However, ajax deleted that
code in 2010 (commit 1ccef926be).
The bounds checking apparently doesn't actually work, either. Non-VBO
attributes arbitrarily set _MaxElement to 2 * 1000 * 1000 * 1000.
vbo_save_draw and vbo_exec_draw remark /* ??? */ when setting it, and
the i965 code contains a comment noting that _MaxElement is often bogus.
Given that the code is complex, rarely used, and dubiously functional,
it doesn't seem worth maintaining going forward. This patch drops it.
This will probably mean the classic swrast driver may begin crashing on
out of bounds vertex buffer access in some cases, but I believe that is
allowed by OpenGL (and probably happened for non-VBO accesses anyway).
There do not appear to be any Piglit regressions, either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Roland Scheidegger <sroland@vmware.com>
While depth test state is passed through the fragment shader as sideband,
data, the stencil test state has to be set by the fragment shader itself.
Many tests are still failing, but this gets most of hiz/ passing.
The SWZ instruction can have swizzle terms >4 (SWIZZLE_ZERO, SWIZZLE_ONE).
These swizzle terms caused a few assertions to fail.
This started happening after the commit "mesa: Actually use the Mesa IR
optimizer for ARB programs." when replaying some apitrace files.
A new piglit test (tests/asmparsertest/shaders/ARBfp1.0/swz-08.txt)
exercises this.
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This allows for introducing dead code eliminating of uniforms, copy
propagation of uniforms, and instruction rescheduling between instructions
that both read uniforms.
This is particularly important for outputs, where we try to MOV the whole
vec4 to the VPM, even if only 1-3 components had been set up. It might
also be important for temporaries, if the shader reads components before
writing them.
While the result of signed integer division by zero is undefined by glsl
(and doesn't exist with d3d10), we must not crash, so need to make sure we
don't get sigfpe much like udiv already does.
Unlike udiv where we return 0xffffffff (as required by d3d10) there is
no requirement right now to return anything specific so we use zero.
sample opcodes don't have valid texture target information (and I don't think
this should be changed), however it would be nice if we had that information
ready elsewhere, so stuff that information into the tgsi info when analyzing
a shader.
v2: Ilja Mirkin spotted some bugs wrt not handling msaa resources. So add them
and while there also add them to the tex opcode analysis this was cloned from
as well (plus get rid of some bug not detecting indirect textures there in some
cases too).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
MESA_FORMAT_x8y8z8w8 puts the x channel in the least significant part of
the containing 32-bit integer, which is equivalent to PIPE_FORMAT_xyzw8888.
PIPE_FORMAT_x8y8z8w8 puts the x channel first in memory.
This patch fixes up the mesa<->gallium mapping accordingly.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
MESA_FORMAT_LnAn puts the luminance in the least significant part of
the containing integer, which is equivalent to PIPE_FORMAT_LAnn.
PIPE_FORMAT_LnAn puts the luminance first in memory.
This patch fixes up the mesa<->gallium mapping accordingly.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means that each 8888 SRGB format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
v2: fix missing i965 additions. (Jason)
fix 127->255 max alpha for SRGB formats. (Jason)
v1: Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The associated UNORM format already existed.
This means that each LnAn format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
[airlied: rebased onto current master]
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
...i.e. formats in which the first listed component is in the least
significant byte of the integer. The corresponding UNORM aliases already exist.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means that each RnGnBnxn format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
The associated UNORM and SRGB formats already exist.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
...i.e. formats in which the alpha or green channel is first in memory.
This means that each LnAn and RnGn format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason pointed out the bug on review adding new formats,
but the existing format also appears to have the bug, so
use 255 as the max, these are SRGB no SNORM.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Luminance is the least-significant byte of the uint16, rather than the
lowest byte in memory. Other parts of mesa already handle this correctly
for big-endian, and swrast already handles other MESA_FORMAT_x8y8 formats
correctly. This case was just an odd-one-out.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
MESA_FORMAT_R8G8B8X8_SNORM used a function called unpack_X8B8G8R8_SNORM
while MESA_FORMAT_R8G8B8X8_SRGB used a function called unpack_R8G8B8X8_SRGB.
This patch renames the SNORM function to have the same order as the
MESA_FORMAT name, like the SRGB function does.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was being shared using a ../../ get out of gallium into
mesa, and I swore when I did it I'd fix things when we got a util
dir, we did, so I have.
v2: move RGTC_DEBUG define
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This gets a ton of piglit working that crashes in waffle context
management stuff otherwise. Actually supporting mismatched FB sizes is at
best going to require some more load/store generals for color buffers, but
if I can't manage to do that I'll want to just have state_tracker reject
those FBOs as unsupported, rather than deny GL 2.1.
Previously, we would get a trailing ', ' which looked strange.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The goal here is to have an argument for the depth write opcode so that I
can do computed depth. In the process, this makes the calculations that
will be emitted more obvious in the QIR.
Commit afe3d1556f (i965: Stop doing
remapping of "special" regs.) stopped remapping delta_x/delta_y, and
additionally stopped considering them always-live. We later realized
delta_x was used in register allocaiton, so we actually needed to remap
it, which was fixed in commit 23d782067a
(i965/fs: Keep track of the register that hold delta_x/delta_y.).
However, that commit didn't restore the "always consider it live" part.
If all the code using delta_x was eliminated, fs_visitor::delta_x would
be left pointing at its old register number. Later code in register
allocation would handle that register number specially...even though it
wasn't actually delta_x.
To combat this, set delta_x/y to BAD_FILE if they're eliminated, and
check for that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83127
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
The triangle_32_ rast functions never made it into the debug output,
confused me for a few seconds.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch builds on 6c8f547f66 and
previous patches by allowing u_format.csv to specify separate big-endian
and little-endian layouts. It then uses this to specify the correct layouts
for various depth/stencil formats. Later patches handle other formats.
To recap, the idea is that u_format.csv lists the channels for an N-byte
value as though it were an N-byte integer. For little-endian targets
the channels are listed starting at the least-significant bit of the
integer while for big-endian targets the channels are listed starting
at the most-significant bit. This means that for something like
PIPE_FORMAT_B8G8R8A8_UNORM (blue in first byte of memory, alpha in last
byte of memory) the orders are the same for both endiannesses. But for
something like PIPE_FORMAT_S8_UINT_Z24_UNORM, where the stencil is in
the least significant byte of a 32-bit integer, there need to be separate
channel definitions for each endianness.
The effect of this patch is to make the affected PIPE_FORMAT_*s have
the same layout as the associated MESA_FORMAT_*s for big-endian.
The MESA_FORMAT_*s are already handled correctly.
Fixes various piglit tests on z. No regressions on x86_64.
[airlied: squash subsequent patches]
util: Add big-endian layout for 5551 and 565 formats
util: Add big-endian layout for 10/10/10/2 formats
util: Add big-endian layout for 4444 formats
util: Add big-endian layout for 233 format
util: Add big-endian layout for 44 formats
Signed-off-by: Dave Airlie <airlied@redhat.com>
llvmpipe treats PIPE_FORMAT_Z32_FLOAT_S8X24_UINT as a bit of a special case,
handling it as two 32-bit pieces rather than a single 64-bit block:
/* 64bit d/s format is special already extracted 32 bits */
total_bits = format_desc->block.bits > 32 ? 32 : format_desc->block.bits;
The format_desc describes the whole 64-bit block, so the z shift
will be 32 for big-endian. But since we're accessing the z channel
as a 32-bit value rather than a 64-bit value, we need to mask the shift
with 31.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just a very limited version, in particular sampler and sampler view
index must be the same. It cannot handle any modifiers neither.
Works much the same as soa version otherwise, to figure out the target we
need to store the sampler view dcls.
While here, also handle (no-op) RET and get rid of a couple bogus deprecated
comments.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
sample opcodes are a little oddly represented in the opcode_info, since
they don't count as texture instructions - they don't have valid target
information, but they may have offsets (unlike "ordinary" texture
instructions, the texture token may be optional for them).
So just make sure with these opcodes the optional offsets are accepted.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
sample opcodes don't encode a texture target, it would thus always
print UNKNOWN, which is not helpful (and wouldn't parse when giving
back the shader text to tgsi).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We don't have any specific limits in the hardware, just like the other
GPUs, so match their behavior. Fixes minmax_gles2 and several other
piglit tests relying on the specced uniform minmax values.
Put macro code in do {} while loop and put semicolons on macro calls
so auto indentation works properly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This reduces gcc -O3 compile time to 1/4 of what it was on my system.
Reduces MSVC release build time too.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
With GLES we don't give any kind of warning in case we don't
write to gl_position. This patch makes changes so that we
generate a warning in case of GLES (VER < 300) and an error
in case of GL.
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Remap table for uniforms may contain empty entries when using explicit
uniform locations. If no active/inactive variable exists with given
location, remap table contains NULL.
v2: move remap table bounds check before existence check (Ian Romanick)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Erik Faye-Lund <kusmabite@gmail.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83574
We might run into ve->count == 0 and last_velement_edgeflag == true in
gen6_3DSTATE_VERTEX_ELEMENTS() when the state tracker sets an invalid
combination of VS and VE (does not seem to happen with st/mesa). Do not
assume ve->count is positive when last_velement_edgeflag is true.
Reported by Coverity.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Rename ilo_builder_batch_state_base_address() to gen6_state_base_address() for
consistency and remove unused gen6_STATE_BASE_ADDRESS(). Reorder the code in
gen6_PIPE_CONTROL() a bit. Finally, some mostly cosmetic changes.
Move functions for the 3D pipeline to the new headers. We artificially split
the functions into top (vertex processing) and bottom (pixel processing), to
keep the headers at reasonable sizes.
ir_rvalue::constant_expression_value() recursively walks down an IR
tree, attempting to reduce it to a single constant value. This is
useful when you want to know whether a variable has a constant
expression value at all, and if so, what it is.
The constant folding optimization pass attempts to replace rvalues with
their constant expression value from the bottom up. That way, we can
optimize subexpressions, and ideally stop as soon as we find a
non-constant subexpression.
In order to obtain the actual value of an expression, the optimization
pass calls constant_expression_value(). But it should only do so if it
knows the value can be combined into a constant. Otherwise, at each
step of walking back up the tree, it will walk down the tree again, only
to discover what it already knew: it isn't constant.
We properly avoided this call for ir_expression nodes, but not for
ir_swizzle nodes. This patch fixes that, drastically reducing compile
times on certain shaders where tree grafting has given us huge
expression trees. It also fixes SuperTuxKart.
Thanks to Iago and Mike for help in tracking this down.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78468
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
The FS backend has always used 0, and the VS backend has always used 1.
I think 1 is just working around other problems, and is incorrect.
Samplers are baked in; nothing uses the UNIFORM register we would
create, and we shouldn't upload any constant values for them.
Fixes ES3-CTS.shaders.struct.uniform.sampler_array_vertex.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Samplers take up zero slots and therefore don't exist in the params
array, nor are they included in stage_prog_data->nr_params. There's no
need to store their size in param_size, as it's only used for dealing
with arrays of "real" uniforms (ones uploaded as shader constants).
We run into all kinds of problems trying to refer to the uniform storage
for variables that don't have uniform storage. For one, we may use some
other variable's index, or access out of bounds in arrays. In the FS
backend, our extra 2 * MaxSamplerImageUnits params for texture rectangle
rescaling paper over a lot of problems. In the VS backend, we claim
samplers take up a slot, which also papers over problems.
Instead, just skip allocating storage for variables that don't have any.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
We always uploaded them together, mostly out of laziness - both required
an additional vertex element. However, gl_VertexID now also requires an
additional vertex buffer for storing gl_BaseVertex; for non-indirect
draws this also means uploading (a small amount of) data. This is extra
overhead we don't need if the shader only uses gl_InstanceID.
In particular, our clear shaders currently use gl_InstanceID for doing
layered clears, but don't need gl_VertexID.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
In the non-indirect draw case, we call intel_upload_data to upload
gl_BaseVertex. It makes brw->draw.draw_params_bo point to the upload
buffer, and increments the upload BO reference count.
So, we need to unreference it when making brw->draw.draw_params_bo point
at something else, or else we'll retain a reference to stale upload
buffers and hold on to them forever.
This also means that the indirect case should increment the reference
count on the indirect draw buffer when making brw->draw.draw_params_bo
point at it. That way, both paths increment the reference count, so
we can safely unreference it every time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
4f338c9b introduced logic to trigger a flush rather than overflowing
cmdstream buffer. But the threshold was too low, triggering flushes
where they were not needed. This caused problems with games like
xonotic.
Part of the problem is that we need to mark all state dirty between
cmdstream submit ioctls, because we cannot rely on state being
preserved across ioctls. But even with that, there are still some
problems that are still being debugged. For now:
1) correctly mark all state dirty
2) introduce FD_MESA_DEBUG flush flag to force rendering to be flushed
between each draw, to trigger problems (so that I can debug)
3) use a more reasonable threshold so for normal usecases we don't
trigger the problems
This at least corrects the regression, but there is still more debugging
to do.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need the .w component to end up in .x, since the hw appears to fetch
gl_FragColor starting with the .x coordinate regardless of MRT format.
As long as we are doing this, we might as well throw out the remaining
unneeded components.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Because of render-to-alpha (000x) shenanigans, freedreno needs to do
some special handling when rendering to alpha-only formats. And I
noticed that while we had _is_luminance(), _is_intensity(), etc, an
_is_alpha() helper was missing. So fix that.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Unreference the ctx->_Shader object before we delete all the pipeline
objects in the hash table. Before, ctx->_Shader could point to freed
memory when _mesa_reference_pipeline_object(ctx, &ctx->_Shader, NULL)
was called.
Fixes crash when exiting the piglit rendezvous_by_location test on
Windows.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
q_total should never go below 0 (which is why it's defined as unsigned),
and if it does, then something is seriously wrong.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As noted in the previous commit, this was introduced in
567e2769b8 ("ra: make the p, q test more
efficient"), but I forgot to mention it.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In commit 567e2769b8 ("ra: make the p, q
test more efficient") I unknowingly introduced a new requirement to the
register allocator API: the user must set the register class of all
nodes before setting up their interferences, because
ra_add_conflict_list() now uses the classes of the two interfering
nodes. i965 already did this, but r300g was setting up register classes
interleaved with setting up the interference graph. This led to us
calculating the wrong q total, and in certain cases
e78a01d5e6 (" ra: optimistically color
only one node at a time") made it so that this bug caused a segfault. In
particular, the error occurred if the q total was decremented to 1 below
0 for the last node to be pushed onto the stack. Since q_total is an
unsigned integer, it overflowed to 0xffffffff, which is what
lowest_q_total happens to be initialzed to. This means that we would
fail the "new_q_total < lowest_q_total" check on line 476 of
register_allocate.c, and so the node would never be pushed onto the
stack, which led to segfaults in ra_select() when we failed to ever give
it a register.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82828
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Pavel Ondračka <pavel.ondracka@email.cz>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Needed for assert.
Fixes build on BE archs with -Werror=implicit-function-declaration.
In file included from
../../../../../src/gallium/auxiliary/draw/draw_fs.c:30:0:
../../../../../src/gallium/auxiliary/util/u_math.h: In function
'util_memcpy_cpu_to_le32':
../../../../../src/gallium/auxiliary/util/u_math.h:810:4: error:
implicit declaration of function 'assert'
[-Werror=implicit-function-declaration]
assert(n % 4 == 0);
^
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In buf_clear_region() and buf_copy_region(), max_cmd_size was set to 0. If
either of the functions is called and there is not enough space in the
builder, the next ilo_cp_flush() will fail silently in a release build.
Replace magic numbers by size defines in tex_clear_region()/tex_copy_region()
for consistency and readability.
Intruduce gen6_blt_bo and gen6_blt_xy_bo to describe BOs. In the extreme case
of gen6_XY_SRC_COPY_BLT(), the number of parameters goes down from 18 to 8.
This allows a sampler view to have a different texture target than the
underlying resource. This will be used to implement the type casting
between 2d arrays and cube maps as specified in ARB_texture_view.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes a regression from "nouveau/vdec: small fixes to h264 handling"
New picking order for frames:
1. Vidbuf pointer matches.
2. Take the first kicked ref.
3. If that fails, take a ref that has a different last_used.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
The BSP bo might be too small to contain all of the bsp data,
bump its size on overflow. Also bump inter_bo when this happens,
it might be too small otherwise.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
If the source is not a GRF, it could have a register >= virtual_grf_count.
Accessing virtual_grf_end with such a register would lead to
out-of-bounds access. Make sure the source is a GRF before accessing
virtual_grf_end.
Fixes Valgrind complaints while compiling some shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
If the glx/wgl state tracker requested a core profile but the gallium
driver did not support some feature of GL 3.1 or later, we were setting
ctx->Version=0 and then failing the assertion in
_mesa_initialize_exec_table().
With this change we check for ctx->Version=0 and tear down the context
and return NULL from st_create_context().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So far we have been using CL_INVOCATION_COUNT to resolve this query but this
is no good with streams, as only stream 0 reaches the clipping stage.
From ARB_transform_feedback3:
"When a generated primitive query for a vertex stream is active, the
primitives-generated count is incremented every time a primitive emitted to
that stream reaches the Discarding Rasterization stage (see Section 3.x)
right before rasterization. This counter is incremented whether or not
transform feedback is active."
Unfortunately, we don't have any registers that provide the number of primitives
written to a specific stream other than the ones that track the number of
primitives written to transform feedback in the SOL stage, so we can't
implement this exactly as specified.
In the past we implemented this feature by activating the SOL unit even if
transform feeback was disabled, but making it so that all buffers were
disabled and it only recorded statistics, which gave us the right semantics
(see 3178d2474a). Unfortunately, this came with
a significant performance impact and had to be reverted.
This new take does not intend to implement the exact semantics required by
the spec, but improves what we have now, since now we return the primitive
count for stream 0 in all cases. With this patch we use
GEN7_SO_PRIM_STORAGE_NEEDED to resolve GL_PRIMITIVES_GENERATED queries
for non-zero streams. This would return the number of primitives written
to transform feedback for each stream instead. Since non-zero streams are
only useful in combination with transform feedback this should not be too
bad, and the only case that I think we would not be supporting would be
the one in which we want to use both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN on the same non-zero stream to
detect buffer overflow.
This patch also fixes the following piglit test:
arb_gpu_shader5-xfb-streams-without-invocations
This test uses both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries on non-zero streams, but it
does never hit the overflow case, so both queries are always expected to return
the same value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
That better matches the actual userspace use case, the
kernel will force it to VRAM if the hardware requires it.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Less CPU overhead and avoids contention over CPU accessible memory on startup.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
In preparation to using buffers clears with the hw engine(s).
v2: split out flipping to using hw buffer clears.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This builds the opengl DLLs with address layout space randomization
(ASLR) and data execution prevention (DEP) for better security.
Reviewed-by: Kurt Daverman <krd@vmware.com>
The old disassembler was modified from i965's. It is as much work as doing a
new one to keep it up-to-date, which also requires copying more headers over.
The outputs of this new disassembler should match i965's as closely as
possible.
Add some new registers and some tweaks. The changes that affect ilo are
GEN6_REG_HS_INVOCATION_COUNT -> GEN7_REG_HS_INVOCATION_COUNT
GEN6_REG_DS_INVOCATION_COUNT -> GEN7_REG_DS_INVOCATION_COUNT
GEN6_COND_NORMAL -> GEN6_COND_NONE
If a precision qualifer is allowed on type T, it should be allowed
on an array of T. Refactor the check to ensure this is the case.
(Fixes failures in WebGL conformance test 'gl-min-textures')
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
MSVC replaces the "F" in "255.0F" with the macro argument which leads
to an error. s/F/FLT/ to avoid that.
It turns out we weren't using this macro at all on MSVC until the
recent "mesa: Drop USE_IEEE define." change.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch fixes a build error on DragonFly.
CC libpipe_loader_la-pipe_loader_drm.lo
pipe_loader_drm.c: In function 'pipe_loader_drm_probe':
pipe_loader_drm.c:207:10: error: implicit declaration of function 'close' [-Werror=implicit-function-declaration]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Apparently guardband clipping doesn't work like we thought: objects
entirely outside fthe guardband are trivially rejected, regardless of
their relation to the viewport. Normally, the guardband is larger than
the viewport, so this is not a problem. However, when the viewport is
larger than the guardband, this means that we would discard primitives
which were wholly outside of the guardband, but still visible.
We always program the guardband to 8K x 8K to enforce the restriction
that the screenspace bounding box of a single triangle must be no more
than 8K x 8K. So, if the viewport is larger than that, we need to
disable guardband clipping.
Fixes ES3 conformance tests:
- framebuffer_blit_functionality_negative_height_blit
- framebuffer_blit_functionality_negative_width_blit
- framebuffer_blit_functionality_negative_dimensions_blit
- framebuffer_blit_functionality_magnifying_blit
- framebuffer_blit_functionality_multisampled_to_singlesampled_blit
v2: Mention the acronym expansion for TA/TR/MC in the comments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that we have the data available, we need to expose it to the
shaders. We can reuse the same vertex element that we use for
gl_VertexID, but we need to back it by an actual vertex buffer.
A hardware restriction requires that vertex attributes coming from a
buffer (STORE_SRC) must come before any other types (i.e. STORE_0).
So, we have to make gl_BaseVertex be the .x component of the vertex
attribute. This means moving gl_VertexID to a different component.
I chose to move gl_VertexID and gl_InstanceID to the .z and .w
components, respectively, to make room for gl_BaseInstance in the .y
component (which would also come from a buffer, and therefore be
STORE_SRC).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We'll need to emit another VERTEX_BUFFER_STATE for gl_BaseVertex;
pulling this into a helper function will save us from having to deal
with cross-generation differences in that code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used for GL_ARB_shader_draw_parameters, as well as fixing
gl_VertexID, which is supposed to include gl_BaseVertex's value.
For indirect draws, we simply point at the indirect buffer; for normal
draws, we upload the value via the upload buffer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The lower_vertex_id pass converts uses of the gl_VertexID system value
to the gl_BaseVertex and gl_VertexIDMESA system values. Since
gl_VertexID is no longer accessed, it would not be considered active.
Of course, it should be, since the shader uses gl_VertexID.
v2: Move the var->name dereference past the var != NULL check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Converts gl_VertexID to (gl_VertexIDMESA + gl_BaseVertex). gl_VertexIDMESA
is backed by SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, and gl_BaseVertex is backed
by SYSTEM_VALUE_BASE_VERTEX.
v2: Put the enum in struct gl_constants and propoerly resolve the scope
in C++ code. Fix suggested by Marek.
v3: Reabase on Matt's foreach_in_list changes (was using foreach_list).
v4 (Ken): Use a systemvalue instead of a uniform because
STATE_BASE_VERTEX has been removed.
v5: Use a boolean to select lowering, and only allow one lowering
method. Suggested by Ken.
v6 (Ken): Replace strcmp against literal "gl_BaseVertex"/"gl_VertexID"
with SYSTEM_VALUE enum checks, for efficiency.
v7: Rebase on context constant initialization work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
There exists hardware, such as i965, that does not implement the OpenGL
semantic for gl_VertexID. Instead, that hardware does not include the
value of basevertex in the gl_VertexID value.
SYSTEM_VALUE_VERTEX_ID_ZERO_BASE is the system value that represents
this semantic.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
v2: Additions to the documentation for SYSTEM_VALUE_VERTEX_ID. Quote
the GL_ARB_shader_draw_parameters spec and mention DirectX SV_VertexID.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
181581280b changed the way the
llvm-config version is read from sed to grep and introduced
a requirement for gnu grep extension that treats BREs as EREs.
Avoid this by calling egrep instead of grep which should be
able to handle EREs everywhere.
This allows Mesa to build on OpenBSD again.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
This doesn't quite make depth-tex-compare work, presumably because we're
not hitting equality with itof(sample) * 1.0/0xffffff in the 0xffffff
case. arb_fragment_program_shadow tests pass, though, as well as a bunch
of other shadow-related stuff.
We potentially need to be careful that use of a value stored in r4 isn't
copy-propagated (or something) across another r4 write. That doesn't
appear to happen currently, and this makes the dataflow more obvious. It
also opens up not unpacking the r4 value, which will be useful for depth
textures.
Since software primitive-restart emulation is going to be removed (and
anyways, mostly seemed to be crash prone in combination with
u_primconvert and oddball scenarios (like PIPE_PRIM_POLYGON with only a
single vertex), might as well do it in hardware (which fortunately
didn't turn out to be too hard to figure out).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Triggered by shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0]
DCL TEMP[0..2], LOCAL
0: IF CONST[0].xxxx :0
1: MOV TEMP[0], TEMP[1]
2: ELSE :0
3: MOV TEMP[0], TEMP[2]
4: ENDIF
5: MOV OUT[0], TEMP[0]
6: END
not really a sane shader, although driver segfaulting is probably
not the appropriate response.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We currently aren't too clever about dealing with running out of
cmdstream buffer space. Since we use a single buffer for both drawing
and tiling commands, we need to ensure there is enough space at the tail
of the cmdstream buffer to fit the tiling commands.
Until we get more clever, the easy solution is a threshold to trigger
flushing rendering even if the application does not trigger flush (swap,
changing render target, etc). This way we at least don't crash for apps
that do several thousand draw calls (like some piglit tests do).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Most of the things the new compiler still has trouble with basically
amount to cp stage removing too many copies. But without the cp stage,
the shaders the new compiler produces are still better (perf and
correctness) than the old compiler. So a simple thing to do until I
have more time to work on it is first trying falling back to new
compiler without cp, before finally falling back to old compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Excess conversions considered harmful.
Recently Matt reworked the boolean uniform handling to use the value of
UniformBooleanTrue, rather than integer 1, when uploading uniforms:
mesa: Upload boolean uniforms using UniformBooleanTrue.
glsl: Use UniformBooleanTrue value for uniform initializers.
Marek then set the default to 1.0f for drivers without native integer
support:
mesa: set UniformBooleanTrue = 1.0f by default
However, ir_to_mesa was assuming a value of integer 1, and arranging for
it to be converted to 1.0f on upload. Since Marek's commit, we were
uploading 1.0f = 0x3f800000 which was being interpreted as the integer
value 1065353216 and converted to float as 1.06535322E9, which broke
assumptions in ir_to_mesa that "true" was exactly 1.0f.
+13 Piglits on classic swrast (fs-bool-less-compare-true,
{vs,fs}-op-not-bool-using-if, glsl-1.20/execution/uniform-initializer).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83573
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mesa already defines _GNU_SOURCE for glibc based systems and defining
_GNU_SOURCE will break the Mesa build on other systems such as OpenBSD.
_GNU_SOURCE only seems to be included in llvm-config output when
LLVM is built via autoconf and not when it is built by cmake.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Superseded by HAVE_LOADER_GALLIUM. The latter has a *DRM* brethren
making the whose easier on which one to keep.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Just drop the conditional and simplify our build. This means that
it'll build every time, but it does not require any dependencies nor
does it take that long to compile 200 lines of boilerplate code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The variable was unused and gave false information. The need for nonnull
winsys currently does not relate as it used to. Nowadays one can mix and
match more freely with plenty of winsys' to make your head spin.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With recent commit we removed the NEED_NONNULL_WINSYS checks when
selecting the hardware (inc svga) winsys. svga has only one winsys
that explicitly requires libdrm (via it's bundled version of
vmwgfx_drm.h) but configure.ac never really checks for it.
Add the check early to prevent people from shooting themselves when
they select the driver but lack libdrm.
$ ./autogen.sh --disable-dri --disable-egl --disable-gallium-llvm
--with-dri-drivers=swrast --with-gallium-drivers=svga,swrast
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82539
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The rest of stencil handling isn't done yet, but it documents an extra
cl_u8(0) and helps make it obvious why we don't need to format clear_depth
the same way the depth/stencil buffer is formatted.
For now it still requires the color buffer to be present -- we're relying
on the store of color buffer contents to end the frame, and we have to do
something with color buffers in the rendering config packet.
According to GLSL-ES Spec(i.e. 1.0, 3.0), gl_Position value is undefined
after the vertex processing stage if we don't write gl_Position. However,
GLSL 1.10 Spec mentions that writing to gl_Position is mandatory. In case
of GLSL-ES, it's not an error and atleast the linking should pass.
Currently, Mesa throws an linker error in case we dont write to gl_position
and Version is less then 140(GLSL) and 300(GLSL-ES). This patch changes
it so that we don't report an error in case of GLSL-ES.
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83380
Similar to the changes to GEN7 command functions, but to GEN6 this time.
As every GPE function has been converted, remove
ilo_cp_assert_no_implicit_flush() calls.
Make these changes
ilo_cp_begin() -> ilo_builder_batch_pointer()
ilo_cp_write() -> direct memory set
ilo_cp_write_bo() -> ilo_builder_batch_reloc()
and use this chance to drop the "_emit_" infix.
Make these changes
ilo_cp_steal_ptr() and memcpy() -> ilo_builder_state_write()
ilo_cp_steal_ptr() -> ilo_builder_state_pointer()
and use this chance to drop the "_emit_" infix.
Make these changes
ilo_cp_steal_ptr() and memcpy() -> ilo_builder_surface_write()
ilo_cp_steal() and ilo_cp_write() -> ilo_builder_surface_write()
ilo_cp_write_bo() -> ilo_builder_surface_reloc()
and use this chance to drop the "_emit_" infix.
Make these changes
ilo_cp_begin() -> ilo_builder_batch_pointer()
ilo_cp_write() -> direct memory set
ilo_cp_write_bo() -> ilo_builder_batch_reloc()
and make sure there is no implicit flush. Use this chance to drop the
"_emit_" infix.
Remove instruction buffer management from ilo_3d and adapt ilo_shader_cache to
upload kernels to ilo_builder. To be able to do that, we also let ilo_builder
manage STATE_BASE_ADDRESS.
Comparing to how we manage batch and instruction buffers, the new builder
- does not flush
- manages both types of buffers
- manages STATE_BASE_ADDRESS
- uploads kernels using unsynchronized mapping
- has its own decoder for the buffers
- provides more helpers
Do not require a toy_compiler so that it can be used in other places, such as
state dumping. Add a bool to control whether the raw instruction words are
shown.
This would just crash. Noticed by accident while checking int divisions by zero
with a quickly hacked piglit test.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Forgot to add it when I fixed up the start instance handling in (llvm) draw.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Calling the variable min when it's really max and vice versa seems a bit
confusing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
UBO loads can be boolean-valued expressions, too, so we need to handle
them in emit_bool_to_cond_code() and emit_if_gen6().
However, unlike most expressions, it doesn't make sense to evaluate
their operands, then do something with the results. We just want to
evaluate the UBO load as a whole---which performs the read from
memory---then load the boolean result into the flag register.
Instead of adding code to handle it, we can simply bypass the
ir_expression handling, and fall through to the default code, which will
do exactly that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83468
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Matt and I believe that Sandybridge actually uses 0xFFFFFFFF for a
"true" comparison result, similar to Ivybridge. This matches the
internal documentation, and empirical results, but contradicts the PRM.
So, the comment is inaccurate, and we can actually just handle these
directly without ever needing to fall through to the condition code
path.
Also, the vec4 backend has always done it this way, and has apparently
been working fine. This patch makes the FS backend match the vec4
backend's behavior.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_triop_csel can return a boolean expression, so we need to handle it
here; we simply forgot when we added ir_triop_csel, and forgot again
when adding it to emit_bool_to_cond_code.
Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
As long as we don't have a workaround for frame based
decoding in VDPAU we should not advertise NV_vdpau_interop.
v2: fix commit message, check if get_video_param is present
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Instead we cast backend_visitor::prog for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes use of Altivec pack intrinsics on little-endian PowerPC
systems. Since little-endian operation only affects the load and store
instructions, the semantics of pack (and other) instructions that take
two input vectors implicitly change: the pack instructions still fill
a register placing values from the first operand into the "high" parts
of the register, and values from the second operand into the "low" parts
of the register, but since vector loads and stores perform an endian swap,
the high parts end up at high memory addresses.
To still achieve the desired effect, we have to swap the two inputs to
the pack instruction on little-endian systems. This is done automatically
by the back-end for instructions generated by LLVM, but needs to be done
manually when emitting intrisincs (which still result in that instruction
being emitted directly).
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Maarten Lankhorst <dev@mblankhorst.nl>
Instead we store a void pointer to the key, and cast it to
brw_wm_prog_key for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead we store a brw_stage_prog_data pointer, and cast it to
brw_wm_prog_data for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
BitSet::allocate() is being used with the expectation that it would
leave the bitfield untouched if its size hasn't changed, however,
the function always zeroed the last word, which led to obscure bugs
with live set computation.
This also fixes BitSet::resize(), which was broken, but luckily not
being used.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This hasn't been enabled in a long time and is completely stale and
unnecessary. Remove, esp since it doesn't build.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Uniform values are in the UNIFORM register file, not the GRF register file.
Looking in virtual_grf_sizes makes no sense and only makes the output of
dump_instructions confusing.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- bundle the scons buildscript, README and trace.xsl
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android & scons buildscript
- include the headers' README & svga_dump.py
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android & scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & README
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- bundle the android buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & LLVM note
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & custom include
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & the tests
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript
v2: Don't double-include the compiler sources.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- bundle the scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the scons buildscript
v2: Don't double include the test sources.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Commit 60d772cd9d1(st/vega: add headers and SConscript in
the tarball) meant to pick all the headers to be included in
the release tarball yet it missed a few.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
A few files were missing, namely:
- a few of the common headers
- the android + gdi sources
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
With the last revisions of commit 664c2d76947(gallium/ilo: cleanup
intel_winsys.h) we moved the header from winsys to drivers, but we
forgot to update the makefile.sources to reflect this.
Cc: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Currently, BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE* and
BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_RESOLVE* shaders
in setup_glsl_msaa_blit_shader() are not recompiled
when the source buffer sample count changes. For example,
implementation continued using a 4X msaa shader, even if
source buffer changes from 4X msaa to 8x msaa. It causes
incorrect rendering.
This patch adds new enums in blit_msaa_shader, one for
each supported sample count, and uses them to store
msaa shaders.
Fixes following piglit tests on Broadwell:
ext_framebuffer_multisample-accuracy all_samples color
ext_framebuffer_multisample-accuracy all_samples depth_draw
ext_framebuffer_multisample-accuracy all_samples depth_resolve
ext_framebuffer_multisample-accuracy all_samples stencil_draw
ext_framebuffer_multisample-accuracy all_samples stencil_resolve
ext_framebuffer_multisample-formats all_samples
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstarnd <jason@jlekstrand.net>
Make sure to check the presence of the module in order to pick the
correct libs flag and before feeding them to the compiler/linker.
Current libXvMC*, libvdpau* and libomx_mesa depends unconditionally
upon xcb, due to their usage of the aux/vl gallium module.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Make sure to check the presence of the module in order to pick the
correct libs flag and before feeding them to the compiler/linker.
Current libEGL depends conditionally (when building with x11 platform)
upon xcb.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Make sure to check the presence of the module in order to pick the
correct libs flag and before feeding them to the compiler/linker.
Current libGL depends conditionally (when building with dri3) upon
xcb 1.9.3 and unconditionally on ancient xcb functions -
xcb_generate_id and xcb_request_check amongst others.
v2: Use PKG_CHECK_EXISTS() when checking for dri3 xcb.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80848
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
When a texture is wrapped in a texture view, we can't trust the format in
the miptree itself. This patch allows us to pass the format seperately
through blorp so we can proprerly handled wrapped textures.
It's worth noting here that we can use the miptree format directly for
depth/stencil formats because they cannot be reinterpreted by a texture
view.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
CC: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Before commit 04895f5c we would only reswizzle dot product instructions
(since they wrote the same value into all channels, and we didn't have
to think about anything else). That commit extended reswizzling to cases
when the swizzle was single valued -- i.e., writing the same result into
all channels.
But allowing reswizzling of arbitrary things is actually really easy and
is even less code. (Why didn't we do this in the first place?!)
total instructions in shared programs: 4266079 -> 4261000 (-0.12%)
instructions in affected programs: 351933 -> 346854 (-1.44%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Despite the comment above the function claiming otherwise, the function
did not reswizzle sources, which would lead to bad code generation since
commit 04895f5c, which began claiming we could do such swizzling when we
could not.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The 'start' instruction is always in the current block, except for the
case of shader time, which emits code in a pattern seen no where else.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise, the basic block start/end IPs don't get updated properly,
leading to a broken CFG. This usually results in the following
assertion failure:
brw_fs_live_variables.cpp:141:
void brw::fs_live_variables::setup_def_use():
Assertion `ip == block->start_ip' failed.
Fixes KWin, WebGL demos, and a score of Piglit tests on Sandybridge and
earlier hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dump() methods don't alter the CFG or basic blocks, so we should
mark them as const. This lets you call them even if you have a const
cfg_t - which is the case in certain portions of the code (such as live
interval handling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As older versions of gnu ld did not support --dynamic-list check to see
if it is supported before using it. Non gnu linkers such the apple one
likely lack this option as well.
Fixes the build on OpenBSD which has binutils 2.15 and 2.17.
The --dynamic-list option seems to been have introduced sometime after
binutils 2.17 was released as it is present in 2.18.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Fixes binary garbage in the compilation logs caused by
compat::string::c_str() not being null-terminated (which is a bug on
its own that will be fixed in another commit).
Reported-by: EdB <edb+mesa@sigluy.net>
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed91
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1e
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404da
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
We can't rely on the value from the assembler if relative addressing is
used. So instead use the max of declared-consts (which does not include
compiler immediates) and what we get from the assembler (which does).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
all_delayed will also be true if we didn't attempt to schedule anything
due to no more instructions using current addr/pred. We rely on coming
in to block_sched_undelayed() to detect and clean up when there are no
more uses of the current addr/pred, which isn't necessarily an error.
This fixes a regression introduced in b823abed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The split between these two didn't make much sense. I'm going to want the
chance to look at uniform contents in optimization passes, and the QPU
emit I think is going to end up rewriting the uniforms stream.
This common init routine can be used by constructors for multiple program
types.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Debugging a regression in discard support was just too full of duplicate
instructions, so I decided to remove them instead of re-analyzing each of
them as I dumped their outputs in simulation.
There were troubles with bools without using native integers
(st_glsl_to_tgsi seemed to think bool true was 1.0f sometimes, when as a
uniform it's stored as ~0), and since I've got native integers other than
divide, I might as well just support them.
Before, we had some special opcodes like CMP and SNE that emitted multiple
instructions. Now, we reduce those operations significantly, giving
optimization more to look at for reducing redundant operations.
The downside is that QOP_SF is pretty special -- we're going to have to
track it separately when we're doing instruction scheduling, and we want
to peephole it into the instruction generating the destination write in
most cases (and not allocate the destination reg, probably. Unless it's
used for some other purpose, as well).
A bool is 0 or ~0, and KILL_IF takes a float arg that's <0 for discard or
>= 0 for not. By negating it, we ended up doing a floating point subtract
of (0 - ~0), which ended up as an inf. To make this actually work, we
need to convert the bool to a float.
Reviewed-by: Brian Paul <brianp@vmware.com>
While similar in layout, the size of the SVGA3dSize type may be smaller than
the struct drm_vmw_size type that is part of the ioctl interface. The kernel
driver could accordingly overwrite a memory area following the size variable
on the stack. Typically that would be another local variable, causing
breakage in, for example, ubuntu 12.04.5 where the handle local variable
becomes overwritten.
v2: Fix whitespace errors
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Cc: "10.1 10.2 10.3" <mesa-stable@lists.freedesktop.org>
As explained in the previous commit, we want to avoid the possibility of
integer-multiplication overflow while allocating buffers.
In these two cases, the final allocation size is the product of three values:
one variable and two that are fixed constants at compile time.
In this commit, we move the explicit multiplication to involve only the
compile-time constants, preventing any overflow from that multiplication, (and
allowing calloc to catch any potential overflow from the remainining implicit
multiplication).
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit 32f2fd1c5d, several calls to
_mesa_calloc(x) were replaced with calls to calloc(1, x). This is strictly
equivalent to what the code was doing previously.
But for cases where "x" involves multiplication, now that we are explicitly
using the two-argument calloc, we can do one step better and replace:
calloc(1, A * B);
with:
calloc(A, B);
The advantage of the latter is that calloc will detect any overflow that would
have resulted from the multiplication and will fail the allocation, (whereas
the former would return a small allocation). So this fix can change
potentially exploitable buffer overruns into segmentation faults.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's been altering the tree and reporting "false" since January 2011.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, opt_copy_propagation_elements would always rewrite the
instruction stream, even if was the same thing as before. In order to
report progress correctly, we'll need to bail if the suggested
replacement is identical (or equivalent) to the original code.
This also introduced unnecessary noop swizzles, as far as I can tell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, if chans < 4, we passed uninitialized stack garbage to the
ir_swizzle constructor for the excess components. Thankfully, it
ignores that data, as it's unnecessary, so no harm actually comes of it.
However, it's obviously better to initialize it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_triop_csel can return a boolean expression, so we need to handle it
here; we simply forgot when we added it.
Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
All shader stages have these fields, so it makes sense to store them in
the common base structure, rather than duplicating them in each.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In commit 46d03d37bf I renamed a Makefile target
from md5 to checksums, (as we switched from MD5 checksums to SHA-256
checksums, so the more general name is more future proof).
But that commit missed one mention of "md5" as a dependency of the .PHONY
target. Rename that here as well.
According to the GLSL 1.40 spec, section 5.7 Structure and Array Operations:
"Array elements are accessed using an expression whose type is int or uint."
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The default case was accidentally clearing RADEON_FLAG_CPU_ACCESS from the
previous fall-through cases.
Reported-by: Mathias Fröhlich <Mathias.Froehlich@gmx.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Coverity pointed out we never dropped the lock here, so fix
it by using a common exit path.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
tex-miplevel-selection was hammering my memory manager with primconverts
on individual quads. This gets all those converted IBs packed into larger
IBs.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This is part of fixing extremely long runtimes on some piglit tests that
involve streaming vertex reuploads due to format conversions, and will
similarly be important for X performance, which relies on these flags.
A meta begin/end pair with MESA_META_DRAW_BUFFERS will change visible GL
state. We recreate the draw buffer enums from the buffer bitfield, which
changes GL_BACK to GL_BACK_LEFT (and GL_FRONT to GL_FRONT_LEFT).
This commit modifes the save/restore logic to instead copy the buffer enums
from the gl_framebuffer and then set them on restore using
_mesa_drawbuffers().
It's not clear how this breaks the benchmark in 82796, but fixing meta to not
leak the state change fixes the regression.
No piglit regressions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=82796
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
Coverity reported this, and I think this is the right solution,
since cache->items is struct cache_item ** not struct cache_item *,
we also realloc it using struct cache_item * at some point.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not all of these are used in every context, so this can make a
significant difference for short-lived contexts such as in piglit tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If we fails in reserve_explicit_locations, we leak uniform_map.
Reported-by: coverity scanner.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
i965 will have more than 32 bits when BRW_STATE_COMPUTE_PROGRAM is added.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The set of state atoms for compute shaders is currently empty; it will
be filled in by future patches.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The hardware state for compute shaders is almost entirely orthogonal
to the hardware state for 3D rendering. To avoid sending unnecessary
state to the hardware, we'll need to have a separate set of state
atoms for the compute pipeline and the 3D pipeline. That means we
need to maintain two separate sets of dirty bits to determine which
state atoms need to be run.
But the dirty bits are not completely independent; for example, if
BRW_NEW_SURFACES is flagged while doing 3D rendering, then not only do
we need to re-run 3D state atoms that depend on BRW_NEW_SURFACES, but
we also need to re-run compute state atoms that depend on
BRW_NEW_SURFACES. But we'll also need to re-run those state atoms the
next time the compute pipeline is run.
To accomplish this, we record two sets of dirty bits, one for each
pipeline. When bits are dirtied (via SET_DIRTY_BIT() or
SET_DIRTY_ALL()) we set them to the dirty state in both pipelines.
When brw_state_upload() is run, we clear the dirty bits just for the
pipeline that was run.
Note that since the number of pipelines is known at compile time to be
2, the compiler should unroll the loops in SET_DIRTY_BIT() and
SET_DIRTY_ALL().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This one path doesn't goto fail, so it seems to leak dec.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With VP2, nv50_miptree is faked because the underlying bo's have to be
laid out in a certain way. This is done by adjusting the address. Make
sure that blits (and everything else for consistency) use the mt address
rather than the bo address as a base.
This fixes retrieving chroma plane with VDPAU.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82255
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
When constant folding a MAD operation, we first fold the multiply and
generate an ADD. However we do so without making sure that the immediate
can be handled in the saturate case. If it can't, load the immediate in
a separate instruction.
Reported-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Experimentally, the sampler doesn't appear to like these, neither as
buffer nor as rect textures. So remove 1D from the list of texture types
to make linear when used for staging.
This fixes the OSD in mplayer for VDPAU.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Samplers are only defined up to num_samplers, so set all samplers above
nr to NULL so that we don't try to read them again later.
Tested-by: Christian Ruppert <idl0r@qasl.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
We always left them enabled, which turned off HiZ in some cases.
This should improve performace with Hyper-Z.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This should be as fast as no HTILE for stencil. I think we can still get full
performance with depth-only rendering even if stencil is present in the buffer
but not used, but I'm not 100% sure. This may be revisited when HiS and fast
stencil clear are implemented.
This fixes a hang in Brutal Legend.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64471
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is a golden setting on RV740, but there is a hw bug which recommends
setting it on all R7xx chipsets.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
It's almost the same.
This enables tiling for HTILE. It also enables Hyper-Z for other texture
targets (1D, 1D_ARRAY, 2D_ARRAY, CUBE, CUBE_ARRAY, 3D, RECT).
2D array depth textures are tested by Unigine Sanctuary and my new piglit
test.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
This should make a machine which is running piglit more responsive at times.
e.g. streaming-texture-leak can easily eat 600 MB because of how fast it
creates new textures.
We were only using it to get at its type, which we already know because
it's a builtin variable.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were only using it to get at its type, which we already know because
it's a builtin variable.
v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The replicated data clear shader needs to be SIMD16, or else the GPU
will hang. So, compile it even if INTEL_DEBUG=no16 is set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Current method of generating distribution tar-balls involves manually
invoking make + target name in the appropriate places. This temporary
solution is used until we get 'make dist' working.
Currently it does not work, as in order to have the target (which is
also a filename) available in the final Makefile we need to add a PHONY
target + use the correct target name.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
v3: Since the fs backend can emit saturate as a separate instruction, there is
no need to detect for min/max instructions and to rewrite the instruction tree
accordingly. On the other hand, we don't need to emit a separate saturated
mov either when the expression generating src can do saturate directly.
v4: Add can_do_saturate() check before enabling saturate modifer (Ken)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge dst a 0.25F
To be propagated as:
sel.ge.sat dst b 0.25F
v3: - Syntax clarifications in inst->saturate assignment
- Remove extra parenthesis when assigning src_reg value
from copy_entry (Matt Turner)
v4: - Take channels into consideration when propagating saturated instructions.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge dst a 0.25F
To be propagated as:
sel.ge.sat dst b 0.25F
v3: Syntax clarifications in inst->saturate assignment (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
v2: - Output max(saturate(x),b) instead of saturate(max(x,b))
- Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is > 0.0 and
inner constant is 1.0.
- Fix comments to show that the optimization is a commutative operation
(Matt Turner)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
v2: - Output min(saturate(x),b) instead of saturate(min(x,b)) suggested by Ilia Mirkin
- Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is zero and
inner constant is < 1
- Fix comments to reflect we are doing a commutative operation (Matt Turner)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
v2: - Check that the base type is float (Ian Romanick)
v3: - Make sure comments reflect that we are doing a commutative operation
- Add missing condition where the inner constant is 1.0 and outer constant is 0.0
- Make indexing of operands easier to read (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Suggested by Matt. This patch combines and moves back the code-generation
functions from generate_vec4_instruction() into generate_code(). Makes
generate_code() a bit larger, but helps us to count loops in a
straightforward manner.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
brw_meta_fast_clear.c:211:17: warning: 'x_scaledown' may be used
uninitialized in this function [-Wmaybe-uninitialized]
unsigned int x_scaledown, y_scaledown;
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are some cases where the scheduler can get itself into impossible
situations, by scheduling the wrong write to pred or addr register
first. (Ie. it could end up being unable to schedule any instruction if
some instruction which depends on the current addr/reg value also
depends on another addr/reg value.)
To solve this we'd need to be able to insert extra mov instructions
(which would also help when register assignment gets into impossible
situations). To do that, we'd need to move the nop padding from sched
into legalize.
But to start with, just detect when we get into an impossible situation
and bail, rather than sitting forever in an infinite loop. This way it
will at least fall back to the old compiler, which might even work if
you are lucky.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
All of the GL image enums fit in 16-bits.
Also move the fields from the anonymous "image" structucture to the next
higher structure. This will enable packing the bits with the other
bitfield.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0
After (32-bit): 70 40,577,421,777 68,487,584 62,973,695 5,513,889 0
Before (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0
After (64-bit): 74 37,124,603,758 95,891,808 88,466,712 7,425,096 0
A real savings of 346KiB on 32-bit and 262KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only values allowed are 0 and 1, and the value is checked before
assigning.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0
After (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0
Before (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0
After (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just use ir_variable::data.binding... because that's the where the
binding is stored for everything else that can use layout(binding=).
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0
After (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0
Before (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0
After (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The VertexProgram and FragmentProgram have a Cache member for dealing
with fixed function programs. There are no fixed function geometry
programs, so this should never have existed, and was just copy and
pasted.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I actually screwed that up in 754319490f,
mistakenly thinking the code actually wanted the non-nan result before.
So, introduce that missing nan behavior case and use that instead.
For sse, there's no actual change in the resulting code at all, the fallback
code wouldn't have done the right thing though.
Of course, the actual issue I saw with pow() was completely unrelated...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Pretty trivial, just fill in the offsets and such. The implementation
is near 100% copy and paste from llvmpipe. Should be useful for debugging.
No piglit change when not using SOFTPIPE_USE_LLVM=1.
Now that it can do the same tests with and without using llvm for vs/gs,
with llvm more pass, the only things failing only with llvm seems to be
edgeflags tests and vs/gs-pow-float-float (and for the latter I'm not
convinced the zero tolerance it requires is somehow mandated by glsl).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The code is all in place now so enable it.
Seems to pass all relevant piglit tests (just like cube maps, some of the
cube map array tests need GALLIVM_DEBUG=no_quad_lod,no_rho_approx)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Pretty easy, just make sure that all paths testing for PIPE_TEXTURE_CUBE
also recognize PIPE_TEXTURE_CUBE_ARRAY, and add the layer * 6 calculation
to the calculated face.
Also handle it for texture size query, looks like OpenGL wants the number
of cubes, not layers (so need division by 6).
No piglit regressions.
v2: fix up adding cube layer to face for seamless filtering (needs to happen
after calculating per-sample face). Undetected by piglit unfortunately.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
Not sure why it was there but it is definitely not an error if gs outputs are
infs/nans. Besides, the outputs can be ints, in which case any small negative
number asserted.
This fixes piglit's texelFetch gs isamplerXX crashes with softpipe (down from
14 to 2).
Bug https://bugs.freedesktop.org/show_bug.cgi?id=80012
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
piglit tex-miplevel-selection nowadays doesn't use repeat wrap mode due to
sampler objects any longer, however at the time of the clear the wrap mode
is still illegal and at this point we get to verify the state, including
samplers (even though they won't get used), and because mesa doesn't treat
it as an incomplete texture as the spec says it should, we hit the assertion.
Just warn about this for now instead.
Gets crashes down from 44 to 14 in a piglit run (all were in various tests of
tex-miplevel-selection with texture rectangles). Though just about all
tex-miplevel-selection tests fail anyway for other reasons.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just handle as ordinary 2d / 2d array resources. Prevents an assertion failure
with softpipe and piglit glsl-resource-not-bound 2DMS/2DMSArray tests.
While here also fix TXD shadowCube similarly, which fixes the crash with piglit
tex-miplevel-selection textureGrad CubeShadow (the test will still fail due to
softpipe being broken).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=80011
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was meant for softpipe to not crash at some point if vertex texturing
was used. It is, however, fishy because it uses values from
draw_set_samplers/draw_set_sampler_views and not from the shader key. Albeit
we should still in all cases actually generate a new shader if this changes
(because the samplers and views themselves are in the key) I don't want to
think again wondering if that's really correct in the future.
Besides, at least today, it does not actually work for softpipe, as this was
relying on softpipe not actually calling draw_set_samplers/sampler_views at
all - I've verified it crashes regardless (if there were a tex instruction in
the vs, which normally should not happen anyway). For drivers which do indeed
not call these functions because they don't support vertex texturing at all
(r300), this should still not crash because the static texture data is all
zero, which causes the sampling functions to take an early out (same as is done
if no texture is bound at the slot used for sampling - verified with hacked up
softpipe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
mesa was creating a cube map array texture with just one layer, which is
not legal. This caused an assertion failure when using that texture later
in llvmpipe (when enabling cube map arrays) since it verifies the number
of layers in the view is divisible by 6 (the sampling code might well crash
randomly otherwise) with piglit glsl-resource-not-bound CubeArray -fbo -auto.
v2: use appropriately sized texel array...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
The source can be a register as well as an immediate, and disassembling
a register as an immediate can have some strange results.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It is meant to be private within the actual winsys. Remove it from
the exported header, and fold it into it's only user.
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This define is always set and it had no real purpose according to
git log. Seems like it is a leftover from the vl/vdpau prototype
stage.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Namely
- the private header (xa_priv.h)
- README and
- xa-indent
Sort the sources list while we're here.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 6a19bb56e0.
The above commit disabled the default build of xvmc as the xvmc tests
were failing. As pointed out by Ilia, the tests are "broken by design"
as they do not test the object that is build but the one that is
installed and setup on the workstation.
With previous commit we moved the programs from the 'make check' to
noinst automake target. This way they won't be run but will be around
for people to use them.
Cc: Tom Stellard <thomas.stellard@amd.com>
All the tests require an installed and setup XvMC, thus they
are not good candidates for 'make check'.
Keep them around as the user might want to actually test the
implementation post installation/setup.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add the final remaining files into the tarball (make dist), namely:
- SConscripts
- Non-autotooled winsys' - android, gdi and hgl.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Include the headers within.
- Update scons to use them.
- Drop useless include (gallium/drivers) from scons.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Drop duplicate include compiler directives.
- Leave the sw/ prefix for all the software winsys headers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Add top_srcdir/src/gallium/winsys to GALLIUM_DRIVER_C{XXFLAGS}.
- Remove top_srcdir/src/gallium/drivers/radeon from the includes.
As a result:
- Common radeon headers are prefixed with 'radeon/'
- Winsys header inclusion is prefixed 'radeon/drm'
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Make the header location, inclusion and contents more common with
its i915,r* and nouveau counterparts:
- Move the header within drivers/ilo.
- Separate out intel_winsys_create_for_fd into 'drm_public' header.
- Cleanup the compiler includes.
v2: Move the header to drivers/ilo. Suggested by Chia-I.
v3: Correct intel_winsys.h inclusion. Spotted by Chia-I.
Cc: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
V2: moved test for the VertexAttrib*Pointer() functions
to update_array(), and made constant available for drivers to set
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The base instance needs to be passed to the jited function, otherwise the
instanced data fetch will only work with the same start instance when the
jit function was created (and baking that into the key instead is not a viable
option).
This fixes piglit arb_base_instance-drawarrays (modulo some unrelated
core/compat context trouble I get for the test).
And fix the pipe cap bit in llvmpipe for it now that it actually works (it
already worked for softpipe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The docs were never really up to date for them, missing just about everything.
So mark them off as all done for GL 3.3 (though softpipe is in fact quite
broken for some newer things especially wrt texturing, and both don't have
compliant, real msaa support). And add the extensions missing too (no
guarantee of completeness).
Reviewed-by: Dave Airlie <airlied@gmail.com>
* IEEE Std 1003.1-2001 placed strcasecmp() in strings.h.
* ISO C99 doesn't mention strcase* in string.h
* On all platforms I could find, strcasecmp is in strings.h and string.h
as a compatibility layer for software written pre-2001 POSIX
* Technically strcasecmp should be only in strings.h and the man
pages back this up.
* Tested build on CentOS and Haiku
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is never used. There is another token "OUTPUT" which the lexer can
generate, though. This has been around since the dawn of time; is most
likely a typo.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Now that qpu_inst() ignores the WADDR from the other half of the
instruction, we can set both the ADD and MUL WADDRs in the NOP helper.
Thanks to that, we also no longer need to qpu_inst(NOP, NOP).
This allows setting the opposite-side WADDR to NOP (a non-zero value) in
qpu_* helpers, so that we don't need to qpu_inst() merge them with NOPs
all the time just to get the waddr set.
Fixes piglit draw-vertices and gl-2.0-vertexattribpointer on vc4, where
I'm only advertising R32F to RGBA32F support so far.
Note: regresses gl-1.5-normal3b3s-invariance due to introduced flushes and
missing depth buffer load/store support in the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Individual caps made supporting new fallbacks more complicated than it
needed to be. Instead, just make a table of fallbacks at context init
time.
v2: Fix inverted "do we need to install vbuf?" flagging caught by Marek.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
When I made the shader cache take the .fs member and moved the binding
point to .bind_fs, I failed to update these. Fixes crashes in
copyteximage-related tests.
The patch fixes the build on Oracle Solaris.
CC os/os_misc.lo
"os/os_misc.c", line 59: #error: unexpected platform in os_sysinfo.c
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
We've found that there's a buffer overrun bug in flex that's triggered by
using alternation in a lookahead pattern.
Fortunately, we don't need to match the exact {NEWLINE} expression to
detect an empty pragma. It suffices to verify that there are no non-space
characters before any newline character. So we can use a simple [\r\n] to
get the desired behavior while avoiding the flex bug.
Fixes the regression of piglit's 17000-consecutive-chars-identifier test,
(which has been crashing since commit
04e40fd337 ).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82472
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
CC: <mesa-stable@lists.freedesktop.org>
The optimization relies on CMP setting the destination to 0, which is
equivalent to 0.0f. However, early platforms only set the least
significant byte, leaving the other bits undefined. So, we must disable
the optimization on those platforms.
Oddly, Sandybridge wasn't reported as broken. The PRM states that it
only sets the LSB, but the internal documentation says that it follows
the IVB behavior. Since it wasn't reported as broken, we believe it
really does follow the IVB behavior.
v2: Allow the optimization on Sandybridge (requested by Matt).
+32 piglits on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?=79963
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Operating on this code,
B0: ...
cmp.ne.f0(8)
(+f0) if(8)
B1: break(8)
B2: endif(8)
We can delete B2 without attempting to merge any blocks, since the
break/continue instruction necessarily ends the previous block.
After deleting the if instruction, we attempt to merge blocks B0 and B1.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This pass deletes an IF/ELSE/ENDIF or IF/ENDIF sequence, or the ELSE in
an ELSE/ENDIF sequence.
In the typical case (where IF and ENDIF) aren't the only instructions in
their basic blocks, we can simply remove the instructions (implicitly
deleting the block containing only the ELSE), and attempt to merge
blocks B0 and B2 together.
B0: ...
(+f0) if(8)
B1: else(8)
B2: endif(8)
...
If the IF or ENDIF instructions are the only instructions in their
respective basic blocks (which are deleted by the removal of the
instructions), we'll want to instead merge the next blocks.
Both B0 and B2 are possibly removed by the removal of if & endif.
Same situation for if/endif. E.g., in the following example we'd remove
blocks B1 and B2, and then attempt to combine B0 and B3.
B0: ...
B1: (+f0) if(8)
B2: endif(8)
B3: ...
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
... rather than pointing directly to the associated instruction. This
will let us set the block containing the IF statement's else-pointer to
NULL, when we delete a useless ELSE instruction, as in the case
(+f0) if(8)
...
else(8)
endif(8)
Also, remove the pointer to the ENDIF, since it's unused, and it was
also potentially wrong, in the case of a basic block containing both an
ENDIF and an IF instruction:
endif(8)
cmp.ne.f0(8) ...
(+f0) if(8)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.
cfg calculations: 202951 -> 80307 (-60.43%)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Will let us avoid invalidating the CFG if the optimization pass has
removed instructions using the new basic block methods.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes piglit glsl-fs-discard-01 and -03, and allows a lot of mesa demos to
start running. glsl-fs-discard-02 has a problem where the first tile is
not getting stored on the first render.
This should improve performance on real hardware by allowing more shader
instances to run in parallel. It also fixes assertion failures in tests
that don't emit a fragment color, since otherwise we didn't have enough
instructions to fit our signals in.
The spec citation talked about A and B, and I proceeded to pay no
attention to whether the waddrs were for A or B. As a result, this pair
of instructions would claim to conflict:
mov ra4, ra4 ; nop nop, r0, r0
mov.ns ra4, rb4 ; nop nop, r0, r0
Now that tiling is in place, we can expose the other formats. Depth is
still broken (need to make changes in the shader), but if you don't expose
it things crash all over. SNORM is dropped, but we could re-add it later
with some shader fixes to handle converting between [0,1] and [-1,1].
This still treats everything as RGBA8888 for the most part, same as
before. This is a prerequisite for handling other texture formats, since
only RGBA8888 has a raster-layout mode.
It meant that LUMALPHA was being marked as *many* miplevels, and
unsurprisingly wouldn't validate. On the other hand, some miplevel counts
wouldn't get the small mips validated at all.
The header is used by DRI1 drivers, which we've removed a while
back. Now only the dri1 loader in libGL is using it, so let's
move it in src/glx, and prefix it accordingly.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This flag was set to true for the atomic counter intrinsics, but it
never got plumbed through the linker, so by the time it got to the
backends it would always be set to the false. The current i965 backend
code doesn't use is_intrinsic, so this should not change any existing
code, but it's useful for codepaths that want to distinguish between
intrinsics and non-intrinsics without using strcmp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
This captures all of the steps I have been following in making releases for
the past year or so. This way, the instructions should be sound for anyone who
would like to take over the release process going forward.
This change will double cache size for branches which have a lower
LP_MAX_SHADER_VARIANTS limit (it will not do anything on master).
The reason is that nowadays shaders tend to be quite a bit larger than they
were (they were big when llvmpipe didn't have a fs loop, got much smaller with
that loop, and since then have gradually increased quite a bit though still
smaller than without the fs loop for various reasons - among them being d3d10
compliance, usage of 8-wide vectors, non-swizzled blend code). Thus effectively
less shaders would be cached (unless they were very small and the variant limit
was hit first). Also, since we're getting rid of the IR nowadays, the cached
shaders shouldn't need all that much memory actually.
This captures the set of rules I have been using for stable-branch management,
(starting with a discussion on the mesa-dev mailing list on July 2013, and
then refined through my own experience of performing stable-branch releases
since then).
We switched to these several stable releases ago, (since the MD5 algorithm has
been broken for some time), but only now did I get around to fixing this in
the Makefile rather than just performing this step manually.
CC: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
v2:
- Move dri*_query_renderer_* into their respective dri*_priv.h headers
- Drop then unnneeded include of dri2.h from dri2_query_renderer.c
- Rename dri2_query_renderer.c as dri_common_query_renderer.c, as it's contents
now are used for more than dri[23]
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
If only the flat/smooth shade state changed between
two render calls the prior code would miss updating the
hardware state.
Also add check for sprite coord, potentially same type
of issue otherwise for it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81967
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
cs_vertex_buffer_state.enabled_mask and
cs_vertex_buffer_state.dirty_mask are both updated when
r600_set_constant_buffer() is called, so we don't need to manually
update these values.
This fixes a crash with OpenCL programs that have a kernel with no
arguments.
https://bugs.freedesktop.org/show_bug.cgi?id=82671
CC: "10.2" <mesa-stable@lists.freedesktop.org>
Unlocking the texture is not safe: another thread could come in and grab
it. Now that we use a recursive mutex, this should work. This also fixes
texture lock deadlocks in the new meta fast clear path.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Chris Forbes <chrisf@ijw.co.nz>
This avoids problems with things like meta operations calling functions
that want to take the lock while the lock is already held. Basically,
the point is to guard against API reentrancy across threads...not to
guard against ourselves.
Dave Airlie opposed this change, but it makes master usable again and no
one proposed a better solution. We can revert this if/when someone
does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Chris Forbes <chrisf@ijw.co.nz>
There were two problems with the way this script used sed on OS X:
1. The OS X sed doesn't interpret "\r" in a replacement list as a
carriage-return character, (instead it was inserting a literal
'r' character).
We fix this by putting an actual ^M character into the source of
the script, (rather than a two-character escape sequence hoping
for sed to do the right thing).
2. When generating the test files with LF-CR ("\n\r") newlines, the
OS X sed was adding an undesired final newline ("\n") at the end
of the file. We avoid this by first using sed to add the ^M
before the newlines, then using tr to swap the \r and \n
characters. This way, sed never sees any lines ending with
anything but \n, so it doesn't get confused and doesn't add any
bogus extra newlines.
Tested-by: Vinson Lee <vlee@freedesktop.org>
Vinson's testing confirmed that this patch fixes FreeBSD as well.
I noticed that with /bin/sh on Mac OS X, "echo -n" does not work as
desired, (it actually prints "-n" rather than suppressing the final
newline). There is a /bin/echo that could be used (it actually works)
instead of the builtin echo.
But I decided it's more robust to just use printf rather than
hardcoding /bin/echo into the script.
This LLVM 3.6 commit changed EngineBuilder constructor.
commit 3f4ed32b4398eaf4fe0080d8001ba01e6c2f43c8
Author: Rafael Espindola <rafael.espindola@gmail.com>
Date: Tue Aug 19 04:04:25 2014 +0000
Make it explicit that ExecutionEngine takes ownership of the modules.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215967 91177308-0d34-0410-b5e6-96231b3b80d8
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
The docs say "When performing a render target resolve, PIPE_CONTROL with end
of pipe sync must be delivered.", which doesn't actually tell us whether we
need to do it before or after. Blorp did it before and after, and doing it
before certainly makes sense. The resolve operation needs to read from the
MCS and if we don't flush the render cache it won't get up-to-date data.
On the other hand, doing it after should not be necessary, since we call
brw_render_cache_set_check_flush() after the resolve.
Fixes rendering corruption in kwin's cover switch effect and various steam
games.
Missing flush spotted by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The extension requires GL 3.0, so enable on just the generations
exposing that.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
SB needs a bit of special handling to handle
instructions without obvious side effects, to
avoid it deleting them.
Fixes failing non-const ARB_gpu_shader5
textureOffsets piglits with sb enabled.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Based on the old code, the new layout code describes the layout with the new,
well-documented, ilo_layout. It also gains new features such as MCS support
and extended ARYSPC_LOD0 that i965 comes up with (see
6345a94a9b).
Instead create a staging texture with pipe_buffer_create and
PIPE_USAGE_STAGING.
u_upload_mgr sets the usage of its staging buffer to PIPE_USAGE_STREAM.
But since 150ac07b85 CPU -> GPU streaming buffers
are created in VRAM. Therefore the staging texture (in VRAM) does not offer any
performance improvements for buffer downloads.
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
... and store the value in intel_winsys_info/ilo_dev_info.
Suggested-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
olv: check for errors and report raw values
The loop over all instructions is now two-fold, over all of the blocks
and all of the instructions in each block.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The next patch adds a foreach_block (block, cfg) macro, which works
better if it provides a direct bblock_t pointer, rather than a
bblock_link pointer that you have to use to find the actual block.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Doesn't use fewer instructions, but it does avoid writing the flag
register and if we want to switch the representation of true for Gen4/5
in the future, we can just delete the AND instruction.
Dead since the call to _mesa_generate_parameters_list_for_uniforms
was removed in commit 12751ef2. So this was why all of that code that
was supposed to fix up the value of a uniform bool to wasn't happening.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The next patches are going to combine some of the mapi subdirectories'
Makefiles into a single Makefile, giving better build parallelism.
lib_LTLIBRARIES will be set to something like
lib_LTLIBRARIES = shared-glapi/libglapi.la es2api/libGLESv2.la
and the current code in install-lib-links.mk simply prepends .libs/ and
replaces the .la in order to create the filenames that it needs to ln/cp
into the LIBDIR. This doesn't work when the .la file is actually in a
subdirectory.
This patch fixes this and puts .libs/ in the right place.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The SWIZZLE_1 of the winsys destination was dereffing off the end of the
array, which surprisingly often worked out (since nobody reads the
rendered value anyway, so whatever junk was referenced in the QIR didn't
matter), but shader dumping would sometimes segfault.
We weren't accounting for the level 0 offset in the texture setup (so it
only worked if it happened to be a single-level texture), and doing so
required that we get the level 0 offset page aligned so that the offset
bits don't get interpreted as the texture format and such.
I had the right viewports in vc4_emit.c, but grabbed the wrong values in
the uniform setup, so primitives would claim to be in the wrong parts of
the screen. (The vc4_emit.c state looks like it just decides how big the
clipping guardband is).
This gets fbo-viewport closer to working (which still has the problem that
the HW is always guard-band clipping), and fixes inverted FBO rendering in
general.
The function should return GLboolean, not GLenum.
If we detect invalid compressed pixel storage parameters, we should
return GL_TRUE, not GL_FALSE so that the function is no-op'd.
An update to the piglit s3tc-errors test will check this.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Replace the gl_texture_image parameter with mesa_format since we only
used the image's format.
Add some comments.
Reviewed-by: Matt Turner <mattst88@gmail.com>
For gen6 we will use the ALL_SLICES_AT_EACH_LOD miptree layout for
separate stencil/hiz. This is needed because gen6 hiz and separate
stencil only support a single miplevel. When accessing the other LODs,
we will program a tile aligned offset for the bo.
PRM Volume 1, Part 1, 7.18.3.7.2 For separate stencil buffer [DevILK]
to [DevSNB]:
"The separate stencil buffer does not support mip mapping, thus the
storage for LODs other than LOD 0 is not needed."
We still allocate storage for the other stencil mip-levels within a
single texture, but each mip-level will use non-mip-array spacing.
PRM Volume 2, Part 1, 7.5.3 Hierarchical Depth Buffer
"[DevSNB]: The hierarchical depth buffer does not support the LOD
field, it is assumed by hardware to be zero. A separate
hierarachical depth buffer is required for each LOD used, and the
corresponding buffer’s state delivered to hardware each time a new
depth buffer state with modified LOD is delivered."
We allocate storage for the other hiz mip-levels within a single
texture, but each mip-level will use non-mip-array spacing.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Since gen6 separate stencil & hiz only supports LOD0, we need to
program an offset to the LOD when emitting the separate stencil/hiz.
v3:
* Use new array_layout enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Gen6 doesn't support multiple miplevels for hiz and stencil.
Therefore, we must point to the LOD directly during rendering.
But, we also have removed the tile offsets from normal depth surfaces,
so we need to align each LOD to a tile boundary for hiz and stencil.
v3:
* Use new array_layout enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Previously array_layout ALL_SLICES_AT_EACH_LOD was only used for array
spacing lod0 on gen7+ and therefore was only used with a single mip
level.
gen6 separate stencil & hiz only support LOD0, so we need to allocate
the miptree similar to gen7+ array spacing lod0, except we also need
space for multiple mip levels. (Since OpenGL stencil and depth support
multiple LODs.)
The miptree is allocated with tightly packed array slice spacing, but
we still also pack the miplevels into the region similar to a normal
multi mip level packing.
A 2D Array texture with 2 slices and multiple LODs would look somewhat
like this:
+----------+
| |
| |
+----------+
| |
| |
+----------+
+---+ +-+
| | +-+
+---+ +-+
| | :
+---+
v3:
* Use new array_layout enum
* ASCII art!
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
gen6 does not support multiple miplevels with separate
stencil/hiz. Therefore we need to layout its miptree with no mipmap
spacing between the slices of each miplevel.
v3:
* Use new array_layout enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We will want to setup gen6 separate stencil and hiz miptrees in a
layout that is similar to array_spacing_lod0. This is needed because
gen6 hiz and stencil only support a single mip-level.
In both use cases (gen7+ LOD0 spacing & gen6 separate stencil/hiz),
the array slices will be packed at each LOD without reserving extra
space for LODs within each array slice.
So, we generalize the name of this field and add comments to indicate
the old and new uses.
Motivation for the gen6 change comes from the PRM:
PRM Volume 1, Part 1, 7.18.3.7.2 For separate stencil buffer [DevILK]
to [DevSNB]:
"The separate stencil buffer does not support mip mapping, thus the
storage for LODs other than LOD 0 is not needed."
PRM Volume 2, Part 1, 7.5.3 Hierarchical Depth Buffer
"[DevSNB]: The hierarchical depth buffer does not support the LOD
field, it is assumed by hardware to be zero. A separate
hierarachical depth buffer is required for each LOD used, and the
corresponding buffer’s state delivered to hardware each time a new
depth buffer state with modified LOD is delivered."
v2:
* Rename array_spacing_lod0 to non_mip_arrays
v3:
* Instead, replace array_spacing_lod0 with array_layout enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
(bf25ee2 for gen6)
Previously we would always find the 2D sub-surface of interest,
and then program the surface to this location. Now we always
program the 3DSTATE_DEPTH_BUFFER at the start of the surface.
To select the lod/slice, we utilize the lod & minimum array
element fields.
We also must disable brw_workaround_depthstencil_alignment for
gen >= 6. Now the hardware will handle alignment when rendering
to additional slices/LODs.
v3:
* Set depth_mt bo RELOC offset to 0, as was done in bf25ee2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56127
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
(bc1acaa for gen6)
This will be used in 3DSTATE_DEPTH_BUFFER in a later patch.
Note: Cube maps are treated as 2D arrays with 6 times as
many array elements as the cube map array would have.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(171e633 for gen6)
This will be used in 3DSTATE_DEPTH_BUFFER in a later patch.
Note: Cube maps are treated as 2D arrays with 6 times as
many array elements as the cube map array would have.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We will program the gen6 hiz depth state differently to enable layered
rendering on gen6.
v2:
* Remove unneeded gen6_emit_depthbuffer as suggested by Topi
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the gen6 PRM Volume 1 Part 1: Graphics Core, Section
7.18.3.7.1 (Surface Arrays For all surfaces other than separate
stencil buffer):
"[DevSNB] Errata: Sampler MSAA Qpitch will be 4 greater than the
value calculated in the equation above , for every other odd Surface
Height starting from 1 i.e. 1,5,9,13"
Since this Qpitch errata only impacts the sampler, we have to adjust
the input for the rendering surface to achieve the same qpitch. For
the affected heights, we increment the height by 1 for the rendering
surface.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than pointing the surface_state directly at a single
sub-image of the texture for rendering, we now point the
surface_state at the top level of the texture, and configure
the surface_state as needed based on this.
v2:
* Use SET_FIELD as suggested by Topi
* Simplify min_array_element assignment as suggested by Topi
v3:
* Use irb->layer_count for depth instead of rb->Depth
* Make gl_target const
* depth - 1, not depth
v4:
* Merge in dd43900b & b875f39e fixes to prevent 3D texture piglit
regressions
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since this code was branched from brw_wm_surface_state.c, it had
support for gen < 6. We can now remove this.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Layered rendering is part of OpenGL 3.2; GL_ARB_draw_instanced is part
of OpenGL 3.1. As such, all drivers supporting layered rendering
already support gl_InstanceID.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Only assign gl_Layer if we have GL_AMD_vertex_shader_layer. Gen6 doesn't
(currently) have that extension, but it also doesn't support layered
rendering.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Passes blendminmax and blendsquare. glean's more serious blendFunc fails
in simulation due to binner memory overflow (I really need to work around
that), and fbo-blending-formats fails due to Mesa refusing one of the
getter requests, even before it could fail due to the driver not actually
supporting different formats yet.
The hw_mask is the set of primitives you actually support, so this attempt
to provide the set of formats that's unsupported was wrong in two ways (it
was intended to be '~' not '!'). However, we only call this code when
prim isn't one of the actually supported hw_mask bits, so missing out on
the memcpy didn't matter anyway.
We were triggering simulator assertion failures for not consuming these,
and presumably we want to actually make use of them some day (for things
like point/line antialiasing)
Note that this has the qreg index as 0, which is the same index as the
first GL varyings read. This doesn't matter currently, since that number
isn't used for anything except dumping.
At some point I'm going to want to move the information necessary for the
host buffer upload/download into the BO so that it's independent of the
current vc4->framebuffer, but for now this fixes pointless derefs on
non-simulator in vc4_context.c since the dump_fbo() removal
This patch uses the infrastructure put in place by previous patches
to implement fast color clears and replicated color clears in terms of
meta operations.
This works all the way back to gen7 where fast clear was introduced and
adds support for fast clear on gen8. It replaces the blorp path
completely and improves on a few cases. Layered clears are now done
using instanced rendering and multiple render-target clears use a
MRT shader with rep16 writes.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The data port has a SIMD16 'replicate data' message, which lets us write
the same color for all 16 pixels by sending the four floats in the
lower half of a register instead of sending 4 times 16 identical
component values in 8 registers.
The message comes with a lot of restrictions and could be made generally
useful by recognizing when those restriction are satisfied. For now,
this lets us enable the optimization when we know it's safe, but we don't
enable it by default. The optimization works for simple color clear shaders
only, but does recognized and support multiple render targets.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We'll use this in the i965 fast clear implementation.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
GetTexParamterfv() doesnt change texture state, so instead of
_mesa_lock_texture() we can use _mesa_lock_context_textures(),
which doesn't increase the texture stamp. With this change,
_mesa_update_state_locked() is now only called from under
_mesa_lock_context_textures(), which is right thing to do. Right now
it's the same mutex, but if we made texture locking more fine grained
locking one day, just locking one texture here would be wrong.
This all ignores the fact that texture locking seem a bit
flaky and broken, but we're trying to not blatantly make it worse.
This change allows us to reliably unlock the context textures in the
dd::UpdateState callback as is necessary for meta color resolves.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
No functional change except for glBegin/glEnd style rendering, where we now
do the resolves at glBegin time instead of FLUSH_VERTICES time. This is also
the reason for this change, so that when we later switch fast clear resolve to
use meta, we won't be doing meta operations in the middle of a begin/end
sequence.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
GEN7+ has the fast clear functionality, which lets us clear the color
buffers using the MCS and a scaled down rectangle. To enable this
we have to set the appropriate bits in the 3DSTATE_PS package.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The clipper doesn't support clipping 3DPRIM_RECTLIST primitives and must
be turned off when we use them.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The brw_draw_prims() function is the draw entry point into the driver,
and takes struct _mesa_prim for input. We want to be able to feed
native primitives into the driver, and to that end we introduce
BRW_PRIM_OFFSET, which lets use describe geometry using the native
GEN primitive types.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lets us disable the viewport transform, which will be useful for
emitting 3DPRIM_RECTLIST.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now, this can only be triggered with a new 'no8' INTEL_DEBUG option
and a new context flag. We'll use the context flag later, but introducing
it now lets us bisect to this commit if it breaks something.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The final step to get GLX_MESA_query_renderer working with gallium
drivers.
v2: Remove __DRI2_RENDERER_PREFERRED_PROFILE handling. It's already
handled in dri/common. Spotted by Marek.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Both report 0xffffffff as both vendor and device id, and the maximum
amount of system memory as video memory.
v2: Use aux helper os_get_total_physical_memory().
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
All the values are are currently hardcoded. One could use
some heuristics to determine the amount of video memory if
a callback to the host is not available.
Do we what to advertise the driver as hardwar accelerated ?
Cc: Brian Paul <brianp@vmware.com>
Cc: José Fonseca <jose.r.fonseca@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Implementation based on the classic driver with the following
changes:
- Use auxiliarry function os_get_total_physical_memory to get the
total amount of memory.
- Move the libdrm_intel specific get_aperture_size to the winsys.
Cc: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Implementation based on the classic driver with the following
changes:
- Use auxiliarry function os_get_total_physical_memory to get the
total amount of memory.
- Move the libdrm_intel specific get_aperture_size to the winsys.
Cc: Stephane Marchesin <stephane.marchesin@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Provide the real vendor and and hardcode the device id as
0xffffffff as the devices currently using freedreno are non-pci.
The device features UMA.
Cc: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Namely vendor/device id, accelerated and UMA, which will be used to describe
the underlying renderer.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2:
- Drop __DRI2_RENDERER_PREFERRED_PROFILE case.
- Cleanup return statements.
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Create radeon{Vendor,GetRenderer}String helpers.
- Drop __DRI2_RENDERER_PREFERRED_PROFILE case.
- Cleanup return statements.
To be used by the upcomming GLX_MESA_query_renderer implementation.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Printing the TCL involves that context is available at the time of
query. The GLX_MESA_query_renderer states that glGetString(GL_RENDERER)
and glXQueryRendererStringMESA(GLX_RENDERER_DEVICE_ID_MESA) will have
the same format, thus removing the context dependenicy will help us
achieve that.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Create nouveau_{vendor,get_renderer}_string helpers.
- Set correct max_gl*version.
- Query the device PCIID via libdrm_nouveau/nouveau_getparam.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Essentially all drivers would like to use to opengl core profile if
available, so avoid duplication by moving the code to a common fallback
within driQueryRendererIntegerCommon.
If a driver uses different approach they can handle it separately.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The extension is used by GLX_MESA_query_renderer, which
can be provided for by hardware and software drivers.
v2: Use designated initializers.
v3: Move drisw_query_renderer_*() to dri2_query_renderer.c
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Generate a GL error and return rather than crashing on a null
ctx->Driver.CopyImageSubData pointer (gallium). This allows apitraces
with glCopyImageSubData() calls to continue rather than crash.
Plus, fix a comment typo.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Similar to the problem described in 2c50212b14, if we copy the clear
value through a regular assignment via a floating point value, then if an
integer clear value is being used that happens to contain a signalling NaN
value then it would get converted to a quiet NaN when stored via the x87
floating-point registers. This would corrupt the integer value. Instead we
should use a memcpy to ensure the exact bit representation is preserved.
This bug can be triggered on 32-bit builds with optimisations by using an
integer clear color with a value like 0x7f817f81.
Reviewed-by: Matt Turner <mattst88@gmail.com>
V2: Set force_writemask_all on ADD; this *is* necessary in the VS case
too.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now, assume that the addressed sampler can be in any of the
16-sampler banks. If we preserved range information this far, we
could avoid emitting these instructions if the sampler were known
to be contained within one bank.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're about to be using this infrastructure to build descriptors in
src1 of non-send instructions, when preparing to do an indirect send.
Don't accidentally clobber the conditionalmod field of those
instructions with SFID bits, which aren't part of the descriptor.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
This provides a reasonable place to enforce the hardware restriction
that indirect descriptors must be in a0.0
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
The pass breaks live ranges of virtual registers by allocating new
registers when it sees an assignment to a virtual GRF it's already seen
written.
total instructions in shared programs: 4337879 -> 4335014 (-0.07%)
instructions in affected programs: 343865 -> 341000 (-0.83%)
GAINED: 46
LOST: 1
[mattst88]: Make pass not break in presence of control flow.
invalidate_live_intervals() only if progress.
Fix up delta_x/delta_y.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Commit c66d928f2c ("i965: Enable INTDIV
in SIMD16 mode.") began using generate_math_gen6 to break SIMD16 INTDIV
into two SIMD8 operations.
generate_math_gen6 takes two registers - for unary operations, we pass
ARF null for the second operand. Prior to Broadwell, real operands were
always GRF. But now they can be IMM as well.
So, check for != ARF instead of == GRF.
+12 piglits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts commit af13cf609f, which
appears to cause huge performance problems on Ivybridge. I'd missed
that the FFTID bits are in the low byte. The documentation doesn't
indicate that the URB write message header actually wants FFTID - it
just labels those bits as "Reserved." But it appears necessary.
This does slightly more than revert the original change: originally,
Broadwell had separate code generation, which used MOV, and this patch
only changed it for Gen4-7. Now that both are unified, reverting this
also makes Broadwell use OR. Which should be fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The extension says GL 4.0 is required. We'll meet the spirit
of that restriction by enabling on just those generations which will
soon support GL 4.0 (Gen7+), although it's technically supportable on
all generations.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The quality level (fine/coarse/dont-care) is plumbed through to the
generator as a constant in src1.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The quadop-based method we currently use on all chipsets already
provides the fine version of the derivatives.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This extension is identical to NV_texture_barrier. Alias
glTextureBarrier to the existing glTextureBarrierNV and use the existing
NV_texture_barrier extension bit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This matches the name of the dd hook. Also convert a couple of nearby
dd implementations to lowercase + underscore as is now the standard.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Right now we decide which kernels to use and the GRF start offsets in
one place and emit the kernel pointers later. The logic of how to map
8, 16 and 32 kernels to kernel start pointers follows the same logic as which
GRF start offsets to use, so lets figure out these two things in one place.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The EGL_EXT_image_dma_buf_import specification was revised (according to
its revision history) on Dec 5th, 2013, for EGL to not take ownership of
the file descriptors.
Do not close the file descriptors passed in to eglCreateImageKHR with
EGL_LINUX_DMA_BUF_EXT target.
It is assumed, that the drivers, which ultimately process the file
descriptors, do not close or modify them in any way either. This avoids
the need to dup(), as it seems we would only need to just close the
dup'd file descriptors right after.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76188
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
gcc 4.6.3 chokes with the following error:
brw_vec4.cpp: In member function 'int brw::vec4_visitor::setup_uniforms(int)':
brw_vec4.cpp:1496:37: error: expected primary-expression before '.' token
Apparently C++ does not do named initializers for unions, except maybe
as a gcc extension, which is not present here.
As .f is the first element of the union, just drop it. Fixes the build
error.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
* --enable-{32,64}-bit is done. Use --build and --host instead.
* Configure does not add "-g -O2" to C{,XX}FLAGS.
* Pkg-config has been mandatory for a while now.
* Avoid using LDFLAGS, refer to pkg-config.
* --with-expat is deprecated. Use pkg-config.
v2:
* Note that CC/CXX will need to be set for multilib builds.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
These two were added ages ago, with an explicit comment "Hacks ..."
They have been insufficient for years and maintainers needed to
explicitly handle the build themselves.
Rather than lying and pretending that it works, just kill this hack and
let maintainers build things the way it should be done for their
distribution.
Document the removal in the release notes.
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts commit 2af28040d6.
The commit was resolving an issue where libtool will not setup the
environment correctly when one explicitly provides --enable-{32,64}-bit
at configure time. It was caused due to the "-m32,64" C{,XX}FLAGS being
set too late relative to LT_INIT.
At the same time this cases the enable_static to be incorrectly set,
amongst others leading to build issues. Rather than being smart and
trying to handle 32/64 bit build ourselves it may be better to delegate
it to the builder/maintainer. The latter should now know better which is
the correct(most appropriate) method.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82536
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82546
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
The brw_stage_prog_data struct previously contained an array of float pointers
to the values of parameters. These were then copied into a batch buffer to
upload the values using a regular assignment. However the float values were
also being overloaded to store integer values for integer uniforms. This can
break if x87 floating-point registers are used to do the assignment because
the fst instruction tries to fix up invalid float values. If an integer
constant happened to look like an invalid float value then it would get
altered when it was copied into the batch buffer.
This patch changes the pointers to be gl_constant_value instead so that the
assignment should end up copying without any alteration. This also makes it
more obvious that the values being stored here are overloaded for multiple
types.
There are some static asserts where the values are uploaded to ensure that the
size of gl_constant_value is the same as a float.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Requires GLSL 1.50 or higher, which we only support in the core profile.
V2: Fix broken alignment
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes the LUMINANCE_ALPHA formats of fbo-clear-formats piglit test.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
All we need to do is decompose this to two SIMD8 instructions, like we
do in many other cases. We even already have code for that.
I apparently just botched this last time I tried, and it was easy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When dual source blending, the visitor already stores a flag in
brw_wm_prog_data (dual_src_blend) for the state upload code to use.
The generator also receives this, so there's no need to pass an
additional flag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The _mesa_swizzle_and_convert path can't do transfer ops, so we should bail
if they're needed.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This just cherry-pick the extensions into a list for GLES 3.1
I'm not actually sure if this list if complete or correct, maybe someone
else can tell me what I missed, and I'm not 100% sure on multi_draw_indirect.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
==17630== Invalid read of size 4
==17630== at 0x400AE10: memcpy (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==17630== by 0x49024A2: u_upload_data (u_upload_mgr.c:253)
==17630== by 0x49050E1: u_vbuf_draw_vbo (u_vbuf.c:980)
==17630== by 0x487DE29: cso_draw_vbo (cso_context.c:1425)
==17630== by 0x487DEA0: cso_draw_arrays (cso_context.c:1445)
==17630== by 0x48A3B0E: hud_draw_colored_prims.constprop.6 (hud_context.c:123)
==17630== by 0x48A4810: hud_draw (hud_context.c:266)
==17630== by 0x48763F7: dri_flush (dri_drawable.c:483)
==17630== by 0x4057510: dri2Flush.constprop.4 (dri2_glx.c:559)
==17630== by 0x405789E: dri2SwapBuffers (dri2_glx.c:851)
==17630== by 0x402C531: glXSwapBuffers (glxcmds.c:842)
==17630== by 0x8049716: ??? (in /usr/bin/glxgears)
==17630== Address 0x4426b2c is 4 bytes after a block of size 1,008 alloc'd
==17630== at 0x4006B11: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==17630== by 0x48A4CE7: hud_pane_add_graph (hud_context.c:625)
==17630== by 0x48A68F0: hud_pipe_query_install (hud_driver_query.c:175)
==17630== by 0x48A6A30: hud_driver_query_install (hud_driver_query.c:207)
==17630== by 0x48A5835: hud_create (hud_context.c:791)
==17630== by 0x48756CB: dri_create_context (dri_context.c:165)
==17630== by 0x4871CD4: driCreateContextAttribs (dri_util.c:435)
==17630== by 0x4871E06: driCreateNewContext (dri_util.c:464)
==17630== by 0x4056A22: dri2_create_context (dri2_glx.c:223)
==17630== by 0x402CF68: CreateContext (glxcmds.c:299)
==17630== by 0x402D265: glXCreateContext (glxcmds.c:430)
==17630== by 0x804B136: ??? (in /usr/bin/glxgears)
This is due to second vertex element being specified, and the upload
tries to fetch over the end. However the pane rendering only requires
a single vertex element, so specify only one.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This got broken by 3dbf5bf657.
GL_COLOR_INDEX data is still supported (in legacy contexts), but the new
texstore_swizzle path cannot handle it (and didn't detect this).
Unfortunately there's no piglit test trying to specify textures with a
GL_COLOR_INDEX source format, and I don't really understand how all the color
map stuff which is used by this works, but this caused conform failures
(with a reported mesa implementation error when trying to figure out the color
mapping).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
accel_working2 returns 3 if the new firmware is used.
The comment wasn't updated in v3 of commit:
36771dc winsys/radeon: fix nop packet padding for hawaii
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Fixes piglit glean "do-loop with continue and break" on RS690
It's based on Tom Stellard patch and improved to handle CMP instruction.
[v2] handle CMP instruction
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: David Heidelberger <david.heidelberger@ixit.cz>
We were leaking the input buffer used for kernel arguments and since
we were allocating it using si_upload_const_buffer() we were leaking
1 MB per kernel invocation.
CC: "10.2" <mesa-stable@lists.freedesktop.org>
This will decrement the reference count for buffers referenced in the
command stream will prevent us from leaking them.
CC: "10.2" <mesa-stable@lists.freedesktop.org>
There is a hard limit in older kernels of 256 MB for buffer allocations,
so report this value as MAX_MEM_ALLOC_SIZE and adjust MAX_GLOBAL_SIZE
to statisfy requirements of OpenCL.
CC: "10.2" <mesa-stable@lists.freedesktop.org>
Before, when we encountered a situation where we had to optimistically
color a node, we would immediately give up and push all the remaining
nodes on the stack in the order of their index - which is a random, and
potentially not optimal, order. Instead, choose one node to
optimistically color in ra_select(), and then once we've optimistically
colored it, keep on going as normal in the hopes that we've opened up
more avenues for the normal select phase to make progress. In cases with
high register pressure, this helps make the order we push things on the
stack much better, and therefore increase the chance that we can allocate
successfully.
total instructions in shared programs: 4545447 -> 4545401 (-0.00%)
instructions in affected programs: 1353 -> 1307 (-3.40%)
GAINED: 124
LOST: 6
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we would consider any optimistically colored nodes for
spilling. However, spilling any optimistically colored nodes below the
node that we failed to color on the stack wouldn't help us make
progress, since it wouldn't help with allowing us to find a color for
the node currently failing to get colored. Only consider nodes
which were above the failing node on the stack for spilling, which
simplifies the logic, and comment the code better so people know what's
going on here.
No shader-db changes with BRW_MAX_GRF reduced to 90 (or with the normal
number of GRF's).
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can store the q total that pq_test() would've calculated in the node
itself, updating it when we add a node to the stack. This way, we only
have to walk the adjacency list when we push a node on the stack (i.e.
when the p, q test succeeds) instead of every time we do the p, q test.
No difference in shader-db run times, but I'm keeping this in because
the q total that it calculates will also be used in the next few commits.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, there were 3 entrypoints into parts of the actual allocator,
and an API called ra_allocate_no_spills() that called all 3. Nobody
would ever want to call any of the 3 entrypoints by themselves, so
everybody just used ra_allocate_no_spills(). So just make them static
functions, and while we're at it rename ra_allocate_no_spills() to
ra_allocate() since there's no equivalent "with spills," because the
backend is supposed to handle spilling.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This would try to allocate 0-sized bo's when the max level was below the
base level.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The ARB_gpu_shader5 extension is made up of a lot of small sub-parts.
Instead of adding PIPE_CAP's for each of these, just rely on the GLSL
version reported by the pipe driver. The remaining extensions lend
themselves naturally to being checked through a single CAP.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The file contains rules that are executed on incremental builds. This
way one can avoid doing a full clean and ensure that the new object
(library) is correctly build.
Inspired by the work of Chih-Wei Huang, from the Android-x86 project.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Will make it easier on us as CleanSpec.mk comes along and improves
consistency across the Android build.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Will make it easier on us as CleanSpec.mk comes along and improves
consistency across the Android build.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Saves us a few lines and brings us closer to the automake build.
Drop DRM_TOP as it's not longer used.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Trying to get rid of the hardcoded dependency of DRM_TOP which
expects that mesa is localted in /external/drm. Will
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The variable i915_C_FILES changed to i915_FILES with commit
34d4216e64 back in mesa 9.1/9.2. Yet we've missed to update the
the android build, essentially creating an dummy/empty driver that
can never work.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
When the compiler is not capable/does not accept -msse4.1 while the target
has the instruction set we'll blow up as _mesa_streaming_load_memcpy is
going to be undefined.
To make sure that never happens, wrap the runtime cpu check+caller in an
ifdef thus do not compile that hunk of the code.
Fix the android build by enabling the optimisation and adding the define
where applicable.
v2: autoconf conditionals end with "fi" rather than endif.
v3: Wrap the definition and call to intel_miptree_{un,}map_movntdqa in
if defined(USE_SSE41). Spotted by Matt.
Cc: Matt Turner <mattst88@gmail.com>
Cc: Adrian Negreanu <adrian.m.negreanu@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Upstream Android (system/core) has dropped these formats with commit
6bac41f1bf9(get rid of HAL pixelformats 5551 and 4444) yet does not
mention why.
These formats never really worked so we're safe to drop them as well.
Identical commit is available in the android-x86 external/mesa repo
commit 06a2d36edc
Author: Chih-Wei Huang <cwhuang@linux.org.tw>
Date: Wed Sep 25 01:16:57 2013 +0800
android: get rid of HAL pixelformats 5551 and 4444
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Recent versions of bionic has picked up support for these functions,
leading to build issues due to the redefition of the symbols.
Note: wrapping things in #ifdef does not cut it :\
Identical patch is available in chromium, android-x86 and perhaps other
projects.
commit 66c1c789ce3407472de9ed620c9f815639058835
Author: rmcilroy@chromium.org
Date: Wed Apr 02 10:59:34 2014 +0000
Porting to x64 Android. Remove redefinitions of log2 and log2f.
BUG=
R=kbr@chromium.org
Review URL: https://codereview.chromium.org/216773005
commit 9cc0a0d2b0
Author: Chih-Wei Huang <cwhuang@linux.org.tw>
Date: Sun Jul 21 23:04:19 2013 +0800
android: remove log2, log2f
The functions are already defined in the latest bionic.
Cc: Chia-I Wu <olvaffe@gmail.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Chia-I Wu <olvaffe@gmail.com>
Android build never really installs the headers, as such we need to
explicitly add their location in the source tree otherwise it will
fail to find them.
v2: Android now installs the headers, so let's use that ;)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Considering the way we've been consolidating things it makes
sense to add the final two (aux and tests) in here.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Yet another makefile less to worry about.
v2: Add state_trackers and targets on a single SUBDIRS line.
Requested by Matt.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
One makefile less, with the potential of further compacting the
automake build.
v2: Rebase on top of vc4 changes.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Rather than having two separate almost empty and identical makefiles,
compact them thus improving the configure and build time.
Additionally this makes the automake build symmetrical to the scons
and android one.
v2: Rebase on top of vc4, compact drivers + winsys on a single line.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For all the people interested in testing the freedreno driver on
their Android devices. The next commit will hook these up within
the libEGL driver (via the gallium-egl backend).
There may be some rough edges but those can be sorted when a
willing builder/tester comes along.
v2:
- s/freefreno/freedreno/. Spotted by Matt Turner.
- Use the installed libdrm headers.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: freedreno@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- link against libdrm_radeon
- link the r600 driver against libstlport
- linkin the newly added libmesa_pipe_radeon library
required by r600 and radeonsi drivers
v2: Include pipe_radeon after pipe_r600/radeonsi.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
[Emil Velikov] Split up and add commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- include the correct folders
- add a new buildscript for the common radeon folder
v2: Use the installed libdrm headers over the DRM_TOP ones.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
[Emil Velikov] Split up and add commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
For a while the nouveau pipe driver has been a static library
and it has been using STL for even longer.
Correct add the link and cleanup the gallium_DRIVERS.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
nouveau uses STL for a while now thus we need to include
external/stlport/libstlport.mk in order to get the build
at least partially working.
v2: Use the installed libdrm headers over the DRM_TOP ones.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Rather than having the sources list duplicated across all three
build systems, define it once and use it whenever needed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A while back we've mandated that gbm requires enable_dri,
thus this check is no longer required.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
To avoid unresolved symbols in the DRI modules with earlier commit we
wrapped the innards of dri_kms_init_screen() in a
DRI_TARGET/GALLIUM_SOFTPIPE ifdef.
At the same time we forgot to adds the defines to the st/dri build
systems, breaking kms_swrast and gnome-continuous.
Drop the DRI_TARGET define, we're already in st/DRI.
Reported-by: Jasper St. Pierre <jstpierre@mecheye.net>
Reported-by: Vadim Rutkovsky <vrutkovs@redhat.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We should assert when either the function or the flag pointer
is null or we'll end up with a null reference a few lines later.
Currently unused by mesa thus it has gone unnoticed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Apps that only want to use core functionality should #include this
header. This version covers everything up to OpenGL 4.5.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We don't actually do them (or even fake them) currently, but it does get
us a bunch of unrelated glean glsl1 tests passing, which previously would
error out due to glean assuming the minimums on a 3D texture that 2 of the
subtests use.
This can be useful for looking at context init setup and texture format
choices, and there's no reason for the silly retval computation we do if
you're not going to have this code (mostly from freedreno) around.
Everything should be in place to unify code generation between Gen4-7
and Gen8+. We should be able to drop the Gen8 generators at this point.
However, leave them hooked up for a brief moment, for testing and
comparison purposes. Set GEN8=1 to use the old Gen8+ code generator
paths.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kind of a moot point since we're deleting Gen8 code generation, but
this at least helps make it match the Gen4-7 code. It's probably more
reasonable than using float.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This improves performance in Trine 2 at 1280x720 (windowed) on "Very
High" settings by 30% (in the interactive menu) to 45% (in the forest
by the giant frog) on Haswell GT3e.
It also now generates the same assembly on Gen7 as it does on Gen8,
which always used the sampler for both types.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to the documentation, we need to set the source 0 register
type to IMM for flow control instructinos that have both JIP and UIP.
Out of paranoia, just make all flow control instructions use IMM;
there's no benefit to using ARF anyway, and it could trouble that's
difficult to diagnose.
See commit 9584959123, which did the
analogous change in the gen8_generator code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're going to add a Gen8+ case shortly, which would need to duplicate
this code again. Instead, share it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When we combine the Gen4-7 and Gen8+ generators, we'll need to handle
half float packing/unpacking functions somehow. The Gen8+ generator
code today just emulates the behavior of the Gen7 F32TO16/F16TO32
instructions, including the align16 mode bugs.
Rather than messing with fs_generator/vec4_generator, I decided to just
emulate the instructions at the brw_eu_emit.c layer.
v2: Change gen >= 7 asserts to gen == 7 (suggested by Chris Forbes).
Fix regressions on Haswell in VS tests due to type assertions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Broadwell requires the number of vertices written by the geometry shader
to be specified in a separate register, as part of the terminating
message's payload.
This also means GS_OPCODE_THREAD_END needs to increment mlen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
g0.5 has nothing of value to contribute to m0.5. In both the VS and GS
payload, g0.5 contains the scratch space pointer - which is definitely
not of any use. The GS payload also contains FFTID, but the URB write
message header doesn't want FFTID.
The only reason I used OR was because Eric originally requested it.
On Broadwell, I used MOV, and that's worked out fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Haswell, we implement "discard" via predicated SEND messages, using
f0.1 instead of f0.0. To accomplish this, we set inst->flag_subreg to 1
on the FS_OPCODE_FB_WRITE.
Most instructions using fs_inst::flag_subreg expand to a single assembly
instruction. However, FS_OPCODE_FB_WRITE can generate several MOVs for
setting up header information. We don't want to set flag_subreg on
those, so override the default state back to 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously the Meta implementation of glGetTexImage would fall back to
_mesa_get_teximage if the texturing is not using an unsigned normalised
format. However in order to support the half-float formats of BPTC textures we
can make it render to a floating-point renderbuffer instead. This patch makes
decompression_state have two FBOs, one for the GL_RGBA format and one for
GL_RGBA32F. If a floating-point texture is encountered it will try setting up
a floating-point FBO. It will now also check the status of the FBO and fall
back to _mesa_get_teximage if the FBO is not complete.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Enables the BPTC extension on Gen>=7 and adds the necessary format mappings to
get the right surface type value.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Once we add BPTC texture support we will need to generate mipmaps for
compressed floating point textures too. Most of the code seems to already be
there but it just needs a few extra lines to get it to use GL_FLOAT instead of
GL_UNSIGNED_BYTE as the type for the temporary buffers.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds compressors for all four of the BPTC compressed-texture formats. The
compressor is written from scratch and takes a very simple approach. It always
uses a single mode of the BPTC format (4 for unorm and 3 for half-floats) and
picks the two endpoints by dividing the texels into those which have more or
less than the average luminance of the block and then calculating an average
color of the texels within each division.
It's probably not really sensible to try to use BPTC compression at runtime
because for example with the Nvidia offline compression tool it can take in
the order of an hour to compress a full-screen image. With that in mind I
don't think it's worth having a proper compressor in Mesa and this approach
gives reasonable results for a usage that is basically a corner case.
v2: Always use the custom compressor, even for the unorm formats. Fix the
quantization step for the half-float format compressor. Fixed a typo which
was breaking the right-hand edge of half-float textures with a width that
isn't a multiple of four.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Adds functions to fetch from any of the four BPTC-compressed formats.
v2: Set the alpha component to 1.0 when fetching from the half-float formats
instead of leaving it uninitialised. Don't linearize the alpha component
when fetching from sRGB.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds the following four Mesa image format enums which correspond to the
four BPTC compressed texture formats:
MESA_FORMAT_BPTC_RGBA_UNORM
MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM
MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT
MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT
It also updates the format information functions to handle these and the
corresponding GL enums.
v2: Also modify _mesa_get_format_color_encoding, _mesa_get_srgb_format_linear
and _mesa_get_uncompressed_format
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Adds the ‘bptc’ layout to get_channel_bits. The channel bits for BPTC depend
on the mode but as it only has to be an approximation this sets it to 8 for
the two UNORM formats and 16 for the two half-float formats. These represent
the minimum number of bits of variation that can be generated by the
interpolation of the two formats.
This doesn't quite match what we do for S3TC which only returns 4 even though
it can similarly generate 8 bits from the interpolation. However it does match
what we return for ETC2. For reference, NVidia seems to return 8 bits for the
UNORM formats and 32 bits for the half-float formats.
v2: Change the number of bits to 8/8/8/8 for the UNORM formats and 16/16/16
for the half-float formats.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If the name of a compressed texture format has ‘FLOAT’ in it it will now set
the data type of the format to GL_FLOAT. This will be needed for the BPTC
half-float formats.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The signed and unsigned half-float BPTC-compressed formats were being reported
as having a base format of GL_RGBA but they don't store an alpha channel so it
should be GL_RGB.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds a boolean in the gl_extensions struct for
GL_ARB_texture_compression_bptc as well as an entry in extension_table.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The initial firmware for hawaii does not support type3 nop packet.
Detect the new hawaii firmware with query RADEON_INFO_ACCEL_WORKING2.
If the returned value is 3, then the new firmware is used.
This patch uses type2 for the old firmware and type3 for the new firmware.
It fixes the cases when the old firmware is used and the user wants to
manually enable acceleration.
The two possible scenarios are:
- the kernel has no support for the new firmware.
- the kernel has support for the new firmware but only the old firmware
is available.
Additionaly this patch disables GPU acceleration on hawaii if the kernel
returns a value < 2. In this case the kernel hasn't the required fixes
for proper acceleration.
v2:
- Fix indentation
- Use private struct radeon_drm_winsys instead of public struct radeon_info
- Rename r600_accel_working2 to accel_working2
v3:
- Use type2 nop packet for returned value < 3
v4:
- Fail to initialize winsys for returned value < 2
Cc: mesa-stable@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds a limit to the maximum surface size which is
based on the maximum size of a single mob. If this value is not
available, the maximum surface size is by default set to 128 MB.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Replace the plain sampler index with a register reference to a sampler.
We also need to keep track of the sampler array size when there is a
relative reference so that we can mark the whole array used.
To facilitate implementation, we add a separate ADDR register that
exclusively handles the sampler relative address. Other approaches would
be more invasive.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
If the array index is not a constant expression, the existing support
will assume a zero offset (giving us the sampler index of the base of
the array).
For dynamically uniform indexing of sampler arrays, we need both that
and the indexing expression.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
V2: Expand comment to explain what dynamically uniform expressions are
about.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This prevents some simulator assertion failures, but it does mean (since
I've dropped the "* 16" padding) that on real hardware you need a kernel
that does overflow memory management (currently, "drm/vc4: Add support for
binner overflow memory allocation." in my kernel tree).
In addition to reducing sim-specific code, it also avoids our local handle
allocation conflicting with the host GEM's handle numbering, which was
causing vc4_gem_hindex() to not distinguish between winsys BOs and the
same-numbered non-winsys bo.
We don't care if things like vertex data get smashed by render target
data, but we do need to make sure that shader code doesn't get rendered
to.
v2: Fix overflowing read of gl_relocs[] that incorrect flagged of some
VBOs as shader code.
The comment conflicted with the support in the code, so I moved the TMU
write validation to where the comment was, and dropped some dead arguments
from the functions while changing their signatures.
This doesn't load/store the Z contents across submits yet. It also
disables early Z, since it's going to require tracking of Z functions
across multiple state updates to track the early Z direction and whether
it can be used.
v2: Move the key setup to before the search for the key.
Otherwise, once we're not flushing at the end of every draw, we'll free
things like gallium resources, and free the backing GEM object, before
we've flushed the rendering using it to the kernel.
render_cl_size/bin_cl_size includes relocations, while the hardware buffer
doesn't. If you don't emit a HALT packet, the command parser continues
until the end register's value. We can't allow executing unvalidated
buffer contents (and it's actually harmful in the render lists Mesa is
emitting, since VC4_PACKET_STORE_MS_TILE_BUFFER_AND_EOF doesn't trigger a
halt).
This required building a shader parser that would walk the program to find
where the texturing-related uniforms are in the uniforms stream.
Note that as of this commit, a new kernel is required for rendering on
actual VC4 hardware (currently that commit is named "drm/vc4: Introduce
shader validation and better command stream validation.", but is likely to
be squashed as part of an eventual merge of the kernel driver).
This ensures that when I'm using the simulator, I get a closer match to
what behavior on real hardware will be. It lets me rapidly iterate on the
kernel validation code (which otherwise has a several-minute turnaround
time), and helps catch buffer overflow bugs in the userspace driver
faster.
Only rgba8888 works, and only a single texture unit, and it's only under
simulation because I haven't built the kernel interface yet.
v2: Rebase on helpers.
v3: Fold in the don't-break-the-arm-build fix.
This limit is fixed in Mesa core and cannot be changed.
It only affects ARB_vertex_program and ARB_fragment_program.
The minimum value for ARB_vertex_program is 1 according to the spec.
The maximum value for ARB_vertex_program is limited to 1 by Mesa core.
The value should be zero for ARB_fragment_program, because it doesn't
support ARL.
Finally, drivers shouldn't mess with these values arbitrarily.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This computes all GL versions before any context is created.
It's a requirement for GLX_MESA_query_renderer.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Setting Const.MaxSamples needed a rework, so that it doesn't call
st_choose_format, which depends on st_context.
Other than that, there is no change in functionality.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I don't know of any hardware which supports it.
With this, GL_OES_compressed_ETC1_RGB8_texture is supported if RGBA8
is supported.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
The formats are emulated by translating them into plain uncompressed
formats, because I don't know of any hardware which supports them.
This is required for GLES 3.0 and ARB_ES3_compatibility (GL 4.3).
This casues problems when converting atomics to use the GRF. Sometimes the atomic operation would get eaten by CSE when it shouldn't.
v2: Roll the has_side_effects check into is_expression
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This, together with the meta path, provides a complete implemetation of
ARB_copy_image.
v2: Add a fallback memcpy path for when the texture is too big for the
blitter
v3: Properly support copying between two places on the same texture in the
memcpy fallback
v4: Properly handle blit between the same two images in the fallback path
v5: Properly handle blit between the same two compressed images in the
fallback path
v6: Fix a typo in a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
This provides an implementation of CopyImageSubData that works if both
textures are uncompressed. This implementation works by using a
combination of texture views and BlitFramebuffer. If one of the textures
is compressed, it returns false and the driver is expected to provide a
fallback.
v2: Don't leak fbo's
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
v3: Change glGen/DeleteTextures to _mesa_Gen/DeleteTextures
This adds the API entrypoint, error checking logic, and a driver hook for
the ARB_copy_image extension.
v2: Fix a typo in ARB_copy_image.xml and add it to the makefile
v3: Put ARB_copy_image.xml in the right place alphebetically in the
makefile and properly prefix the commit message
v4: Fixed some line wrapping and added a check for null
v5: Check for incomplete renderbuffers
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
v6: Update dispatch_sanity for the addition of CopyImageSubData
There's no need to copy the array of DrawBuffer enums to a temp array.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Fixes failed assertion when _mesa_update_draw_buffers() was called
with GL_DRAW_BUFFER == GL_FRONT_AND_BACK. The piglit gl30basic hit
this.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
No need to return a value. Remove unused ctx parameter. Remove
_mesa_ prefix since it's static.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Silences MinGW warnings:
warning: unknown conversion type character ‘l’ in format [-Wformat]
warning: too many arguments for format [-Wformat-extra-args]
v2: use signed types/formats
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Previously the blorp blitter wouldn't be used if the source and destination
buffer had a different format other than swizzling between RGB and BGR and
adding or removing a dummy alpha channel. However there's no reason why the
blorp code path can't be used to do almost all format conversions so this
patch just removes the checks. However it does explicitly disable converting
to/from MESA_FORMAT_Z24_UNORM_X8_UINT because there is a similar check
brw_blorp_copytexsubimage.
This doesn't cause any Piglit test regressions at least on Ivybridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Broadwell measures jump distances in bytes, so we need to scale by 16.
v2: Update the function in brw_eu.h, not in brw_eu_emit.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Different generations of hardware measure jump distances in different
units. Previously, every function that needed to set a jump target open
coded this scaling, or made a hardcoded assumption (i.e. just used 2).
Most functions start with the number of instructions to jump, and scale
up to the hardware-specific value. So, I made the function match that.
Others start with a byte offset, and divide by a constant (8) to obtain
the jump distance. This is actually 16 / 2 (the jump scale for Gen5-7).
v2: Make the helper a static inline defined in brw_eu.h, instead of
an actual function in brw_eu_emit.c (as suggested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It has Gen6+ knowledge baked in, and indeed is only called for Gen6+,
but it wasn't immediately obvious that this was the case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Until now, it's been off implicitly: we never call the compactor
function. When we merge the generators, we'll start calling it, so we
should make it do nothing.
Matt will enable instruction compaction properly later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Broadwell is going to use the brw_eu_emit.c code soon. We want to get
the fake MRF handling and URB HWord channel mask handling.
We don't need the CMP thread switch workaround, though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we no longer use ctx->DrawBuffer->_Xmin and related fields to
program the screen-space viewport extents, we don't depend on any
scissoring state. So we can drop the +_NEW_SCISSOR dependency.
On GEN8, a change in scissor state does not effect anything for the
clipper/sf hardware state. The hardware will always do the right thing
once the viewport extents are programmed. We can therefore remove the
unecessary state emission.
Ken originally spotted this.
v2: Reword the commit message. Remove spurious hunk.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The goal of guardband clipping is to try to avoid 3d clipping because it
is an expensive operation. When guardband clipping is disabled, all
geometry that intersects the viewport is sent to the FF 3d clipper.
Objects which are entirely enclosed within the viewport are said to be
"trivially accepted" while those entirely outside of the viewport are,
"trivially rejected".
When guardband clipping is turned on the above behavior is changed such
that if the geometry is within the guardband, and intersects the
viewport, it skips the 3d clipper. Prior to GEN8, this was problematic
if the viewport was smaller than the screen as it could allow for
rendering to occur outside of the viewport. That could be mitigated if
the programmer specified a scissor region which was less than or equal
to the viewport - but this is not required for correctness in OpenGL. In
theory you could be clever with the guardband so as not to invoke this
problem. We do not do this, and have no data that suggests we should
bother (nor the converse data).
With viewport extents in place on GEN8, it should be safe to turn on
guardband clipping for all cases
While here, add a comment to the code which confused me thoroughly.
v2: Update grammar in commit message. Reword comments based on Ken's
suggestion.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Viewport extents are a 3rd rectangle that defines which pixels get
discarded as part of the rasterization process. The actual pixels drawn
to the screen are an intersection of the drawing rectangle, the viewport
extents, and the scissor rectangle. It permits the use of guardband
clipping in all cases (see later patch). The actual pixels drawn to the
screen are an intersection of the drawing rectangle, the viewport
extents, and the scissor rectangle.
Scissor rectangle is not super important for this discussion as it should
always help do the right thing provided the programmer uses it.
switch (viewport dimensions, drawrect dimension) {
case viewport > drawing rectangle: no effects; break;
case viewport == drawing rectangle: no effects; break;
case viewport < drawing rectangle:
Pixels (after the viewport transformation but before expensive
rastersizing and shading operations) which are outside of the
viewport are discarded.
}
I am unable to find a test case where this improves performance, but in
all my testing it doesn't hurt performance, and intuitively, it should
not ever hurt performance. It also permits us to use the guardband more
freely (see upcoming patch).
v2: Updating commit message.
v3: Commit message updates requested by Ken
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While working in this part of the code I had a great deal of trouble
understanding what it was trying to do, and matching it with the spec.
(mostly due bad wording in the PRM). To help future people, I've cleaned
up the wording and provided some ascii art.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This lets us call dump_instructions() after register allocation without
failing an assertion.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Without this patch I get the following during DMA transfers:
[drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
radeon 0000:01:00.0: CP DMA dst buffer too small (21475829792 4096)
This is a fixup for e878e154cd.
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tahiti has 12 tile pipes, but P8 pipe config.
It looks like there is no way to get the pipe config except for reading
GB_TILE_MODE. The TILING_CONFIG ioctl doesn't return more than 8 pipes,
so we can't use that for Hawaii.
This fixes a regression caused by 9b046474c9
on Tahiti.
v2: add an assertion and print an error on failure
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The code is rewritten to take known constraints into account, while always
using 0 by default.
This should improve performance for multi-SE parts in theory.
A debug option is also added for easier debugging. (If there are hangs,
use the option. If the hangs go away, you have found the problem.)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: fix a typo, set max_se for evergreen GPUs according to the kernel driver
This is a bug which was probably uncovered recently by Jason's commits
and broke this.
The problem is _mesa_base_tex_format(GL_STENCIL_INDEX) returns -1.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
These values are supposed to be the minimum/maximum index values used to
read from the vertex buffers. This code either copies index values out of
the old IB (so, same min/max as the original draw call), or generates a
new IB (using index values between the start and the start + count of the
old array draw info, which just happens to be what min/max_index are set
to by st_draw.c).
We were incorrectly setting the max_index in the
converting-from-glDrawArrays case to the start vertex plus the number of
vertices generated in the new IB, which broke QUADS primitive conversion
on VC4 (where max_index really has to be correct, or the kernel might
reject your draw call due to buffer overflow).
Reviewed-by: Rob Clark <robclark@freedesktop.org> (from verbal description
of the patch)
Some tests start working (useprogram-flushverts, for example) due to
getitng the right vertices now. Some that used to pass start failing with
memory overflow during binning, which is weird (glsl-fs-texture2drect).
And a couple stop rendering correctly (glsl-fs-bug25902).
v2: Move the attribute format setup in the key from after search time to
before the search.
v3: Fix reading of attributes other than position (I forgot to respect
attr and stored everything in inputs 0-3, i.e. position).
We could get undefined sources in real programs from the wild, so we'll
need to turn off this debug eventually. But for now, using undefined
sources is typically me just mistyping something.
We put in a bunch of extra MOVs for program outputs, and this can clean
those up. We should do uniforms, too, though.
v2: Fix missing flagging of progress when we actually optimize. Caught by
Aaron Watry.
This cleans up a bunch of noise in the compiled coordinate shaders (since
we don't need the varying outputs), and also from writemasked instructions
with negated src operands.
This took a couple of tries, and this is the squash of those attempts.
v2: Fix register file conflicts on the args in the
destination-is-accumulator case.
v3: Rebase on helper change and qir_inst4 change.
We will want to occasionally disable this again when we do clear support.
v2: Squash with the previous commit (I accidentally committed at two
stages of writing the change)
This introduces an IR (QIR, for QPU IR) to do optimization on. It's a
scalar, SSA IR in general. It looks like optimization is pretty easy this
way, though I haven't figured out if it's going to be good for our weird
register allocation or not (or if I want to reduce to basically QPU
instructions first), and I've got some problems with it having some
multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably
want to break down.
Of course, this commit mostly doesn't work, since many other things are
still hardwired, like the VBO data.
v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR
instructions into temporary values, and make qir_inst4 take the 4 args
separately instead of an array (all later callers wanted individual
args).
Note: This is the cutoff point where I switched from developing primarily
on the Pi to developing o the simulator. As a result, from this point on
the code is untested on the Pi (the kernel code I have currently wasn't
rendering anything at this commit, though the simulator renders
successfully, suggesting kernel bugs).
This mostly just takes every draw call and turns it into a sequence of
commands that clear the FBO and draw a single shaded triangle to it,
regardless of the actual input vertices or shaders. I copied the initial
driver skeleton mostly from freedreno, and I've preserved Rob Clark's
copyright for those. I also based my initial hardcoded shaders and
command lists on Scott Mansell (phire)'s "hackdriver" project, though the
bit patterns of the shaders emitted end up being different.
v2: Rebase on gallium megadrivers changes.
v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change.
v4: Rely on simpenrose actually being installed when building for
simulation.
v5: Add more header duplicate-include guards.
v6: Apply Emil's review (protection against vc4 sim and ilo at the same
time, and dropping the dricommon drm bits) and fix a copyright header
(thanks, Roland)
The non-llvm path made sure that both clip and pre_clip_pos point to the data
output by position, not clipvertex, if user based clipping is disabled.
However, the llvm path did not, which apparently led to failures if
gl_ClipVertex was written but user plane clipping not enabled (bug 80183).
Why I have no idea really, but just make it match the non-llvm behavior...
Reviewed-by: Brian Paul <brianp@vmware.com>
Use both macros as in some cases using AC_CHECK_FUNCS alone may fail.
Thus HAVE_DLADDR will not be defined, and as a result most of the code
in megadriver_stub.c will not be compiled. Breaking the backwards
compatibility between older libGL/xserver(s) and DRI megadrivers.
Cc: Jon TURNEY <jon.turney@dronecode.org.uk>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
[Emil Velikov] Commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The tests in an empty stub, which we're currently building twice.
If anyone is interested in expanding it (adding actual tests) they
can always bring it back.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This support is preliminary due to the fact that MSAA is not
actually implemented.
However, this patch does fix the piglit test:
spec/!OpenGL 3.2/glsl-resource-not-bound 2DMS (bug #79740).
(v2 RS: don't emit 4th coord as explicit lod)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The distinction between system values and ordinary inputs is not very
obvious in gallium - further fueled by the fact that they use the same
semantic names.
Still, if there's any value which imho really is a system value, it's the
primitive id input into the gs (while earlier (tessleation) stages could read
it, it is _always_ generated by the system). For some odd reason though (which
I'd classify as a bug but seems too complicated to fix) the glsl compiler in
mesa treats this as an ordinary varying, and everything else after that
(including the state tracker and other drivers) just go along with that.
But input fetching in gs for llvm based draw was definitely limited to the
ordinary (2-dimensional) inputs so only worked with other state trackers,
the code was also additionally relying on tgsi_scan_shader filling
uses_primid correctly which did not happen neither (would set it only for
all stages if it was a system value, but only set it for the fragment shader
if it was an input value).
This fixes piglit glsl-1.50-geometry-primitive-id-restart and primitive-id-in
in llvmpipe.
Reviewed-by: Brian Paul <brianp@vmware.com>
These values are always uints, casting them to floats does no good.
Fixes piglit glsl-1.50-geometry-primitive-id-restart tests for softpipe.
Reviewed-by: Brian Paul <brianp@vmware.com>
OpenCL 1.2 CL_MAP_WRITE_INVALIDATE_REGION sounds a lot like
PIPE_TRANSFER_DISCARD_RANGE:
From OpenCL 1.2 spec:
The contents of the region being mapped are to be discarded.
From p_defines.h:
Discards the memory within the mapped region.
v2: Move the code for validating flags to the front-end as
suggested by Francisco Jerez
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The PRMs no longer have a single table for format capabilities. Multiple
tables take up less space, and are easier to maintain.
Encode typed write information while at it.
We have a CPU-side implementation of conditional rendering; it really
should be done on the GPU. It's not necessarily that hard, but nobody
has gotten to fixing it yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, we explicitly set the execution size to BRW_EXECUTE_8 and
disabled compression for loop instructions. I can't imagine how this
could be correct in SIMD16 mode.
Looking at the history, it appears that this code has used BRW_EXECUTE_8
since 2007, when we had a SIMD8 backend that supported control flow and
a separate SIMD16 backend that didn't. Presumably, when we added SIMD16
support for shaders with control flow, we simply neglected to update it.
Note that Gen4-5 don't support SIMD16 on shaders with control flow.
This might be a candidate for stable, but would need to be rewritten
completely due to the brw_inst API changes in master.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We shouldn't need to set them, then set them differently.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
format_srgb.c is generated by format_srgb.py python script, having
format_srgb.c in git ignore list will silence git complaints about
untracked file.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also delete the comment before that function. Everything in that
comment was either stale, wrong, or captured elsewhere.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplifies all the callers, and it enables the removal of one of
the function parameters.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With two tests both numbered 118, there was a confusing off-by-two difference
between the last test number and the total number of tests (as reported by
glcpp-test).
With this rename, there's only an off-by-one difference left, (which is easy
to understand given the zero-based test numbering).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Here is some additional stress testing of nested macros where the expansion
of macros involves commas, (and whether those commas are interpreted as
argument separators or not in subsequent function-like macro calls).
Credit to the GCC documentation that directed my attention toward this issue:
https://gcc.gnu.org/onlinedocs/gcc-3.2/cpp/Argument-Prescan.html
Fixing the bug required only removing code from glcpp. When first testing the
details of expansions involving commas, I had come to the mistaken conclusion
that an expanded comma should never be treated as an argument separator, (so
had introduced the rather ugly COMMA_FINAL token to represent this).
In fact, an expanded comma should be treated as a separator, (as tested here),
and this treatment can be avoided by judicious use of parentheses (as also
tested here).
With this simple removal of the COMMA_FINAL token, the behavior of glcpp
matches that of gcc's preprocessor for all of these hairy cases.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Beyond just listing this in the TESTS variable in Makefile.am, only minor
changes were needed to make this work. The primary issue is that the build
system runs the test script from a different directory than the script
itself. So we have to use the $srcdir variable to find the test input files.
Using $srcdir in this way also ensures that this test works when using an
out-of-tree build.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The (optional) test-specific command-line arguments to be passed to glcpp are
embedded within the source files of some tests, and glcpp-test uses grep to
extract them.
Of course, grep is line-based and looks for the native line-separator to
determine line boundaries. So, for files using non-native line separators,
grep was getting quite confused and passing bogus arguments to glcpp.
Fix this by canonical-izing the line separators in the source file prior to
using grep.
With this commit, the glcpp-test-cr-lf tests pass entirely:
\r: 143/143 tests pass
\r\n: 143/143 tests pass
\n\r: 143/143 tests pass
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Sometimes the newline separator is a single character, and sometimes it is two
characters. Before we can fold away and line-continuation backslashes, we
identify the flavor of line separator that is in use.
With this identified, we then correctly search for backslashes followed
immediately by the first character of the line separator.
Also, when re-inserting newlines to replace collapsed newlines, we carefully
insert newlines of the same flavor.
With this commit, almost all remaining test are fixed as tested by
glcpp-test-cr-lf:
\r: 142/143 tests pass
\r\n: 142/143 tests pass
\n\r: 143/143 tests pass
(The only remaining failures have nothing to do with the actual pre-processor
code, but are due to a bug in the way the test suite uses grep to try to
extract test-specific command-line options from the source files.)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some tests were failing because the message printed by #error was including a
'\r' character from the source file in its output.
This is easily avoided by fixing the regular expression for #error to never
include any of the possible newline characters, (neither '\r' nor '\n').
With this commit 2 tests are fixed for each of the '\r' and '\r\n' cases.
Current results after the commit are:
\r: 137/143 tests pass
\r\n 142/143 tests pass
\n\r: 139/143 tests pass
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GLSL specification says that either carriage-return, line-feed, or both
together can be used to terminate lines. Further, it says that when used
together, the pair of terminators shall be interpreted as a single line.
This final requirement has not been respected by glcpp up until now, (it has
been emitting two newlines for every CR+LF pair).
Here, we fix the lexer by using a regular expression for NEWLINE that eats
up both "\r\n" (or even "\n\r") if possible before also considering a single
'\n' or a single '\r' as a line terminator.
Before this commit, the test results are as follows:
\r: 135/143 tests pass
\r\n: 4/143 tests pass
\n\r: 4/143 tests pass
After this commit, the test results are as follows:
\r: 135/143 tests pass
\r\n: 140/143 tests pass
\n\r: 139/143 tests pass
So, obviously, a dramatic improvement.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GLSL specification has a very broad definition of what is a
newline. Namely, it can be the carriage-return character, '\r', the newline
character, '\n', or any combination of the two, (though in combination, the
two are treated as a single newline).
Here, we add a new test-runner, glcpp-test-cr-lf, that, for each possible
line-termination combination, runs through the existing test suite with all
source files modified to use those line-termination characters. Instead of
using the .expected files for this, this script assumes that the regular test
suite has been run already and expects the output to match the .out
files. This avoids getting 4 test failures for any one bug, and instead will
hopefully only report bugs actually related to the line-termination
characters.
The new testing is not yet integrated into "make check". For that, some
munging of the testdir option will be necessary, (to support "make check" with
out-of-tree builds). For now, the scripts can just be run directly by hand.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to this commit, the following snippet would trigger an error in glcpp:
#define FOO defined BAR
#if FOO
#endif
The problem was that support for the "defined" operator was implemented within
the grammar, (where the parser was parsing the tokens of the condition
itself). But what is required is to interpret the "defined" operator that
results after macro expansion is performed.
I could not find any fix for this case by modifying the grammar alone. The
difficulty is that outside of the grammar we already have a recursive function
that performs macro expansion (_glcpp_parser_expand_token_list) and that
function itself must be augmented to be made aware of the semantics of the
"defined" operator.
The reason we can't simply handle "defined" outside of the recursive expansion
function is that not only must we scan for any "defined" operators in the
original condition (before any macro expansion occurs); but at each level of
the recursive expansion, we must again scan the list of tokens resulting from
expansion and handle "defined" before entering the next level of recursion to
further expand macros.
And of course, all of this is context dependent. The evaluation of "defined"
operators must only happen when we are handling preprocessor conditionals,
(#if and #elif) and not when performing any other expansion, (such as in the
main body).
To implement this, we add a new "mode" parameter to all of the expansion
functions to specify whether resulting DEFINED tokens should be evaluated or
ignored.
One side benefit of this change is that an ugly wart in the grammar is
removed. We previously had "conditional_token" and "conditional_tokens"
productions that were basically copies of "pp_token" and "pp_tokens" but with
added productions for the various forms of DEFINED operators. With the new
code here, those ugly copy-and-paste productions are eliminated from the
grammar.
A new "make check" test is added to stress-test the code here.
This commit fixes the following Khronos GLES3 CTS tests:
conditional_inclusion.basic_2_vertex
conditional_inclusion.basic_2_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we were passing these through, just like any other pragma. But the
downstream compiler was tripping up on them. It seems easier to swallow these
in the preprocessor and not pass them on at all rather than fixing the
downstream compiler.
This fixes the following Khronos GLES3 CTS tests:
preprocessor.pragmas.pragma_vertex
preprocessor.pragmas.pragma_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the #pragma directive was swallowing an entire line, (including
the final newline). At that time it was appropriate for it to increment the
line count.
More recently, our handling of #pragma changed to not include the newline. But
the code to increment yylineno stuck around. This was causing __LINE__ to be
increased by one more than desired for every #pragma.
Remove the bogus, extra increment, and add a test for this case.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This new "make check" test stresses out the support from the last two commits,
(to esnure that '#' is correctly interpreted as the null directives,
regardless of any whitespace or comments on the same line).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is the fix for the following line:
# // comment to ignore here
According to the translation-phase rules, the comment should be removed before
the preprocessor looks to interpret the null directive.
So in our implementation we must explicitly look for single-line comments in
the <HASH> start condition as well.
This commit fixes the following Khronos GLES3 CTS tests:
null_directive_vertex
null_directive_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This simply tests the previous commit, (that #define followed by a comment
will still generate the expected "#define without macro name" error message).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were already correctly supporting single-line comments in case like:
#define FOO bar // comment here...
The new support added here is simply for the none-too-useful:
#define // comment instead of macro name
With this commit, this line will now give the expected "#define without
macro name" error message instead of the lexer just going off into the
weeds.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This ensures that the previous commit indeed generates the expected error
message when a "#define" directive is not followed by anything except for a
newline.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, glcpp would emit an error like this if <EOF> happened to occur
immediately after the "#define", but in general would just get confused,
(leading to un-helpful error messages).
To fix things to generate a clean error message, we do a few things:
1. Don't require horizontal whitespace immediately after #define
2. Add a production for the error case, (DEFINE_TOKEN followed
immediately by a NEWLINE token).
3. Make the lexer reset to the <INITIAL> state after every NEWLINE.
This 3rd point prevents the lexer from getting so confused and generating
further spurious errors in the file because it was stuck in the <DEFINE> start
condition.
We also drop the similar error message from the <EOF> rule since the
newly-added rule will have already printed the error message.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Listing the GLSL version as an individual component of a GL version,
separate from the extensions isn't really right. The GLSL changes are
(almost?) entirely comprised of changes listed in the extensions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I think OpenVMS was the only platform that Mesa ran on that used a
non-IEEE representation for floats. We removed OpenVMS support a while
back, and this should alleviate the need to continue updating the
this-platform-uses-IEEE list.
The one bit of this patch that needs review is the IS_INF_OR_NAN,
because I'm not sure if MSVC supports isfinite.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82268
Reviewed-by: Brian Paul <brianp@vmware.com>
Future patches will rearrange the values in gl_system_value, and I want
to catch errors. Designated initializers would make all of this
unnecessary.
v2: Don't use STATIC_ASSERT. Not only does it not work, but GCC doesn't
tell you that it's not going to work. Thanks for nothing!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Due to the destination register width of 1 or 2, these instructions get
ExecSize 1 or 2. But dir and offset (used as src0) are both registers
of width 4, violating the execsize >= width assertion.
I honestly don't think this could have ever worked.
Fixes Piglit's polygon-offset and polygon-mode-offset tests on Gen4-5.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70441
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If the vertex shader has no position but the gs has, the clipvertex output
was -1 (because it's the same as vs position in this case if there's no
explicit clipvertex output). This caused crashes (or assertion failures) in
clipping since in the end position (which came from gs) was different from
cv (-1) and we then tried to use the bogus cv input.
Rather than just test for -1 cv value in clipping, make it explicitly return
the position output of the gs instead which seems cleaner (since we really
don't want to use the clipvertex value from the vs (it could be a valid value
in the (unsupported) case of vs writing clipvertex but still using a gs).
This fixes piglit shader_runner clip-distance-out-values.shader_test.
Reviewed-by: Zack Rusin <zackr@vmware.com>
The clip stage may crash if there's no position output, for this reason
code was added to avoid running the pipeline stages in this case
(c7c7186045). However, this failed to actually
work when there was a geometry shader, since unlike the vertex shader it did
not initialize the position output to -1, hence the code trying to detect
this didn't trigger. So simply initialize the position output to -1 just like
the vs does.
This fixes piglit glsl-1.50-transform-feedback-type-and-size (segfault->pass).
clip-distance-out-values.shader_test goes from segfault to assertion failure,
suggesting more fixes are needed, no other piglit changes.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
This patch fixes this build error on Mac OS X.
./xmlconfig.h:61:5: error: unknown type name 'uint'; did you mean 'int'?
uint nRanges; /**< \brief Number of ranges */
^~~~
int
./xmlconfig.h:79:5: error: unknown type name 'uint'; did you mean 'int'?
uint tableSize;
^~~~
int
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Again, we delete a lot of functions that aren't really doing anything
interesting anymore.
v2: Comment the texstore_rgba_integer function
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This commit also removes a bunch of functions which aren't doing anything
more interesting than the general path does.
v2: Better comment the texstore_via_float function
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This should be both faster and more accurate than our general slow-path of
converting everything to float.
v2: Add a comment to top of the texstore_swizzle function
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This commit splits the texture storage into three functions:
texstore_depth_stencil, texstore_compressed, and texstore_rgba. Right now
this split seems artificial since we just have one function pointer per
format and there is no difference between these three categories. However,
this split makes it much easier to write a more general function upload
path for one of these categories than the current function pointers.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This commits adds the _mesa_format_to_array function that determines if the
given format can be represented as an array format and computes the array
format parameters. This is a direct helper function for using
_mesa_swizzle_and_convert
v2: Better documentation and commit message
v3: Fixed a potential segfault from an invalid endianness swizzle
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Most format conversion operations required by GL can be performed by
converting one channel at a time, shuffling the channels around, and
optionally filling missing channels with zeros and ones. This adds a
function to do just that in a general, yet efficient, way.
v2:
* Add better comments including full docs for functions
* Don't use __typeof__
* Use inline helpers instead of writing out conversions by hand,
* Force full loop unrolling for better performance
v3: Add another set of parens around the MAX_INT macro
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mesa hasn't supported color-indexed textures for some time. This is 0 for
all texture formats, so we don't need to store it.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of a having all of the format metadata in a gigantic hard-to-edit
array of type struct format_info, we now have a human-readable CSV file.
The CSV file also contains more format information than the format_info
struct contained so we can potentially make format_info more detailed later.
The python to generate the format information was added the previous
commit. This commit turns it on in both automake and scons builds.
v2: Split into two commits and stuff to generate format_info.c from scons
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds a python script called format_info.py that is used to generate a
single format_info.c file that contains the filled-out format_info array.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The basic concept for the format parser was taken from the format CSV
parser in gallium/auxilliary/util. However, this one has been altered in a
number of ways:
* Removed big endian vs. little endian stuff (mesa doesn't need it)
* Better documentation: Almost every method has a full docstring
* An actual Swizzle class with methods for composition and inverses
* Over-all cleaner (in my opinion) implementation and class interactions
* A few bug fixes
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Specify the quad's Z position in clip coordinate space, not
normalized Z space. Use viewport scale, translation = 0.5, 0.5.
Before, we were specifying the quad's Z position in [0,1] and using
viewport scale=1.0, translate=0.0. That works fine, unless your
driver needs to work in clip coordinate space and needs to
reconstruct viewport near/far values from the scale/translation
factors. The VMware svga driver falls into that category.
When we did that reconstruction we wound up with near=-1 and far=1
which are outside the limits of [0,1]. In some cases, this caused
the quad to be drawn at the wrong depth. In other cases it was
clipped away.
Fixes some scissored depth clears with VMware driver. This should
have no effect on other drivers. We're already using these values
for the glBitmap and glDraw/CopyPixels code.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Compute the bitmask of supported array types once instead of every
time we call a GL vertex array function.
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
According to the GL spec the only fragment operations that should affect
glBlitFramebuffer are “the pixel ownership test, the scissor test, and sRGB
conversion”. That implies that dithering should not be performed so we need to
disable it when implementing the blit with a render.
Before commit 05b52efbc9 the dithering state would be left as whatever the
application picks (the default being GL_TRUE) and after that commit it was
explicitly enabled. Neither of these were correct.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81828
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Add documentation for TEX2/TXL2/TXB2 tgsi opcodes. Also, the texture opcode
documentation wasn't very accurate so fix this up a bit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In particular need to handle TEX2/TXB2/TXL2 opcodes.
cube map shadow with bias already used TXB2 which didn't work before
at all, despite that there's by default no piglit change (but using
no_quad_lod and no_rho_opt indeed passes some more tex-miplevel-selection
tests).
The actual sampling code still won't handle cube map arrays.
Reviewed-by: Brian Paul <brianp@vmware.com>
This just covers the resource side of things, not the actual sampling.
Here things are trivial as cube map arrays are identical to 2d arrays in
all respects.
Reviewed-by: Brian Paul <brianp@vmware.com>
We would generate EGL_BAD_CONFIG because _eglGetContextAPIBit
returns zero for the combination of EGL_OPENGL_ES_API and a major
version > 3. By just returning zero, the caller can't tell the
difference between a bad version (which should generate
EGL_BAD_MATCH) and a bad API (which should generate
EGL_BAD_CONFIG). This patch causes us to filter out major
versions > 3 at a point where we can generate the correct error.
Fixes gles3 Khronos CTS test:
egl_create_context.egl_create_context
V2: Fix commit message as suggested by Ian.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Changes in the patch will cause datatype to be computed
correctly for 8 and 16 bit integer formats. For example:
GL_RG8I, GL_RG16I etc.
Fixes many failures in gles3 Khronos CTS test:
copy_tex_image_conversions_required
copy_tex_image_conversions_forbidden
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We currently get red bits from ctx->DrawBuffer->Visual.redBits
by making a false assumption that the texture we're writing to
(in glCopyTexImage2D()) is used as a DrawBuffer.
Fixes many failures in gles3 Khronos CTS test:
copy_tex_image_conversions_required
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
GL_TEXTURE_CUBE_MAP is an allowed texture target in glTexStorage2D()
and is allowed to be used (like GL_TEXTURE_2D) with compressed internal
formats.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously we had to keep unreachable global symbols in the symbol table
because the symbol table is used during linking. Having the symbol
table retain pointers to freed memory... what could possibly go wrong?
At the same time, this meant that we kept live references to tons of
memory that was no longer needed.
New strategy: destroy the old symbol table, and make a new one from the
reachable symbols.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 59 40,642,425,451 76,337,968 69,720,886 6,617,082 0
After (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0
Before (64-bit): 79 37,179,441,771 106,986,512 98,112,095 8,874,417 0
After (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0
A real savings of 846KiB on 32-bit and 1.5MiB on 64-bit.
v2: (by Kenneth Graunke) Just add the ir_function from the IR stream,
rather than looking it up in the symbol table; they're now
identical.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Piglit's spec/glsl-1.10/linker/override-builtin-{const,uniform}-05 tests
do the following:
1. Call abs(float) - a built-in function.
2. Create a user-defined replacement for abs(float).
3. Call abs(float) again - now the user function.
At step 1, we created an ir_function which included the built-in
signature, added it to the symbol table, and emitted it into the IR
stream.
Then, when processing the function definition at step 2, we'd see that
there was already an ir_function. But, since there were no user-defined
functions, we skipped over a bunch of code, and ended up creating a
second one. This new ir_function shadowed the original in the symbol
table, but both ended up in the IR stream.
This results in an awkward situation where searching for an ir_function
via the symbol table, a forward linked list walk, and a reverse linked
list walk may return different ir_functions. This seems undesirable.
This patch instead re-uses the existing ir_function, putting both
built-in and user-defined signatures in the same one. The previous
patch's additional filtering ensures everything continues working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Historically, we've implemented the rules for overriding built-in
functions by creating multiple ir_functions and relying on the symbol
table to hide the one containing built-in functions. That works, but
has a few drawbacks, so the next patch will change it.
Instead, we'll have a single ir_function for a particular name, which
will contain both built-in and user-defined signatures. Passing an
extra parameter to matching_signature makes it easy to ignore built-ins
when they're supposed to be hidden.
I didn't add the parameter to exact_matching_signature since it wasn't
necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On Haswell, this cuts 1-3 instructions from 183 vertex shaders in
"Shadowrun Returns", "Shatter", and "Trine 2." It adds 2 instructions
to a single fragment shader in "Closure."
total instructions in shared programs: 278803 -> 278546 (-0.09%)
instructions in affected programs: 41930 -> 41673 (-0.61%)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This code was attemping to align the base of the structure to the required
alignment of the structure. However, it had two problems:
1. It was aligning the target structure member, not the base of the
structure.
2. It was calculating the alignment based on the members previous to the
target member instead of all the members of the structure.
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.nested_structs.6
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays_instance_arrays.2
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays_instance_arrays.6
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.5
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.19
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.0
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.2
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.6
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.12
v2: Fix rebase failure noticed by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Use the data that is stored in the ir_variable and the glsl_type to
determine whether or not a UBO member is row-major.
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat2x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat2x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat3x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat3x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat4x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat4x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat2x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat2x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat3x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat3x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat4x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat4x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat2x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat2x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat3x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat3x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat4x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat4x3
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays.2
ES3-CTS.shaders.uniform_block.random.nested_structs_instance_arrays.5
ES3-CTS.shaders.uniform_block.random.nested_structs_instance_arrays.9
Causes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.basic_types.8
ES3-CTS.shaders.uniform_block.random.basic_arrays.3
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.0
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.2
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.13
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.18
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.4
These failures will be fixed shortly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays_instance_arrays.3
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.13
Causes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.9
This failure will be fixed shortly.
v2: Use without_array() instead of older predicates.
v3: s/GLSL_MATRIX_LAYOUT_DEFAULT/GLSL_MATRIX_LAYOUT_INHERITED/g
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
v2: Rename GLSL_MATRIX_LAYOUT_DEFAULT to GLSL_MATRIX_LAYOUT_INHERITED.
Add comments in glsl_types.h explaining the layouts. Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This causes the thing following the structure to be vec4-aligned.
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.nested_structs.2
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.5
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I also considered renaming visit_field(const glsl_struct_field *) to
entry_record and adding an exit_record method. This would be more
similar to the hierarchical visitor.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit 32f32292 (glsl: Allow elimination of uniform block members)
enabled elimination of unused uniform block members to fix a gles3
conformance test failure. This went too far the other way.
Section 2.11.6 (Uniform Variables) of the OpenGL ES 3.0.3 spec says:
"All members of a named uniform block declared with a shared or
std140 layout qualifier are considered active, even if they are not
referenced in any shader in the program. The uniform block itself is
also considered active, even if no member of the block is
referenced."
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.single_nested_struct.per_block_buffer_shared
ES3-CTS.shaders.uniform_block.single_nested_struct.per_block_buffer_std140
ES3-CTS.shaders.uniform_block.single_nested_struct_array.per_block_buffer_shared
ES3-CTS.shaders.uniform_block.single_nested_struct_array.per_block_buffer_std140
ES3-CTS.shaders.uniform_block.random.scalar_types.2
ES3-CTS.shaders.uniform_block.random.scalar_types.9
ES3-CTS.shaders.uniform_block.random.vector_types.1
ES3-CTS.shaders.uniform_block.random.vector_types.3
ES3-CTS.shaders.uniform_block.random.vector_types.7
ES3-CTS.shaders.uniform_block.random.vector_types.9
ES3-CTS.shaders.uniform_block.random.basic_types.5
ES3-CTS.shaders.uniform_block.random.basic_types.6
ES3-CTS.shaders.uniform_block.random.basic_arrays.0
ES3-CTS.shaders.uniform_block.random.basic_arrays.2
ES3-CTS.shaders.uniform_block.random.basic_arrays.5
ES3-CTS.shaders.uniform_block.random.basic_arrays.8
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.0
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.4
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.5
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.6
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.9
ES3-CTS.shaders.uniform_block.random.nested_structs.0
ES3-CTS.shaders.uniform_block.random.nested_structs.1
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays.4
ES3-CTS.shaders.uniform_block.random.nested_structs_instance_arrays.8
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays_instance_arrays.7
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.3
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.6
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.18
v2: Whitespace and other minor fixes suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just a few lines earlier we may have wrapped the index expression with
ir_unop_i2u expression. Whenever that happens, as_constant will return
NULL, and that almost always happens.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Since the ralloc test in util/tests needs gtest, we need to make sure that
the gtest subdir is loaded first. This fixes bug #82148.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
With this patch, the SVGA_3D_CMD_BIND_GB_SHADER functionality will reserve
two relocations, one for the shader ID and the second for the MOB ID.
Verified with the WDDM winsys path that the number of relocations and patch
locations required is two.
Fixes Bug 1277406
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This gathers macros that have been included across components into util so
that the include chain can be more vertical. In particular, this makes
util stand on its own without any dependence whatsoever on the rest of
mesa.
Signed-off-by: "Jason Ekstrand" <jason.ekstrand@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This hash table is used in core Mesa, the GLSL compiler, and the i965
driver, which makes it a good candidate for the new src/util module.
It's much faster than program/hash_table.[ch] (see commit 6991c2922f
for data), and José's u_hash_table.c has a comment saying Gallium should
probably consider switching to a linear probing hash table at some point.
So this seems like the best candidate for a shared data structure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Jason Ekstrand): Pick up another hash_table use and patch up scons
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For a long time, we've wanted a place to put utility code which isn't
directly tied to Mesa or Gallium internals. This patch creates a new
src/util directory for exactly that purpose, and builds the contents as
libmesautil.la.
ralloc seemed like a good first candidate. These days, it's directly
used by mesa/main, i965, i915, and r300g, so keeping it in src/glsl
didn't make much sense.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Jason Ekstrand): More realloc uses and some scons fixes
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With earlier commit we've conditionally enabled/added the kms_dri target
for automake builds. Unfortunately the we forgot to add the appropriate
define in the scons build, resulting in a broken library due to the
undefined symbol 'kms_swrast_create_screen'.
Reported-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Roland Scheidegger <sroland@vmware.com>
If building hardware drivers only, then kms_swrast_create_screen
won't be defined in inline_drm_helper.h and hardware drivers will
fail to dlopen as a result.
Copy the #if guards from inline_drm_helper.h to dri_kms_init_screen
to make the definition/use of the function match.
Fixes radeonsi_dri.so dlopen with the following configure:
./configure --with-dri-drivers= --with-dri-driverdir=/usr/local/lib/dri/ \
--enable-gbm --enable-gallium-gbm --enable-debug --enable-opencl \
--enable-opencl-icd --with-gallium-drivers=radeonsi \
--with-egl-platforms=drm --enable-glx-tls --enable-texture-float \
--enable-omx
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Native integers imply a somewhat different handling of booleans. Instead
of being 1.0/0.0 floats, they are 0 (true) / -1 (false) integers. As such
the original optimization no longer applies.
Reported-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
We don't support this type of X acceleration and we never did.
Other drivers might want to do the same thing.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
In commit 16060c5adc, Eric changed the
code to not relayout just for baselevel changes - only if the range of
miplevels actually increases. So this comment is now wrong.
Notably, the i915 version of the code actually does what the comment
says.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We've moved to using bitshifts (like we did for surface state); nothing
uses the structures anymore.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These are the last users of struct gen7_sampler_state.
v2: Use a local sampler_state_size variable, to help distinguish the
various 16s (suggested by Topi Pohjolainen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This is the last user of the structure.
v2: Use a local variable with a sensible name so people know what 16 is.
(Suggested by Topi Pohjolainen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This simplifies the code, removes use of the old structures, and also
allows us to combine the Gen6 and Gen7+ code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Although the Gen4-6 and Gen7+ variants used different structure types,
they didn't use any of the fields - only the size, which is identical.
So both decoders did exactly the same thing.
Someday we should implement useful decoders for SAMPLER_STATE.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that gen7_sampler_state.c is gone, everything is once again in a
single file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The code in brw_sampler_state.c now handles all generations; we don't
need the extra Gen7+ only code anymore.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This was the only actual difference between Gen4-6 and Gen7+ in terms of
the values we program. The rest was just mechanical structure
rearrangement.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Instead of stuffing bits directly into the brw_sampler_state structure,
we now store them in local variables, then use brw_emit_sampler_state()
to assemble the packet. This separates the decision about what values
to use from the actual packet emission, which makes the code more
reusable across generations.
v2: Put const on a bunch of local variables and move declarations,
as suggested by Topi Pohjolainen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This simply assembles all the SAMPLER_STATE fields into their proper bit
locations. Making it work on all generations was easy enough; some of
the fields are even in the same place.
Not used by anything yet, but will be soon. I made it non-static so
BLORP can use it too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It doesn't edit the value, and this lets us use const in more places.
Needed to implement Topi's review comments for the next patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We'll use these to replace the existing structures.
I've adopted the convention that "BRW" applies to all hardware, and
"GENX" applies starting with generation X, but might be replaced by some
later generation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This makes it easy to tell that they're grouped together, and also
improves gdb printing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
brw_upload_sampler_state_table now handles all generations, so we don't
need the vtable mechanism either.
There's still a lot of code duplication; the next patches will address
that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This copies a few changes from gen7_upload_sampler_state_table; the next
patch will delete that function.
Gen7+ has per-stage sampler state pointer update packets, so we emit
them as soon as we emit a new table for a stage. On Gen6 and earlier,
we have a single packet, so we delay until we've changed everything
that's going to be changed.
v2: Split 3DSTATE_SAMPLER_STATE_POINTERS_XS packet emission into a
helper function (suggested by Topi Pohjolainen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The Gen4-6 and Gen7+ code is virtually identical, but both use different
structure types. Switching to use a uint32_t pointer and operate on the
number of DWords will make it possible to share code.
It turns out that SURFACE_STATE is the same number of DWords on every
platform currently; it will be easy to handle a change there, though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Other than this, brw_update_sampler_state only deals with a single
SAMPLER_STATE structure, and doesn't need to know which position it is
in the table. The caller takes care of dealing with multiple surface
states.
Pushing this up a level allows us to drop the ss_index parameter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
sdc_offset is produced and consumed in the same function, so there's no
need to store it in the context, nor pass pointers to it through various
call chains.
Saves 128 bytes per brw_stage_state structure, and makes the code
clearer as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It's just an array of four floats, and we have an array of four floats,
so this is literally just a memcpy...but with custom structs and strange
macros to give the appearance of doing something more.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
When the driver was originally written, it only supported texturing in
the pixel shader backend; vertex and geometry shader texturing came much
later. Originally, the pixel shader was referred to as "WM" (the
Windowizer/Masker unit). So, this code happened to only be relevant for
the WM stage, at the time.
However, sampler state really applies to all stages, so putting "wm" in
the filename doesn't make sense. I dropped it in gen7_sampler_state.c;
at this point the asymmetry just trips people up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The "Min/Mag State Not Equal" bit is supposed to be set when the min/mag
filters or address rounding modes differ. BLORP uses identical min/mag
settings, so the bit should be unset.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Define the macro GL_OES_standard_derivatives as 1 if the extension
GL_OES_standard_derivatives is supported.
V2 [Chris]: Correct trailing whitespace
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This could be recalculated, though it turns out the only use of it after
resource allocation is for calculating whole resource size (for scene size
accounting though that isn't quite ideal neither). Thus, instead just store
the whole resource size and drop it (saving a couple bytes of storage per
resource). It makes things simpler too. Note that for the accounting winsys
resources always come back with size 0 but this is unchanged (we don't actually
know the size in any case).
Also reformat llvmpipe_texture_layout (drop unneded indentation).
v2: adapt to previous changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Seems pointless to just duplicate some of the calculations (the calculation
of actual memory used compared to what was predicted in llvmpipe_texture_layout
actually could have differed slightly in some cases due to different alignment
rules used though this should have been of no consequence).
v2: keep the previous mip alignment of MAX2(64, cacheline). This was added for
ARB_map_buffer_alignment - I'm not convinced it's needed for textures, but
it was supposed to be cleanup without functional change. Also replace div
with 64bit mul / comparison.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
1D array miptrees were being laid out as a 2D texture with 1 slice.
This happened due to the mesa core storing the 1D array slice count in
the height field. On Intel hardware, we want to create a 2D array with
a height of 1 for the 1D array case.
Fixes assertion failure in piglit (gen6, gen8):
spec/glsl-1.30/execution/tex-miplevel-selection textureOffset 1DArrayShadow
In release builds of Mesa, this test was observed to cause a GPU hang
on gen8.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81450
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Adds 0-3 textureGather component selection and non-constant offsets
Caveat: 0 and 1 texture swizzles only work if textureGather component
select is 3 or a component that does not exist in the sampler texture
format. This is a hardware limitation, any other value returns
128/255=0.501961 for both 0 and 1.
Passes all textureGather piglit tests on radeon 6670, except for those
using 0/1 texture swizzles due to aforementioned reason.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Map TGSI_SEMANTIC_SAMPLEMASK to register/component.
Enable face register when sample mask is needed by shader.
Requires Evergreen/Cayman
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We don't want to log every single error (such as all the ones where the file
wasn't even present in our list of search paths), but if you didn't find any
driver, then seeing at least one error is useful (since the common case as a
developer is a single DEFAULT_DRIVER_DIR or GBM_DRIVERS_PATH entry).
v2: Rebase on swrast changes.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
I found myself often wanting this when I'm printing out a uint32_t mapping
of some GPU data, and I want to put in an interpretation of that value as
a float.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
execinfo.h is not available on DragonFly.
Fixes this build error.
CC glapi_gentable.lo
glapi_gentable.c:44:22: fatal error: execinfo.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
When using (d3d10) conformant out-of-bound behavior for texel fetching
(currently always enabled) the level still needs to be set to a safe value
even though the offset in the end won't get used because the level is used
to look up the mip offset itself and the actual strides, which might otherwise
crash.
For simplicity, we'll use level 0 in this case (this ought to be safe, llvmpipe
does not actually fill in level 0 information if first_level is larger, but
some random strides / offsets shouldn't hurt as ultimately we always use
offset 0 in this case).
Fixes a crash in some in-house test where random huge levels appear in
lp_build_fetch_texel() (the test actually uses level 0 always but if the
fetching happens in a block with a execution mask random values may appear).
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The kms-dri swrast driver cannot share buffers using the GEM,
so it must tell the loader to disable extensions relying on
that, without disabling the image DRI extension altogether
(which would prevent the loader from working at all).
This requires a new gallium capability (which is queried on
the pipe_screen and for swrast drivers it's forwarded to the
winsys), and requires a new version of the DRI image extension.
[Emil Velikov]
- Rebased on top of gallium-dri megadrivers.
- Drop PIPE_CAP_BUFFER_SHARE and sw_winsys::get_param hook.
The can_share_buffer cap is set at InitScreen. We use a different
InitScreen (and thus value for the cap) function for kms_dri, due to
deeper differences originating from dri megadrivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add a new winsys and target that can be used with a dri2 state tracker
and loader instead of drisw. This allows to use gbm as a dri2/image
loader and avoid the extra copy from the backbuffer to the shadow
frontbuffer.
The new driver is called "kms_swrast", and is loaded by gbm as a
fallback, because it is only useful with the gbm platform (as no buffer
sharing is possible)
To force select the driver set the environment variable
GBM_ALWAYS_SOFTWARE
[Emil Velikov]
- Rebase on top of gallium megadriver.
- s/text/test/ in configure.ac (Spotted by Andreas Pokorny).
- Add scons support for winsys/sw/kms-dri and fix the build.
- Provide separate DriverAPI, due to different InitScreen hook.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Turn GBM into a swrast loader (providing putimage/getimage backed
by a dumb KMS buffer). This allows to run KMS+DRM GL applications
(such as weston or mutter-wayland) unmodified on cards that don't
have any client side HW acceleration component but that can do
modeset (examples include simpledrm and qxl)
[Emil Velikov]
- Fix make check.
- Split dri_open_driver() from dri_load_driver().
- Don't try to bind the swrast extensions when using dri.
- Handle swrast->CreateNewScreen() failure.
- strdup the driver_name, as it's free'd at destruction.
- s/LIBGL_ALWAYS_SOFTWARE/GBM_ALWAYS_SOFTWARE/
- Move gbm_dri_bo_map/unmap to gbm_driiint.h.
- Correct swrast fallback logic.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Whenever dd_create_screen/pipe_loader_* fails, gdrm->dev may be NULL.
Thus peeking inside the struct will lead to a crash.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
... on static targets. Otherwise we'll crash badly as gdrm->dev is
NULL when we try to copy the string driver_name.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
ERROR is a #define in the MSVC WinGDI.h header file.
Add the _TOKEN suffix as we do for a few other lexer tokens.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Principle of least surprise: --enable-debug should enable debugging.
Ages ago, Mesa's build system only added -g in dri-debug builds (yay for
the static Makefiles). If you forgot to change it (or wrap the build
with custom scripts), you would often be disappointed when trying to gdb
Mesa bugs. New developers, that may not yet have custom scripts, will
have this same issue.
I think we should enable experienced developers to do what they want,
and make things easier for new developers. I already pass '-ggdb3 -O1'
or '-ggdb3 -Og' for CFLAGS, and I don't want configure to change them
for me.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've had bugs in the past where we have been inadvertently matching the
default rule.
Just as we did in the pre-processor in the previous commit, we can use:
%option warn nodefault
in the compiler to instruct flex to not generate the default rule, and
further to warn if our set of rules could let any characters go unmatched.
With this warning active, flex actually warns that the catch-all rule we
recently added to the compiler could never be matched. Since that is all
safely determined at compile time now, we can safely drop this run-time
compiler error message, (as we do in this commit).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We've had multiple bugs in the past where we have been inadvertently matching
the default rule, (which we never want to do). We recently added a catch-all
rule to avoid this, (and made this rule robust for future start conditions).
Kristian pointed out that flex allows us to go one step better. This syntax:
%option warn nodefault
instructs flex to not generate the default rule at all. Further, flex will
generate a warning at compile time if the set of rules we provide are
inadequate, (such that it would be possible for the default rule to be
matched).
With this warning in place, I found that the catch-all rule was in fact
missing something. The catch-all rule uses a pattern of "." which doesn't
match newlines. So here we extend the newline-matching rule to all start
conditions. That is enough to convince flex that it really doesn't need
any default rule.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Using a single rule here means that we can use the <*> syntax to match
all start conditions. This makes the catch-all rule more robust against
the addition of future start conditions, (no need to maintain an ever-
growing list of start conditions for this rul).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
There is no behavioral change here. It's just easier to verify that lists
of start conditions include all expected conditions when they appear in a
consistent order.
The <INITIAL> state is special, so it appears first in all lists. All others
appear in alphabetical order.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In some of the recent glcpp bug-fixing, we found that glcpp was emitting
unrecognized characters from the input source file to stdout, and dropping
them from the source passed onto the compiler proper.
This was obviously confusing, and totally undesired.
The bogus behavior comes from an implicit default rule in flex, which is
that any unmatched character is implicitly matched and printed to stdout.
To avoid this implicit matching and printing, here we add an explicit
catch-all rule. If this rule ever matches it prints an internal compiler
error. The correct response for any such error is fixing glcpp to handle
the unexpected character in the correct way.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, the '\r' character was not explicitly matched by any lexer
rule. This means that glcpp would have been using the default flex rule to
match '\r' characters, (where they would have been printed to stdout rather
than actually correctly handled).
With this commit, we treat '\r' as equivalent to '\n'. This is clearly an
improvement the bogus printing to stdout. The resulting behavior is compliant
with the GLSL specification for any source file that uses exclusively '\r' or
'\n' to separate lines.
For shaders that use a multiple-character line separator, (such as "\r\n"),
glcpp won't be precisely compliant with the specification, (treating these as
two newline characters rather than one), but this should not introduce any
semantic changes to the shader programs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This test is written to exercise a bug which I recently wrote, (but
fortunately caught and fixed before ever committing it).
For the curious:
The bug happened when the NEWLINE_CATCHUP code didn't actually return the
NEWLINE token (due to the skipping). This resulted in the lexer continuing
on through all the subsequent rules while still in the NEWLINE_CATCHUP start
condition, (which then triggered the internal-compiler-error catch-all
rule).
What is intended is for the return of the NEWLINE token to start a new
iteration of the lexer loop, at which time the NEWLINE_CATCHUP-handling code
will reset from the <NEWLINE_CATCHUP> to the <INITIAL> start condition.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
At one point while rewriting the lexing rule for pre-processing numbers, I
made it a bit too aggressive and within a replacement list sucked up a
parameter name that appeared immediately after a period. This caused the
parameter name to be unreplaced when the macro was expanded.
It was in some piglit tests that I originally found this issue. Here, I'm
adding a test to "make check" to ensure that this behavior remains correct.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
These operators aren't defined for preprocessor expressions, so we never
implemented them. This led them to be misinterpreted as strings of unary
'+' or '-' operators.
In fact, what is actually desired is to generate an error if these operators
appear in any preprocessor condition.
So this commit looks like it is strictly adding support for these
operators. And it is supporting them as far as passing them through to the
subsequent compiler, (which was already happening anyway).
What's less apparent in the commit is that with these tokens now being lexed,
but with no change to the grammar for preprocessor expressions, these
operators will now trigger errors there.
A new "make check" test is added to verify the desired behavior.
This commit fixes the following Khronos GLES3 CTS test:
invalid_op_1_vertex
invalid_op_1_fragment
invalid_op_2_vertex
invalid_op_2_fragment
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This will emit an error for something like:
#define FOO(x,x) ...
Obviously, it's not a legal thing to do, and it's easy to check.
Add a "make check" test for this as well.
This fixes the following Khronos GLES3 CTS tests:
invalid_function_definitions.unique_param_name_vertex
invalid_function_definitions.unique_param_name_fragment
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Just reading the code, it looked like a bug that _define_object_macro had this
check, but _define_function_macro did not. Upon further reading, that's
because the check is to allow for our builtins to be defined, (and there are
no builtin function-like macros).
Add my new understanding as a comment to help the next reader.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, we had a single token for "#if" but now that we have two separate
tokens, it looks much better to see:
HASH_TOKEN IF
than:
HASH_TOKEN HASH_IF
(Note, that for the same reason we use HASH_TOKEN instead of HASH, we also use
DEFINE_TOKEN instead of DEFINE to avoid a conflict with the <DEFINE> start
condition in the lexer.)
There should be no behavioral change from this commit.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Without this, in the <PP> state, we would hit Flex's default rule, which
prints tokens to stdout, rather than returning them as tokens. (Or, after the
previous commit, we would hit the new catch-all rule and generate an internal
compiler error.)
With this commit in place, we generate the desired syntax error.
This manifested as a weird bug where shaders with semicolons after
extension directives, such as:
#extension GL_foo_bar : enable;
would print semicolons to the screen, but otherwise compile just fine
(even though this is illegal).
Fixes Piglit's extension-semicolon.frag test.
This also fixes the following Khronos GLES3 conformance tests, (and for real
this time):
invalid_char_in_name_vertex
invalid_char_in_name_fragment
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is to avoid the default, silent flex rule which simply prints the
character to stdout.
For the following Khronos GLES3 conformance tests:
invalid_char_in_name_vertex
invalid_char_in_name_fragment
With this commit, these tests now report Pass where they previously reported
Fail, but Mesa isn't behaving correctly yet. It's now reporting the internal
error where what is really desired is a syntax error.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It's legal (though highly bizarre) for a pre-processor directive to look like
this:
# /* why? */ define FOO bar
This behavior comes about since the specification defines separate logical
phases in a precise order, and comment-removal occurs in a phase before the
identification of directives.
Our implementation does not use an actual separate phase for comment removal,
so some extra care is necessary to correctly parse this. What we want is for
'#' to introduce a directive iff it is the first token on a line, (ignoring
whitespace and comments). Previously, we had a lexical rule that worked only
for whitespace (not comments) with the following regular expression to find a
directive-introducing '#' at the beginning of a line:
HASH ^{HSPACE}*#{HSPACE}*
In this commit, we switch to instead use a simple literal match of '#' to
return a HASH_TOKEN token and add a new <HASH> start condition for whenever
the HASH_TOKEN is the first non-space token of a line. This requires the
addition of the new bit of state: first_non_space_token_this_line.
This approach has a couple of implications on the glcpp parser:
1. The parser now sees two separate tokens, (such as HASH_TOKEN and
HASH_DEFINE) where it previously saw one token (HASH_DEFINE) for
the sequence "#define". This is a straightforward change throughout
the grammar.
2. The parser may now see a SPACE token before the HASH_TOKEN token of
a directive. Previously the lexical regular expression for {HASH}
would eat up the space and there would be no SPACE token.
This second implication is a bit of a nuisance for the parser. It causes a
SPACE token to appear in a production of the grammar with the following two
definitions of a control_line:
control_line
SPACE control_line
This is really ugly, since normally a space would simply be a token
separator, so it wouldn't appear in the tokens of a production. This leads to
a further problem with interleaved spaces and comments:
/* ... */ /* ... */ #define /* ..*/
For this, we must not return several consecutive SPACE tokens, or else we would need an arbitrary number of new productions:
SPACE SPACE control_line
SPACE SPACE SPACE control_line
ad nauseam
To avoid this problem, in this commit we also change the lexer to emit only a
single SPACE token for any series of consecutive spaces, (whether from actual
whitespace or comments). For this compression, we add a new bit of parser
state: last_token_was_space. And we also update the expected results of all
necessary test cases for the new compression of space tokens.
Fortunately, the compression of spaces should not lead to any semantic changes
in terms of what the eventual GLSL compiler sees.
So there's a lot happening in this commit, (particularly for such a tiny
feature). But fortunately, the lexer itself is looking cleaner than ever. The
only ugly bit is all the state updating, but it is at least isolated to a
single shared function.
Of course, a new "make check" test is added for the new feature, (directives
with comments and whitespace interleaved in many combinations).
And this commit fixes the following Khronos GLES3 CTS tests:
function_definition_with_comments_vertex
function_definition_with_comments_fragment
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is in preparation for the planned addition of a new <HASH> start
condition to the lexer. Both start conditions and token types are, of course,
in the same default C namespace, so a start condition and a token type with
the same name will collide. (And unfortunately, they are both apparently
implemented as equivalent numeric types so the collision is undetected at
compile time and simply leads to unpredictable behavior at run time.)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This commit does not cause any behavioral change for any valid program. Prior
to entering the <DEFINE> start condition, the only valid start condition is
<INITIAL>, so whether pushing/popping <DEFINE> onto the stack or explicit
returning to <INITIAL> is equivalent.
The reason for this change is that we are planning to soon add a start
condition for <HASH> with the following semantics:
<HASH>: We just saw a directive-introducing '#'
<DEFINE>: We just saw "#define" starting a directive
With these two start conditions in place, the only correct behavior is to
leave <DEFINE> by returning to <INITIAL>. But the old push/pop code would have
returned to the <HASH> start condition which would then cause an error when
the next directive-introducing '#' would be encountered.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The verbose debug output from the parser is quite useful when debugging, and
having this available as a command-line option is much more convenient than
manually forcing this into the code when needed, (which is what I had been
doing for too long previously).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For the first line we were initializing the column to 1, but for all
subsequent lines we were initializing the column to 0. The column number is
advanced for each token read before any error message is printed. So the 0
value is the correct initialization, (so that the first column is reported as
column 1).
With this extremely minor change, many of the .expected files are updated such
that error messages for the first line now have the correct column number in
them.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Here, "skipping" refers to the lexer not emitting any tokens for portions of
the file within an #if condition (or similar) that evaluates to false.
Previously, the lexer had a special <SKIP> start condition used to control
this skipping. This start condition was not handled like a normal start
condition. Instead, there was a particularly ugly block of code set to be
included at the top of the generated lexing loop that would change from
<INITIAL> to <SKIP> or from <SKIP> to <INITIAL> depending on various pieces of
parser state, (such as parser->skip_state and parser->lexing_directive).
Not only was that an ugly approach, but the <SKIP> start condition was
complicating several glcpp bug fixes I attempted recently that want to use
start conditions for other purposes, (such as a new <HASH> start condition).
The recently added RETURN_TOKEN macro gives us a convenient way to implement
skipping without using a lexer start condition. Now, at the top of the
generated lexer, we examine all the necessary parser state and set a new
parser->skipping bit. Then, in RETURN_TOKEN, we examine parser->skipping to
determine whether to actually emit the token or not.
Besides this, there are only a couple of other places where we need to examine
the skipping bit (other than when returning a token):
* To avoid emitting an error for #error if skipped.
* To avoid entering the <DEFINE> start condition for a #define that is
skipped.
With all of this in place in the present commit, there are hopefully no
behavioral changes with this patch, ("make check" still passes all of the
glcpp tests at least).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Now that we have a common macro for returning tokens, it makes sense to
perform some of the common work there, (such as copying string values).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The glcpp parser is line-based, so it needs to see a NEWLINE token at the end
of each line. This causes a trick for files that end without a final newline.
Previously, the lexer for glcpp punted in this case by unconditionally
returning a NEWLINE token at end-of-file, (causing most files to have an extra
blank line at the end). Here, we refine this by lexing end-of-file as a
NEWLINE token only if the immediately preceding token was not a NEWLINE token.
The patch is a minor change that only looks huge for two reasons:
1. Almost all glcpp test result ".expected" files are updated to drop
the extra newline.
2. All return statements from the lexer are adjusted to use a new
RETURN_TOKEN macro that tracks the last-token-was-a-newline state.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The glcpp implementation has long had code to support a file that ends without
a final newline. But we didn't have a "make check" test for this.
Additionally, the <EOF> action was restricted only to the <INITIAL> state so
it would fail to get invoked if the EOF was encountered in the <COMMENT> or
the <DEFINE> case. Neither of these was a bug, per se, since EOF in either
of these cases is an error anyway, (either "unterminated comment" or
"missing macro name for #define").
But with the new explicit support for these cases, we not generate clean error
messages in these cases, (rather than "unexpected $end" from before).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The NEWLINE_CATCHUP code is only intended to be invoked after we lex an actual
newline character ('\n'). The two extra calls here were apparently added
accidentally because the pattern happened to contain a (negated) '\n',
(see commit 6005e9cb28).
I don't think either case could have caused any actual bug. (In the first
case, the pattern matched right up to the next newline, so the NEWLINE_CATCHUP
code was just about to be called. In the second case, I don't think it's
possible to actually enter the <SKIP> start condition after commented newlines
without any intervening newline.)
But, if nothing else, the code is cleaner without these extra calls.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The recent adddition of an error for "#define followed by a non-identifier"
was a bit to aggressive since it used a regular expression in the lexer to
flag any character that's not legal as the first character of an identifier.
But we need to allow comments to appear here, (since we aren't removing
comments in a preliminary pass). So we refine the error here to only flag
characters that could not be an identifier, nor a comment, nor whitespace.
We also augment the existing comment support to be active in the <DEFINE>
state as well.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, if the preprocessor encountered a #define with a non-identifier,
such as:
#define 123 456
The lexer had no explicit rules to match non-identifiers in the <DEFINE> start
state. Because of this, flex's default rule was being invoked, (printing
characters to stdout), and all text was being discarded by the compiler until
the next identifier. As one can imagine, this led to all sorts of interesting
and surprising results.
Fix this by adding an explicit rule complementing the existing
identifier-based rules that should catch all non-identifiers after #define and
reliably give a well-formatted error message.
A new test is added to "make check" to ensure this bug stays fixed.
This commit also fixes the following Khronos GLES3 CTS test:
define_non_identifier_vertex
(The "fragment" variant was passing earlier only because the preprocessor was
behaving so randomly and causing the compilation to fail. It's lucky, in fact,
that the "vertex" version succesfully compiled so we could find and fix this
bug.)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The glcpp lexer and parser use the space_tokens state bit to avoid emitting
tokens for spaces while parsing a directive. Previously, this bit was only
being set again by the first non-space token following a directive.
This led to a bug where a space, (or a comment that should emit a space),
immediately following a directive, (optionally searated by newlines), would be
omitted from the output.
Here we fix the bug by also setting the space_tokens bit whenever we lex a
newline in the standard start conditions.
mesa/mesa/src/mesa/drivers/dri/common/xmlconfig.c:104:10: warning: #warning "Per application configuration won't work with your OS version." [-Wcpp]
# warning "Per application configuration won't work with your OS version."
Signed-off-by: Yaakov Selkowitz <yselkowitz@users.sourceforge.net>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
We can create 3D texture views. Avoids an assertion in piglit
fbo-generatemipmap-3d test and allows it to pass.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
While running https://github.com/nvMcJohn/apitest with apitrace I noticed that Mesa was producing bogus results:
wglChoosePixelFormatARB(hdc, piAttribIList = {...}, pfAttribFList = &0, nMaxFormats = 1, piFormats = {19, 65576, 37, 198656, 131075, 0, 402653184, 0, 0, 0, 0, -573575710}, nNumFormats = &12) = TRUE
However https://www.opengl.org/registry/specs/ARB/wgl_pixel_format.txt states
<nNumFormats> returns the number of matching formats. The returned
value is guaranteed to be no larger than <nMaxFormats>.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes 3D texture support in all these cases, because array_size is 1
with 3D textures and depth0 actually contains the "array size".
util_max_layer is universal and returns the last layer index for any texture
target.
A lot of the cases below can't actually be hit with 3D textures, but let's
be consistent.
This fixes a failure in:
piglit layered-rendering/clear-color-all-types 3d single_level
for r600g and radeonsi, which was caused by an incorrect CMASK size
calculation.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I actually couldn't reproduce this one, but internal docs recommend this
workaround. Better safe than sorry.
Also, the number of dwords for the sync packets is increased by 4 instead
of 2, because it wasn't bumped last time when a new packet was added there.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes the checkerboard pattern in glxgears and anything that triggers
fast color clear.
num_channels is always <= 8, but Hawaii has 16 pipes.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This new name isn't so confusing.
I also changed the gallivm limit, because it looked wrong.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: use sizeof(float[4])
Most image functions are required to return a CL_INVALID_OPERATION
error when used on devices without image support.
v2:
- Simplified the code
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When mapping a busy resource with PIPE_TRANSFER_DISCARD_RANGE or
PIPE_TRANSFER_FLUSH_EXPLICIT, we can avoid blocking by allocating and mapping
a staging bo, and emit pipelined copies at proper places. Since the staging
bo is never bound to GPU, we give it packed layout to save space.
Add xfer_map() to replace map_bo_for_transfer(). Add xfer_unmap() and
xfer_alloc_staging_sys() to simplify texture and buffer mapping/unmapping, and
enable more code sharing between them.
Add a bunch of helper functions and a big comment for
choose_transfer_method(). This also fixes handling of
PIPE_TRANSFER_MAP_DIRECTLY to not ignore tiling.
With MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader, this fixes piglit:
built-in-constants tests/spec/arb_compute_shader/minimum-maximums.txt
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader, this fixes piglit:
* arb_compute_shader-minmax
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Instead of falling back to just the block name (which we won't find),
look for the first element of the block array. We'll deal with the rest
in the backend by arranging for the blocks to be laid out contiguously.
V2: Squashed together patches 3, 5 of V1, plus a naming tweak.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously this was a block index with special semantics for -1.
With ARB_gpu_shader5, this need not be a compile-time constant, so
allow any rvalue here and convert the -1 to a NULL pointer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without doing a lot more work, we have no idea which indices may
be used at runtime, so just mark them all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us two things: we now need less item copies when we have
to defrag+grow the pool (to just one copy per item) and, even in the
case where we don't need to defrag the pool, we reduce the data copied
to just the useful data that the items use.
Note: The fallback path is a bit ugly now, but hopefully we won't need
it much.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Now, before moving everything to host memory, we try to create a
new resource to use as a pool. I we succeed we just use this resource
and delete the previous one. If we fail we fallback to using the
shadow.
This should make growing the pool faster, and we can also save
64KB of memory that were allocated for the 'shadow', even if they
weren't used.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Move the bits we want to share between generations from fd3_program to
ir3_shader. So overall structure is:
fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3
|- ...
\- ir3_shader_variant -> ir3
So the ir3_shader becomes the topmost generation neutral object, which
manages the set of variants each of which generates, compiles, and
assembles it's own ir.
There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/,
etc.
Keep the split between the gallium level stateobj and the shader helper
object because it might be a good idea to pre-compute some generation
specific register values (ie. anything that is independent of linking).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
First step of reoganization split out compiler (so it can be shared
between a3xx and a4xx). Rename ir3_shader -> ir3 (since we'll want
the name ir3_shader for a higher level object).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The scheduler also needs to be aware of predicate register (p0) in
addition to address register (a0).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It seems like for the most part, different behaviors, workarounds, etc,
should be conditional on GPU patch revision (ie. a320.0 vs a320.2)
rather than GPU id (a320 vs a330).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The fixed size heap is a remnant of the fdre-a3xx assembler. Yet it is
convenient for being able to free the entire data structure in one shot
without worrying about leaking nodes.
Change it to dynamically grow the heap size (adding chunks) as needed so
we don't have an artificial upper limit on shader size (other than hw
limits) and don't always have to allocate worst-case size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The GBM_DRIVERS_PATH environment variable is not documented, and only
used to set the location of gbm drivers, while LIBGL_DRIVERS_PATH is
used for everything else, and is documented.
Generally this split leads to confusion as to why gbm doesn't work.
This patch will read LIBGL_DRIVERS_PATH as a fallback if
GBM_DRIVERS_PATH is not set.
The comments clearly indicate that using LIBGL_DRIVERS_PATH is
preferred over GBM_DRIVERS_PATH.
v2: - Use GBM_DRIVERS_PATH as a fallback
v3: [jordan.l.justen@intel.com] - Make LIBGL_DRIVERS_PATH the fallback
Signed-off-by: Dylan Baker <baker.dylan.c@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
According to a quick micro-benchmark, this new version is 20% faster on my
Haswell laptop.
v2: Removed the XXX note about x86_64 from the comment
v3: Use an intrinsic instead of an __asm__ block. This should give us MSVC
support for free.
v4: Enable it for all x86_64 builds, not just with USE_X86_64_ASM
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
... to eliminate an ELSE instruction followed immediately by an ENDIF.
instructions in affected programs: 704 -> 700 (-0.57%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a situation where double-register values are used, the phi nodes can
still end up being u32 values. They all get merged into one RA node
though. When fixing up the merge (which comes after the phi node), the
phi node's def would get fixed, but not its sources which would remain
at the low register value.
This maintains the invariant that a phi node's defs and sources are
allocated the same register.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since intel is always going to be little-endian,
GL_UNSIGNED_INT_8_8_8_8_REV is the same as GL_UNSIGNED_BYTE for RGBA and
BGRA textures, so the same acceleration code will work. We might as well
use it.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In Piglit's EXT_framebuffer_multisample/alpha-to-coverage-dual-src-blend
test, key->nr_color_regions == 2, but the dual source blend FB write has
ir->target set to 0. So we failed to set "Last Render Target Select" on
any FB write message.
We only emit one FB write per render target, so my comment about setting
LastRT on every FB write directed at the last color region is a bit...
misinformed. According to the documentation, depth buffer writes and
scoreboard updates happen on the FB write with LastRT set, so I believe
we want to set it only once.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
This will be useful for INTEL_DEBUG=optimizer in the vec4 backend, which
needs to know whether it's currently processing a VS or GS. It isn't
worth adding virtual methods for this case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dropping this helps most lines fit in an 80 column terminal. The
absence of WE_normal also helps call attention to WE_all, where
something unusual is going on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Don't assert (debug builds) or assign random uninitialized value for
predicate register (p0).. that screws up kill, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
generate faster but possibly less accurate code.
Piglit didn't indicate any regressions.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Now that we know that the pool is defragmented, we positively know
that allocated + unallocated will be the total size of the
current pool plus all the items that will be promoted. So we only
need to grow the pool once.
This will allow us to just add the new items to the end of the
item_list without the need of looking for a place to the new item.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This way we can avoid defragmenting the pool, even if it is needed
to defragment it, and looping again through the list of unallocated
items.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This patch adds a new member to the pool to track its status.
For now it is used only for the 'fragmented' status, but if
needed it could be used for more statuses.
The pool will be considered fragmented if: An item that isn't
the last is freed or demoted.
This 'strategy' has a problem, although it shouldn't cause any bug.
If for example we have two items, A and B. We choose to free A first,
now the pool will have the 'fragmented' status. If we now free B,
the pool will retain its 'fragmented' status even if it isn't
fragmented.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This new function will move items forward in the pool, so that
there's no gap between them, effectively defragmenting the pool.
For now this function is a bit dumb as it just moves items
forward without trying to see if other items in the pool could
fit in the gaps.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This function will be used in the future by compute_memory_defrag
to move items forward in the pool.
It does so by first checking for overlaping ranges, if the ranges
don't overlap it will copy the contents directly. If they overlap
it will try first to make a temporary buffer, if this buffer fails
to allocate, it will finally fall back to a mapping.
Note that it will only be needed to move items forward, it only
checks for overlapping ranges in that case. If needed, it can
easily be added by changing the first if.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Actually what we currently handle is just the SCALED versions, and not
the int versions. The difference probably matters more when we actually
support integer in the compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Teach new compiler scheduling and register assignment how to deal with
relative addressing. This gets us what we need to avoid falling back to
old compiler for CONST[ADDR[0].x+n]. It is also a prerequisite for temp
file relative addressing, although that is going to also need some
cleverness in register assignment to keep arrays grouped together.
NOTE: doing address calculation in full precision and then narrowing to
s16 in the mov to addr reg seems to sometimes cause lockups (and
sometimes work?!). It seems more reliable to do the address calculation
in s16, like the blob does. Which means teaching RA how to deal with
mixed half and full precision allocation. Fortunately that didn't turn
out to be too hard, so that is a nice bonus which we could probably take
better advantage of elsewhere.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Technically we should not need these. CP_LOAD_STATE can be pipelined.
But removing them broke a few piglit tests, like fbo-depth-
GL_DEPTH_COMPONENT24-readpixels. I expect these are just masking a
problem elsewhere, or perhaps they are only needed under some more
specific circumstances. But until that is understood properly, give
back a bit of the perf boost we got from c63450e8.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Adds an implementation of the ClearTexSubImage driver entry point that tries
to set up an FBO to render to the texture and then calls glClearBuffer with a
scissor to perform the actual clear. If an FBO can't be created for the
texture then it will fall back to using _mesa_store_ClearTexSubImage.
When used in combination with _mesa_store_ClearTexSubImage this should provide
an implementation that works for all DRI-based drivers. However as this has
only been tested with the i965 driver it is currently only enabled there.
v2: Only enable the extension for the i965 driver instead of all DRI drivers.
Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add
some more comments.
v3: Use glClearBuffer* to avoid having to modify glClearColor and friends.
Handle sRGB textures. Explicitly disable dithering.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
The Meta implementation of glClearTexSubImage is going to want to ensure that
dithering is disabled so that it can get a consistent color across the whole
texture when clearing. This adds a state flag to easily save it and set it to
the default value when performing meta operations.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Adds an implmentation of the ClearTexSubImage driver entry point that just
maps the texture and writes the values in. The extension is not yet enabled by
default because it doesn't work with multisample textures as they don't have a
simple linear layout.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This adds the driver entry point for glClearTexSubImage and fills in the
_mesa_ClearTexImage and _mesa_ClearTexSubImage functions that call it.
v2: Don't clear some of the images if only one of them makes an error
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In texture_error_check() there was a snippet of code to check whether the
given format and internal format are basically compatible. This has been split
out into its own static helper function so that it can be used by an
implementation of glClearTexImage too.
Should reduce overhead because the caching buffer manager doesn't need to
consider buffers of the wrong type.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
since some bits are done in tree, but nobody is working on it anymore.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
because float depth texture data needs clamping to [0.0, 1.0]. Let the
_mesa_texstore() fallback to slower path.
Fixes Khronos GLES3 CTS tests:
shadow_execution_vert
shadow_execution_frag
V2: Move the check to _mesa_texstore_can_use_memcpy() function.
Add check for floating point data types.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We might be able to do this without an extra program key field, but this
is non-invasive and fixes the bug, for now.
This fixes the following Piglit tests on Broadwell:
- ARB_sample_shading/builtin-gl-sample-id 2
- ARB_sample_shading/builtin-gl-sample-position 2
- EXT_framebuffer_multisample/multisample-blit 2 color
- EXT_framebuffer_multisample/multisample-blit 2 color linear
- EXT_framebuffer_multisample/multisample-blit 2 depth
- EXT_framebuffer_multisample/no-color 2 depth combined
- EXT_framebuffer_multisample/no-color 2 depth separate
- EXT_framebuffer_multisample/no-color 2 depth single
- EXT_framebuffer_multisample/no-color 2 depth-computed combined
- EXT_framebuffer_multisample/no-color 2 depth-computed separate
- EXT_framebuffer_multisample/no-color 2 depth-computed single
- EXT_framebuffer_multisample/unaligned-blit 2 color msaa
- EXT_framebuffer_multisample/unaligned-blit 2 depth msaa
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Otherwise, the performance warning for shader recompiles will just say
"something else".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It doesn't exist, so attempting to read it will trigger generation
assertions in the brw_inst API.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Printing the hex offsets makes it basically impossible to diff assembly:
if you add even a single instruction, the entire shader shows up as a
difference. So, every time I want to compare assembly, I have to strip
this out.
The hex offsets might be useful when debugging compaction, or when
inspecting the program cache buffer. Since it's occasionally useful,
but uncommon, this patch disables it by default, but makes it easy to
re-enable it temporarily when the need arises.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Avoids regenerating it unnecessarily.
Every program in shader-db improved, none by an amount less than a 1/3
reduction. One Dota2 shader decreased from 62 -> 24.
cfg calculations: 429492 -> 193197 (-55.02%)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The is used for programs that have arrays of constants that
are accessed using dynamic indices. The shader will compute
the base address of the constants and then access them using
SMRD instructions.
The parameter is an int16_t, and we're check that it's value will fit in
16-bits. Yes, the value that is stored in 16-bits will surely fit in
16-bits.
brw_inst.h: In function 'brw_inst_set_gen6_jump_count':
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_inst.h: In function 'brw_inst_set_src1_vstride':
brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Before it was only storing one of the color components due to truncation.
With this patch it now properly stores all of them.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
This removes the intermediate storage (pm4 state) and generates descriptors
directly in a staging buffer.
It also reduces the number of flushes, because the descriptors no longer
take CS space.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Sampler descriptors are now represented by si_descriptors.
This also adds support for fine-grained sampler state updates and
the border color update is now isolated in a separate function.
Border colors have been broken if texturing from multiple shader stages is
used. This patch doesn't change that.
BTW, blitting already makes use of fine-grained state updates.
u_blitter uses 2 textures at most, so we only have to save 2.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The draw indirect packets cannot set VGT_INDX_OFFSET, they can only set user
data SGPRs. This is the only way to support start/index_bias with indirect
drawing.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Most (all?) Unigine shaders fail to compile without this if sample shading
is advertised. This is, of course, Unigine developers' fault.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is needed to make Unigine Heaven 4.0 and Unigine Valley 1.0 work
with sample shading.
Also, if this is disabled, the error message at least makes sense now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Only supported on evergreen and later. Currently limited
to single component textures as the hardware GATHER4
instruction ignores texture swizzles.
Piglit quick run passes on radeon 6670 with all
applicable textureGather tests, no regressions.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL
as base internal format and non-zero x, y offsets. Currently x, y
offsets are ignored while updating the texture image.
Fixes Khronos GLES3 CTS tests:
npot_tex_sub_image_2d
npot_tex_sub_image_3d
npot_pbo_tex_sub_image_2d
npot_pbo_tex_sub_image_2d
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This reverts commit bbefb15e01.
Fixes the 11 regressions caused in framebuffer_blit tests in
Khronos GLES3 CTS tests:
Original patch reduced the instruction count but had no performance
benefits. So, it's safe to revert it without causing any performance
regressions.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This commit "mesa: fix packing of float texels to GL_SHORT/GL_BYTE" replaced *_TO_BYTE to *_TO_BYTE_TEX because *_TO_FLOAT_TEX are used to unpack the texels to floats.
In this case *_TO_FLOATZ in function extract_float_rgba also should be replaced to *_TO_FLOAT_TEX. Underline that these macros automatically preserve zero when converting.
The regression was observed on 3 oglconform tests:
snorm-textures basic.getTexImage
snorm-textures advanced.mipmap.manual.getTex
snorm-textures advanced.mipmap.upload.getTex
Signed-off-by: Pavel Popov <pavel.e.popov@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This fixes following tests in es3conform:
shaders.switch.default_not_last_dynamic_vertex
shaders.switch.default_not_last_dynamic_fragment
and makes following tests in Piglit pass:
glsl-1.30/execution/switch/fs-default-notlast-fallthrough
glsl-1.30/execution/switch/fs-default_notlast
No Piglit regressions.
v2: take away unnecessary ir_if, just use conditional assignment
v3: use foreach_in_list instead of foreach_list
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v2)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
components() includes matrix columns, so if this code encountered a
matrix, it would ask for something like a vec9 or vec16. This is
clearly not what you want.
Earlier code now prevents this from seeing matrices, but we should still
use vector_elements, for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
It doesn't handle things like (vector * matrix) correctly, and
apparently Matt's intention was to bail.
Fixes shader compilation in Natural Selection 2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 3178d2474a.
This caused GPU hangs on Ivybridge for some users and huge (80%)
performance regressions across the board on multiple platforms.
We need to find a better solution. I've made several attempts, but none
of them have worked yet. In the meantime, we should revert this.
Reverting it breaks GL_PRIMITIVES_GENERATED for non-zero streams, but
that's okay, since we don't expose GL_ARB_gpu_shader5 yet.
Fixes Piglit's EXT_transform_feedback/generatemipmap prims_generated
test case on Haswell.
This code should execute without regard to the currently executing
channels. Asking for gl_SampleID inside control flow might break in
strange ways. It appears to break even at the top of the program in
SIMD16 mode occasionally as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
gen8_fs_generator uses these to decide whether to set the execution size
to 8 or 16, so we incorrectly made both of these MOVs the full width in
SIMD16 shaders. (It happened to work out on Gen4-7.)
Setting them should also help inform optimization passes what's really
going on, which could help avoid bugs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
Both inst->force_uncompressed and inst->force_sechalf mean that the
generated instruction should be uncompressed and have an execution size
of 8. We don't require the visitor to set both flags - setting
inst->force_sechalf by itself is supposed to be enough.
On Gen4-7, guess_execution_size() demoted instructions to 8-wide based
on the default compression state. On Gen8+, we instead set a default
execution size, which worked great...except that we forgot to check
inst->force_sechalf when deciding whether to use 8 or 16.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
nouveau_fence_update does real work unconditionally. Avoid doing that if
the fence we're checking on has already been signalled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This complements the existing append function. It's implemented in a
rather simple way right now; it could be changed if performance is a
concern.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
There are no queries for GL_TEXTURE_LUMINANCE_SIZE,
GL_TEXTURE_INTENSITY_SIZE, GL_TEXTURE_LUMINANCE_TYPE, or
GL_TEXTURE_INTENSITY_TYPE in any version of OpenGL ES or desktop OpenGL
core profile.
NOTE: Without changes to piglit, this regresses
required-sized-texture-formats.
v2: Rebase on different initial change.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.2 <mesa-stable@lists.freedesktop.org>
There are no texture borders in any version of OpenGL ES or desktop
OpenGL core profile.
Fixes piglit's gl-3.2-texture-border-deprecated.
v2: Rebase on different initial change.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.2 <mesa-stable@lists.freedesktop.org>
Instead of catching the special case early, handle it by constructing a
fake gl_texture_image that will cause the values required by the OpenGL
4.0 spec to be returned.
Previously, calling
glGenTextures(1, &t);
glBindTexture(GL_TEXTURE_2D, t);
glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, 0xDEADBEEF, &value);
would not generate an error.
Anuj: Can you verify this does not regress proxy_textures_invalid_size?
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Suggested-by: Brian Paul <brianp@vmware.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
A similar attempt was made in commit 5ff1e446 and was reverted in commit
a39428cf after causing a regression in an ES 3 conformance test. The
test still passes after this commit.
total instructions in shared programs: 1994827 -> 1992858 (-0.10%)
instructions in affected programs: 128247 -> 126278 (-1.54%)
GAINED: 0
LOST: 1
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
If we saw a tree that looked like
vec3
/ \
vec3 float
/ \
vec3 float
/ \
vec3 float
We would see that all of the expression types were vec3, and then
rebalance to
vec3
/ \
vec3 vec3 <-- should be float
/ \ / \
vec3 float float float
This patch adds code to visit the rebalanced tree and update the
expression types from the bottom up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80880
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
0cbefc1bea added a source argument to
EMIT/ENDPRIM, but it did not update tgsi_ureg accordingly, causing all
users of ureg_EMIT/ENDPRIM to fail at runtime with an assertion failure.
Trivial.
This patch fixes this MinGW build error.
glapi_gentable.c: In function '_glapi_create_table_from_handle':
glapi_gentable.c:123:9: error: implicit declaration of function 'dlsym' [-Werror=implicit-function-declaration]
*procp = dlsym(handle, symboln);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Brian Paul <brianp@vmware.com>
Vectors are falling in to the ir_dereference_array() path.
Without this change, the following glsl aborts the debug driver,
or gets the wrong answer in release:
mat2x2 a = mat2( vec2( 1.0, vertex.x ), vec2( 0.0, 1.0 ) );
Also submitting piglit tests, will reference in bug.
v2: Rebase on Mesa master.
v3: Remove unneeded check for arrays, which are covered by
process_array_constructor(), recommended by Timothy Arceri.
Signed-off-by: Cody Northrop <cody@lunarg.com>
Reviewed-by: Courtney Goeltzenleuchter <courtney@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79373
On Cygwin and MinGW, linking a shared library also generates an import library
Use a wildcard which also matches the name of the megadriver import lib,
mesa_dri_drivers.dll.a, so that is also removed after megadriver symlinks are
created
(This then matches src/gallium/targets/dri/Makefile.am, which already does
things this way)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
SIMD8-only for now.
V5: - Fix style complaints
- Move prototype to be with other oddball emit functions
- Use unreachable() instead of assert() where possible
V6: - Describe what is happening with the clamping
- Add reg_width to make some expressions clearer
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The backend will have to do a message send, so we want to keep these in
one piece, just like texture ops.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V5: - Split into separate opcodes
- Pass message data in src1 immediate
- Put noperspective bit in fs_inst rather than adding any junk to
backend_instruction
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- Don't try to disassemble send's src1 as a descriptor if it's not an
immediate.
- In the same case, show src1 as an operand (makes it easier to see
bogus register regions, etc -- the hardware is very fussy)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We'd otherwise go looking into virtual_grf_sizes for things that aren't
in there at all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
V2: - Don't assume everyone wants interpolateAtSample() lowered to
interpolateAtOffset. It turns out this isn't what we want most
of the time for i965. Lowering can be added later in an ir pass
which drivers opt into, rather than bolting it straight into the
builtin definition.
- Only expose the interpolateAt* builtins in the fragment language.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Will be used to implement interpolateAt*() from ARB_gpu_shader5
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The new interpolateAt* builtins have strange restrictions on the
<interpolant> parameter.
- It must be a shader input, or an element of a shader input array.
- It must not include a swizzle.
V2: Don't abuse ir_var_mode_shader_in for this; make a new flag.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Instead of using intr_name in lp_build_tgsi_action, this selects the names
with a switch statement in the emit function. This allows emitting
llvm.SI.sample for instructions without offsets and llvm.SI.image.sample.*.o
otherwise.
This depends on my LLVM changes.
When LLVM 3.5 is released, I'll switch all texture instructions to the new
intrinsics.
This happens when glGetMultisamplefv (or any other non-draw function) is
called, which doesn't invoke the VBO module to update _DrawArrays and
the pointer is invalid at that point.
However st/mesa still dereferences it to setup vertex buffers ==> crash.
Reviewed-by: Brian Paul <brianp@vmware.com>
Adjust definition of _XOPEN_SOURCE appropriately for use of strndup() added with
commit da3a47d6
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Before, we were checking the level against view->u.tex.last_level but
level is not valid for buffers. Plus, the aliasing of the view->u.tex
view->u.buf members (a union) caused the level checking arithmetic to
be totally wrong. The net effect is we always returned early for
PIPE_BUFFER size queries.
This fixes the piglit "textureSize 140 fs samplerBuffer" test.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Commit 54e91e7420 introduced a function declaration that uses
brw_context. While brw_context tends to get included in most files, it
is not when compiling intel_asm_annotation.c resulting in the following
warning:
In file included from brw_shader.h:25:0,
from brw_cfg.h:32,
from intel_asm_annotation.c:24:
brw_reg.h:122:39: warning: 'struct brw_context' declared inside
parameter list [enabled by default]
brw_reg.h:122:39: warning: its scope is only this definition or
declaration, which is probably not what you want [enabled by default]
Add a forward-declaration for struct brw_context to avoid the issue.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the current logic, it's very likely that s/r indirect sources are
right after the "regular" ones. Unset them before moving the texture
arguments over rather than after, as one of those arguments would
likely have assumed one of the s/r positions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Convert the final dri target to the single DRI (megadriver) library.
Cleanup all the automake leftovers from the conversion stage and
update the scons build.
v2: Link in llvmpipe, when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Move the driver_name to dri2/drisw and remove all the SPLIT_TAGETS
mayhem. In the next step we'll unify the dri and dri-swrast targets,
completing the gallium DRI megadriver.
v2: Remove leftover st/dri Makefiles from CONFIG_FILES. Spotted by
Thomas Helland.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Export the approapriate new symbol, and keep backwards compat
via the megadriver_stub helper library.
Our next step would be to unify dri/drm and dri/sw, leading to
a complete megadrivers solution, and having a single library
that provides dri across all targets.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Rather than building two identical ones for dri-vmwgfx and dri-swrast
build a single library, and drop some duplication in the build.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
... and use libmegadriver_stub as their provider.
Teach scons how to build the library archive and use it.
v2: scons: fix build on a drm-less system.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
With all the users converted to __driGetExtensions_* we can
have only a single inclusion of the required header + define.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
The symbol is introduced by the mesa megadrivers, and
adding gallium support for it will allow us to merge
st/dri/drm and st/dri/sw. Resulting in a single dri library
across all of gallium.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
The symbol is introduced by the mesa megadrivers, and adding
gallium support for it will allow us to merge st/dri/drm and
st/dri/sw. Resulting in a single dri library across gallium.
v2: Rebase on top of gallium dri3.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
This enables a gallium driver not to care about the semantics of
ARB_sample_shading vs ARB_gpu_shader5 sample attributes. When
ARB_sample_shading-style sample shading is enabled, all of the fp inputs
are marked for interpolation at the sample location.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The new location field can be either center, centroid, or sample, which
indicates the location that the shader should interpolate at.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
i965 precompiles shaders at link time, and prints a disassembly if
INTEL_DEBUG=vs,gs,fs, including the shader name. However, blit shaders
were showing up as "unnamed" since we hadn't set a name prior to
linking.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Originally, we didn't have direct accessors for all of the GLSL types,
so the only way to get at them was to use the symbol table. Now, we
can just get at them directly, which is simpler and faster.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The lexer was insisting that there be at least one character after "#pragma"
and before the end of the line. This caused an error for a line consisting
only of "#pragma" which volates at least the following sentence from the GLSL
ES Specification 3.00.4:
The scope as well as the effect of the optimize and debug pragmas is
implementation-dependent except that their use must not generate an
error. [Page 12 (Page 28 of PDF)]
and likely the following sentence from that specification and also in
GLSLangSpec 4.30.6:
If an implementation does not recognize the tokens following #pragma,
then it will ignore that pragma.
Add a "make check" test to ensure no future regressions.
This change fixes at least part of the following Khronos GLES3 CTS test:
preprocessor.pragmas.pragma_vertex
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We've always warned about this case, but a recent confromance test expects
this to be an error that causes compilation to fail. Make it so.
Also add a "make check" test to ensure these errors are generated.
This fixes the following Khronos GLES3 conformance tests:
invalid_conditionals.tokens_after_ifdef_vertex
invalid_conditionals.tokens_after_ifdef_fragment
invalid_conditionals.tokens_after_ifndef_vertex
invalid_conditionals.tokens_after_ifndef_fragment
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While writing the previous commit message, I just felt bad documenting the
shortcoming of the change, (that undefined macro names would not be reported
in error messages).
Fix this by preserving the first-encounterd undefined macro name and reporting
that in any resulting error message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GLSL ES Specification 3.00.4 says:
#if, #ifdef, #ifndef, #else, #elif, and #endif are defined to operate
as for C++ except for the following:
...
• Undefined identifiers not consumed by the defined operator do not
default to '0'. Use of such identifiers causes an error.
[Page 11 (page 127 of the PDF file)]
as well as:
The semantics of applying operators in the preprocessor match those
standard in the C++ preprocessor with the following exceptions:
• The 2nd operand in a logical and ('&&') operation is evaluated if
and only if the 1st operand evaluates to non-zero.
• The 2nd operand in a logical or ('||') operation is evaluated if
and only if the 1st operand evaluates to zero.
If an operand is not evaluated, the presence of undefined identifiers
in the operand will not cause an error.
(Note that neither of these deviations from C++ preprocessor behavior apply to
non-ES GLSL, at least as of specfication version 4.30.6).
The first portion of this, (generating an error for an undefined macro in an
(short-circuiting to squelch errors), was not implemented previously, but is
implemented in this commit.
A test is added for "make check" to ensure this behavior.
Note: The change as implemented does make the error message a bit less
precise, (it just states that an undefined macro was encountered, but not the
name of the macro).
This commit fixes the following Khronos GLES3 conformance test:
undefined_identifiers.valid_undefined_identifier_1_vertex
undefined_identifiers.valid_undefined_identifier_1_fragment
undefined_identifiers.valid_undefined_identifier_2_vertex
undefined_identifiers.valid_undefined_identifier_2_fragment
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The preprocessor defines a notions of a "preprocessing number" that
starts with either a digit or a decimal point, and continues with zero
or more of digits, decimal points, identifier characters, or the sign
symbols, ('-' and '+').
Prior to this change, preprocessing numbers were lexed as some
combination of OTHER and IDENTIFIER tokens. This had the problem of
causing undesired macro expansion in some cases.
We add tests to ensure that the undesired macro expansion does not
happen in cases such as:
#define e +1
#define xyz -2
int n = 1e;
int p = 1xyz;
In either case these macro definitions have no effect after this
change, so that the numeric literals, (whether valid or not), will be
passed on as-is from the preprocessor to the compiler proper.
This fixes the following Khronos GLES3 CTS tests:
preprocessor.basic.correct_phases_vertex
preprocessor.basic.correct_phases_fragment
v2. Thanks to Anuj Phogat for improving the original regular expression,
(which accepted a '+' or '-', where these are only allowed after one of
[eEpP]. I also expanded the test to exercise this.
v3. Also fixed regular expression to require at least one digit at the
beginning (after an optional period). Otherwise, a string such as ".xyz" was
getting sucked up as a preprocessing number, (where obviously this should be a
field access). Again, I expanded the test to exercise this.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, a line such as:
#else garbage
would flag an error if it followed "#if 0", but not if it followed "#if 1".
We fix this by setting a new bit of state (lexing_else) that allows the lexer
to defer switching to the <SKIP> start state until after the NEWLINE following
the #else directive.
A new test case is added for:
#if 1
#else garbage
#endif
which was untested before, (and did not generate the desired error).
This fixes the following Khronos GLES3 CTS tests:
tokens_after_else_vertex
tokens_after_else_fragment
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, the test suite was expecting the compiler to allow a redefintion
of a macro with whitespace added, but gcc is more strict and allows only for
changes in the amounts of whitespace, (but insists that whitespace exist or
not in exactly the same places).
See: https://gcc.gnu.org/onlinedocs/cpp/Undefining-and-Redefining-Macros.html:
These definitions are effectively the same:
#define FOUR (2 + 2)
#define FOUR (2 + 2)
#define FOUR (2 /* two */ + 2)
but these are not:
#define FOUR (2 + 2)
#define FOUR ( 2+2 )
#define FOUR (2 * 2)
#define FOUR(score,and,seven,years,ago) (2 + 2)
This change adjusts the existing "redefine-macro-legitimate" test to work with
the more strict understanding, and adds a new "redefine-whitespace" test to
verify that changes in the position of whitespace are flagged as errors.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch specifically fixes redefinition condition for white space
changes. #define and #undef functionality in GLSL follows the standard
for C++ preprocessors for macro definitions.
From https://gcc.gnu.org/onlinedocs/cpp/Undefining-and-Redefining-Macros.html:
These definitions are effectively the same:
#define FOUR (2 + 2)
#define FOUR (2 + 2)
#define FOUR (2 /* two */ + 2)
but these are not:
#define FOUR (2 + 2)
#define FOUR ( 2+2 )
#define FOUR (2 * 2)
#define FOUR(score,and,seven,years,ago) (2 + 2)
Fixes Khronos GLES3 CTS tests;
invalid_object_whitespace_vertex
invalid_object_whitespace_fragment
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
On nvc0, a counter can have up to 6 sources instead of only one
for nve4+. This fixes a crash when a counter uses more than
one source.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The set of variable uses does not need to be ordered in any way, and
removing/adding elements is a fairly common operation in various
optimization passes.
This shortens runtime of piglit test fp-long-alu to ~22s from ~4h
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
BRW_PREDICATE_ALIGN1_ANY16H was incorrectly being disassembled as
"all16h", and ALL16H would probably print as "(null)".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We clearly don't want to start at the head and walk backwards; we want
to start at the last real element before the tail sentinel. If the list
is empty, tail_pred will be the head sentinel, and we'll stop.
Nothing uses this function, so I guess nobody noticed it was broken.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This doesn't fix any known issue. In fact, radeon drivers ignore all
the discard flags for textures and implicitly do "discard range"
for any write transfer.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Previously instruction scheduling tracked dependencies on a per-register
basis. This meant that there was an artificial dependency between
interpolation instructions writing into the same virtual register.
Instruction scheduling would insert a number of instructions between the
two instructions in this example, when they are actually independent.
linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F
linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F
This lead to cases where the first texture coordinate is interpolated at
the beginning of the shader, but the second is done immediately before
the texture operation that uses it as a source.
After this change, the artificial dependency is removed and the
interpolation instructions are scheduled together.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Revert "build: Build on Cygwin with gnu99 instead of c99." and define
_XOPEN_SOURCE appropriately.
This reverts commit 53e36d333c.
Since Cygwin 1.7.18 (April 2013), it's headers correctly prototype strtoll()
when using -std=c99, and correctly prototype strdup() when _XOPEN_SOURCE is
defined appropriately, so this workaround is no longer needed.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Cc: Vinson Lee <vlee@freedesktop.org>
Apparently TXD wants its offset differently than TEX, accepting it in
the upper bits of the layer index. Unclear what happens when this is
combined with indirect sampler indexing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Something about how we're implementing offsets for TXD is wrong, just
flip to the generic quadop-based implementation in that case.
This is the minimal fix appropriate for backporting.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Unfortunately there's no good way to do this on the nv50 shader isa.
Dropping the bias seems preferable to doing the compare post-filtering.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Although the HSW PRM shows it, the BSpec lists this workaround as being
for Ivybridge only.
total instructions in shared programs: 1994951 -> 1993675 (-0.06%)
instructions in affected programs: 27325 -> 26049 (-4.67%)
Port of commit b16b3c87 to the vec4 code.
No shader-db improvements, but might as well. The fs backend saw an
improvement because it's scalar and multiple identical CMP instructions
were generated by the SEL peepholes.
[mattst88]: Modified to perform CSE on instructions with
the same writemask. Offered no improvement before.
total instructions in shared programs: 1995633 -> 1995185 (-0.02%)
instructions in affected programs: 14410 -> 13962 (-3.11%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
With a hack to place an exec_node in the struct in C to be at the same
location as the inherited exec_node in C++.
Acked-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
If another layout qualifier appeared to the left of `invocations` in the
GS input layout declaration, the invocation count would be dropped on
the floor.
Fixes the piglit tests:
spec/ARB_transform_feedback3/arb_transform_feedback3-ext_interleaved_two_bufs_gs_max
spec/ARB_gpu_shader5/arb_gpu_shader5-invocation-id
spec/ARB_gpu_shader5/compiler/correct-multiple-layout-qualifier-invocations.geom
spec/ARB_gpu_shader5/execution/invocations-conflicting
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The hardware allows multiple simultaneous renders with the same
memory-backed constbufs but with each invocation having different
values. However in order for that to work, the data has to be streamed
in via the right constbuf slot. We weren't doing that for UBOs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.1" <mesa-stable@lists.freedesktop.org>
Now that this cap is used to determine the availability of both, adjust
its name to reflect the new reality.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The assumption is that any driver capable of emitting layer from the
vertex shader and supporting viewports should be able to also handle
emitting viewport index from the vertex shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tobias Droste <tdroste@gmx.de>
The Linux winsys can no longer relocate shader code, so avoid
reemitting BindGBShader commands. They are costly.
v2: Correctly handle errors from SVGA3D_BindGBShader()
Reported-by: Michael Banack <banackm@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Previously, we were assuming that kernel metadata nodes only had 1 operand.
Kernels which have attributes can have more than 1, e.g.:
!0 = metadata !{void (i32 addrspace(1)*)* @testKernel, metadata !1}
!1 = metadata !{metadata !"work_group_size_hint", i32 4, i32 1, i32 1}
Attempting to get the kernel without the correct number of attributes led
to memory corruption and luxrays crashing out.
Fixes the cl/program/execute/attributes.cl piglit test.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76223
CC: "10.2" <mesa-stable@lists.freedesktop.org>
If XA fails to initialize with pipe_loader enabled, the pipe_loader's
cleanup function will close the drm file descriptor. That's pretty bad
because the file descriptor will probably be the X server driver's only
connection to drm. Temporarily solve this by dup()'ing the file descriptor
before handing it over to the pipe loader.
This fixes freedesktop.org bugzilla bug #80645.
v2: Fix CC addresses.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
If multiple viewports are supported, that implies the presence of a GS
and layered rendering, so we can enable ARB_fragment_layer_viewport as
well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I wanted to access this value from stage-generic code, so stop storing it
under two different names.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we had some NOS affecting VS compilation that resulted in optimization
changing the set of constants to be uploaded, we might not have reuploaded
the constants.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At this point, the extra copy of the data and memcmp are as expensive as
just re-uploading.
Note: now that we'll always upload, and brw_constant_buffer watches
BRW_NEW_BATCH anyway, we don't need to explicitly unref the old curbe_bo
at batch reset time.
No significant performance difference on glamor copywinwin10 (n=55),
despite that test having a 98% hit rate on the cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to support ARB_fragment_layer_viewport, we need to explicitly
send these along to the pixel shader, since it has no other way to
retrieve them.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
With the inclusion of xmlconfig in the loader we're providing dri* symbols
which are already available in libdricommon.la. This leads to a build
break due to the multiple definitions.
Temporary allow multiple definitions, until we come with a better solution.
Reported-by: Laurent Carlier <lordheavym@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With all the hw drivers converted, we can go back to having
a single libdridrm provider.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
v2:
- Drop inclusion of the winsys wrapper and softpipe/llvmpipe.
- Remove old Makefile.am, target.c.
- Correctly append i915 to the megadrivers list.
Cc: Stephane Marchesin <stephane.marchesin@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Similiar to other targets, we'd like to convert all the separate
targets into a single one, thus we'll minimize the duplication and
overall size of mesa. The conversion per API basis, with the drivers
available either statically or shared. Currently the former is the
default.
v2: Correctly append the version script to the linker flags.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Will be used to create the single dri target library, on our
way to convert all the dri targets during the conversion to
to static/shared pipe-drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
With this commit we add a couple of DEFINES making the ST code
conditional, in a way that we can use it to gradually convert
the dri-targets from separate libraries into a single one.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Some of the features are completely implemented by core, while others
have hardware dependencies. Create a list of drivers supporting each
sub-feature that must have hw support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Because the layout is always linear this didn't really do much any longer -
at some point this triggered per-tile swizzled->linear conversion. The x/y
coords were ignored too.
Apart from triggering conversion, this also invoked alloc_image_data(), which
could only actually trigger mapping of display target resources. So, instead
just call resource_map in the callers (which also gives the ability to unmap
again). Note that mapping/unmapping of display target resources still isn't
really all that clean (map/unmap may be unmatched, and all such mappings use
the same pointer thus usage flags are a lie).
Reviewed-by: Brian Paul <brianp@vmware.com>
The only caller left used it only for non display target textures,
hence it was really the same as llvmpipe_get_texture_image_address - it
also had a usage flag but this was ignored anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
Once used for invoking swizzled->linear conversion for all needed images.
But we now have a single allocation for all images in a resource, thus looping
through all slices is rather pointless, conversion doesn't happen neither.
Also simplify the sampling setup code to use the mip_offsets array in the
resource directly - if the (non display target) resource exists its memory
will already be allocated as well.
Reviewed-by: Brian Paul <brianp@vmware.com>
The deferred allocation doesn't really make much sense anymore, since we no
longer allocate swizzled/linear memory in chunks and not per level / slice
neither.
This means we could fail resource creation a bit more (could already fail in
theory anyway) but should not fail maps later (right now, callers can't deal
with neither really).
Reviewed-by: Brian Paul <brianp@vmware.com>
it looks since ce1a137228 they are now included
in more places, in particular even for things buildable with msvc, and hence
those break the build.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Drop stdbool, due to the X server being a pain and having
struct members called bool, although I've sent a patch to fix
that we should retain stupidity here. Use unsigned char
which is what GLboolean is anyways.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit 17c7ead7 exposed a bug in how uniform loading happens in the
presence of discard. It manifested itself in an application as
randomly incorrect pixels on the borders of conditional areas.
This is due to how discards jump to the end of the shader incorrectly
for some channels. The current implementation checks each 2x2
subspan to preserve derivatives. When uniform loading via samplers
was turned on, it uses a full execution mask, as stated in
lower_uniform_pull_constant_loads(), and only populates four channels
of the destination (see generate_uniform_pull_constant_load_gen7()).
It happens incorrectly when the first subspan has been jumped over.
The series that implemented this optimization was done before the
changes to use samplers for uniform loads. Uniform sampler loads
use special execution masks and only populate four channels, so we
can't jump over those or corruption ensues.
This fix only jumps to the end of the shader if all relevant channels
are disabled, i.e. all 8 or 16, depending on dispatch. This
preserves the original GLbenchmark 2.7 speedup noted in commit
beafced2.
It changes the shader assembly accordingly:
before : (-f0.1.any4h) halt(8) 17 2 null { align1 WE_all 1Q };
after(8) : (-f0.1.any8h) halt(8) 17 2 null { align1 WE_all 1Q };
after(16): (-f0.1.any16h) halt(16) 17 2 null { align1 WE_all 1H };
v2: Cleaned up comments and conditional ordering.
v3: Fix typo.
Signed-off-by: Cody Northrop <cody@lunarg.com>
Reviewed-by: Mike Stroyan <mike@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
cfg, for instance, is a pointer to a local variable in
calculate_live_intervals, certainly not valid after that function has
returned.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We want to have the dri common files compiled to define USE_DRICONF.
We need to check both NEED_OPENGL_COMMON and HAVE_DRICOMMON
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: Brian Paul <brianp@vmware.com>
UniformBufferSize is in bytes so we need to divide by 16 to get the
number of constant buffer slots. Also, the ureg_DECL_constant2D()
function takes first..last parameters so we need to subtract one
for the last value.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Before, we were always using the address register and indirect addressing
to index into a UBO constant buffer. With this change we only do that
when necessary.
Using the piglit bin/arb_uniform_buffer_object-rendering test as an
example:
Shader code:
uniform ub_rot {float rotation; };
...
m[1][1] = cos(rotation);
Before:
IMM[1] INT32 {0, 1, 0, 0}
1: UARL ADDR[0].x, IMM[1].xxxx
2: MOV TEMP[0].x, CONST[3][ADDR[0].x].xxxx
3: COS TEMP[1].x, TEMP[0].xxxx
After:
0: COS TEMP[0].x, CONST[3][0].xxxx
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Otherwise, if we were creating a const buffer src register for a UBO
the index into the UBO was always zero.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
To implement the unlit_centroid_workaround, previously we emitted
(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 1Q };
(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 1Q };
where the flag register contains the channel enable bits from g0.
Since the predicates are complementary, the pair of pln instructions
write to non-overlapping components of the destination, which is the
case that the dependency control hints are designed for.
Typically setting dependency control hints on predicated instructions
isn't safe (if an instruction doesn't execute due to the predicate, it
won't update the scoreboard, leaving it in a bad state) but since we
must have at least one channel executing (i.e., +f0 is true for some
channel) by virtue of the fact that the thread is running, we can put
the +f0 pln instruction last and set the hints:
(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 NoDDClr 1Q };
(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 NoDDChk 1Q };
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This sequence (where both x and w are used afterwards) wasn't handled.
mul.sat x, y, z
...
mov.sat w, x
We assumed that if x was used after the mov.sat, that we couldn't
propagate the saturate modifier, but in fact x was already saturated.
So ignore the live range check if the producing instruction already
saturates its result. Cuts one instruction from hundreds of TF2 shaders.
total instructions in shared programs: 1995631 -> 1994951 (-0.03%)
instructions in affected programs: 155248 -> 154568 (-0.44%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
text data bss dec hex filename
4231165 123200 39648 4394013 430c1d i965_dri.so
4186277 123200 39648 4349125 425cc5 i965_dri.so
Cuts 43k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
text data bss dec hex filename
4244821 123200 39648 4407669 434175 i965_dri.so
4231165 123200 39648 4394013 430c1d i965_dri.so
Cuts 13k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
text data bss dec hex filename
4270747 123200 39648 4433595 43a6bb i965_dri.so
4244821 123200 39648 4407669 434175 i965_dri.so
Cuts 25k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This improves GLX DRI3 GPU offloading significantly on CPU
bound benchmarks particularly.
No performance impact for DRI2 GPU offloading.
v2: Add missing tests
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák<marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The differences with DRI2 GPU offloading are:
a) There's no logic for GPU offloading needed in the Xserver
b) for DRI2, the card would render to a back buffer, and
the content would be copied to the front buffer (the same buffers
everytime). Here we can potentially use several back buffers and copy
to buffers with no tiling to share with X. We send them with the
Present extension.
That means than the DRI2 solution is forced to have tearings with GPU
offloading. In the ideal scenario, this DRI3 solution doesn't have this
problem.
However without dma-buf fences, a race can appear (if the card is slow
and the rendering hasn't finished before the server card reads the buffer),
and then old content is displayed. If a user hits this, he should probably
revert to the DRI2 solution (LIBGL_DRI3_DISABLE). Users with cards fast
enough seem to not hit this in practice (I have an Amd hd 7730m, and I
don't hit this, except if I force a low dpm mode)
c) for non-fullscreen apps, the DRI2 GPU offloading solution requires
compositing. This DRI3 solution doesn't have this requirement. Rendering
to a pixmap also works.
d) There is no need to have a DDX loaded for the secondary card.
V4: Fixes some piglit tests
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Dave Airlie <airlied@redhat.com>
DRI_PRIME is not very handy, because you have to launch the executable
with it set, which is not always easy to do.
By using drirc, the user specifies the target executable
and the device to use. After that the program will be launched everytime
on the target device.
For example if .drirc contains:
<driconf>
<device driver="loader">
<application name="Glmark2" executable="glmark2">
<option name="device_id" value="pci-0000_01_00_0" />
</application>
</device>
</driconf>
Then glmark2 will use if possible the render-node of
ID_PATH_TAG pci-0000_01_00_0.
v2: Fix compilation issue
v3: Add "-lm" and rebase.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
v2: Fix the leak of device_name
v3: Rebased
It enables to use the DRI_PRIME env var to specify
which gpu to use.
Two syntax are supported:
If DRI_PRIME is 1 it means: take any other gpu than the default one.
If DRI_PRIME is the ID_PATH_TAG of a device: choose this device if
possible.
The ID_PATH_TAG is a tag filled by udev.
You can check it with 'udevadm info' on the device node.
For example it can be "pci-0000_01_00_0".
Render-nodes need to be enabled to choose another gpu,
and they need to have the ID_PATH_TAG advertised.
It is possible for not very recent udev that the tag
is not advertised for render-nodes, then
ones need to add a file containing:
SUBSYSTEM=="drm", IMPORT{builtin}="path_id"
in /etc/udev/rules.d/
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This in theory changes ABI for the boolean->bool I think,
but nothing in the tree uses configQueryb AFAICS.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just drops all the GL types from the xmlconfig and use
std C types from stdint and stdbool.
v2: drop further double and header include.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Update all three build systems, and add freedreno to the android
build. Pending future work on the ST we can convert egl-static
to provide either static or dynamic access to the pipe-drivers.
There is no functional change with this patch.
v2: Don't add freedreno to android build, drop the wrapper winsys.
Cc: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Move the gbm "target" code to the state-tracker, similar
to other - dri, omx, vdpau... ST.
v2: Drop inclusion of the wrapper winsys and softpipe/llvmpipe.
Cc: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Now we can build the xa target (libxatracker) with either static
pipe-drivers or shared ones. Currently we default to static.
- Remove the unused CFLAGS/CPPFLAGS.
- Use GALLIUM_TARGET_CFLAGS where applicable.
v2: Update the printout messages at configure.
v3: Drop inclusion of the wrapper winsys and softpipe/llvmpipe.
Cc: Jakob Bornecrantz <jakob@vmware.com>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
At this point, brw_disassemble can do everything gen8_disassemble can
do - and, thanks to the new brw_inst API, it supports all generations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Previously, we decoded render target write messages as:
render ( RT write, 0, 16, 12, 0) mlen 8 rlen 0
which made you remember (or look up) what the numbers meant:
1. The binding table index
2. The raw message control, undecoded:
- Last Render Target Select
- Slot Group Select
- Message Type (SIMD8, normal SIMD16, SIMD16 replicate data, ...)
3. The dataport message type, again (already decoded as "RT write")
4. The write commit bit (0 or 1)
Needless to say, having to decipher that yourself is annoying. Now, we
do:
render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0
with optional "Hi" and "WriteCommit" for slot group/write commit.
Thanks to the new brw_inst API, we can also stop duplicating code on a
per-generation basis.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We haven't used the name "message target" in a while - there are a lot
of things called "target", and it gets confusing. SFID ("Shared
Function ID") is the term commonly used in the modern documentation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The name of this message is "Render Target UNORM Write" (Sandybridge
PRM, Volume 4 Part 1, Page 210). Drop the bogus 'c'.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Most developers will recognize the Gen6+ SFID names more quickly than
the Gen4-5 ones. Given that they're the same values, just use the new
names.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We should print something properly, but I'm not sure how to properly
print an HF, and we don't have any DFs today to test with.
This is at least better than the current Gen8 disassembler, which would
simply assert fail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This backports the atomic message disassembly support from
gen8_disasm.c, which additionally offers support for decoding atomic
surface read/write messages, and showing SIMD modes and other details.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
I never bothered implementing the disassembler for Gen7+ URB opcodes, so
we were just disassembling them as Ironlake/Sandybridge ones. This
looked pretty bad when running Paul's GS EndPrimitive tests, as the
"write OWord" message was decoded at ff_sync, which doesn't exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
While we're adding things, use symbolic constants rather than magic
numbers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
These have existed since Ivybridge. We don't use them today, but the
Gen8+ disassembler supports them, and I'd like to use symbolic names
rather than magic numbers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This makes brw_disasm.c able to disassemble ELSE instructions correctly
on Broadwell. (gen8_disasm.c already handles this correctly.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Previously, our dissasembly for flow control instructions looked like:
0x00000040: else(8) ip 65540D { align16 switch };
It didn't print InstCount properly for ELSE/ENDIF, and didn't even
attempt to disassemble PopCount.
Now it looks like:
0x00000040: else(8) Jump: 4 Pop: 1 { align16 switch };
which is much more readable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Previously, flow control instructions generated output like:
(+f0) if(8) 12 8 null 0x000c0008UD { align16 WE_normal 1Q };
which included a dissasembly of the register fields, even though those
are meaningless for flow control instructions---those bits are reused
for another purpose.
It also wasn't immediately obvious which number was UIP and which was
JIP.
With this patch, we instead output:
(+f0) if(8) JIP: 8 UIP: 12 { align16 WE_normal 1Q };
which is much clearer.
The patch also introduces has_uip/has_jip helper functions which clear
up a some generation/opcode checking mess.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
While we're at it, use proper names rather than magic numbers for the
existing fields.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
brw_disasm.c basically wasn't following the Mesa coding style at all.
It used 4-space indent instead of 3-space, didn't cuddle braces, didn't
put function return types on a separate line, put extra spaces in
function calls (between the name and parenthesis), and a number of other
things.
This made it fairly obnoxious to work on, since my editor is configured
to follow Mesa style in the Mesa source repository. Fixing it to follow
a consistent style now should save time dealing with it later.
These modifications were originally generated by:
$ indent -br -i3 -npcs -ce -cs -l80 --no-tabs
with some manual changes afterwards to fit our style better.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
opcode is just a pointer to opcode_descs; we may as well use that
directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
As far as I can tell, the Intel mesa driver is the only driver in the world
still supporting this legacy extension. If someone wants to do bump
mapping, they can use shaders.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
Reviewed-by: Ian Romanick <idr@freedesktop.org> [v3]
On i965, enabling and disabling the GS is not free: you have to do a
full pipeline stall, reconfigure the URB and push constant space, and
emit a bunch of state. Most clears aren't layered, so the GS isn't
needed in the common case. But we turned it on universally.
Using AMD_vertex_shader_layer allows us to skip setting up the GS
altogether, while achieving the same effect.
According to Ilia, current nVidia GPUs can't do AMD_vertex_shader_layer.
However, since nouveau is Gallium-based, they're unlikely to ever care
about this path. Intel and AMD GPUs both support the extension.
Since i965 is the only driver using this path which does layered
rendering, we may as well target it at that.
v2: Improve commit message. No code changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It should be possible to query the number of primitives written to each
individual stream by a geometry shader in a single draw call. For that
we need to have up to MAX_VERTEX_STREAM separate query objects.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
So far we have been using CL_INVOCATION_COUNT to resolve this query but this
is no good with streams, as only stream 0 reaches the clipping stage. Instead
we will use SO_PRIM_STORAGE_NEEDED which can keep track of the primitives sent
to each individual stream.
Since SO_PRIM_STORAGE_NEEDED is related to the SOL stage and according to
ARB_transform_feedback3 we need to be able to query primitives generated in
each stream whether transform feedback is active or not what we do is to
enable the SOL unit even if transform feedback is not active but disable all
output buffers in that case. This effectively disables transform feedback
but permits activation of statistics enabling SO_PRIM_STORAGE_NEEDED even
when transform feedback is not active.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Check if non-zero streams are used. Fail to link if emitting to unsupported
streams or emitting to non-zero streams with output type other than GL_POINTS.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will be necessary to implement EndStreamPrimitive().
EndPrimitive() will produce an ir_end_primitive with the default stream 0.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will be necessary to implement EmitStreamVertex().
EmitVertex() will produce an ir_emit_vertex with the default stream 0.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If the geometry shader is indeed using streams then we need 2 control data
bits per vertex for the StreamID. If the shader is not using streams then
we don't need control data bits.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On Intel hardware when a geometry shader outputs GL_POINTS primitives we
only need to emit vertex control bits if it emits vertices to non-zero
streams, so use a flag to track this.
This flag will be set to TRUE when a geometry shader calls EmitStreamVertex()
or EndStreamPrimitive() with a non-zero stream parameter in a later patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Outputs that are linked to inputs in the next stage must be output to stream 0,
otherwise we should fail to link.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Configure hardware to read vertex data for all streams and have all streams
write their varyings to the corresponsing output buffers.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This implements parsing requirements for multi-stream support in
geometry shaders as defined in ARB_gpu_shader5.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes commit a001ca98e15(st/omx: keep the name,
(name|role)_specific strings dynamically allocated) in which we
dynamically allocated the buffers for name and (name|role)_specific
yet forgot to copy the encoder strings into them.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80614
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
All of the bits appear to already be in place to support this in the
sampler (which the original AMD version didn't allow).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This doesn't change anything to the intel DRI3 implementation,
but enables the gallium implementation to use dri2.stamp instead
of relying on the stamp shared with the st backend.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Move the image extension setup in with all the others in
bind_extensions, and improve the check to both version
and function pointer.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Dave Airlie <airlied@redhat.com>
While the official INTEL_swap_event specification says that the drawable
field should contain the GLXDrawable, not the Drawable, the existing
DRI2 code in dri2.c that translates from DRI2_BufferSwapComplete sends out
GLX_BufferSwapComplete with the Drawable's ID, so existing codebases
like Clutter/Cogl rely on getting the Drawable.
Match DRI2's error here and stuff the event with the X Drawable, not
the GLX drawable.
This fixes apps seeing wrong drawables through an indirect GLX context
or with DRI3, which uses the GLX_BufferSwapComplete event directly on
the wire instead of translates Present in mesa.
At the same time, also modify the structure for the event to make sure
that clients don't make the same mistake. This is not an API or ABI
break, as GLXDrawable and Drawable are both typedefs for XID.
Signed-off-by: Jasper St. Pierre <jstpierre@mecheye.net>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
MCS buffers are never allocated on Broadwell, so this does nothing for
now, but puts the infrastructure in place for when they do exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
According to the documentation, we don't need this SINT workaround on
Broadwell. (Or at least, it doesn't mention that we need it.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Broadwell generalizes the MCS fields to allow for multiple kinds of
auxiliary surfaces. This patch adds the plumbing to set those values,
but doesn't yet hook any up.
v2: (by Jordan Justen) Use mt for qpitch; pitch is tiles - 1.
v3: Don't forget to subtract 1 from aux_mt->pitch.
v4: Drop unnecessary aux_mt->offset (caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Prior to the new brw_inst API, the brw_instruction structure split off
bits 4 and 5 of msg_control for specific fields, and we failed to
disassemble them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
They don't have a UIP. We used UIP in an array dereference, which never
caused problems on Gen < 8, since UIP was a small integer (number of
instructions). On Gen 8 UIP is in bytes, so it's large enough that it
caused us to read out of bounds of the array.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now nothing uses this, but we can incrementally convert.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use this an an opportunity to clean up the formatting of some old code
(brw_ADD, for instance).
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: (by Kenneth Graunke)
- Fix disassembly of Gen4-5 SEND messages to print base MRF correctly.
- Only print URB opcode on Gen5+, to match previous output (besides,
there is only one opcode AFAICT.)
- Only print the low 3 bits of msg_control, to match previous output.
(We probably should decode all the fields, but hadn't previously due
to the brw_instruction structure definition splitting out bits 4/5
for last_render_target and slot_group_select.)
- Fix 3-source MRF/GRF file decoding on Sandybridge.
- Fix compression code to use qtr_control rather than cmpt_control
(which is compaction, not compression).
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v2]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use brw_inst_bits rather than pulling out individual fields and
reassembling them.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're going to use fs_generator/vec4_generator for Gen8+ code soon,
thanks to the new brw_instruction API. When we do, we'll generally
want to take the Haswell paths on Gen8+ as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2:
- Fix IF -> ELSE patching on Sandybridge.
- Don't set base_mrf on Gen6+ in OWord Block Read functions. (Although
- the old code did this universally, it shouldn't have - the field
- doesn't exist on Gen6+ and just got overwritten by the SFID anyway.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Don't set flag_reg_nr prior to Gen7 (as it doesn't exist).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is similar to gen8_instruction, and will eventually replace it.
For now nothing uses this, but we can incrementally convert.
The new API takes the existing brw_instruction pointers to ease
conversion; when done, we can simply drop the old structure and rename
struct brw_instruction -> brw_inst.
v2: (by Matt Turner) Make JIP/UIP functions take a signed argument.
v3: (by Kenneth Graunke)
- Make Gen4-6 jump target functions take a signed argument.
- Fix indirect align1 AddrImm bits on Gen4-7.
- Fix SFID on Sandybridge to use bits 27:24.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v1, v3+]
Signed-off-by: Matt Turner <mattst88@gmail.com> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com>
The new brw_inst API is going to require a brw pointer in order
to access fields (so it can do generation checks). Plumb it in now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to the workarounds list, these stalls aren't needed on
production Baytrail systems. Piglit confirms that as well.
These cause a small slowdown when we are sending a large number of small
batches to the GPU. Removing these improves performance by up to 5% on
some CPU bound SynMark tests (Batch[4-7], DrvState1, HdrBloom,
Multithread, ShMapPcf).
Signed-off-by: Gregory Hunt <greg.hunt@mobica.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Intel would like us to include the marketing names. Developers
additionally want "Broadwell GT1/2/3" because it makes it easier
to identify what hardware users have when they request assistance
or report issues.
Including both makes it easy for everyone to map between the names.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The flags are not specific to the video targets plus
we can reuse them for targets/xa and targets/gbm.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Required for the conversion stage of all VL targets to
a single library per API (static/shared pipe-drivers).
No longer required as per last commit.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The radeonsi counterpart of previous commit - now libomx-radeonsi is
built into the libomx-mesa library. Providing a single library per API.
v2: Include the radeon winsys only when there is a user for it.
v3: Correcly include the winsys. Now with extra brown bag :\
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio
This patch concludes the unification. Now libomx-mesa will be used
for all hardware - r600, radeonsi and nouveau.
Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The r600 counterpart of previous commit - now the libomx-r600 is
built into the libomx-mesa library. Providing a single library per API.
v2: Include the radeon winsys only when there is a user for it.
v3: Correcly include the winsys. Now with extra brown bag :\
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio
If you have more than one omx library (libomx-radeonsi, libomx-r600),
make sure to temporary move the unused one. By the end of the series
there will be only one library that will be used for all hardware -
r600, radeonsi and nouveau.
Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Similar to the vdpau/xvmc targets, we're going to convert the
multiple target libraries into a single one.
The library can be built with the relevant pipe-drivers
statically linked in, or loaded as shared modules.
Currently we default to static.
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio
If you have more than one omx library (libomx-radeonsi, libomx-r600),
make sure to temporary move the unused one. By the end of the series
there will be only one library that will be used for all hardware -
r600, radeonsi and nouveau.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Strictly speaking we should not have done this in the
first place, as all of the above should be static across
the system.
Currently this may cause some minor issues, which will be
resolved in the following patches, by providing a single
library for the OMX api.
Cleanup a few unneeded strcpy cases while we're around.
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio
If you have more than one omx library (libomx-radeonsi, libomx-r600),
make sure to temporary move the unused one. By the end of the series
there will be only one library that will be used for all hardware -
r600, radeonsi and nouveau.
Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The number of components and their names/roles should
be kept constant as all of that information cached.
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio.
Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Check calloc return value and report on error, also later skip
results handling if there was no memory to store results to.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This only makes any sense on the GS input or output layout declaration,
nowhere else.
Fixes the piglit tests:
* spec/glsl-1.50/compiler/incorrect-in-layout-qualifiers-with-variable-declarations.geom
* spec/glsl-1.50/compiler/incorrect-out-layout-qualifiers-with-variable-declarations.geom
* spec/glsl-1.50/compiler/layout-fs-no-output.frag
* spec/glsl-1.50/compiler/layout-vs-no-input.vert
* spec/glsl-1.50/compiler/layout-vs-no-output.vert
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we disallowed any combination of layout with interpolation,
invariant, or precise qualifiers. There is very little spec guidance on
exactly which combinations should be allowed, but with ARB_sso it's
useful to allow these qualifiers with rendezvous-by-location.
Since it's unclear exactly where the layout qualifier should appear when
combined with other qualifiers, we will allow it anywhere before the
auxiliary storage qualifier.
This allows enough flexibility for all examples I've seen, while keeping
the auxiliary-storage-qualifier / storage-qualifier pair together (as
they are a single qualifier in the spec prior to
ARB_shading_language_420pack)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa has an optimization that converts expressions like "v.x + v.y + v.z
+ v.w" into dot(v, 1.0). And therein lies the rub: the other operand to
the dot-product is always a float... even if the vector is an ivec or
uvec. This results in an assertion failure in ir_builder.
If the base type of the operand is not float, don't try the
optimization. Dot-product is not valid on integer data.
Fixes piglit vs-integer-reduction.shader_test and OpenGL ES conformance
test ES2-CTS.gtf.GL2Tests.glGetUniform.glGetUniform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
The emit_math?_gen? functions serve to implement workarounds for the
math instruction, none of which exist on Gen8+.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For the first use of a buffer, we will only need the temporary
resource in the case that a user wants to write/map to this buffer.
But in the cases where the user creates a buffer to act as an
output of a kernel, then we were creating an unneeded resource,
because it will contain garbage, and would be copied to the pool,
and destroyed when promoting.
This patch avoids the creation and copies of resources in
this case.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The important part is the change of the condition to <= 0. Otherwise the loop
gets stuck never actually growing the pool.
The change in the aux-need calculation guarantees max 2 iterations, and
avoids wasting memory in case a smaller item can't fit into a relatively larger
pool.
Reviewed-by: Bruno Jiménez <brunojimen@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
The dst pointer needs to be initialized after any calls to
compute_memory_grow_pool, as the function might change the pool->vbo pointer.
This fixes crashes and assertion failures in two gegl tests.
Reviewed-by: Bruno Jiménez <brunojimen@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
nouveau screens are reused for the same device node. However in the
scenario where we create screen 1, screen 2, and then delete screen 1,
the surrounding code might also close the original device node. To
protect against this, dup the fd and use the dup'd fd in the
nouveau_device. Also tell the nouveau_device that it is the owner of the
fd so that it will be closed on destruction.
Also make sure to free the nouveau_device in case of any failure.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79823
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@ubuntu.com>
Previously, if we had something like:
gl_ViewportIndex = idx;
for(int i = 0; i < gl_in.length(); i++) {
gl_Position = gl_in[i].gl_Position;
EmitVertex();
}
EndPrimitive();
The right viewport index would not be set on the primitive because the
last vertex is the provoking one. However blob drivers appear to move
the gl_ViewportIndex write into the for loop, allowing the application
to be ignorant of this detail.
While the application is technically wrong here, because the blob does
it and other drivers appear to implicitly work this way as well, we add
a buffer register that viewport index writes go into, which is then
exported before every EmitVertex() call.
This fixes the remaining piglit tests in ARB_viewport_array for nv50/nvc0.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The old logic would let all negative values go through unclamped, with
potentially disastrous results (probably trying to fetch viewport values
from random memory locations). GL has undefined rendering for vp indices
outside valid range but that's a bit too undefined...
(The logic is now the same as in llvmpipe.)
CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
As far as I can tell, Broadwell doesn't need any of the SURFACE_STATE
workarounds for textureGather() bugs, so there's no need to emit
a second set of identical copies.
To keep things simple, just point the gather surface index base to the
same place as the texture surface index base.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
With commit 11e46a32ae and f9ebb1ea77 we resolved the symlink
generation required by the versioning of the library.
Although they incorrectly changed the way hardlinks are created by
linking to the ones from the build tree. If the device used for
building differs from the one set as destination linking will fail.
Reported-by: Andy Furniss <adf.lists@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously the blorp blitter would only be used if the format is identical or
there is only a difference between whether there is an alpha component or not.
This patch makes it also allow the blorp blitter if the only difference is the
ordering of the RGB components (ie, RGB or BGR).
This is particularly useful since commit 61e264f4fc because Mesa now
prefers RGB ordering for textures but the window system buffers are still
created as BGR. That means that the blorp blitter won't be used for the
(probably) common case of blitting from a texture to the window system buffer.
This doesn't cause any regressions in the FBO piglit tests on Haswell. On
Sandybridge it causes the fbo-blit-stretch test to fail but that is only
because it was failing anyway before the above commit and that commit hid the
problem.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68365
Reviewed-by: Matt Turner <mattst88@gmail.com>
In file included from ../../src/glsl/builtin_functions.cpp:61:0:
../../src/glsl/glsl_parser_extras.h:154:9: warning: unused parameter 'var' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Fix an off by one in the texture unit walk during texblend
setup on gen2. This caused the last enabled texunit to be
skipped resulting in totally messed up texturing.
This is a regression introduced here:
commit 1ad443ecdd
Author: Eric Anholt <eric@anholt.net>
Date: Wed Apr 23 15:35:27 2014 -0700
i915: Redo texture unit walking on i830.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
The r600 equivalent of previous commit.
v2: Correctly include the radeon winsys/radeon_common.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Similar to vdpau targets, we're going to convert the individual
target libraries into a single one.
The library can be built with the relevant pipe-drivers
statically linked in, or loaded as shared modules.
Currently we default to static.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Similar to previous commits, this allows us to minimise some
of the duplication by compacting all vdpau targets into a
single library.
v2: Include the radeon winsys only when there is a user for it.
v3: Correcly include the winsys. Now with extra brown bag :\
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Similar to previous commit, this allows us to minimise some
of the duplication by compacting all vdpau targets into a
single library.
v2: Include the radeon winsys only when there is a user for it.
v3: Correcly include the winsys. Now with extra brown bag :\
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Create a single library (for the vdpau api) thus reducing
the overall size of mesa. Current commit converts
vdpau-nouveau, with upcomming commits handling the rest.
The library can be built with the relevant pipe-drivers
statically linked in, or loaded as shared modules.
Currently we default to static.
Add SPLIT_TARGETS to guard the other VL targets.
Note: symlink handling is rather ugly and will need an
update to work with BSD and other non-linux platforms.
v2: Split the conversion into per-target basis.
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Blob driver seems to need WFI in some cases after CP_EVENT_WRITE,
implying that this is asynchronous and should reset needs_wfi.
Also, CP_INVALIDATE_STATE seems to need WFI. But CP_LOAD_STATE
does not.
The blob driver also puts WFIs before writing GRAS_CL_VPORT registers.
The latter may be a work-around, as these registers should be banked/
context registers. I haven't yet found a lockup that this averts, but
I expect viewport to change infrequently so out of paranoia I will
keep these for now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The spec doesn't actually mention adding this, but this is the usual
pattern so I'm assuming it's a spec bug.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This extension is purely GLSL -- there are no new GL API elements.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When the last context in a share group is destroyed, the hash table
containing all of the shader programs (ctx->Shared->ShaderObjects) is
destroyed, throwing away all of the shader programs.
Using a static variable to store program IDs ends up holding on to them
after this, so we think we still have a compiled program, when it
actually got destroyed. _mesa_UseProgram then hits GL errors, since no
program by that ID exists.
Instead, store the program IDs in the context, so we know to recompile
if our context gets destroyed and the application creates another one.
Fixes es3conform tests when run without -minfmt (where it creates
separate contexts for testing each visual).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77865
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Prior to GLX 1.3 there was the glxMakeCurrent() function that took a
single drawable handle. The Drawable could be either a bare XID for a
Window or an XID for a glxpixmap.
GLX 1.3 added glxMakeContextCurrent that takes 2 handles: one for
reading, one for writing. Nowadays the old glxMakeCurrent call is
implemented as a call to glxMakeContextCurrent with the single handle
duplicated.
Because of this it is allowed to use a plain-old Window ID as an
argument to glxMakeContextCurrent, although nobody really documents this
sort of thing. The manpage for the NEW call specifies the arguments as
GLXPixmaps, but the actual code accepts Window XIDs too, and handles
them correctly.
Similarly, the glxSelectEvents function can also take a bare Window XID.
The "piglit" tests all use GLXWindows and/or GLXPixmaps. You never
tested swap events with a bare Window XID. That is what my app was
doing.
The swap_events code worked with Window XIDs in mesa 7.x.y. The new code
added in versions 8, 9, and 10 assumes that all buffer swap events have
a GLXPixmap associated with them. Because of the historical quirks
above, this is not true. Swap events for bare Window XIDs do NOT have a
glxpixmap resulting in a segfault.
Any app that uses the old school glxMakeCurrent call with a Window XID
while trying to use swap_events will crash when the libs try to lookup
the nonexistent GLXPixmap associated with the incoming swap event.
I believe that the people who wrote the spec overlooked this, because
the "sbc" field comes from the OML_sync extension that is defined in
terms of glxpixmaps only.
v2 (idr): Formatting changes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54372
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
With this we can assure that mapped buffers will never change
its position when relocating the pool.
This patch should finally solve the mapping bug.
v2: Use the new is_item_in_pool util function,
as suggested by Tom Stellard
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This function will be used when we want to map an item
that it's already in the pool.
v2: Use temporary variables to avoid so many castings in functions,
as suggested by Tom Stellard
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Acording to the OpenCL spec, it is possible to have a buffer mapped
for reading and at read from it using commands or buffers.
With this we can keep the mapping (that exists against the
temporary item) and read with a kernel (from the item we have
just added to the pool) without problems.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Now we will have a list with the items that are in the pool
(item_list) and the items that are outside it (unallocated_list)
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
These statuses will help track whether the items are mapped
or if they should be promoted to or demoted from the pool
v2: Use the new is_item_in_pool util function,
as suggested by Tom Stellard
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This patch changes completely the way buffers are added to the
compute_memory_pool. Before this, whenever we were going to
map a buffer or write to or read from it, it would get placed
into the pool. Now, every unallocated buffer has its own
r600_resource until it is allocated in the pool.
NOTE: This patch also increase the GPU memory usage at the moment
of putting every buffer in it's place. More or less, the memory
usage is ~2x(sum of every buffer size)
v2: Cleanup
v3: Use temporary variables to avoid so many castings in functions,
as suggested by Tom Stellard
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The intention of this pass was to give us better instruction scheduling
opportunities, but it unexpectedly reduced some instruction counts as
well:
total instructions in shared programs: 1666639 -> 1666073 (-0.03%)
instructions in affected programs: 54612 -> 54046 (-1.04%)
(and trades 4 SIMD16 programs in SS3)
Previously llvm detected cpu features automatically when the execution engine
was created (based on host cpu). This is no longer the case, which meant llvm
was then not able to emit some of the intrinsics we used as we didn't specify
any sse attributes (only on avx supporting systems this was not a problem since
despite at least some llvm versions enabling it anyway we always set this
manually). So, instead of trying to figure out which MAttrs to set just set
MCPU.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=77493.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
An LLVMContext should only be accessed by a single and using the global
context was causing crashes in multi-threaded environments. Now we use
a separate context for each compile.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Will be used to control the linking mode of pipe-drivers
in gallium targets.
Keep this hardcoded to static, as the pipe-drivers bare
an unstable interface which we do not want to expose to
the normal user.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- gallium_pipe_loader_winsys_libs
Will be used in upcomming commits to reduce duplication
in the build.
v2: Drop the megadriver/static_target variables.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add a couple of helpers to be used by the dri targets when
built with static pipe-drivers. Both functions provide
functionality required by the dri state-tracker.
With this patch ilo, nouveau and r300 gain support for
throttle dri configuration.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Will be used by gallium targets that statically link the
pipe-drivers in the final library. Provides identical
functionality to device_descriptor.create_screan.
v2:
- Don't sw_screen_wrap the i915/svga screen.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
If memory serves me right, at least one debug wrapper does
not return the base screen on failure.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Required for the dri state-tracker. Will be used to retrieve
driver specific configuration parameters:
- share_fd (dmabuf) capability
- throttle
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This is just a hack, it should be possible to create a temporary zeta
surface and render to that instead. However that's more complicated and
this avoids the render being entirely broken and errors being reported
by the card.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Commit ad4dc772 fixed an issue with the viewport not being restored
correctly. However it's rather hackish and confusing. Instead just mark
the viewport dirty and let the viewport validation take care of it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The extension is always supported if GLSL 1.30 is supported.
Softpipe and llvmpipe support is also added (trivial).
Radeon and nouveau support is already done.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It was set to pipe_resource::last_level and _MaxLevel was embedded in max_lod,
that's why it worked for ordinary texturing. However, min_lod doesn't have
any effect on texelFetch and textureQueryLevels, so we must still set
last_level correctly.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Such conversions (which are most likely rather pointless in practice) were
resulting in shifts with negative shift counts and shifts with counts the same
as the bit width. This was always undefined in llvm, the code generated was
rather horrendous but happened to work.
So make sure such shifts are filtered out and replaced with something that
works (the generated code is still just as horrendous as before).
This fixes lp_test_format, https://bugs.freedesktop.org/show_bug.cgi?id=73846.
v2: prettify by using build context shift helpers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
A drawable size of 0x0 means that we don't have buffers for a drawable yet,
not that we have a zero-sized buffer. Core mesa shouldn't be optimizing out
drawing based on buffer size, since the draw call could be what triggers
the driver to go and get buffers. As discussed in the referenced bug report,
the optimization was added as part of a scatter-shot attempt to fix a
different problem. There's no other example in mesa core of using the
buffer size in this way.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74005
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Type mismatch caused random memory to be copied when casted
memory area was smaller than expected type.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
because no matching fbConfigs or visuals could be found.
Nearly all the error cases in *createScreen() issue an error message to diagnose
the failure to initialize before branching to handle_error. The few remaining
error cases which don't should probably do the same.
(At the moment, it seems this can be triggered in drisw with an X server which
reports definite values for MAX_PBUFFFER_(WIDTH|HEIGHT|SIZE), because those
attributes are checked for an exact match against 0.)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
They have to be marked as structs for C code elsewhere. bblock_t is
already defined as a struct, and all of backend_instruction's fields are
public anyway.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since CSE creates instructions, if we let CSE generate things register
coalescing can't remove, bad things will happen. Only let CSE combine
non-copy load_payloads.
E.g., allow CSE to handle this
load_payload vgrf4+0, vgrf5, vgrf6
but not this
load_payload vgrf4+0, vgrf5+0, vgrf5+1
Patch adds a type check between switch init-expression and case label
and performs a implicit signed->unsigned type conversion when possible.
v2: add GLSL spec reference, do implicit conversion if possible (Matt)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79724
Reviewed-by: Matt Turner <mattst88@gmail.com>
Like on Haswell, we need to use 8x4 aligned rectangle primitives for
hierarchical depth buffer resolves and depth clears. See the comments
in brw_blorp.cpp's brw_hiz_op_params() constructor. (The Broadwell
documentation confirms that this is still necessary.)
This patch makes the Broadwell code follow the same behavior as Chad and
Jordan's Gen7 BLORP code. Based on a patch by Topi Pohjolainen.
This fixes es3conform's framebuffer_blit_functionality_scissor_blit
test, with no Piglit regressions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
We only enable HiZ for miplevels which are aligned on 8x4 blocks. When
debugging HiZ failures, it's useful to know whether a particular
miplevel is using HiZ or not.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
flags.q.local_size has 3 bits. One each for x, y and z.
Fixes piglit's:
* spec/ARB_compute_shader/linker/mismatched_local_work_sizes
* spec/ARB_compute_shader/compiler/default_local_size.comp
* spec/ARB_compute_shader/compiler/work_group_size_too_large
* spec/ARB_compute_shader/compiler/gl_WorkGroupSize_matches_layout.comp
This was regressed in 738c9c3c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, we would parse MESA_EXTENSION_OVERRIDE each time a context
was created. Now we will save the results of that parsing and use it
during context initialization.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will allow us to utilize the early MESA_EXTENSION_OVERRIDE
parsing at the later extension string initialization step.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will allow us to utilize the early MESA_EXTENSION_OVERRIDE
parsing at the later extension string initialization step.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In 25268b93, we added a new environment variable
(INTEL_COMPUTE_SHADER) to allow some constant values to be upgraded
for the ARB_compute_shader extension.
Now, we can look to see if the extension was enabled via the
MESA_EXTENSION_OVERRIDE environment variable.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
During the early one_time_init phase of context creation, we
initialize two global gl_extensions structures.
We read the MESA_EXTENSION_OVERRIDE environment variable, and store
positive and negative overrides in two structures:
* struct gl_extensions _mesa_extension_override_enables
* struct gl_extensions _mesa_extension_override_disables
These are filled before the driver initializes extensions and
constants, therefore the driver can make adjustments based on the
desired overrides.
This can be useful during development of a new extension where the
extension is only partially ready. The driver can't actually advertise
support for the extension, but if it sees that the override is set for
the extension, then it can expose more supported parts of the
extension, such as upgrading context constants.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We will add new gl_extensions structures that capture the environment
variable extension overrides and are available early in context
creation.
This will allow a driver to take actions during its initialization
based on the extension overrides.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously setting:
MESA_EXTENSION_OVERRIDE=-GL_MESA_ham_sandwich
Would cause Mesa to advertise support for the GL_MESA_ham_sandwich
extension, even though the override specifically asked for it to be
disabled.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Patch adds a preprocessor define for the extension and stores explicit
location data for uniforms during AST->HIR conversion. It also sets
layout token to be available when having the extension in place.
v2: change parser check to require GLSL 330 or enabling
GL_ARB_explicit_attrib_location (Ian)
v3: fix the check and comment in AST->HIR (Petri)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Support inactive uniforms that have explicit location set in
glUniform* functions.
v2: remove unnecessary extension check, use new define (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patch refactors the existing uniform processing so explicit locations
are taken in to account during variable processing. These locations
are temporarily stored in gl_uniform_storage before actual locations
are set.
UNMAPPED_UNIFORM_LOC marks unset location so that we can use 0 as a
valid explicit location.
When locations are set, UniformRemapTable is first populated with
uniforms that have explicit location set (inactive and active ones),
rest are put after explicit location slots.
v2: introduce define for locations that have not been set yet (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patch initializes the UniformRemapTable for explicit locations. This
needs to happen before optimizations to make sure all inactive uniforms
get their explicit locations correctly.
v2: fix initialization bug, introduce define for inactive uniforms (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function calculates the number of unique values from
glGetUniformLocation for the elements of the type.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patch adds new implementation dependent value required by the
GL_ARB_explicit_uniform_location extension. Default value for user
assignable locations is calculated as sum of MaxUniformComponents
for each stage.
v2: fix descriptor in get_hash_params.py (Petri)
v3: simpler formula for calculating initial value (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We've used the LD sampler message for pull constant loads on earlier
hardware for some time, and also were already using it for the FS on
Broadwell. This patch makes us use it for Broadwell VS/GS as well.
I believe that when I wrote this code in 2012, we still used the data
port in some cases, and I somehow neglected to convert it while
rebasing.
Improves performance in GLBenchmark 2.7 Egypt by 416.978% +/- 2.25821%
(n = 17). Many other applications should benefit similarly: this speeds
up uniform array access in the VS, which is commonly used for skinning
shaders, among other things.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
When faced with code such as:
mov vgrf31.0:UD, 960D
mov vgrf31.1:UD, vgrf30.xxxx:UD
The dead code eliminator didn't consider reg_offsets, so it decided that
the second instruction was writing was writing to the same register as
the first one, and eliminated the first one. But they're actually
different registers.
This fixes INTEL_DEBUG=shader_time for vertex shaders. In the above
code, vgrf31.0 represents the offset into the shader_time buffer where
the data should be written, and vgrf31.1 represents the actual time
data. With a completely undefined offset, results were...unexpected.
I think this is probably one of the few cases (maybe only case) where we
generate multiple MOVs to a large VGRF. Normally, we just use them as
texturing results; the other SEND-from-GRF uses a size 1 VGRF.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
"shader_time_add" is a lot more informative than "op152".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fix clang mismatched-tags warnings introduced with commit
4f5445a45d.
./glsl_symbol_table.h:37:1: warning: class 'glsl_type' was previously declared as a struct [-Wmismatched-tags]
class glsl_type;
^
./glsl_types.h:86:8: note: previous use is here
struct glsl_type {
^
./glsl_symbol_table.h:37:1: note: did you mean struct here?
class glsl_type;
^~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch fixes several clang constant-logical-operand warnings such as
the following.
../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: warning: use of logical '||' with constant operand [-Wconstant-logical-operand]
if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL)
^ ~~~~~~~~~~~
../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: note: use '|' for a bitwise operation
if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL)
^~
|
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some apps seem to give us a null sampler/view for texture slots which
come before the last used texture slot. In particular 0ad triggers
this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Gallium (but not OpenGL) does allow nesting of queries, but there's no
limit specified (d3d10 has no limit neither). Nevertheless, for practical
purposes we need some limit in llvmpipe, otherwise we'd need more complex
handling of queries as we need to keep track of all binned queries (this
only affects queries which gather data past setup). A limit of 16 is too
small though, while 64 would suffice.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The comment for _mesa_is_type_integer is confusing because it says that it
returns whether the type is an “integer (non-normalized)” format. I don't
think it makes sense to say whether a type is normalized or not because it
depends on what format it is used with. For example, GL_RGBA+GL_UNSIGNED_BYTE
is normalized but GL_RGBA_INTEGER+GL_UNSIGNED_BYTE isn't. If the normalized
comment is just a mistake then it still doesn't make much sense because it is
missing the packed-pixel types such as GL_UNSIGNED_INT_5_6_5. If those were
added then it effectively just returns type != GL_FLOAT.
That function was only used in _mesa_is_enum_format_or_type_integer. This
function effectively checks whether the format is non-normalized or the type
is an integer. I can't think of any situation where that check would make
sense.
As far as I can tell neither of these functions have ever been used anywhere
so we should just remove them to avoid confusion.
These functions were added in 9ad8f431b2.
Reviewed-by: Brian Paul <brianp@vmware.com>
That commit made possible that the items could be one just
after the other when their size was a multiple of ITEM_ALIGNMENT.
But compute_memory_prealloc_chunk still looked to leave a gap
between items. Resulting in that we got an infinite loop when
trying to add an item which would left no space between itself and
the next item.
Fixes piglit test: cl-custom-r600-create-release-buffer-bug
And the test for alignment I have just sent:
http://lists.freedesktop.org/archives/piglit/2014-June/011135.html
Sorry about this.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The code that parses LIBGL_DRIVERS_PATH was printing an
error for every attempted dlopen. It's not an error to
have to check multiple items in the path, only an error if
no suitable library is found. Reduced the load error to
a warning to match behavior of dynamic linker.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Use the has_streamout flag as we do elsewhere to check if we need
to call pipe->set_stream_output_targets(). The driver might implement
the set_stream_output_targets() function, but not for all hardware
configurations.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When a multisampled texture is used for sampling the fast clear color value
needs to be programmed into the surface state. This was being left as all
zeroes so if the surface was cleared to a value other than black then it
wouldn't work properly. This doesn't matter for single-sample textures because
in that case the MCS buffer is resolved before it is used as a texture source.
https://bugs.freedesktop.org/show_bug.cgi?id=79729
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
We only need to alter the default state if we're emitting MOVs for
header related fields. So, we can simply move the push/pop of state in
to the if (header_present) block, bypassing it in the common case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
In commit dc2d3a7f5c, Iago accidentally
moved fire_fb_write() above the brw_pop_insn_state(), which caused the
SEND to lose its predication and change from WE_normal to WE_all.
Haswell uses predicated SENDs for discards, so this broke Piglit's
tests for discards.
We want the Gen4-5 MOV to be uncompressed, unpredicated, and unmasked,
but the actual FB write itself should respect those. So, pop state
first, and force it again around the single MOV.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
Will simplify the automated conversion if we want to allow compiling the
driver for a single generation.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This makes sure to use a no-op swizzle while iteratively rendering each
level of a mipmap otherwise we may loose components and effectively
apply the swizzle twice by the time these levels are sampled.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we would emit the comparison, emit an AND to mask off extra
bits from the comparison result, then convert the result to float. Now,
do the comparison, then use a cleverly constructed SEL to pick either
0.0f or 1.0f.
No piglit regressions on Ivybridge.
total instructions in shared programs: 1642311 -> 1639449 (-0.17%)
instructions in affected programs: 136533 -> 133671 (-2.10%)
GAINED: 0
LOST: 0
Programs that are affected appear to save between 1 and 5 instuctions
(just by skimming the output from shader-db report.py.
v2: s/b2i/b2f/ in commit subject (noticed by Chris Forbes). Remove
extraneous fix_3src_operand (suggested by Matt). The latter change
required swapping the order of the operands and using predicate_inverse.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes the limits for GL 3.2, and subsequently fixes
some segfaults in some varying packing tests and max varying tests
after the limits bumped.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for GL 3.2 layered rendering to softpipe.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the layer info to the tile cache.
This changes clear_flags to be dynamically allocated as
MAX_LAYERS seems like a too big step.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This passes the piglit depth clamp tests.
this is required for GL 3.2.
v2: move min/max up one level, could go further, thanks
to Roland for suggestion.
v1: Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This limits the number of emitted vertices to the shaders max output
vertices, and avoids us writing things into memory that isn't big
enough for it.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add Intel driver hook for glGetTexImage to accelerate the case of reading
texture image into a PBO. This case gets huge performance gains by using
GPU BLIT directly to PBO rather than GPU BLIT to temporary texture followed
by memcpy.
No regressions on Piglit tests with Intel driver.
Performance gain (1280 x 800 FBO, Ivybridge):
glGetTexImage + glMapBufferRange with patch 1.45 msec
glGetTexImage + glMapBufferRange without patch 4.68 msec
v3: (by Kenneth Graunke)
- Fix compile after Eric's change to drop the tiling argument
to intel_miptree_create_for_bo.
- Add GL_TEXTURE_3D to blacklisted texture targets to prevent Piglit
regressions.
- Squash in several whitespace and coding style fixes.
When walking backwards, we want to stop at the head sentinel, which is
where scan_inst->prev->prev == NULL, not scan_inst->prev == NULL.
Fixes random crashes, as well as valgrind errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Giving the meta clear program a meaningful name makes it easier to find
in output such as INTEL_DEBUG=fs or INTEL_DEBUG=shader_time. We already
did so for integer programs, but neglected to label the primary program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These used to call different math emitters (brw_math vs. brw_math2).
Now that they both call gen6_math, they're virtually identical.
When unrolling SIMD16 to multiple SIMD8 operations, we should take care
not to apply sechalf to brw_null_reg for src1. Otherwise, we'd end up
with BRW_ARF_NULL + 1 as the register number, and I'm not sure if that's
valid.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
These functions are basically identical, so we should combine them.
However, they're so trivial, we may as well just fold them into their
only call sites.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
These are trivial to combine: we should just avoid checking the second
operand if it's brw_null_reg.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Usually, I try to use "brw" for functions that apply to all generations,
and "gen4" for dead end/legacy code that is only used on Gen4-5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Our existing functions, brw_math and brw_math2, had unclear roles:
Gen4-5 used brw_math for both unary and binary math functions; it never
used brw_math2. Since operands are already in message registers, this
is reasonable.
Gen6+ used brw_math for unary math functions, and brw_math2 for binary
math functions, duplicating a lot of code. The only real difference was
that brw_math used brw_null_reg() for src1.
This patch improves brw_math2's assertions to allow both unary and
binary operations, renames it to gen6_math(), and drops the Gen6+ code
out of brw_math().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Thread switching on control flow instructions is a documented workaround
for Gen4-5 errata. As far as I can tell, it hasn't been needed since
Sandybridge. Thread switching is not free, so in theory this may help
performance slightly.
Flow control instructions with the "switch" flag cannot be compacted, so
removing it will make these instructions compactable. (Of course, we
still have to implement compaction for flow control instructions...)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
total instructions in shared programs: 2081469 -> 2081248 (-0.01%)
instructions in affected programs: 22606 -> 22385 (-0.98%)
No programs were hurt by this patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Only function-defs use glsl_type so forward declare instead.
Compile-tested on my Ivy-bridge system.
IWYU also suggests removing #include <new>, and this compiles fine.
I'm not familiar enough with memory management in C/C++ that I feel
comfortable removing this. Insights would be appreciated.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Found with IWYU. Comment says it's for struct gl_extensions.
Grepping for gl_extensions shows no uses.
Tested by compiling on my Ivy-bridge system.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Found with IWYU, compile-tested on my Ivy-bridge system.
This is not used in the header, and is included in the source.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Found with IWYU, confirmed with grepping for "hash" and "symbol".
No negative effects on compilation.
IWYU also reported core.h and linker.h could be removed,
but I'm unsure if those are false positives.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
This fixes an issue when running cl-program-bitcoin-phatk
piglit test where some of the inputs have negative values
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Now, items whose size is a multiple of 1024 dw won't leave
1024 dw between itself and the following item
The rest of the cases is left as it was
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Removed compute_memory_defrag declaration because it seems
to be unimplemented.
I think that this function would have been the one that solves
the problem with fragmentation that compute_memory_finalize_pending has.
Also removed comments that are already at compute_memory_pool.c
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Explanation of the changes, as requested by Tom Stellard:
Let's take need after is calculated as
item->size_in_dw+2048 - (pool->size_in_dw - allocated)
BEFORE:
If need is positive or 0:
we calculate need += 1024 - (need % 1024), which is like
cealing to the nearest multiple of 1024, for example
0 goes to 1024, 512 goes to 1024 as well, 1025 goes
to 2048 and so on. So now need is always possitive,
we do compute_memory_grow_pool, check its output
and continue.
If need is negative:
we calculate need += 1024 - (need % 1024), in this case
we will have negative numbers, and if need is
[-1024:-1] 0, so now we take the else, recalculate
need as need = pool->size_in_dw / 10 and
need += 1024 - (need % 1024), we do
compute_memory_grow_pool, check its output and continue.
AFTER:
If need is positive or 0:
we jump the if, calculate need += 1024 - (need % 1024)
compute_memory_grow_pool, check its output and continue.
If need is negative:
we enter the if, and need is now pool->size_in_dw / 10.
Now we calculate need += 1024 - (need % 1024)
compute_memory_grow_pool, check its output and continue.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
In this case, NULL checks are added to compute_memory_grow_pool,
so it returns -1 when it fails. This makes necesary
to handle such cases in compute_memory_finalize_pending
when it is needed to grow the pool
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Always default to --enable-driglx-direct, now that will build driswrast, but
won't try to use dri[123] on platforms which don't have that.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some untangling to fix building in the dri_platform=none, --enable-driglx-direct
case, where only driswast can be used.
Turn the test for including the glXGetScreenDriver()/glXGetScreenDriver()
interface used by xdriinfo from !GLX_USE_APPLEGL into a positive form, as it is
only useful when dri_platform=drm
Add additional GLX_USE_DRM tests so DRI[123] renderers are only used when
dri_platform=drm
Note that swrast and indirect must still be disabled in the APPLEGL case at the
moment, which makes things more complex than they need to be. More untangling
is needed to allow that
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Originally all hardware drivers duplicate the driver_name string
from an external source, while for the software rasterizer we set
it to "swrast". Follow the example set by hw drivers this way
we can free the string at dri2_terminate().
v2: Use strdup over strndup. Suggested by Ilia Mirkin.
v3: Handle platform_drm in a similar manner. Cleanup swrast
driver_name in error path.
Cc: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This fixes:
In file included from
/home/adrian/workspace/mesa/mesa-master.git/src/mesa/vbo/vbo_exec_api.c:445:0:
/home/adrian/workspace/mesa/mesa-master.git/src/mesa/vbo/vbo_attrib_tmp.h:28:38:
fatal error: util/u_format_r11g11b10f.h: No such file or directory
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Adrian Negreanu <adrian.m.negreanu@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Fixes linker error:
ld:
.../libmesa_dri_common_intermediates/libmesa_dri_common.a(dri_util.o):
in function globalDriverAPI:dri_util.c(.data.rel+0x0): error:
undefined reference to 'driDriverAPI'
As an example, you can see that mesa_dri_drivers
also uses common/libmegadriver_stub (src/mesa/drivers/dri/Makefile.am)
The _stub part might be confusing, but
it actually provides the dri-driver shared lib constructor,
megadriver_stub_init, which will later on load the real
platform dependent part and call
l __driDriverGetExtensions_<platform>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Adrian Negreanu <adrian.m.negreanu@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
The negation source modifier on src registers has changed meaning in Broadwell when
used with logical operations. Don't copy propagate when negate src modifier is set
and when the destination instruction is a logical op.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
The negation source modifier on src registers has changed meaning in Broadwell when
used with logical operations. Don't copy propagate when negate src modifier is set
and when the destination instruction is a logical op.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
We've been allowing `centroid` and `sample` in all kinds of weird places
where they're not valid.
Insist that `sample` is combined with `in` or `out`;
and that `centroid` is combined with `in`, `out`, or the deprecated
`varying`.
V2: Validate this in a more sensible place. This does require an extra
case for uniform blocks members and struct members, though, since they
don't go through the normal path.
V3: Improve error message wording; eliminate redundant error generation
for inputs in VS or outputs in FS.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Threads must terminate with a SEND message to a particular shared function,
such as a URB write or FB write, so the instruction stream really shouldn't
ever end in an IF/ELSE/ENDIF or similar block structure.
However, if the instruction stream (incorrectly) ends in a block structure
the last block's end pointer will not be set, leading to a crash later on in
fs_live_variables::setup_def_use(). It is better to detect this earlier, so
assert on that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In Gen < 6 the hardware generates a runtime bit that indicates whether AA data
has to be sent as part of the framebuffer write SEND message. This affects the
specific case where we have setup antialiased line rendering and we render
polygons which have one face setup in GL_LINE mode (line antialiasing
will be used) and the other one in GL_FILL mode (no line antialiasing needed).
Currently we are not doing this runtime test and instead we always send AA
data, which produces incorrect rendering of the GL_FILL face of the polygon in
in the aforementioned scenario (verified in ironlake and gm45).
In Gen4 this is, likely, a regression introduced with commit 098acf6c84. In
Gen5 this has never worked properly. Gen > 5 are not affected by this.
The patch fixes the problem by adding the appropriate runtime check and
adjusting the framebuffer write message accordingly in the conflictive
scenario.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This logic is reusable across CompressedTex*Image* and
GetCompressedTexImage; the strides calculated will also be needed
in the PBO validation functions to ensure that the referenced range of
bytes is valid.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Move _mesa_error call for INVALID_VALUE to one place.
Remove checks for previous value matching -- this was important when we
were flushing vertices before the update, but that hasn't happened for a
long time now.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A recent ApiTrace change, that tries to dump more buffer state
causes Mesa from my distro (10.1.4) to segfaults here.
I haven't actually confirm this fixes it (I can't repro on master),
but it seems a good idea to be defensive here anyway.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit f3cb2e6ed7.
brw_land_fwd_jump() is convenient wherever we produce JMPI instructions
and we will use JMPI to implement framebuffer writes that involve line
antialiasing in gen < 6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm making a lot of changes to this area, and I figured I may as well
not conflate these trivial changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
With my earlier cleaning in place (see git log brw_eu_emit.c), nothing
relies on the instruction emitters for IF/WHILE/JMPI disabling
predication. Drop it in favor of making callers do the right thing
explicitly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There is a short-immediate version as well, but it should never end up
getting used since it would have gotten folded earlier.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Match the behavior of the SCons MinGW build.
This patch also fixes these build errors.
CC glapi_entrypoint.lo
glapi_entrypoint.c: In function 'init_glapi_relocs_once':
glapi_entrypoint.c:341:4: error: unknown type name 'pthread_once_t'
static pthread_once_t once_control = PTHREAD_ONCE_INIT;
^
glapi_entrypoint.c:341:41: error: 'PTHREAD_ONCE_INIT' undeclared (first use in this function)
static pthread_once_t once_control = PTHREAD_ONCE_INIT;
^
glapi_entrypoint.c:341:41: note: each undeclared identifier is reported only once for each function it appears in
glapi_entrypoint.c:342:4: error: implicit declaration of function 'pthread_once' [-Werror=implicit-function-declaration]
pthread_once( & once_control, init_glapi_relocs );
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This code was originally introduced to fix https://bugs.freedesktop.org/show_bug.cgi?id=53617. The comment says you need to pass NULL in order to unref old views however cso_set_sampler_views() already takes care of old views with the second for loop. Also as of 2355a64414 cso_set_sampler_views() passes the max of the old and new views to the driver for all state trackers making this code obsolete.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
The nouveau fw currently prints a bunch of errors. No point in seeing
those all the time, esp since compute doesn't really work in the first
place.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The new hardware actually supports this OpenGL 1.x feature natively,
so we can finally drop our shader workarounds.
Not many applications use GL_CLAMP, and most use it unintentionally, but
it's trivial to do right, so we should.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
When dereferencing an element of gl_SampleMaskIn[], the source register
here will be a HW_REG rather than a VGRF because the payload slot is
now exposed directly.
Fixes an assertion failure in the Piglit test:
tests/spec/arb_gpu_shader5/execution/samplemaskin-basic
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At least on MSVC we statically link against the CRT, so we must disable
the CRT message boxes if we want unattended testing.
The messages are convenient when running manually, so let them be if the
system error message boxes are not disabled.
The ARB_gpu_shader5 spec says:
"To determine whether the conversion for a single argument in one match is
better than that for another match, the following rules are applied, in
order:
1. An exact match is better than a match involving any implicit
conversion.
2. A match involving an implicit conversion from float to double is
better than a match involving any other implicit conversion.
3. A match involving an implicit conversion from either int or uint to
float is better than a match involving an implicit conversion from
either int or uint to double.
If none of the rules above apply to a particular pair of conversions,
neither conversion is considered better than the other."
V3: Add spec citation, including oddball difference between gs5 and GLSL
4.0; comment a bit better as per Jordan's suggestions.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This will facilitate GLSL 4.0 / ARB_gpu_shader5's enhanced overload
resolution rules, and also possibly better error reporting for ambiguous
function calls.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
V2: Fix crashes during linking, where the parse state is NULL. In this
case, all required checks have already been done, so we assume the
extension is enabled.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The available implicit conversions depend on the GLSL version we're
compiling.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're about to add new implicit conversions, first for ARB_gpu_shader5,
and then later for ARB_gpu_shader_fp64. Pull out the opcode
determination into its own function, and get rid of the bool -> float
case that could never be hit anyway [since it fails the is_numeric()
check].
V2: Retain the vector width mangling. It turns out this is necessary for
the conversions done (and then thrown away) when determining the return
type of arithmetic operators.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This works like glsl-1.20+'s invariant redeclarations, but with fewer
restrictions, since `precise` is allowed on pretty much anything.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There are several common ways to check whether an object is a particular
subclass: dynamic_cast<>, the as_subclass() pattern, or explicit enum
tags. We originally used the virtual as_subclass methods, but later
added enum tags as they are much nicer for debugging.
Since we have the enum tags, we don't necessarily need to use virtual
functions to implement the as_subclass() methods. We can just check the
tag and return the pointer or NULL.
This saves 18 entries in the vtable, and instead of two pointer
dereferences per as_subclass() call most are only three inline
instructions.
Compile time of sam3/112.frag (the longest compile in a recent shader-db
run) is reduced by 5% from 348 to 329 ms (n=500).
perf stat of this workload shows:
24.14% reduction in iTLB-loads: 285,543 -> 216,606
42.55% reduction in iTLB-load-misses: 18,785 -> 10,792
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Now that the constructors set a type, ir_type_unset is not very useful.
Move it to the end of the enum (specifically out of position 0) so that
enums checks for dereferences and rvalues can save an instruction.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Makes checking whether an object is an ir_dereference, an ir_rvalue, or
an ir_jump simpler. Since ir_dereference is a subclass or ir_rvalue,
list its subtypes first so that they can both generate nice code.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This has the added perk that if you forget to set ir_type in the
constructor of a new subclass (or a new constructor of an existing
subclass) the compiler will tell you... instead of relying on
ir_validate or similar run-time detection.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
We have customers using NULL as a way to test the robustness of the API.
Without this check, EGL will segfault trying to dereference
dri2_surf->wl_win->private because wl_win is NULL.
This fix adds a check and sets EGL_BAD_NATIVE_WINDOW
v2: Incorporated feedback from idr - moved the check to a higher level
function.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
For some reason, CP DMA doesn't follow the predicate bit if I enable it,
so this is the only option.
This fixes piglit: spec/NV_conditional_render/clear
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Same as b026b6bbfe, but
COLOR_ARRAY_SIZE/SECONDARY_COLOR_ARRAY_SIZE.
Ideally we wouldn't munge the incoming state, so that we wouldn't need
to unmunge it back on glGet*. But the array size state is copied and
referred in many places, many of which couldn't take an GLenum like
GL_BGRA instead of a plain integer. So just hack around on glGet*,
to ensure there is no risk of introducing regressions elsewhere.
This bug causes problems to Apitrace, resulting in wrong traces. See
https://github.com/apitrace/apitrace/issues/261 for details.
Tested with piglit arb_vertex_array_bgra-get, which was created for this
purpose.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
These were missed in commit e374809819.
Fixes 'make check'.
CC test_eu_compact.o
test_eu_compact.c: In function ‘gen_f0_0_MOV_GRF_GRF’:
test_eu_compact.c:222:4: error: implicit declaration of function ‘brw_set_predicate_control’ [-Werror=implicit-function-declaration]
brw_set_predicate_control(p, true);
^
test_eu_compact.c: In function ‘run_tests’:
test_eu_compact.c:270:6: error: implicit declaration of function ‘brw_set_access_mode’ [-Werror=implicit-function-declaration]
brw_set_access_mode(p, BRW_ALIGN_16);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Eventually we're going to use functions to set bits on an instruction.
Putting 'default' in the name of functions that alter default state will
help distinguins them.
This patch was generated entirely mechanically, by the following:
for file in brw*.{cpp,c,h}; do
sed -i \
-e 's/brw_set_mask_control/brw_set_default_mask_control/g' \
-e 's/brw_set_saturate/brw_set_default_saturate/g' \
-e 's/brw_set_access_mode/brw_set_default_access_mode/g' \
-e 's/brw_set_compression_control/brw_set_default_compression_control/g' \
-e 's/brw_set_predicate_control/brw_set_default_predicate_control/g' \
-e 's/brw_set_predicate_inverse/brw_set_default_predicate_inverse/g' \
-e 's/brw_set_flag_reg/brw_set_default_flag_reg/g' \
-e 's/brw_set_acc_write_control/brw_set_default_acc_write_control/g' \
$file;
done
No manual changes were done after running that command.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This removes the ability to set the default conditional modifier on all
future instructions. Nothing uses it, and it's not really a sensible
thing to do anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
With the predication changes eliminated, all this does is set the
conditional modifier on a single instruction. Doing that directly is
easy, and avoids mucking about with default state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_set_conditionalmod and brw_next_insn work together to set the
conditional modifier for the next instruction, then turn it off.
The Gen8+ generators don't implement this: we just set it for all future
instructions, and whack it for each fs_inst/vec4_instruction.
Both approaches work out because we only set conditional_mod on
IR instructions like CMP, AND, and so on, which correspond to exactly
one assembly instruction. The Gen8 generators would break if we had
an IR instruction that generated multiple instructions, and the Gen4-7
EU emit layer would do...something.
To safeguard against this, assert that we only generated one instruction
if conditional_mod is set, and just set the flag directly on that
instruction rather than altering default state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_set_conditionalmod has traditionally been complex: it causes
conditionalmod to be set for the next instruction, and then predication
to be set on all future instructions after that.
We may want to generate a flag condition and not use it immediately,
due to instruction scheduling or the like. Even if not, it's easy
to set things explicitly, and that's clearer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_CMP already takes a conditional modifier as a parameter, and sets it
accordingly. brw_set_conditionalmod() also makes everything after the
next instruction predicated, but we don't need that: we always emit an
IF instruction after load_clip_distance(), and that's already
predicated.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It wasn't too bad before, but the macro is going to be nicer once I
start modifying a lot more instructions in this pattern.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Often times, we want to emit an instruction, then set one field on it,
such as predication or a conditional modifier. Normally, we'd have to
declare "struct brw_instruction *inst;" and then use "inst =
brw_FOO(...)" to emit the instruction, which can hurt readability.
The new "brw_last_inst" macro refers to the most recently emitted
instruction, so you can just do:
brw_ADD(...)
brw_last_inst->header.predicate_control = BRW_PREDICATE_NORMAL;
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We use both predicated and unconditional JMPI instructions. But in each
case, it's clear which we want. It's simpler to just specify it as a
parameter, rather than relying on default state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In all cases, we set both dst and src0 to brw_ip_reg(). This is no
accident: according to the ISA reference, both are required to be the IP
register. So, we may as well drop the parameters.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
EGL 1.4 Specification says that
eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT)
can be used to release the current thread's ownership on the surfaces
and context.
MESA's egl implementation was only accepting the parameters when the
KHR_surfaceless_context extension is supported.
[chadv] Add quote from the EGL 1.4 spec.
Cc: "10,1, 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2 get rid of magic value, use DEFINES
v3 update clip_disable together with vs_position_window_space
Big thanks to Marek Olšák!
Signed-off-by: David Heidelberger <david.heidelberger@ixit.cz>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The fs_reg src array is going to turn into a pointer and we'd rather not
consider the implications of shallow copying fs_insts.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Will get more complicated when fs_reg src becomes a pointer.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Running shader-db with INTEL_DEBUG=noann reduces the runtime
from ~90 to ~80 seconds on my machine. It also reduces the disk space
consumed by the .out files from 660 MB (676 on disk) to 343 MB (358 on
disk).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With INTEL_DEBUG=optimizer, write the output of dump_instructions() to a
file each time an optimization pass makes progress. This lets you easily
diff successive files to see what an optimization pass did.
Example filenames written when running glxgears:
fs8-0000-00-start
fs8-0000-01-04-opt_copy_propagate
fs8-0000-01-06-dead_code_eliminate
fs8-0000-01-12-compute_to_mrf
fs8-0000-02-06-dead_code_eliminate
| | | |
| | | `-- optimization pass name
| | |
| | `-- optimization pass number in the loop
| |
| `-- optimization loop interation
|
`-- shader program number
Note that with INTEL_DEBUG=optimizer, we disable compact_virtual_grfs,
so that we can diff instruction lists across loop interations without
the register numbers being changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow debugging code to dump the IR after an optimization pass
makes progress (the next patch). Only let it open and write to a file if
the effective user isn't root.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use function overloading rather than default arguments, since gdb
doesn't know about default arguments.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This made sense when swizzled storage layout was used for rendering to tiles.
But nowadays the name just adds confusion (and makes for long lines).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Framebuffers can have NULL attachments since a while. llvmpipe handled
that properly for lp_rast_shade_quads_mask but it seems the change didn't
make it to lp_rast_shade_tile.
This fixes piglit fbo-drawbuffers-none test (though I need to increase
the FB_SIZE from 32 to 256 so the tris cover some tiles fully).
https://bugs.freedesktop.org/show_bug.cgi?id=79421
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes this build error with icc 14.0.2.
In file included from state_tracker/st_glsl_to_tgsi.cpp(63):
../../src/gallium/auxiliary/util/u_math.h(583): error: identifier "__builtin_clrsb" is undefined
return 31 - __builtin_clrsb(i);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
mesaVisual can be NULL with configless context since this commit:
commit 551d459af4
Author: Neil Roberts <neil@linux.intel.com>
Date: Fri Mar 7 18:05:47 2014 +0000
Add the EGL_MESA_configless_context extension
...
Previously the i965 and i915 drivers were explicitly creating a zeroed visual
whenever 0 is passed for the EGLConfig.
We attempt to dereference the visual in i915 and now we don't create a
zeroed-out one one it crashes, breaking at least weston in an i915. There's
no point in doing so as it would be zero anyway.
v2: Fixed a typo in commit message. Added some tags.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1100967
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These prototypes are necessary because GLES1 library builds will create
dispatch functions for them. We can't directly include GLES/gl.h
because it would conflict the previously-included GL/gl.h. Since GLES1
ABI is not expected to every add more functions, the path of least
resistance is to just duplicate the prototypes for the functions that
aren't already in desktop OpenGL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79294
Acked-by: Matt Turner <mattst88@gmail.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The math instruction was Align1-only on Gen6 and we never updated this
to let it use Align16 features like writemasking on newer platforms.
total instructions in shared programs: 1686120 -> 1685507 (-0.04%)
instructions in affected programs: 48593 -> 47980 (-1.26%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should print output both for debug and release builds.
Suggested by Jose.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
wglCreateContextAttribsARB() didn't work previously since it returned
a context ID that wasn't allocated by OPENGL32.DLL. So if that context
ID was later passed to wglMakeCurrent(), etc. it was rejected.
Now when wglCreateContextAttribsARB() is called we actually call
wglCreateContext() in order to get a valid context ID. Then we
replace the context data which was created with new context data
which reflects the arguments passed to wglCreateContextAttribsARB().
If there were a DrvCreateContextAttribs() function in the ICD this
work-around wouldn't be necessary.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Conflicts:
src/gallium/state_trackers/wgl/stw_ext_extensionsstring.c
src/gallium/state_trackers/wgl/stw_getprocaddress.c
If the assertion fails, it means something is really broken. Before,
if this happened we reverted to the GDI renderer without any warning.
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
To reflect our actual SwapBuffers implementation. See
stw_st_swap_framebuffer_locked(). This fixes various rendering issues
with SolidEdge.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just happened to stumble across this registry key while debugging
something else.
This technique is much neater than trying to override opengl32.dll.
Also a few minors cleanups.
If texObj == NULL here it mean there is already GL_INVALID_VALUE
or GL_OUT_OF_MEMORY error set to context.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Introduce a simple PCI identification method of looking up the answer
the /sys filesystem (available on Linux). Attempted after libudev, but
before DRM.
Disabled by default (available only when the --enable-sysfs configure
option is specified).
Signed-off-by: Gary Wong <gtw@gnu.org>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
loader_get_pci_id_for_fd() and loader_get_device_name_for_fd() now attempt
all available strategies to identify the hardware, instead of conditionally
compiling in a single test. The existing libudev and DRM approaches have
been retained, attempting first libudev (if available) and then DRM (if
necessary).
Signed-off-by: Gary Wong <gtw@gnu.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The build fails with implicit delaration of drmGetCap (xf86drm.h)
Were we're including the header only when building the DRM_PLATFORM.
Wayland backend can operate without DRM_PLATFORM so replace the
guard, and fold in drmGetCap() usage to silence compiler warnings.
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
There is no reason anymore to load with RTLD_GLOBAL and for some driver
this even result in dlclose failing to unload leading to catastrophic
failure with swrast fallback.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
This way, when someone modifies create_test_cases.py and forgets to
commit their changes again, people will notice.
v2: make sure we parse the right directories and check for existance the
right way.
v3 (Ken): Use $PYTHON2 instead of calling python directly.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In 088494aa (as well as other commits in the series) Paul Berry modified
the tests for lower_jumps to account for the fact that the s-expression
for the loop IR instruction changed from
(loop () () () () (statements...)) to (loop (statements...)), but he
forgot to update create_test_cases.py which he used to create the tests.
Fix that, so that now create_test_cases.py is synced with the generated
tests.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Make sure that we print the same number of digits when printing 0.0 as
any other floating-point number. This will make generating expected
output files for tests easier. To avoid breaking "make check," update
the generated tests for lower_jumps before the next commit which will
bring create_test_cases.py in line with them.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The call to get_variable_being_redeclared() may delete 'var' so we
can't reference var->name afterward. We fix that by examining the
var's name before making that call.
Fixes valgrind warnings and possible crash when running the piglit
tests/spec/glsl-1.30/execution/clipping/vs-clip-distance-in-param.shader_test
test (and probably others).
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we set up new entries in the params[] array on every access
of a rectangle texture. Unfortunately, we only reserve space for
(2 * MaxTextureImageUnits) extra entries, so programs which accessed
rectangle textures more times than that would write off the end of the
array and likely crash.
We don't really have a decent mapping between the index returned by
_mesa_add_state_reference and our index into the params array, so we
have to manually search for it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78691
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
Fixes framebuffer_blit_functionality_multisampled_to_singlesampled_blit
es3 cts test on bdw. Also fixes this on ivb when ivb is forced to use
the meta path.
No piglit regressions on IVB.
Further input from Ken:
"Unfortunately, this doesn't fix MRT for integer data.
In the single-sampled case, since we're directly copying data, we were
read/copy/write data as "float" values, which actually contained the
integer bits. Here, we can't do that since we need to process the
actual integer data.
I do wonder if we could use intBitsToFloat/uintBitsToFloat to stuff the
integer bits in the float gl_FragColor output. Just a crazy idea.
In the long term (post 10.2), I think we should draft an extension that
allows you to do "layout(location = all)" on user-defined fragment
shader outputs. (Or some similar syntax.)"
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
GK20A is mostly compatible with GK104, but uses the SM35 ISA. Use
the GK110 path when this chip is detected.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
GK20A is mostly compatible with GK104, but features a new 3D
class. Add it to the relevant header and use it when GK20A is
detected.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Each of the subroutine emitters alter the predication state, but
otherwise don't change anything (or put it back when they do).
Resetting predication at the end makes these functions idempotent with
regard to the default instruction state - which is a nice property.
With that in place, push/pop is no longer necessary.
v2: Improve whitespace (requested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_MOV doesn't alter the default instruction state, so this does
nothing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_JMPI sets predicate_control to BRW_PREDICATE_NONE, but that's
already the value coming in. Otherwise, nothing changes state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This field is only used to track the current value of the flag register
during the SF compile. It has no place in the common compiler code.
While we're changing every call, drop the 'brw' prefix from the function
since it's static.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only the Gen4-5 SF program compiler actually uses this function; move
it there. Soon the fields will be moved out of brw_compile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There's no point in pushing and popping the default state; the code
between the two stack operations doesn't alter anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
None of the assembly emitters called between push and pop actually
change the state. So, we can drop these.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, brw_CMP with a null destination implicitly set the default
state to make future instructions predicated. This is messy and
confusing - emitting a CMP that populates the flag register and later
using it to predicate instructions are logically separate. With the
main compiler, we may even schedule instructions between the CMP and the
user of the flag value.
This patch simplifies brw_CMP to just emit a CMP instruction, and not
mess with predication. It also updates all necessary callers. These
mostly fell into two patterns:
1. brw_CMP followed by brw_IF.
We don't need to do anything special here; brw_IF already sets up
predication appropriately.
2. brw_CMP followed by a single predicated instruction.
The old model was to call brw_CMP, emit the next (predicated)
instruction, then disable predication for any instructions beyond
that. Instead, just explicitly set predicate_control on the single
instruction we want to predicate. It's no more code, and requires
less cross-module knowledge.
This drops setting flag_value to 0xff as well, which is a field only
used by the SF compile. There is only one brw_CMP call in the SF code,
which is in do_twoside_caller, and called at the start of
brw_emit_tri_setup, where flag_value is already 0xff.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Presumably, this was to reset the default state of predication_control
from brw_CMP. But brw_CMP only sets that if dst is ARF null, which it
isn't here.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When compiling any of the SF program variants, flag_value starts off as
0xff and will be modified when generating code.
brw_emit_anyprim_setup emits several subroutines, saving and restoring
flag_value across each of them. Since it starts out as 0xff, this is
equivalent to simply setting it to 0xff at the start of each subroutine.
Resetting the value makes more logical sense; each subroutine doesn't
know whether one of the others even executed, much less what it did
to the flag register.
This also lets us to drop the brw_set_predicate_control_flag_value call
from brw_init_compile: predicate is already initialized to
BRW_PREDICATE_NONE by the memset, and the value of flag_value is
irrelevant (as it's only used by the SF compiler).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit af38ef907, I added a "fix" to color outputs not being assigned
correctly when sample mask was being output. This was totally wrong --
the color indices (i.e. "si" values) were the ones that were wrong. Undo
that hunk.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Commit c5d822dad9 added support for sample mask incorrectly. It became
treated as a color output, and messed up the color output indices.
Revert the hunk that did that, and add explicit support just like for
depth/stencil writes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
We had a handful of cases where we'd used brw_imm_*() to generate an
immediate, rather than fs_reg(). We shouldn't do that but we shouldn't
limit scheduling flexibility on account of immediate arguments either.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Using brw_imm_* creates a source with file=HW_REG, and the scheduler
inserts barrier dependencies when it sees HW_REG. None of these are
hardware-registers in the sense that they're special and scheduling
shouldn't touch them. A few of the modified cases already have HW_REGs
for other sources, so it won't allow extra flexibility in some cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Leave only the gl/glx and mangled gl symbols.
XMesa* was never an official interface and the only
user of it was mesa-demos, while they were still in
the same repo as mesa.
v2: Conditionally use the version-script.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In the presence of LLVM the final library exports every symbol from
the llvm namespace. Resolve this by using a version script (w/o the
version/name tag).
Considering that there are only ~35 symbols, explicitly list them
to minimize the chances of rogue symbols sneaking in.
v2: Conditionally include the version-script.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The symbol was added with commit 45e2b51c853(DRI2/GLX: check for
vblank_mode in DRI2 GLX code) but was never used as such according
to git log.
Possibly it was marked as public due to confusion with
__driConfigOptions which was used for dri1 drivers.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Without this, I get linking failures (static linking).
The static linking is sort of required for me, because otherwise Steam and
applications using the Steam runtime regularily fail because my LLVM was
compiled and linked against a newer libgcc_s, libstdc++, etc. and uses
features from those newer versions. And instead of Steam just not
starting, my X starts crashing, whenever libGL fails to load a (32 bit)
driver.
Since I hate crashes of X and I don't think Valve/Steam will behave like
a proper distribution soon (rebuilds versus current Debian Testing, since
they base their Steam OS off that), I need a radeonsi which carries its
own LLVM within and doesn't care about what the runtime sets. This means
linking Mesa statically.
v1 → v2: Move logic to configure.ac
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
The profiles are present depending on the defines at build time.
Drop the extra functions and feed the defines directly into the
state-tracker at build time.
v2: Drop unused variable i.
Acked-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
... rather than the one defined in our internal interface (dri_interface.h)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If we make ann_count non-zero, annotation_finalize() won't bail.
Not modifying it seems to make the code more clear than would modifying
annotation_finalize().
This reverts commit a6860100b8.
Why this code didn't work in all circumstances is unknown and without a
working Ironlake simulator (which uses a different AUB format) we'll
probably never know, short of a lot of experimentation, and spending a
bunch of time to try to optimize a few instructions on Ironlake is not
time well spent.
Moreover, for mix(vec4, vec4, vec4) using the accumulator introduces a
dependence between the otherwise independent per-component calculations.
Not using the accumulator, even if it means an extra instruction per
component might be preferable. We don't know, we don't have data, and
we don't have the necessary register on Ironlake for shader_time to tell
us.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77707
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 2dfbbeca50 with the
comment about MAC and implicit accumulator removed.
Why this code didn't work in all circumstances is unknown and without a
working Ironlake simulator (which uses a different AUB format) we'll
probably never know, short of a lot of experimentation, and spending a
bunch of time to try to optimize a few instructions on Ironlake is not
time well spent.
Moreover, for mix(vec4, vec4, vec4) using the accumulator introduces a
dependence between the otherwise independent per-component calculations.
Not using the accumulator, even if it means an extra instruction per
component might be preferable. We don't know, we don't have data, and
we don't have the necessary register on Ironlake for shader_time to tell
us.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77703
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Note the weirdness with src1 subregs. The compacted immediate fields are
uncompacted to bits [127:96] and the high five bits of the subreg
mapping maps to bits [100:96].
Number of compacted instructions: 790085 -> 817752 (3.50%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Also perform arithmetic on char* rather than void* since the latter is a
GNU C extension not available in C++.
Reviewed-by: Eric Anholt <eric@anholt.net>
That we were comparing its return value with offsets should have been a
clue. :)
Make it take a void *store in preparation for making the function useful
elsewhere.
Reviewed-by: Eric Anholt <eric@anholt.net>
... to tell us whether it emitted any code. Will be used to determine
whether we need to skip an annotation for it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Will be used to print disassembly after jump targets are set and
instructions are compacted, while still retaining higher-level IR
annotations and basic block information.
An array of 'struct annotation' will live along side the generated
assembly. The generators will populate the array with their IR
annotations, and basic block pointers if the instructions began or ended
a basic block pointer.
We'll then update the instruction offset when we compact instructions
and then using the annotations print the disassembly.
Reviewed-by: Eric Anholt <eric@anholt.net>
The DO instruction doesn't exist on Gen6+. Since before this commit, DO
always ended a basic block, if it also happened to start one (e.g., a
while loop inside an if statement) the block containing only the DO
would actually contain no hardware instructions.
Pre-Gen6's WHILE instructions jumps to the instruction following the DO,
so strictly speaking we won't be modeling that properly, but I claim
there is actually no functional difference.
This will simplify an upcoming change where we want to mark the first
hardware instruction in the loop as beginning a block, and the last
instruction before the loop as ending one.
Reviewed-by: Eric Anholt <eric@anholt.net>
Note that predicated instructions with defs are still not supported
because transformation to SSA doesn't handle them yet.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The glGetGraphicsResetStatusARB from ARB_robustness extension always
returns GUILTY_CONTEXT_RESET_ARB and never returns NO_ERROR for guilty
context with LOSE_CONTEXT_ON_RESET_ARB strategy. This is because Mesa
returns GUILTY_CONTEXT_RESET_ARB if batch_active !=0 whereas kernel
driver never reset batch_active and this variable always > 0 for guilty
context. The same behaviour also can be observed for batch_pending and
INNOCENT_CONTEXT_RESET_ARB.
But ARB_robustness spec says:
If a reset status other than NO_ERROR is returned and subsequent calls
return NO_ERROR, the context reset was encountered and completed. If a
reset status is repeatedly returned, the context may be in the process
of resetting.
8. How should the application react to a reset context event?
RESOLVED: For this extension, the application is expected to query the
reset status until NO_ERROR is returned. If a reset is encountered, at
least one *RESET* status will be returned. Once NO_ERROR is
encountered, the application can safely destroy the old context and
create a new one.
The main problem is the context may be in the process of resetting and
in this case a reset status should be repeatedly returned. But looks
like the kernel driver returns nonzero active/pending only if the
context reset has already been encountered and completed. For this
reason the *RESET* status cannot be repeatedly returned and should be
returned only once.
The reset_count and brw->reset_count variables can be used to control
that glGetGraphicsResetStatusARB returns *RESET* status only once for
each context. Note the i915 triggers reset_count twice which allows to
return correct reset count immediately after active/pending have been
incremented.
v2 (idr): Trivial reformatting of comments.
Signed-off-by: Pavel Popov <pavel.e.popov@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Define GLX_USE_APPLEGL, as config/darwin used to, to turn on specific code to
use the applegl direct renderer
Convert src/glx/apple/Makefile to automake
Since the applegl libGL is now built by linking libappleglx into libGL, rather
than by linking selected files into a special libGL:
- Remove duplicate code in apple/glxreply.c and apple/apple_glx.c. This makes
apple/glxreply.c empty, so remove it
- Some indirect rendering code is already guarded by !GLX_USE_APPLEGL, but we
need to add those guards to indirect_glx.c, indirect_init.c (via it's
generator), render2.c and vertarr.c so they don't generate anything
Fix and update various includes
glapi_gentable.c (which is only used on darwin), should be included in shared
glapi as well, to provide _glapi_create_table_from_handle()
Note that neither swrast nor indirect is supported in the APPLEGL path at the
moment, which makes things more complex than they need to be. More untangling
is needed to allow that
v2: Correct apple/Makefile.am for srcdir != builddir
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
- Don't require xcb-dri[23] etc. if we aren't building for a target with DRM, as
we won't be using dri[23]
- Enable a more fine-grained control of what DRI code is built, so that a libGL
using direct swrast can be built on targets which don't have DRM.
The HAVE_DRI automake conditional is retired in favour of a number of other
conditionals:
HAVE_DRI2 enables building of code using the DRI2 interface (and possibly DRI3
with HAVE_DRI3)
HAVE_DRISW enables building of DRI swrast
HAVE_DRICOMMON enables building of target-independent DRI code, and also enables
some makefile cases where a more detailled decision is made at a lower level.
HAVE_APPLEDRI enables building of an Apple-specific direct rendering interface,
still which requires additional fixing up to build properly.
v2:
Place xfont.c and drisw_glx.c into correct categories.
Update 'make check' as well
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Fix build for darwin, when ./configured --disable-driglx-direct
- darwin ld doesn't support -Bsymbolic or --version-script, so check if ld
supports those options before using them
- define GLX_ALIAS_UNSUPPORTED as config/darwin used to, as aliasing of non-weak
symbols isn't supported
- default to -with-dri-drivers=swrast
v2:
Use -Wl,-Bsymbolic, as before, not -Bsymbolic
Test that ld --version-script works, rather than just looking for it in ld --help
Don't use -Wl,--no-undefined on darwin, either
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
It works fine, though it requires using ELF objects.
With this change there is nothing preventing us to switch exclusively
to MCJIT, everywhere. It's still off though.
If the source renderbuffer has a depth > 0, then send a Z texcoord
which is set to the source attachment Z offset.
This fixes piglit's gl-3.2-layered-rendering-gl-layer-render with the
GL_TEXTURE_2D_MULTISAMPLE_ARRAY case test on i965/gen8.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
brw_color_buffer_write_enabled depends on brw->fragment_program, which
means we have to listen to BRW_NEW_FRAGMENT_PROGRAM.
On most generations, this was only called from a function that already
subscribed. However, on Broadwell, we failed to listen to the necessary
event in the atom that emits 3DSTATE_PS_BLEND.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
I forgot to disable writemasking on the OR and MOV which set the render
target index and "source 0 alpha present to render target" bit.
Using get_element_ud is equivalent and avoids a line-wrap.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Commit a2fb71e23 introduced 32-bit code for SSE4.1. Fix compilation, and
make sure to check ecx for the SSE4.1 bit.
[imirkin: switch sse4.1 to look at ecx]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes MinGW x64 builds. We don't use assembly on any of the
Windows builds, to avoid divergence between MSVC and MinGW when testing.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch fixes this build error on Mac OS X.
CCLD libglapi.la
clang: warning: argument unused during compilation: '-pthread'
clang: warning: argument unused during compilation: '-pthread'
ld: unknown option: --no-undefined
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
-fstack-protector-strong is not supported by clang.
This patch fixes this build error on Fedora 20 with clang.
CXX gallivm/lp_bld_debug.lo
clang: error: unknown argument: '-fstack-protector-strong'
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75010
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Seems the opcodes are slightly different from a2xx. Resync headers and
move blend_func() helper into hw generation specific code.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For most drivers this if statement is always going to fail so check the constant value first.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
We already multiply by bytes per pixel for this, so f3ba7611 broke
mem2gmem for depth/stencil. Drop the now-redundant mutiply by cpp.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Note that this covers the Begin/End rendering path, but not user vertex
arrays (so we can't drop copy_array_to_vbo_array() code). Improves
performance of isosurf GLVERTEX|TRIANGLES by 16.7506% +/- 4.98934%
(n=20). No difference on openarena (n=10), which was why this was reverted
back in cbde276580.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In cases where there was no color buf bound, there were inconsistancies
in register settings related to position of depth/stencil inside GMEM.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
These aren't buffers we ever read back from CPU, so using incorrect
reloc fxn wasn't really harming anything. But might as well be correct.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
_mesa_meta_setup_blit_shader() currently generates a fragment shader
which, irrespective of the number of draw buffers, writes the color
to only one 'out' variable. Current shader rely on an undefined
behavior and possibly works by chance.
From OpenGL 4.0 spec, page 256:
"If a fragment shader writes to gl_FragColor, DrawBuffers specifies a
set of draw buffers into which the single fragment color defined by
gl_FragColor is written. If a fragment shader writes to gl_FragData,
or a user-defined varying out variable, DrawBuffers specifies a set
of draw buffers into which each of the multiple output colors defined
by these variables are separately written. If a fragment shader writes
to none of gl_FragColor, gl_FragData, nor any user defined varying out
variables, the values of the fragment colors following shader execution
are undefined, and may differ for each fragment color."
OpenGL 4.4 spec, page 463, added an additional line in this section:
"If some, but not all user-defined output variables are written, the
values of fragment colors corresponding to unwritten variables are
similarly undefined."
V2: Write color output to gl_FragColor instead of writing to multiple
'out' variables. This'll avoid recompiling the shader every time
draw buffers count is updated.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In commit 4be146b1, I neglected to add the new property to the strings
array. This leads to the string '(null)' to be printed instead when
converting a GS shader to text.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Make sure to normalize the z coordinates as well as the x/y ones when
there are mipmaps present. Fixes 3d mipmap generation, which now uses
the blit path.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
These instructions can come in either through IMUL_HI/UMUL_HI TGSI
opcodes, or from OP_DIV constant folding.
Also make sure that the constant foldings which delete the original
instruction still get counted as having done something.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Retrieving the high 32 bits of a signed multiply is rather annoying. It
appears that the simplest way to do this is to compute the absolute
value of the arguments, and perform a u32 x u32 -> u64 operation. If the
arguments' signs differ, then negate the result. Since there is no u64
support in the cvt instruction, we have the perform the 2's complement
negation "by hand".
This logic can come into use by the IMUL_HI instruction (very unlikely
to be seen), as well as from constant folding of division by a constant.
Fixes dolphin's divisions by 255.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Modern applications frequencly use both UNORM buffers and FLOAT buffers
with color clamping disabled. (FLOAT with clamping explicitly enabled
and SNORM buffers appear to be less common.) We don't need to emit
saturates in the fragment shader in either of the common cases.
Mesa sets ctx->Color._ClampFragmentColor to false if all the color
buffers are UNORM. Also, for GL_FIXED_ONLY mode (the default in
legacy OpenGL), it will be false if any FLOAT buffers are bound.
Since the common case is false, that should be our default.
Thanks to Roland Scheidegger for pointing out some faulty logic
in v1 of this patch (unnecessary code and incorrect explanations).
v2: Drop superfluous code and reword commit message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I think a3xx and later should support (it is part of GLES3), but this
isn't needed for the time being and still needs to be reversed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Separating the software fallbacks from the rest of the meta path (which
is usually hardware accelerated) gives callers better control over their
blitting options.
For example, i965 might want to try meta blit, hardware blits, then
swrast as a last resort. Splitting it makes that possible.
This updates all callers to maintain the existing behavior (even in the
few cases where it isn't desirable behavior - later patches can change
that).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
SCons is required for Windows. Add links to flex/bison for Windows.
Reorder items and improve formatting.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2ea923cf57 had the side effect of IR counting
now being done after IR optimization instead of before. Some quick analysis
shows that there's roughly 1.5 times more IR instructions before optimization
than after, hence the effective shader cache size got quite a bit smaller.
Could counter this with an increase of the instruction limit but it probably
makes more sense to count them after optimizations, so move that code.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes build error introduced with commit
4b04152db0.
CC test_eu_compact.o
test_eu_compact.c: In function ‘test_compact_instruction’:
test_eu_compact.c:54:3: error: implicit declaration of function ‘brw_disasm’ [-Werror=implicit-function-declaration]
brw_disasm(stderr, &src, brw->gen, false);
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78888
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
We already have a perfectly good copy of the program key, and nobody is
going to modify it. The only reason we copied it was because the
brw_wm_compile structure embedded the key rather than pointing to it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Instead, just pass the key and prog_data as separate parameters.
This moves it up a level - one step further toward getting rid of it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
'c' is going away, but we still need a memory context that lives
for the duration of the compile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, the memory context situation was a bit of a mess:
fs_visitor allocated its own memory context, and freed it in the
destructor. However, some data produced by fs_visitor (such as the list
of instructions) needs to live beyond when fs_visitor is "done", so the
caller can pass it to fs_generator.
Everything worked out because brw_wm_fs_emit's fs_visitor variables
happen to not go out of scope until the end of the function. But that
meant that moving the declaration of, say, the SIMD16 fs_visitor
instance, could cause everything to explode.
Using a memory context that exists for the duration of the compile is
clearer, and should be equivalent.
Ultimately, we don't want to use 'c', but this matches the behavior of
fs_generator and gen8_fs_generator, so it'll be simple to change later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
We throw away the data generated during compilation on the success path,
so we really ought to on the failure path as well. The caller has no
access to it anyway, so it's purely leaked.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
'c' is going away. This is also a bit shorter.
Marking the key pointer as const will also deter people from changing
it in these classes, as that's absolutely not OK.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
'c' is going away. This is also shorter.
Marking the key pointer as const will also deter people from changing
it in fs_visitor, as it's absolutely not OK to modify it there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This data is created by fs_visitor and only used when emitting code,
so keeping it in fs_visitor makes sense. I decided it would be
reasonable to group these all together in a struct, since they're
highly related.
v2: s/nr_payload_regs/payload.num_regs/ in some comments (chrisf).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
As far as I can tell, there's no point in allocating an extra register
and generating a MOV---we can just use the copy provided as part of our
thread payload directly. It's already in the right format.
Of course, there are zero Piglit tests for this. We don't actually ship
the extension (GL_ARB_gpu_shader5) that exposes this functionality
either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This is actually for gl_SampleMaskIn, which is quite different than
gl_SampleMask. Renaming should help avoid confusion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Nothing outside of fs_visitor uses it, so we may as well keep it
internal.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With this one use gone, c->last_scratch is now only used inside
fs_visitor. The rest of the driver uses prog_data->total_scratch.
We already compute similar prog_data fields in fs_visitor, so this
seems reasonable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The if (!allocated_without_spills) block is an obvious spot for this
performance warning message.
In the Vec4 backend, scratch is also used for indirect access of
temporary arrays. The FS backend doesn't implement that yet, but
if it did, this message would be inaccurate, since scratch access
wouldn't necessarily mean spilling. Moving it preemptively fixes that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
"Disassemble" is an accurate description of what this function does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're going to use "disassemble" for the function that disassembles
the whole program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
dump_prog_cache has interpreted compacted instructions as full size
instructions, decoding garbage and complaining about invalid values.
We can just use brw_dump_compile to handle this correctly in less code.
The output format changes slightly, but it's still perfectly acceptable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Looping over the instructions and calling brw_disasm doesn't handle
compacted instructions. In most cases, this hasn't been a problem since
we don't compact prior to Sandybridge.
However, Sandybridge's transform feedback GS program should already be
compacted, and so this ought to fix decoding of that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
UNION appears to expect that all of its sources are conditionally
defined. Otherwise it inserts an unpredicated mov instruction which
overwrites the desired result. This fixes tests that use UMUL_HI, and
much less directly, unsigned integer division by a constant, which uses
this functionality in a peephole pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Gallium already gives us height==1 for these, so the texture state is
already setup correctly to emulate 1D textures as a Nx1 2D texture. We
just need to supply the .y coord.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
That was easy. Turns out it is just a matter of setting one bit.
Enable sampling from sRGB texture, and therefore enable GL 2.1 :-)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Enabled with GALLIVM_DEBUG=perf (which up to now was only used to print
warnings for unoptimized code).
While some unexpectedly long shader compile times for some shaders were fixed
with 8a9f5ecdb1 this should help recognize such
problems in the future. For now though only available in debug builds (which
are not always suitable for such analysis). And since this uses system time,
it might not be all that accurate (even llvmpipe's own rasterization threads
might be running at the same time, or just other tasks).
(llvmpipe also has LP_DEBUG=counters but this only gives an average per shader
and the the total time for all shaders.)
This prints information like this:
optimizing module fs17_variant0 took 1 msec
optimizing module setup_variant_0 took 0 msec
optimizing module draw_llvm_vs_variant0 took 9 msec
optimizing module draw_llvm_vs_variant0 took 12 msec
optimizing module fs17_variant1 took 2 msec
v2: rebase for recent gallivm compilation changes, and print time for whole
modules instead of functions (otherwise it would be very spammy since it would
include all trivial inline sse2 functions), using the shiny new module names,
prying them off LLVM using new helper (not available through C bindings).
Per function timings, while possibly giving more information (if there'd be
a problem only in for instance the partial not the whole function), don't seem
all that useful for now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When we had just one module "gallivm" was an appropriate name. But now we have
modules containing all functions for a particular variant, so give it a
corresponding name (this is really just for helping debugging).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We allocate dispatch tables for BeginEnd and OutsideBeginEnd. But
when we destroy the context we were freeing the BeginEnd and Exec
tables. If Exec==BeginEnd we did a double-free. This would happen
if the context was destroyed while inside a glBegin/End pair. Now
free the BeginEnd and OutsideBeginEnd pointers.
Cc: "10.1", "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes the valgrind report below and random crashes with piglit on radeonsi.
==30005== Conditional jump or move depends on uninitialised value(s)
==30005== at 0xB13584E: st_translate_program (st_glsl_to_tgsi.cpp:5100)
==30005== by 0xB14698B: st_translate_fragment_program (st_program.c:747)
==30005== by 0xB14777D: st_get_fp_variant (st_program.c:824)
==30005== by 0xB11219C: get_color_fp_variant (st_cb_drawpixels.c:1042)
==30005== by 0xB1131AE: st_DrawPixels (st_cb_drawpixels.c:1154)
==30005== by 0xAFF8806: _mesa_DrawPixels (drawpix.c:162)
==30005== by 0x4EB86DB: stub_glDrawPixels (generated_dispatch.c:6640)
==30005== by 0x4F1DF08: piglit_visualize_image (piglit-util-gl.c:1574)
==30005== by 0x40691D: draw_image_to_window_system_fb(int, bool) (draw-buffers-common.cpp:733)
==30005== by 0x406C8B: draw_reference_image(bool, bool) (draw-buffers-common.cpp:854)
==30005== by 0x40722A: piglit_display (alpha-to-coverage-dual-src-blend.cpp:117)
==30005== by 0x4EA7168: run_test (piglit_fbo_framework.c:52)
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This workaround doesn't list any llvm version, but it was introduced
2010-06-10 (e277d5c1f6). It is unlikely
this bug is still present in llvm versions we support (3.1+).
There's no specific test listed, but I ran lp_test_arit (which uses
the mentioned functions) on llvm 3.1 and 3.3 with sse41 disabled and
this pass enabled without issues.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
32bit code generation and llvm >= 2.7 used a different optimization pass
order - this code was initially introduced (2010-07-23) by
815e79e72c, apparently due to buggy code being
generated with then brand new llvm versions (which was llvm 2.7 plus pre 2.8
devel).
It seems very highly likely that whatever this bug was it has been fixed in
newer llvm versions, though there's no easy way to test this - the mentioned
piglit test has been removed years ago, and even if you'd build it I'm
sceptical the glsl compiler would still produce the required code to trigger
it.
I have no idea what a good order of passes is, but just remove the workaround
and use the same order everywhere.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gen8_dump_compile will be called indirectly by code common used by
generations before and after the gen8 instruction format change.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_dump_compile will be called indirectly by code common used by
generations before and after the gen8 instruction format change.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Has been misaligned since we added instruction offset prefixes.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_disasm doesn't disassemble compacted instructions, so we uncompact
before disassembling them which would unset the compaction control bit.
Instead pass it as a separate argument.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Create the intel renderbuffer with level hardcoded to zero instead
of overriding it in the surface state configuration. Also moved the
dimension adjustments for tiling, mip level, msaa into the render
buffer creation. Finally prepares for another blit path needed for
miptree updownsampling.
v3 (Ken): Dropped unnecessary memory context for "ralloc_asprintf()"
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
v2: Configure stencil directly for final dimensions instead of
adjusting bit by bit for tiling, mip level and msaa.
v3 (Ken): Used non-static constant for horizontal alignment
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Allow hardware to offset accesses to individual layers. Also leave
the mip-level overriding for the creator of the intel renderbuffer
to handle. Merged with "i965/gen8: Allow stencil buffers to be
configured as single sampled"
Ken: I left the "_mesa_problem()" still in place. I think it is clearer
to remove it in a separate patch.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use intel_mipmap_tree::total_width in order to get correct alignment
automatically. Also use "mt->total_height / mt->physical_depth0" as
surface height allowing hardware to offset to correct slice.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
mt->format is of type mesa_format, and therefore can't be
used with _mesa_base_fbo_format which requires a GLenum input.
On gen8, this fixes various piglit fbo-depthstencil tests with
samples > 1.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
They've served their purpose (in transitioning blorp to using
fs_generator) and now they just necessitate large amounts of manual
labor to regenerate if the disassembler changes.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
With this and the previous patch, we no longer have multiple
definitions in the final egl_gallium.so.
v2: Drop duplicate libloader link.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com> (v1)
Reviewed-by: Tom Stellard <thomas.stellard@amd.com> (v1)
It makes more sense to link the core and common parts of the driver as the
target is build. Additionally this will help us drop duplicating symbols
for targets that static link mulitple pipe-drivers. Only egl-static needs
that currently with more to come.
To simplify things a bit add HAVE_GALLIUM_RADEON_COMMON variable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The loops for updating the multiple packed fields in SP_VS_OUT[] and
SP_VS_VPC_DST[] will zero out one register beyond the last that on
required. Which is normally not a problem (and is kinda convenient
when looking at cmdstream dumps) unless we have maximum (16) varyings.
Fix loop termination condition so that this does not happen.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to size input/output tables big enough for special inputs/
outputs (gl_Position, gl_FrontFacing, etc) which, while they don't
count towards the hw limit of 16 attributes or 16 varyings, we do
still need to track them all the same.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Hardware only supports 16. Which fd3_shader_variant properly reflected,
but the pipe cap did not, leading to array overflow (and shaders that
could not possibly work).
Also a bunch of asserts to make problems like this easier to see.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We are starting to add integer support to the compiler, which does not
get exercised with glsl feature level 120 and without advertising
integer support. But doing so breaks too many things right now. So
for now use a debug flag to conditionally expose the functionality
while it is in development.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The KILL_IF opcode could potentially be merged in to the regular KILL
opcode function. It was a pain to do so, so I've left is separated
for cleanliness.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Adds a large sum of TGSI opcodes to the a3xx compiler.
For integer opcodes we have 28 opcodes added.
Adds 4 floating point compare opcodes
If GLSL 1.30 is enabled, this allows the GLSL 1.30 piglits to have a
completion amount of 432/641.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
All shaders had the same name.
We could probably use some identifier per shader too, but for now only use
the variant number.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The setup shaders were composed of both a fs shader number and a variant
number. But since they aren't tied to a particular fragment shader, the
former was a fixed zero while the latter was also always zero because
it was never assigned. So, similar to what the fs code does, use a ever
increasing number to give it a more catchy name (unlike fragment shaders
though where this number is for each explicitly created shader, we just use
it for the implicitly created variants).
And while here, fix whitespace a bit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Unused except it was increased for both fs and setup shader variants created.
Probably some leftover from ages ago.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously the code used the total number of ubos being declared in the
linked program (so the ubos of all shaders combined), use the number
from the particular shader instead.
This fixes an assertion failure with piglit arb_uniform_buffer_object-maxblocks
seen in llvmpipe since 8a9f5ecdb1 as it now emits
code for each declared buffer, not just the ones actually used.
CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The big missing part here is proper sched data calculations, but
hopefully the chosen placeholder will be sufficient for now.
Passes piglit as well as GK107 does.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Maxwell no longer has the methods to set constant attributes, and we'll
want to be treating stride 0 vtxbufs the same as for stride > 0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use "@GL_LIB@" in src/gallium/targets/libgl-xlib/Makefile.am to produce
the library name specified by the configure --with-gl-lib-name option.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
In 1d35f77228 support for multiple constant
buffers was introduced. This meant we had another indirection, and we did
resolve the indirection for each constant buffer access. This looks very
reasonable since llvm can figure out if it's the same pointer, however it
turns out that this can cause llvm compilation time to go through the roof
and beyond (I've seen cases in excess of factor 100, e.g. from 50 ms to more
than 10 seconds (!)), with all the additional time spent in IR optimization
passes (and in the end all of it in DominatorTree::dominate()).
I've been unable to narrow it down a bit more (only some shaders seem affected,
seemingly without much correlation to overall shader complexity or constant
usage) but it is easily avoidable by doing the buffer lookups themeselves just
once (at constant buffer declaration time).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously, ir_unop_any was implemented via a dot-product call, which
uses floating point multiplication and addition. The multiplication was
completely pointless, and the addition can just as well be done with an
or. Since we know that the inputs are booleans, they must already be in
canonical 0/~0 format, and the final SNE can also be avoided.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It wasn't completely clear from the docs, so I had to figure out by
looking at piglit results. Hopefully this saves the next driver writer
implementing queries some time.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Not necessary, now that we will free the whole module (hence all
function bodies) immediately after compiling.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Free up unneeded LLVM stuff immediately after generating vertex shader
code. Saves about 500K per shader.
v2: Don't bother calling gallivm_free_function (Jose)
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Split free_gallivm_state() into two steps. First step is
gallivm_free_ir() which cleans up the LLVM scaffolding used to generate
code while preserving the code itself. Second step is
gallivm_free_code() to free the memory occupied by the code.
v2: s/gallivm_teardown/gallivm_free_ir/ (Jose)
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Provide a JITMemoryManager derivative which puts all generated code into
one memory pool instead of creating a new one each time code is generated.
This saves significant memory per shader as the pool size is 512K and
a small shader occupies just several K.
This memory manager also defers freeing generated code until you tell
it to do so, making it possible to destroy the LLVM engine while keeping
the code, thus enabling future memory savings.
v2: Fix compilation errors with LLVM 3.4 (Jose)
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I saw that LLVM internally uses its global context for some things, even
when we use our own. Given ours is also global, might as well use
LLVM's.
However, sepearate contexts can still be enabled with a simple source
code modification, for when the need/benefit arises.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Nowadays LLVMModuleProviderRef is just an alias for LLVMModuleRef, so
its use just causes unnecessary confusion.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Older versions haven't been tested probably don't work anyway. But more
importantly, code supporting it is hindering further work.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Many instructions implicitly update the accumulator on Gen < 6. The instruction
scheduling code just calls add_barrier_deps() for each accumulator access on
these platforms, but a large class of operations don't actually update the
accumulator -- mostly move and logical instructions. Teaching the scheduling
code about this would allow more flexibility to schedule instructions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77740
Reviewed-by: Matt Turner <mattst88@gmail.com>
Split out fd_query into an abstract base class, to allow multiple
implementations. The current sw based queries are moved into
fd_sw_query.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
As far as I can tell, Mesa hasn't had a convenient way to dump ARB_vp/fp
source until now. Using MESA_GLSL=dump is convenient, since it means
you can use a single environment variable to dump a program's shaders,
no matter which language they're written in.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The depth extent field is used to limit the allowed slice range that
can be rendered to.
With the previous setting, only slice 0 could be rendered.
This fixes piglit amd_vertex_shader_layer-layered-depth-texture-render.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Fixes piglit's
'gl-3.2-layered-rendering-clear-color-all-types 3d mipmapped'
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If blorp is disabled for color clears, then piglit's
'gl-3.2-layered-rendering-clear-color-all-types 3d mipmapped'
will fail.
Currently, gen8 fails similarly on this test because gen8
does not use blorp.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With the more advanced dead code elimination pass already being run,
eliminate_dead_code was making no difference in instruction count, and had
an undesirable O(n^2) runtime. So remove it and rename
eliminate_dead_code_advanced to eliminate_dead_code.
Reviewed-by: Marek Olšák <marek.olsak at amd.com>
That information misleads source code auditing tools to think that
ralloc itself is released under LGPL v3.
Instead, simply state talloc is not licensed under a permissive license.
v2: Use wording suggested by Kenneth.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We always call brw_merge_inputs() right before looping over the primitives but
this can be called inside the loop for each primitive too. In the case we do it
for the first primitive the call is redundant and can be skipped.
Reviewed-by: Eric Anholt <eric@anholt.net>
We're moving towards requiring interface additions to be appended to the
end of the interface block. No functional change, opcodes are assigned as
before, but version 2 additions are now grouped together, which prevents
a scanner warning.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Now that we aren't using pixel_[xy] in live variables, nothing is looking
at these regs after the visitor stage.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the only case where a fs_reg in brw_fs_visitor is used during
optimization/code generation, and it meant that optimizations had to be
careful to not move pixel_x/y's register number without updating it.
Additionally, it turns out we had a couple of other UW values that weren't
getting this treatment (like gl_SampleID), so this more general fix is
probably a good idea (though I wasn't able to replicate problems with
either pixel_[xy]'s values or gl_SampleID, even when telling the register
allocator to reuse registers immediately)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The value depends only on the level, so no need to store the bool per slice.
Shrinks intel_mipmap_slice from 24 bytes to 16, while slotting into an
existing hole in intel_mipmap_level.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Need to adjust coordinates since the shader receives the array index as
depth in z, but the TEX instruction expects it to be the second
coordinate for a 1D array texture. This fixes fbo-generatemipmap-array.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Previously the implication was that queries should be disabled during
blits. However glBlitFramebuffer() is supposed to obey the current
query, and this new bit will indicate that to the driver.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Different textures may be bound to each slot for each stage. So we need
to be able to upload ms parameters for each one without stages
overwriting each other.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
It's the same as setting the 3 regs separately, but shorter, and it also
seems to be required on GFX7.2 and later. This doesn't fix Hawaii.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This reverts commit e6967270c7.
Chris Forbes pointed out that this is broken for texture views which
restrict the number of slices. He committed a better fix which makes
this unnecessary.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The information was lost during linking, causing the layout to be treated as
FRAG_DEPTH_LAYOUT_NONE.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both the ast->IR and linker have functions with this name, but different
behavior.
Rename the linker's version to var_counts_against_varying_limit to be
closer to what it is actually used for.
Suggested by Ian a while back.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The screen takes ownership of the winsys, and is responsible for
destroying it. Users of pipe-loader should make sure they destory
and screens they've created to avoid memory leaks.
This fixes a crash in clover introduced by
ce6c17c083 where the pipe-loader was
destroying the winsys while a screen was still using it.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Same issues as the previous commit fixed for Gen7:
- Bogus physical->logical layer conversion; depth/stencil surfaces
are still IMS layout on Gen8.
- mt_layer ignored in layered rendering case, which breaks handling
of views with MinLayer.
- Render target array extent not set correctly for arrays.
I'm not able to test this one since I can't get a Broadwell yet, but
it's the same set of fixes as for Gen7.
V2: Restore the MAX2() to account for zero depth/layer_count.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Again, a few problems:
- Layered attachments did not honor MinLayer.
- Non-layered MSAA attachments rendered to the wrong layer due to
dividing by the layer count. All depth buffers use the IMS layout, so
the physical layer count == logical layer count.
- Layered attachments were not limited to irb->layer_count, so we could
render off the end of the texture.
V2: Restore the MAX2() to account for zero depth/layer_count.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixing the same issues the previous commit does for Gen7.
Note that I can't test this one, since I don't have a Broadwell.
V2: Restore the MAX2() to account for zero depth/layer_count.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There were a few problems here, which mostly just broke layered
rendering into a view:
- Render target view extent was always set to be == depth. This is
benign for non-layered-rendering, but allows writes off the end of the
render target for layered rendering, which ends badly.
- Layered rendering did not honor the mt_layer setting, so would not
properly handle MinLayer being set on a view.
V2: Restore the MAX2() to account for zero depth/layer_count.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm somewhat impressed that current gccs will let you do this, but
sufficiently old ones (including 4.4.7 in RHEL6) won't.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
When the limit was changed to be defined in terms of LP_MAX_SHADER_VARIANTS
(75f1fea14f) when it was increased, this
inadvertently lowered the limit in some branches (that have a lower
LP_MAX_SHADER_VARIANTS number) when merged. So, make sure the limit is always
at least the number it once was.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
1c73e919a4 made it possible to not allocate
the tgsi machine if llvm was used. However, draw_get_option_use_llvm() is
not reliable after draw context creation, since drivers can explicitly
request a non-llvm draw context even if draw_get_option_use_llvm() would
return true (and softpipe does just that) which leads to crashes.
Thus use draw->llvm to determine if we're using llvm or not instead (and
make draw->llvm available even if HAVE_LLVM is false so we don't have to put
even more ifdefs).
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
1D array targets store the number of slices in the Height field.
Fixes Piglit's spec/!OpenGL 3.2/layered-rendering/clear-color-all-types
1d_array single_level, at least when used with Meta clears.
Cc: "10.2 10.1 10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This only matters for TextureView where the texObj's target has not been
set yet, in all other instances, texObj->target should be the same as
the passed-in target parameter.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
The old condition disallowed load propagation any time flags were
defined, even with e.g. set and a constbuf reference. The new condition
disallows it only with immediate propagation. (There are no opcodes that
set the condition flag and have an immediate argument.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add cases for PIPE_SHADER_CAP_MAX_SAMPLER_VIEWS and
PIPE_SHADER_CAP_PREFERRED_IR. Remove default switch case so we
learn of missing cases at compile time.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Return PIPE_SHADER_IR_TGSI for the PIPE_SHADER_CAP_PREFERRED_IR query.
Remove default switch case so we learn of missing switch cases at
compile time.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In an earlier incarnation of populate_consumer_input_sets and
get_matching_input, the consumer_inputs_with_locations array was indexed
using the user-specified location. In that version, only user-defined
varyings were included in the array.
In the current incarnation, the Mesa location is used to index the
array, and built-in varyings are included.
This change fixes the unit test to exepect gl_ClipDistance in the array,
and it resizes the arrays to actually be big enough. It's just dumb
luck that the existing piglit tests use small enough locations to not
stomp the stack. :(
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78258
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Cc: Vinson Lee <vlee@freedesktop.org>
We added wglCreateContextAttribsARB but not the extension strings.
This allows creation of GL 3.x contexts.
Reviewed-by: Brian Paul <brianp@vmware.com>
This path is used to implement both glClear and glClearBuffer; the
latter is only supposed to clear particular buffers. Core Mesa provides
us that information in the buffers bitmask; we must only clear buffers
mentioned there.
To accomplish this, we save/restore the color draw buffers state, and
use glDrawBuffers to restrict drawing to the relevant buffers.
Fixes Piglit's spec/!OpenGL 3.0/clearbuffer-mixed-formats and
spec/ARB_framebuffer_object/fbo-drawbuffers-none glClearBuffer tests
for drivers using meta clears (such as Broadwell).
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77852
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77856
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be used for saving/restoring the glDrawBuffers state.
For now, make sure that existing users of MESA_META_ALL don't get
the new bit, since they probably won't want it.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The majority of _mesa_meta_Clear and _mesa_meta_glsl_Clear was the same;
adding a boolean for whether to use GLSL allows us to share most of it
without polluting either path too much.
Tested for regressions by hacking i965 to always use the non-GLSL path.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes glean/texture_srgb, which hit recursive-flush prevention
assertions in vbo_exec_FlushVertices.
This probably hurts the performance of front buffer rendering, but
very few people in their right mind do front buffer rendering.
Fixes Glean's texture_srgb test.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
[olv: Use the real name provided by the patch author. Ideally this could be
moved to somewhere higher level so that we would not need to create a pipe
context to flush resources. Plus, it is not clear if flushing resources for
another context is valid.]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
If StencilSampling is enabled on the texture object, pass in an
equivalent stencil-only format.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
S8_UINT will become useful when ARB_texture_stencil8 becomes supported by
mesa. The other stencil formats are needed for ARB_stencil_texturing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes:
Program received signal SIGSEGV, Segmentation fault.
bind_samplers (comp=0x21b054, comp=0x21b054, ctx=0x211430)
at ../../../../../src/gallium/state_trackers/xa/xa_composite.c:445
445 mask_pic->srf->tex->format);
(gdb) bt
#0 bind_samplers (comp=0x21b054, comp=0x21b054, ctx=0x211430)
at ../../../../../src/gallium/state_trackers/xa/xa_composite.c:445
#1 xa_composite_prepare (ctx=0x211430, comp=comp@entry=0x21b054)
at ../../../../../src/gallium/state_trackers/xa/xa_composite.c:488
#2 0xb6f454b4 in XAPrepareComposite (op=<optimized out>, pSrcPicture=<optimized out>,
pMaskPicture=<optimized out>, pDstPicture=<optimized out>, pSrc=0x5b3ad8, pMask=0x0,
pDst=0x5923b8) at msm-exa-xa.c:533
We can't yet handle solid fill mask, so explicitly reject that, rather
than segfaulting. Otherwise DDX would need to check XA version to see
if solid fill mask were supported.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Prior to commit 8435b60a35, the region
equivalent of this function called intel_miptree_create_layout, which
set mt->target to target. With that commit, it no longer copied target.
Piglit's ext_image_dma_buf_import-sample_[xa]rgb8888 tests would then
hit an assertion failure, where image->TexObject->Target was
GL_TEXTURE_EXTERNAL_OES, and mt->target was GL_TEXTURE_2D.
Copying the target fixes this assertion failure.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The two paths are really similar, and the extra conditionals will be
dwarfed by the cost of the actual upload.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 30259856a8 moved the state packets to
table generation time, but forgot to make this change. Apparently the
performance win there was about not reemitting the table pointers on
unrelated state changes.
No performance difference on cairo on glamor (n=118).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have the stage state coming into our setup of sampler states,
it's easy to drop an identifier into it of which stage the stage_state is,
and then look up which packet to emit in a little table.
No performance difference on cairo on glamor (n=492).
v2: Don't forget to do the workaround flush on IVB.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should have happend around the time of commit 4680d23, but Keith's
DRI3 patches and my GLX_MESA_query_renderer patches crossed in the mail.
I don't have a working DRI3 setup, so I haven't been able to actually
verify this. I'm hoping that someone can piglit this for me on DRI3...
It's also unfortunate the DRI2 and DRI3 can't share more code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Use the ones provided by the compiler instead.
NOTE: External trees should be updated to not include '#include/c99'
directory directly, but rather rely on scons/gallium.py to do the right
thing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Silence insignificant warnings so significant warnings have a chance to
stand out.
The only abundant warning that's not silenced here is "C4018:
signed/unsigned mismatch", as it could hide security issues, so it's better
to actually fix the code.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Drop the version/name tag from the script as it was never
meant to be there. Add swrast_create_screen as it is used
when loading swrast. Rename the file to pipe.sym.
v2: Rebase on top of the LD_NO_UNDEFINED changes.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Both llvm and clang polute the exported symbol table, as soon
as we try to link with either one. Other than those two
everything else looks good (clean).
Cc: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Namely drop the version/name tag of the exported symbol, and
rename the filename to egl.sym.
v2: Rebase on top of the LD_NO_UNDEFINED changes.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Using export-symbols-regex is the least desirable method of restricting
the exported symbols, as is completely messes up with the symbol table.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Using export-symbols-regex is the least desirable method of restricting
the exported symbols, as is completely messes up with the symbol table.
radeon_drm_winsys_create is not needed, avoid exporting it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
In the presence of LLVM the final library exports every symbol from
the llvm namespace. Resolve this by using a version script (w/o the
version/name tag).
Considering that there are only ~25 symbols, explicitly list them
to minimize the chances of rogue symbols sneaking in.
Drop the *winsys_create functions as they were only meant for
gl-vdpau interop.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The symbol is not meant to be exported, and its presence was
only a side effect due to the missing visibility flags.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The symbol is used for hardware only drivers. For swrast the
loader uses swrast_create_screen. Add VISIBILITY_CFLAGS while
we're here.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This can be called from locations that don't have a context pointer
handy. This patch also adds enough infrastructure so that the unit
tests for the GLSL compiler and the stand-alone compiler will build and
function.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows them to be moved to .rodata, and allow us to be sure that they
will not be modified.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Using the existing driver hooks made for AMD_performance_monitor, implement
INTEL_performance_query functions.
v2: Whitespace changes.
v3: Whitespace changes, add a _mesa_warning()
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Like AMD_performance_monitor, this extension provides an interface for
applications (and OpenGL-based tools) to access GPU performance
counters. Since the exact performance counters available vary between
vendors and hardware generations, the extension provides an API the
application can use to get the names, types, and minimum/maximum
values of all available counters.
Applications create performance queries based on available query
types, and begin/end measurement collection. Multiple queries can be
measuring simultaneously.
v2: Whitespace changes
v3: src/mapi/glapi/gen/gl_API.xml: Also expose the functions to GLES2.
v4: Whitespace changes, static_dispatch="false" for all functions, fix
dispatch_sanity test for GLES2 functions
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was a work-around to allow linking a program with only a fragment
shader in a GLES context. Now that we have GL_EXT_separate_shader_objects
in GLES contexts, we can just use that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
ARB, OES, then everything else. If there's ever a KHR shading language
extension, it should go between ARB and OES.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
I don't know of any applications that actually use it. Now that Mesa
supports GL_ARB_separate_shader_objects in all drivers, this extension
is just cruft.
The entrypoints for the extension remain in the XML. This is done so
that a new libGL will continue to provide dispatch support for old
drivers that try to expose this extension.
Future patches will add OpenGL ES GL_EXT_separate_shader_objects, but
that's a different thing.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
With this patch, the piglit arb_separate_shader_object-dlist test
passes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be used for GL_ARB_separate_shader_objects. That extension
not only allows separable shaders to rendezvous by location, but it also
allows traditionally linked shaders to rendezvous by location. The spec
says:
36. How does the behavior of input/output interface matching differ
between separable programs and non-separable programs?
RESOLVED: The rules for matching individual variables or block
members between stages are identical for separable and
non-separable programs, with one exception -- matching variables
of different type with the same location, as discussed in issue
34, applies only to separable programs.
However, the ability to enforce matching requirements differs
between program types. In non-separable programs, both sides of
an interface are contained in the same linked program. In this
case, if the linker detects a mismatch, it will generate a link
error.
v2: Make sure consumer_inputs_with_locations is initialized when
consumer is NULL. Noticed by Chia-I.
v3: Rebase on removal of ir_variable::user_location.
v4: Replace a (stale) FINISHME with some good explanation comments from
Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When linking a separable program that contains only a fragment shader,
the producer will be NULL. Similar cases will exist with geometry
shaders and, eventually, tessellation shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
While writing the link_varyings::single_interface_input test, I
discovered that populate_consumer_input_sets assumes that all shader
interface blocks have been lowered to discrete variables. Since there
is a pass that does this, it is a reasonable assumption. It was,
however, non-obvious. Make the code fail when it encounters such a
thing, and add a test to verify that behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Four initial tests:
* Create an IR list with a single input variable and verify that
variable is the only thing in the hash tables.
* Same as the previous test, but use a built-in variable
(gl_ClipDistance) with an explicit location set.
* Create an IR list with a single input variable from an interface block
and verify that variable is the only thing in the hash tables.
* Create an IR list with a single input variable and a single input
variable from an interface block. Verify that each is the only thing
in the proper hash tables.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
I want to make some changes to this code, but first I want to make some
unit tests for it... so that I can capture the pre- and
post-invariants. Pulling the code out into its own function in a
non-anonymous namespace enables that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This code was broken in some odd ways before. Too much state was being
saved, it was being restored in the wrong order, and in the wrong way.
The biggest problem was that the pipeline object was restored before
restoring the programs attached to the default pipeline.
Fixes a regression in the glean texgen test.
v3: Fairly significant re-write. I think it's much cleaner now, and it
avoids a bug with some meta ops that use shaders (reported by Chia-I).
v4: Check Pipeline.Current against NULL instead of Pipeline.Default.
Suggested by Chia-I.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Pull most of the guts out of _mesa_BindPipeline into a new utility
function that can be use elsewhere (e.g., meta).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Don't do anything with variables that have explicitly assigned
locations. This is also how built-in varyings are handled.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In February 2013 Paul unified the values used for shader stage outputs
and shader stage inputs. See commits 8a076c5f0^..eed6baf76. Since that
time, the location_base parameters are always VARYING_SLOT_VAR0.
Instead of passing that around, just hard code it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we are uisng the OpenCL 1.2 headers, applications expect all
the OpenCL 1.2 functions to be implemented.
This fixes linking errors with the piglit CL tests.
v2:
- Use c++ features
- Fix error code handling
v3:
- Move <iostream> into api/util.hpp
- Fix indentation
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Make set_ubo_binding() just update the binding, and move the code
that does validation, flushes the vertices etc. into a new
bind_uniform_buffer() function.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Document the difference between _mesa_lookup_bufferobj() and
_mesa_multi_bind_lookup_bufferobj().
v3: Don't create the buffer objects when they don't exist.
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Make set_atomic_buffer_binding() just update the binding, and move
the code that does validation, flushes the vertices etc. into a new
bind_atomic_buffer() function.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is for glBindTextures(), since it doesn't change the active
texture unit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch adds functions for locking/unlocking the mutex, along with
_mesa_HashLookupLocked() and _mesa_HashInsertLocked()
that do lookups and insertions without locking the mutex.
These functions will be used by the ARB_multi_bind entry points to
avoid locking/unlocking the mutex for each binding point.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The texture can only be bound to the index that corresponds to its
target, so there is no need to loop over all possible indices
for every unit and checking if the texture is bound to it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used by glBindTextures() when unbinding textures,
to avoid having to loop over all the targets.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used by glBindTextures() so we don't have to look it up
for each texture.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We had the EGLimage structure laying around in intel_regions.h, but now
it's the only thing left in the file.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Fix bad pointer on unreference (caught by Chad)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
I slipped this in in the region->pitch change from pixels to bytes, but I
don't see any reason for it any more -- the libdrm code doesn't appear to
divide pitch by a cpp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
If the stride wasn't width*cpp, we wouldn't track how much of the src is
busy, and allow a subdata into the end to proceed unsynchronized.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Note: region->width/height used to reflect the total_width/height padding
of separate stencil, though mt->total_width didn't. region->width/height
was being used in EGL images, where the padded value would have been the
wrong one, so I converted them to use rb->Width/Height.
v2: Drop debug printf that slipped in (caught by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Regions aren't refcounted safely for multithreaded applications, and
they're not terribly useful wrappers of a BO, so I'm trying to remove
them.
Even the stride I added here could probably be reduced to use of an
existing field in the __DRIimageRec, but I want this to be as mechanical
of a change as possible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I want to do this to get the region removed from DRI images. However, it
does mean that we won't share the intel_region between the rb and the
texture for texture_from_pixmap. I think that's fine.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I noticed that we were doing this while changing the DRI3 path to not use
regions, which involved changing the signature of
intel_update_winsys_renderbuffer_miptree() this way.
v2: Replace my comment with Chad's version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> (v1)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The drm function to get the tiling is just a getter storing the two
pointers, so we don't need to go out of our way to avoid it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
All the consumers are doing it on a miptree.
v2: fix a silly duplicated dereference (review by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> (v1)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v1)
intel_alloc_renderbuffer_storage() will clobber rb->Format which was
already set up by intel_create_renderbuffer(). This causes the driver
to potentially create the depth buffer in the wrong format.
In practice this makes the depth buffer Z24 even if the visual has
depthBits==16.
The incorrect depth buffer format doesn't seem to cause any actual
problems in i965, but it seems like we should fix it anyway. I see
Z16 has been more or less deprecated in the driver except the for
the depthBits==16 case. But if we want to use Z24 even in that
case (not sure it's really legal?) it would look better if the
code made that decision explicitly rather than relying on the
format to get magically overwritten by the renderbuffer code.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
intel_alloc_renderbuffer_storage() will clobber rb->Format which was
already set up by intel_create_renderbuffer(). This causes the driver
to potentially create the depth buffer in the wrong format.
Long time ago things worked by accident because
_mesa_choose_tex_format() checked for ARB_depth_texture
and thus returned MESA_FORMAT_NONE on gen2 hardware. Somehow
that ended up working when depthBits==16 because the driver
would then pick DEPTH_FRMT_16_FIXED. Not sure how, but things
also seemed to work with depthBits==24.
Things started to go more sideways at:
commit 6ae473221a
Author: Eric Anholt <eric@anholt.net>
Date: Mon Apr 22 16:04:25 2013 -0700
intel: Fold the one last function intel_tex_format.c into the caller.
since that caused intel_miptree_create_layout() to divide by zero
when encoutering MESA_FORMAT_NONE (bw==0). So after this
commit things were broken enough that many applications wouldn't even
run.
Things got a bit better at:
commit c245efe7e8
Author: Eric Anholt <eric@anholt.net>
Date: Thu Mar 21 09:50:45 2013 -0700
mesa: Remove extension checking from ChooseTexFormat.
since now _mesa_choose_tex_format() would return MESA_FORMAT_X8_Z24
for GL_DEPTH_COMPONENT due to i915 erroneosly claiming that
MESA_FORMAT_X8_S24 (and others) are supported texture formats even
on gen2 hardware. So now the the div-by-zero was gone, but now the
driver would pick DEPTH_FRMT_24_FIXED_8_OTHER even when
depthBits==16 which caused rendering problems.
If we prevent rb->Format from getting clobbered for the depth buffer
things work much better. This makes the spinning title text visible
again in chromium-bsu at 16bpp, for example.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
V2: Follow the new naming convention for unpack functions.
Use double precision for converting Z24 to a float.
V3: Unpack stencil value to most significant byte.
Use 'struct z32f_x24s8' type.
V4: Unpack stencil value to least significant byte.
Add a comment to clarify stencil packing.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch contains non-functional changes. Assertion checks made
earlier in the functions make the if checks redundant. So, remove
the if checks and unindent the code in if block.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Depth-stencil teture targets are allowed to use source data of type
GL_UNSIGNED_INT_24_8_EXT and GL_FLOAT_32_UNSIGNED_INT_24_8_REV.
Fixes few crashes in Khronos OpenGL CTS packed_pixels tests.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Link error conditions added in previous patch are equally applicable
to GL_ARB_fragment_coord_conventions implementation. Extension's spec
says:
"If gl_FragCoord is redeclared in any fragment shader in a program,
it must be redeclared in all the fragment shaders in that program
that have a static use of gl_FragCoord. All redeclarations of
gl_FragCoord in all fragment shaders in a single program must have
the same set of qualifiers."
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL 1.50 spec says:
"If gl_FragCoord is redeclared in any fragment shader in a program,
it must be redeclared in all the fragment shaders in that
program that have a static use gl_FragCoord. All redeclarations of
gl_FragCoord in all fragment shaders in a single program must
have the same set of qualifiers."
This patch causes the shader link to fail if we have multiple fragment
shaders with conflicting layout qualifiers for gl_FragCoord.
V2: Restructure the code and add conditions to correctly handle the
following case:
fragment shader 1:
layout(origin_upper_left) in vec4 gl_FragCoord;
void main()
{
foo();
gl_FragColor = gl_FragData;
}
fragment shader 2:
layout(pixel_center_integer) in vec4 gl_FragCoord;
void foo()
{
}
V3:
Allow linking in the following case:
fragment shader 1:
void main()
{
foo();
gl_FragColor = gl_FragCoord;
}
fragment shader 2:
in vec4 gl_FragCoord;
void foo()
{
...
}
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 4.3.8.1, page 39 of GLSL 1.50 spec says:
"Within any shader, the first redeclarations of gl_FragCoord
must appear before any use of gl_FragCoord."
GLSL compiler should generate an error in following case:
vec4 p = gl_FragCoord;
layout(origin_upper_left) in vec4 gl_FragCoord;
void main()
{
}
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL 1.50 spec says:
"If gl_FragCoord is redeclared in any fragment shader in a program,
it must be redeclared in all the fragment shaders in that
program that have a static use gl_FragCoord. All redeclarations of
gl_FragCoord in all fragment shaders in a single program must
have the same set of qualifiers."
This patch makes the glsl compiler to generate an error if we have a
fragment shader defined with conflicting layout qualifier declarations
for gl_FragCoord. For example:
layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;
layout(pixel_center_integer) in vec4 gl_FragCoord;
void main()
{
}
V2: Some code refactoring for better readability.
Add compiler error conditions for redeclarations like:
layout(origin_upper_left) in vec4 gl_FragCoord;
layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;
and
in vec4 gl_FragCoord;
layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;
V3: Simplify function is_conflicting_fragcoord_redeclaration()
V4: Check for null pointer before doing strcmp(var->name, "gl_FragCoord").
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In OpenGL 3.1 attribute 0 becomes non-magic, just like in
OpenGL ES 2.0. Earlier versions of OpenGL used attribute 0
exclusively for vertex position.
V2: Add a utility function _mesa_attr_zero_aliases_vertex() in
varray.h
Fixes 4 Khronos OpenGL CTS failures:
glGetVertexAttrib
depth24_basic
depth24_precision
rgb8_rgba8_rgb
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch makes changes to the behavior of glGetAttribLocation(),
glGetFragDataLocation() and glGetFragDataIndex() functions.
Code changes handle a case described in following example:
shader program:
layout(location = 1)in vec4[4] a;
void main()
{
}
Currently, glGetAttribLocation("a") returns 1.
glGetAttribLocation("a[i]"), where i = {0, 1, 2, 3}, returns -1.
But the expected locations for array elements are: 1, 2, 3 and 4
respectively.
This clarification came up with the addition of
ARB_program_interface_query to OpenGL 4.3.
From Page 326 (page 347 of the PDF) of OpenGL 4.3 spec:
"Otherwise, the command is equivalent to
GetProgramResourceLocation(program, PROGRAM_INPUT, name);"
And, From Page 101 (page 122 of the PDF) of OpenGL 4.3 spec:
"A string provided to GetProgramResourceLocation or
GetProgramResourceLocationIndex is considered to match an active
variable if
• the string exactly matches the name of the active variable;
• if the string identifies the base name of an active array, where
the string would exactly match the name of the variable if the
suffix "[0]" were appended to the string; or
• if the string identifies an active element of the array, where
the string ends with the concatenation of the "[" character, an
integer (with no "+" sign, extra leading zeroes, or whitespace)
identifying an array element, and the "]" character, the integer
is less than the number of active elements of the array variable,
and where the string would exactly match the enumerated name of
the array if the decimal integer were replaced with zero."
V2: Simplify get_matching_index() function.
Add relevant text from OpenGL spec in commit message.
Fixes failures in Khronos OpenGL CTS tests:
explicit_attrib_location_room
draw_instanced_max_vertex_attribs
Proprietary linux drivers of NVIDIA (331.49) matches the behavior
expected by OpenGL 4.3 spec.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently overlapping locations of input variables are not allowed for all
the shader types in OpenGL and OpenGL ES.
From OpenGL ES 3.0 spec, page 56:
"Binding more than one attribute name to the same location is referred
to as aliasing, and is not permitted in OpenGL ES Shading Language
3.00 vertex shaders. LinkProgram will fail when this condition exists.
However, aliasing is possible in OpenGL ES Shading Language 1.00 vertex
shaders."
Taking in to account what different versions of OpenGL and OpenGL ES specs
say about aliasing:
- It is allowed only on vertex shader input attributes in OpenGL (2.0 and
above) and OpenGL ES 2.0.
- It is explictly disallowed in OpenGL ES 3.0.
Fixes Khronos CTS failing test:
explicit_attrib_location_vertex_input_aliased.test
See more details about this at below mentioned khronos bug.
V2: Fix the case where location exceeds the maximum allowed attribute
location.
V3: Simplify the condition added in V2.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: Khronos #9609
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
don't leak the MCSubtargetInfo (not really big, was already fixed with
llvm master) and TargetMachine (big). While this is only used for debugging
the leak is large enough to get you into trouble in some cases.
Tested with llvm 3.1 and master.
Before (llvm 3.1), GALLIVM_DEBUG=asm glxgears:
==14152== LEAK SUMMARY:
==14152== definitely lost: 105,228 bytes in 20 blocks
==14152== indirectly lost: 347,252 bytes in 261 blocks
==14152== possibly lost: 866,625 bytes in 1,453 blocks
==14152== still reachable: 7,344,677 bytes in 6,494 blocks
==14152== suppressed: 0 bytes in 0 blocks
After:
==13799== LEAK SUMMARY:
==13799== definitely lost: 3,108 bytes in 6 blocks
==13799== indirectly lost: 0 bytes in 0 blocks
==13799== possibly lost: 804,143 bytes in 1,429 blocks
==13799== still reachable: 7,314,267 bytes in 6,473 blocks
==13799== suppressed: 0 bytes in 0 blocks
Reviewed-by: Brian Paul <brianp@vmware.com>
The brw_eu_emit.c code manually forces the header present bit when
used in align1 (scalar) mode. So, this has no effect currently.
However, it is nice to have fs_inst::header_present reflect reality.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77221
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is similar to what Eric did for Gen7 a little while ago; it also
has support for untyped surface reads.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Francisco made brw_mark_surface_used a freestanding function in
commit a32817f3c2. We should use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This must not have existed when I wrote the original code. The atomic
operation header setup code uses this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For platforms using hardware contexts (currently Gen6+), we failed to
emit PIPELINE_SELECT and 3DSTATE_VF_STATISTICS, instead emitting MI_NOOP
for both.
During one of the context initialization reordering patches, we
accidentally moved brw_init_state before we set brw->CMD_PIPELINE_SELECT
and brw->CMD_VF_STATISTICS. So, when brw_init_state uploaded initial
GPU state (brw_init_state -> brw_upload_initial_gpu_state ->
brw_upload_invariant_state), these would be 0 (MI_NOOP).
Storing the commands in the context is not worthwhile. We have many
generation checks in our state upload code, and for platforms with
hardware contexts, this only gets called once per GL context anyway.
The cost is negligable, and it's easy to botch context creation
ordering.
This may fix hangs on Gen6+ when using the media pipeline.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
arekm reported that using Chrome with GPU acceleration enabled on GM45
triggered the hw_ctx != NULL assertion in brw_get_graphics_reset_status.
We definitely do not want to advertise reset notification support on
Gen4-5 systems, since it needs hardware contexts, and we never even
request a hardware context on those systems.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75723
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This was introduced with the comment and code below it, though the code
only touches prog_data (CACHE_NEW_WM_PROG).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This keeps us from having to emit the nonpipelined state packet on every
FBO binding.
-4.42003% +/- 1.09961% effect on cairo-perf-trace runtime on glamor (n=110).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No more walking 96*6 pointers looking to see if they're the current
texture, when we only use the first 2 out of 96 units. -6.26002% +/-
1.87817% effect on cairo runtime on no-fbo-cache glamor (n=36).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of walking 6 shader stages for each of the 96 combined texture
image units, now we just walk the samplers used in each shader stage.
With cairo-perf-trace on Xephyr with glamor, I'm seeing a -6.50518% +/-
2.55601% effect on runtime (n=22) since the "drop _EnabledUnits" change.
No significant performance difference on an apitrace of minecraft (n=442).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I want to avoid walking the entire long array texture image units, but the
obvious way to do so means walking program samplers, and thus hitting the
units in a random order.
This change replaces the previous behavior of only setting up the fallback
texture for a fragment shader with setting up the fallback texture for any
shader that's missing a complete texture of the right target in its unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is kind of ugly, but I think it's worth it to finish off the last
consumers of _ReallyEnabled.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The _MaxEnabledTexImageUnit check assures us that Unit[0].Current != NULL.
This is the last consumer of _ReallyEnabled outside of the radeons.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm probably not the only person that has tried to kill _ReallyEnabled.
This does the mechanical part of the work, and cleans _ReallyEnabled from
i965.
I think that using _Current makes texture management clearer: You can't
have multiple targets in use in the same texture image unit at the same
time, because there's just that one pointer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm going to try to delete _ReallyEnabled, which is this weird bitfield
with either 0 or 1 bits set with just the reference to _Current.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The field wasn't really valid, since we've got more than 32 units now. It
turns out it was mostly just used for checking != 0, or checking for fixed
function coordinates, though.
v2: Fix mis-conversion in xm_line.c (caught by Ken).
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_EnabledUnits is all of the first 32 image units that are used by fixed
function or programs, while _EnabledCoordUnits is just which fixed function
fragment shader texcoords need to be generated. This is a theoretical bugfix
in the case of a vertex shader texturing from large texture image unit number
(we'd end up flagging something other than a VARYING_SLOT_TEXn as needing to
be generated), but it's actually just motivated by trying to kill
_EnabledUnits.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The original GL_EXT_texture_swizzle extensions said GL_INVALID_OPERATION
was to be generated when the an invalid swizzle was passed to
glTexParameter(). But in OpenGL 3.3 and later, the error should be
GL_INVALID_ENUM.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It is possible that there are multiple input buffers but only one is
relevant for translation. Then there will be only a single translation
group, which might need to source data from a buffer index != 0.
Fixes wrong vertex shader inputs as observed while debugging with an
application and driver combination that requires translation of a
vertex attribute in a non-trivial set of attributes and input buffers.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Igor Gnatenko:
v2: PIPE_COMPUTE_CAP_MAX_CLOCK_FREQUENCY instead of
PIPE_COMPUTE_MAX_CLOCK_FREQUENCY
Bruno Jiménez:
v3: Drivers report clock in Mhz
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Igor Gnatenko:
v2: in define RADEON_INFO_MAX_SCLK use 0x1a instead of 0x19 (upstream changes)
Bruno Jiménez:
v3: Convert the frequency to MHz from kHz after getting it in
'do_winsys_init'
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
We intended to set these 64-bit addresses to 0, and set the enable bit.
But, I accidentally placed the DWord with the high bits first, when it
should have been second.
This generally worked out, by luck - presumably General State Base
Address is initially zero, and ends up remaining that way in our
contexts since we bungled the "modify enable" bit.
v2: Fix MOCS shift on GSBA. It should be 4, and I had 2.
(Caught by Ben Widawsky.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The implementation is basically a NOP but it conforms with OpenCL 1.2.
[ Francisco Jerez: Initialize property return buffer for
CL_DEVICE_PARTITION_PROPERTIES, CL_DEVICE_PARTITION_TYPE,
CL_DEVICE_PARTITION_AFFINITY_DOMAIN, and make the latter a scalar
rather than a vector. Some clean-up and code style fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2: use a new variable for aligned size
add comment
make both vars const
only use the aligned value in argument constructors
fix comment typo
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The C++ headers are *not* updated because they rely on CL 1.2 APIs
that we do not implement yet when the core CL 1.2 headers are present.
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Fixes textureGatherOffset when used with a shadow sampler. Also verified
against blob compiler with textureLodOffset manually (no piglit tests
for texture[Lod]Offset + shadow samplers).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This adds support for:
IBFE, UBFE, BFI, LSB, IMSB, UMSB, BREV, POPC
Which are all required for ARB_gs5 support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add bitwise reversing and signed MSB helpers for software implementation
of the new TGSI opcodes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Also pipe through [IU]MUL_HI, MAD, and lower ldexp. This provides
coverage of all new ARB_gpu_shader5 functions except uaddCarry,
usubBorrow and interpolateAt*.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Use designated initialisers, and store the extensions pointers as const.
The loader extensions __DRIdri2LoaderExtension and __DRIswrastLoaderExtension
are setup by the platform backends so they should not be constified.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Use designated initialisers, store all extension pointers as const and use
a const __DRIextensions array over assigning each element individually.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Explicitly set the version that is implemented, as that may differ from
the one defined in dri_interface.h. The remaining __DRI*Extensions are
treated as constants, so got ahead and declare them as such.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Use a const array with the extensions, rather than assigning each
one to a fixed size array at runtime.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make sure that the DRI*Extensions report the version of the interface
implemented over the listed in the headers. While both are currently
the same, this may change in the future.
v2: Keep loader extensions handling as is.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Explicitly set the version that is implemented, as that may differ
from the one defined in dri_interface.h. Use designated initialisers
and constify whereever possible.
Note: __DRIimageExtension should not be made const as it's modified
at runtime. This patch should have no side effects on compilers that
do not support designated initialisers, as the existing code in
dri/common already uses them.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Rather than keeping a separate and unused copy of the screen extensions
within the radeon screen, use a constant array that can be used directly
with __DRIscreen.
[Kristian Høgsberg]
The copy in the radeon screen isn't unused, that's where the array is
built and stored, the dri screen just points to that. The pattern
here was used for cases where the extensions exported by a dri driver
could vary at runtime, for example depending on chipset. In this
case, it's known at compile time, so it makes sense to use a static
const array instead.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Uniformly use the typecasted extension name, constify extension instances
and use designated initialisers. Set the implemented version of the
extension, over the one defined in dri_infertace.h. Patch covers the
following extensions:
__DRItexBufferExtension
__DRIimageExtension
__DRIrobustnessExtension
__DRI2rendererQueryExtension
__DRIdri2LoaderExtension
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
With commit e59fa4c46c8("dri2: release texture image.") we updated the
extension without bumping the version number. The patch itself added an
interface required to enable texture_from_pixmap on certain platforms.
The new code was effectively never build, as it depended on
__DRI_TEX_BUFFER_VERSION >= 3, which never came to be in upstream mesa.
This commit bumps the version number, drops the __DRI_TEX_BUFFER_VERSION
checks and resolves all the build conflicts. Additionally it add a version
check as egl and dri3, as require version 2 of the extension which does
not have the releaseTexBuffer hook.
Cc: Juan Zhao <juan.j.zhao@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Unfortunately, Cygwin defines RTLD_DEFAULT (for glibc compatibility), but can't
provide dladdr(), so add a check for dladdr()
Since I don't think scons is ever used to build for Cygwin, just set HAVE_DLADDR
in SConscript, assuming that if we have RTLD_DEFAULT, we have dladdr().
Cc: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
The old python code used sys.is_big_endian to select between little-endian
and big-endian formats, which meant that the build and host endiannesses
needed to be the same. This patch instead generates both big- and little-
endian layouts, using PIPE_ARCH_BIG_ENDIAN to select between them.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Splits out the code that parses the channel list, so that we
can have different lists for little and big endian.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Rather than iterate over format.channels and format.swizzles directly,
use Python subfunctions that take the channel and swizzle lists as
arguments. This allow the channel and swizzle lists to depend on
endianness.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
With the big-endian changes, there can be two swizzle orders for each format.
This patch turns Format.inv_swizzle() into a global function that takes the
swizzle list as a parameter.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
The main aim is to reduce the number of places that access channels[0],
swizzles[0] and swizzles[1] directly.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This can happen with glamor, which uses EGL_KHR_surfaceless_context and
only explicitly binds GL_READ_FRAMEBUFFER for glReadPixels.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
relnotes weren't updated this whole time, so I went through all the
GL3.txt changes and picked out the nouveau ones since 10.1.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
_mesa_HashTable is not well-suited for us: it locks a mutex unnecessarily and
it does not accept 0 as the key (and have branches to handle 1 specially).
What we really need is a sparse array. Whether it should be implemented as a
hash table, a list, or a bsearch()-able array requires investigations of the
use models.
We choose to implement it as a list for now, assuming it is common to have a
short list of IDs in each (source, type) namespace. The code is simpler, and
the memory footprint is lower. This also fixes several corner cases such as
making messages to have different states at different severities.
v2: use GLbitfield for State/DefaultState, and add a comment
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Do not copy the debug group until it is about to be written. One likely
scenario of using glPushDebugGroup/glPopDebugGroup is to enclose a sequence of
GL commands and give them a human-readable description. There is no message
control change in this scenario, and thus no need to copy.
This also reduces the initial size of gl_debug_state from 306KB to 7KB.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add functions to provide these operations on a struct gl_debug_namespace:
init(): initialize the namespace
copy(): copy all elements from one namespace to another
clear(): clear all elements (to free the memories)
set(): set the value of an element
set_all(): set the value of all elements
get(): get the value of an element
A debug namespace is like a sparse array. The length of the array is huge,
2^sizeof(GLuint), but most of the elements assume the same value sepcified by
set_all().
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add struct gl_debug_group to hold all namespaces of a debug group. Replace
the 3-dimensional array, Namespaces, in struct gl_debug_state by a
1-dimensional array of type struct gl_debug_groups.
Turn the 4-dimensional array, Defaults, in struct gl_debug_state to a
1-dimensional array in struct gl_debug_namespace.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove NextMsgLength, and move members of struct gl_debug_state that belong to
the message log to a new struct, gl_debug_log. Rename gl_debug_msg to
gl_debug_message.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When GL_DEBUG_OUTPUT_SYNCHRONOUS is GL_TRUE, drivers are allowed to log debug
messages from other threads. That requires gl_debug_state to be protected by
a mutex, even when it is a context state. While we do not spawn threads in
Mesa yet, this commit makes it easier to do when we want to.
Since the definition of struct gl_debug_state is no longer needed by the rest
of the driver, move it to main/errors.c. This should make it even harder to
use the struct incorrectly.
v2: add comments for the accessors
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add validate_length, and call it together with log_msg directly instead of
message_insert. No functional change.
v2: make sure length is non-negative (i.e., known) before calling
validate_length, noted by Timothy Arceri
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In both call sites, it could be easily replaced by direct
debug_is_message_enabled calls. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Merge control_app_messages with the only caller. Eliminate set_message_state
and control_messages too as they are unused. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Replace free_errors_data by debug_clear_group. Add debug_pop_group and
debug_destroy for use in _mesa_PopDebugGroup and _mesa_free_errors_data
respectively. No funcitonal change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move group copying to debug_push_group. Save the group message before pushing
instead of after, since we will need it after popping. No functional change
otherwise.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move most of the code to debug_set_message_enable_all. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move message fetching to debug_fetch_message and message deletion to
debug_delete_messages. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move message logging to debug_log_message. Replace store_message_details by
debug_message_store. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move message state update to debug_set_message_enable. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move the message filtering logic to debug_is_message_enabled. No functional
change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move gl_debug_state allocation to a new function, debug_create. No functional
change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
bitfieldInsert takes scalar integers for its last two arguments. Since
bitfieldInsert is lowered on i965 to two instructions that have more
flexible arguments, I didn't notice when I wrote this.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
It looks like this bit of code is trying to disable the prime capability if
the driver doesn't support createImageFromFds. However the logic looks a bit
broken and what it would actually do is disable all other capabilities apart
from prime. This patch fixes it to actually disable prime.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This should give the caller some information of what called the error.
For the gbm_bo_import() case, for instance, it is possible to know if
the import is not supported or the error was caused by an invalid
parameter.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
In all fairness we allow the gallium tests to be build with --disable-dri
which will result in the approapriate winsys to not be build, thus the
build will fail.
./configure --disable-dri --with-gallium-drivers=svga --enable-gallium-tests
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Rather than defining our own set of variables, use NEED_WINSYS_XLIB
and based on it include the sw/xlib winsys.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The function relies on the sw/dri winsys which is build only when --enable-dri
is set. Fixes build issues with the following config
./configure --disable-dri --with-gallium-drivers=svga --enable-xa
Issue can be reproduced with any hw gallium driver + st that uses the pipe-loader.
Cc: Brian Paul <brianp@vmware.com>
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
GL (3.0) allows you to clear individual color buffers in a fb. In fact
for fbs containing both int and float/normalized color buffers this is
required (because the clearing values are otherwise undefined if applied
to all buffers). The gallium interface was changed a while ago, but llvmpipe
ignored it (hence doing such individual clears always resulted in clearing
all buffers, plus some assorted asserts due to the mixed fbs).
So change the clear command to indicate the buffer to be cleared. Also, because
indicating the buffer to be cleared would have made lp_rast_arg_cmd larger
which is unacceptable (we're trying to shrink it some day) allocate the clear
value in the scene and just pass a pointer.
There's several advantages and disadvantages here:
+ clearing individual buffers works (we could also actually bin such clears now
if they'd come through clear_render_target() if the surface is in the current
fb, though we didn't do this before for the single rb case and still don't try).
+ since there's one clear per rb, we do the format conversion in setup rather
than per bin. Aside from the (drop in the ocean...) performance advantage this
means that clearing to very small values (that is, denormal when converted to
the format) should work for small float (fp16 etc.) formats, as the util code
couldn't handle it correctly before (because cpu denorms are disabled when
executing the bin commands, screwing up the magic conversion and flushing
the values to 0, though this was not verified).
- there's some overhead for traditional old-style clear-all MRT cases, since
there's one rast clear command per rb instead of one for all rbs.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=76976.
v2: get rid of the ugly manual memcpy stuff and just use union util_color.
This is 32 bytes instead of 16 but as the allocation is per scene we can live
with those additional 16 bytes (and the additional 128 bytes in the setup
context), which makes the code much more obvious. Suggested by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
util_color often merely represents a collection of bytes, however it is
inconvenient if those bytes can only be accessed as floats/doubles for int
formats exceeding 32bits.
(Note that since rgba8 formats use one uint, not 4 bytes, hence the byte and
short member were left as is.)
The current version checking is wrongly refusing to create 3.3 contexts;
unsupported version are checked elsewhere; and the DRI path doesn't do
this sort of checking neither.
This enables piglit glsl 3.30 tests to run without skipping.
Reviewed-by: Brian Paul <brianp@vmware.com>
According to Roland all TGSI support is there in theory.
In practice there are a few piglit failures and crashes, as this hadn't
been tested before.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The GLX_ARB_create_context_profile spec says:
"If version 3.1 is requested, the context returned may implement
any of the following versions:
* Version 3.1. The GL_ARB_compatibility extension may or may not
be implemented, as determined by the implementation.
* The core profile of version 3.2 or greater."
Mesa does not support GL_ARB_compatibility, and there are no plans to
ever support it, therefore the only chance to honour a 3.1 context is
through core profile, i.e, the 2nd alternative from the spec.
This change does that. And with it piglit tests that require 3.1
contexts no longer skip.
Assuming there is no objection with this change, src/glx/dri_common.c
and src/gallium/state_trackers/wgl/stw_context.c should also be updated
accordingly, given they have the same logic.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Lets make draw_get_option_use_llvm function available unconditionally
and use it to avoid useless allocations when LLVM paths are active.
TGSI machine is never used when we're using LLVM.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes a segmentation fault in conform divzero.c test.
This happens when glTexImage(level, width=0, height=0) is called. We
don't allocate texture memory in that case so the ImageSlices array
was never allocated.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This prevents buffer overflow w/ llvmpipe when running piglit
bin/gl-3.2-layered-rendering-clear-color-all-types 1d_array single_level -fbo -auto
v2: Compute the framebuffer size as the minimum size, as pointed out by
Brian; compacted code; ran piglit quick test list (with no
regressions.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In cases where varying fetches are optimized away (just pass-through in
vertex shader, but unused in fragment shader) we need to calculate the
correct TOTALATTROVS based on the actual number of varyings fetched,
otherwise lockup.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
HiZ operations make the depth/render caches out of sync with the sampler
caches. We need to arrange for a TC flush to happen before the target
buffer is used by the sampler. Calling brw_render_cache_set_add_bo
makes that happen.
On previous generations, brw_blorp_exec took care of flushing the
texture cache by calling intel_batchbuffer_emit_mi_flush after doing
any rendering. If we were to use the normal drawing path, then
brw_postdraw_set_buffers_need_resolve would handle this.
On Broadwell, we don't use BLORP, and we don't emit a rectangle
primitive via the normal drawing path. The 3DSTATE_WM_HZ_OP and
PIPE_CONTROL implicitly make drawing happen. So, none of our existing
code makes this flush happen - we need to do it directly.
Fixes 11 Piglit copyteximage subtests.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77223
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77226
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The way cik_num_banks() was calculating the index only makes sense for
the CIK specific macrotile mode array. For SI, we need to use the tile
mode index directly.
This happened to work most of the time because most of the SI tiling
modes use the same number of banks.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Previously this was special-cased for VS and FS; it never got updated
when geometry shaders came along. Generalize using is_varying_var() so
this won't be broken again with tessellation.
Note that there are two copies of the logic for `invariant`: It can be
present as part of a new declaration, and also as a redeclaration of an
existing variable or block member.
Fixes the four new piglits:
spec/glsl-1.50/compiler/invariant-qualifier-*.geom
Note for stable: This won't quite pick cleanly due to whitespace and
state->target -> state->stage renames. Should be straightforward
adjustments though.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 4.3.1, page 220, of OpenGL 3.3 specification explains
the error conditions for glreadPixels():
"If the format is DEPTH_STENCIL, then values are taken from
both the depth buffer and the stencil buffer. If there is
no depth buffer or if there is no stencil buffer, then the
error INVALID_OPERATION occurs. If the type parameter is
not UNSIGNED_INT_24_8 or FLOAT_32_UNSIGNED_INT_24_8_REV,
then the error INVALID_ENUM occurs."
Fixes failing Khronos CTS test packed_depth_stencil_error.test
V2: Avoid code duplication
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the OpenGL 4.4 spec page 275:
"If pname is FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE, param will
contain the format of components of the specified attachment,
one of FLOAT, INT, UNSIGNED_INT, SIGNED_NORMALIZED, or
UNSIGNED_NORMALIZED for floating-point, signed integer,
unsigned integer, signed normalized fixedpoint, or unsigned
normalized fixed-point components respectively. If no data
storage or texture image has been specified for the attachment,
param will contain NONE. This query cannot be performed for a
combined depth+stencil attachment, since it does not have a
single format."
Fixes Khronos CTS test: packed_depth_stencil_parameters.test
Khronos Bug# 9170
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The '**used' pointer was pointing into the stObj->sampler_views array.
If 'free' was null, we'd realloc that array, thus making the 'used'
pointer invalid. This soon led to memory errors.
Just change the pointer to be '*used' so it points directly at the
pipe_sampler_view.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Avoid looping over 32/48/96 (!!) tex image units every draw, most of
which we don't care about.
Improves performance on everyone's favorite not-a-benchmark by 2.9% on
Haswell.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This gives us a better bound for some hot loops in the drivers than
MAX_COMBINED_TEXTURE_IMAGE_UNITS, which is ridiculously large on modern
hardware, and only getting worse as more shader stages are added.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Also rework things so that if someone were to add an opcode without
adjusting the values in these arrays, there will be a compilation error.
This fixes a few quadop-related piglit regressions since commit
d5faf8e786.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We want to center the sample. The old code may have been correct given
the limited values of ms_x/y, but the new logic should be more
intuitive. Note that ms_x can only be 1/2 and ms_y can only be 0/1.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The 2D engine should be usable in more cases, but this fixes MS blits
between textures with the same MS settings. Otherwise a single sample is
selected to be the target texel value.
This allows other tests to work that render to a RB and then blit that
to a texture for input into a shader that uses sampler2DMS to verify it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Back when I originally wrote this code, force_sechalf was only used for
Gen4 code, so I didn't bother hooking it up. However, it's used more
generally these days. In particular, we use it for computing
gl_SamplePosition.
Fixes Piglit's spec/ARB_sample_shading/builtin-gl-sample-position tests.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77222
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We previously only allowed coalescing registers that interfere (i.e.,
whose live ranges overlap) if the destination register's live range was
entirely inside the source's live range. This is unnecessary -- we only
need to check for interfering writes in the intersection of their live
ranges.
total instructions in shared programs: 1639470 -> 1638453 (-0.06%)
instructions in affected programs: 84751 -> 83734 (-1.20%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than any old control flow. Muchnick's algorithm just checks for
interfering writes between the MOV and the end of the program. Handling
this when you have backward branches is hard, so don't, but there's no
reason to bail if you see forward branches.
instructions in affected programs: 4270 -> 4248 (-0.52%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were starting at the beginning of the instruction list, rather than
with the MOV instruction itself. This allows us to coalesce after
control flow.
Excluding the shaders from an unreleased title, the shader-db results:
total instructions in shared programs: 1603791 -> 1594215 (-0.60%)
instructions in affected programs: 678772 -> 669196 (-1.41%)
GAINED: 5
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And avoid rewriting other instructions unnecessarily. Removes a few
self-moves we weren't able to handle because they were components of a
large VGRF.
instructions in affected programs: 830 -> 826 (-0.48%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's a few 3-component vertex attribute formats that have no
equivalent SVGA3D_DECLTYPE_x format. Previously, we had to use
the swtnl code to handle them. This patch lets us use hwtnl for
more vertex attribute types by fetching 3-component attributes as
4-component attributes and explicitly setting the W component to 1.
This lets us handle PIPE_FORMAT_R16G16B16_SNORM/UNORM and
PIPE_FORMAT_R8G8B8_UNORM vertex attribs without using the swtnl path.
Fixes piglit normal3b3s GL_SHORT test.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
There's no SVGA3D_DECLTYPE that directly corresponds to
PIPE_FORMAT_R8G8B8_SNORM. Previously, we used the swtnl fallback
path to handle this but that's slow and causes invariance issues.
Now we fetch the attribute as SVGA3D_DECLTYPE_UBYTE4N and insert
some extra VS instructions to remap the attributes from the range
[0,1] to the range[-1,1].
Fixes Sauerbraten sw fallback.
Fixes piglit normal3b3s-invariance test.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Now only translate the formats once in svga_create_vertex_elements_state().
And rename the array and use the proper SVGA3dDeclType type.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This reverts commit c875d6e57a.
Conflicts:
src/gallium/drivers/svga/svga_context.c
This work-around will no longer be needed after the next patch
which properly supports signed-byte vertex attributes.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This sets up the proper execution mask for sends in SIMD16 mode.
Fixes Piglit's glsl-fs-normalmatrix, glsl-fs-uniform-array-2,
glsl-fs-uniform-array-6, and glsl-fs-uniform-array-7 on Ironlake,
which regressed when I enabled SIMD16 pull parameter support in
commit b207e88b25.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_ViewportIndex doesn't get its own varying slot. It is stored
in VARYING_SLOT_PSIZ.z. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.
Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_ViewportIndex'.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gl_Layer doesn't get its own varying slot. It is stored in
VARYING_SLOT_PSIZ.y. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.
Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_Layer'.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that _debug_assert_fail() has the noreturn attribute, it is better
that execution truly never returns. Not just for sake of silencing the
warning, but because the code at the return IP address may be invalid or
lead to inconsistent results.
This removes support for the GALLIUM_ABORT_ON_ASSERT debugging
environment variable, but between the usefulness of
GALLIUM_ABORT_ON_ASSERT and better static code analysis I think better
static code analysis wins.
Reviewed-by: Brian Paul <brianp@vmware.com>
Same intent as commit a45a50a482,
but this the C compiler is detected via C-preprocessor macros,
similar to how autotools do it, as that seems to be the most
reliable method.
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows the following shader code to work without a weird crash:
struct Foo {
int value[1];
};
int actual_value = Foo[2](Foo(int[1](100)), Foo(int[1](200)))[i].value[0];
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
nouveau_vp3_inter_sizes requires sliec_count as argument just
as the other places that call it from h264 code do. Hopefully
fixes something.
Fix the status_vp code to allow status == 0 too, when processing
hasn't started yet.
set h264->second_field correctly.
Otherwise it will trick the gallium driver into thinking that the render
target has actually changed (due to different pipe_surface pointing to
same underlying pipe_resource). This is really badness for tiling GPUs
like adreno.
This also appears to fix a rendering error with Motif on vmwgfx.
Why that is is still under investigation.
Based on an idea by Rob Clark.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Keep track of the maximal bounds of all the operations and set scissor
accordingly. For tiling GPU's this can be a big win by reducing the
memory bandwidth spent moving pixels from system memory to tile buffer
and back.
You could imagine being more sophisticated and splitting up disjoint
operations. But this simplistic approach is good enough for the common
cases.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Once the relevant branch has been identified do not iterate over the
instructions in the branch, do a linked list insertion instead to avoid the
loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes piglit's fbo-blit-stretch test on drivers which use the meta path.
(i965: should fix Broadwell, but also fixes Sandybridge/Ivybridge/Haswell
since this test falls off the blorp path now due to format conversion)
V2: Use scissor instead of just mangling the rects, to avoid texcoord
rounding problems. (Thanks Marek)
V3: Rebase on Eric's CTSI meta changes; re-add _mesa_update_state in the
CTSI path so that _mesa_clip_blit sees the correct bounds.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77414
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
According to the spec:
<renderbuffertarget> must be RENDERBUFFER and <renderbuffer>
should be set to the name of the renderbuffer object to be
attached to the framebuffer. <renderbuffer> must be either
zero or the name of an existing renderbuffer object of type
<renderbuffertarget>, otherwise an INVALID_OPERATION error is
generated.
This patch changes the previous returned GL_INVALID_VALUE to
GL_INVALID_OPERATION.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76894
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:
mul acc0, x, y
mach null, x, y
mov dest, acc0
Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to emit ADD/MUL/MAC instead of MUL/ADD/MUL/ADD,
saving one instruction and two temporary registers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows us to emit ADD/MUL/MAC instead of MUL/ADD/MUL/ADD,
saving one instruction and two temporary registers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Our hardware has an "accumulator" register, which can be used to store
intermediate results across multiple instructions. Many instructions
can implicitly write a value to the accumulator in addition to their
normal destination register. This is enabled by the "AccWrEn" flag.
This patch introduces a new flag, inst->writes_accumulator, which
allows us to express the AccWrEn notion in the IR. It also creates a
n ALU2_ACC macro to easily define emitters for instructions that
implicitly write the accumulator.
Previously, we only supported implicit accumulator writes from the
ADDC, SUBB, and MACH instructions. We always enabled them on those
instructions, and left them disabled for other instructions.
To take advantage of the MAC (multiply-accumulate) instruction, we
need to be able to set AccWrEn on other types of instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
OpenGL 4.0 spec, page 306 suggests an INVALID_OPERATION in glGetTexImage
if :
"format is one of the integer formats in table 3.3 and the internal
format of the texture image is not integer, or format is not one of
the integer formats in table 3.3 and the internal format is integer."
V2: Use helper function _mesa_is_format_integer()
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
mesa currently returns 4 when GL_VERTEX_ATTRIB_ARRAY_SIZE is queried
for a vertex array initially set up with size=GL_BGRA. This patch
makes changes to return size=GL_BGRA as required by the spec.
Fixes Khronos OpenGL CTS test: vertex_array_bgra_basic.test
V2: Use array->Format instead of adding a new variable
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
For graphics, the LLVM compiler backend currently has many shortcomings
compared to the non-LLVM one. E.g. it can't handle geometry shaders yet,
but that's just the tip of the iceberg.
So building Mesa with --enable-r600-llvm-compiler is currently not
recommended for anyone who doesn't want to work on fixing those issues.
However, for protection of users who end up enabling it anyway for some
reason, let's disable the LLVM backend at runtime by default. It can be
enabled with the environment variable R600_DEBUG=llvm.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This reverts commit a45a50a482.
Unfortunately gcc dumps argv[0] as the first word of --version, so it is
unreliable for detecting gcc.
In particular `cc --version` and `i686-w64-mingw32-gcc --version` give
wrong results.
A better solution needs to be found -- most likely using C-preprocessing
like autotools does. Revert for now.
For Clang static code analyzer, the scan-build script will produce more
comprehensive output. Nevertheless you can invoke it as
CC=clang CXX=clang++ scons analyze=1
For MSVC this is the best way to use its static code analysis. Simply
invoke as
scons analyze=1
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently we can have name space collisions between blocks that define the same
fields. For example:
in block
{
vec4 Color;
} In[];
out block
{
vec4 Color;
} Out;
These two blocks will assign the same interface name (block.Color) to the Color
field in flatten_named_interface_blocks_declarations.cpp, leading to havoc.
This was breaking badly the gl-320-primitive-shading test from ogl-samples.
The patch uses the block instance name to avoid collisions, producing names
like block.In.Color and block.Out.Color to avoid the name clash.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76394
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The transfer staging texture is always freshly allocated, so for write-only
transfers we don't need to explicitly wait for the BO to become idle.
Squeezes a few hundered MB/s more out of x11perf -shmput500 with glamor.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This manifested as rendering failures or sometimes GPU hangs in
compositors when they accidentally got MSAA visuals due to a bug in the X
Server. Today we decided that the problem in compositors was equivalent
to a corruption bug we'd noticed recently in resizing MSAA-visual
glxgears, and debugging got a lot easier.
When we allocate our MCS MT, libdrm takes the size we request, aligns it
to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880
bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
32 pages). Because it's Y tiled, we attach a 384-byte-stride fence to it.
When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
data gets stored to the first 20 or so scanlines of each of the 3 tiled
pages in that row, even though only 2 of those pages were allocated by
libdrm. In the glxgears case, the missing 3rd page happened to
consistently be the static VBO that got mapped right after the first MCS
allocation, so corruption only appeared once window resize made us throw
out the old MCS and then allocate the same BO to back the new MCS.
Instead, just memset the amount of data we actually asked libdrm to
allocate for, which will be smaller (more efficient) and not overrun.
Thanks go to Kenneth for doing most of the hard debugging to eliminate a
lot of the search space for the bug.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't have any piglit tests for this currently.
v2: Use vec3s for the texcoords so it has some hope of working.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
You'll note from the previous commits that there's something of a loop
here: You call CTSI, which calls BlitFB, then if things go wrong that
falls back to CTSI. As a result, meta CTSI reaches over into blitfb to
tell it "no, don't try that fallback".
v2: Drop the _mesa_update_state(), which was only necessary due to use of
_mesa_clip_blit() in _mesa_meta_BlitFramebuffer() in another patch
series.
v3: Drop an _EXT suffix I copy-and-pasted.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I added support to bind_fbo_image in the process of building meta
CopyTexSubImage, and found that it broke generatemipmap because previously
we would just throw a GL error there and then end up with an incomplete
FBO and fallback.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I need to do the same code again for CopyTexSubImage().
v2: Drop incorrect, not-terribly-useful comment (review by Ken)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids a ReadPixels() if there's accelerated CopyTexImage present.
It now requires GLSL as opposed to just fragment programs, but we don't
have any drivers that do ARB_fp but not GLSL.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There shouldn't be anything special about copying out a subset of the src
rb to a temp before texturing from it, so just do it when we're figuring
out our src texture binding.
This drops Anuj's change to copy an extra border of 1 pixel around the src
area. I can't see how that change could be valid, and presumably if
there's some filtering problem at edges we just need to set the right
wrap mode.
v2: Don't fall back to swrast on non-2D/RECT/2D_MS textures when we can
still CopyTexSubImage. Fixes a segfault regression on i965 with
gl-3.2-layered-rendering-blit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
I think we can assert that renderbuffer size is <= maximum 2D texture
size. Our source coordinates should have already been clipped to the src
renderbuffer size, but haven't actually (so we could potentially have
trouble if there's scaling, and we're in the CopyTexImage path that tries
to use src size). However, this texture size dependency was blocking the
next refactors, so I'm not sure if we want to go ahead with this series
before we get the clipping sorted out or not.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Putting NoDDClr and NoDDChk dependency control on instruction
sequences that include math opcodes can cause corruption of channels.
Treat math opcodes like send opcodes and suppress dependency hinting.
Signed-off-by: Mike Stroyan <mike@LunarG.com>
Tested-by: Tony Bertapelli <anthony.p.bertapelli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
total instructions in shared programs: 1653399 -> 1651790 (-0.10%)
instructions in affected programs: 92157 -> 90548 (-1.75%)
GAINED: 2
LOST: 2
Also significantly reduces the number of optimization loop iterations:
total loop iterations in shared programs: 39724 -> 31651 (-20.32%)
loop iterations in affected programs: 21617 -> 13544 (-37.35%)
Including some great pathological cases, like 29 -> 3 in Strike Suit
Zero and 24 -> 3 in Dota2.
Reviewed-by: Eric Anholt <eric@anholt.net>
We previously stopped searching for unread writes after encountering
control flow, but we can instead just search backwards until we hit
control flow.
instructions in affected programs: 22854 -> 22194 (-2.89%)
The generator uses its destination as a source implicitly, which breaks
some assumptions in dead code elimination. Giving the instruction a
source allows us to reason about it better.
We originally thought that GL 3.0 required GL_DEPTH_COMPONENT16 to map
exactly to Z16. However, we misread the specification, thanks in part
to LaTeX reordering the tables in the PDF.
Page 180 of the GL 3.0 specification (glspec30.20080923.pdf) says:
"[...] memory allocation per texture component is assigned by the GL to
match the allocations listed in tables 3.16-3.18 as closely as possible.
[...]
Required Texture Formats
[...]
In addition, implementations are required to support the following sized
internal formats. Requesting one of these internal formats for any
texture type will allocate exactly the internal component sizes and
types shown for that format in tables 3.16-3.17:"
Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16
or table 3.17. It appears in table 3.18, where the "exact" rule doesn't
apply, and it falls back to the "closely as possible" rule.
The confusing part is that the ordering of the tables in the PDF is:
Table 3.16 (pages 182-184)
Table 3.18 (bottom of page 184 to top of 185)
Table 3.17 (page 185)
Presumably, people saw table 3.16, then saw the table immediately
following with DEPTH_COMPONENT* formats, and assumed it was 3.17.
Based on a patch by Chia-I Wu, but without the driconf option to force
Z16 to be used. It's not required, and there's apparently no benefit
to actually using it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
We've learned a few things since we originally disabled Z16; this attempts
to summarize the issue. I am no expert on this subject, though, so the
comment may not be totally accurate.
I did some benchmarking on GM45 and Ironlake, and discovered that for
GLBenchmark 2.7 EgyptHD, using Z16 was 3% slower on GM45 (n=15), and
4.5% slower on Ironlake (n=95). So, we can drop the "on Ivybridge"
aspect of the comment - it's always slower.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Significantly reduces BO allocation / destruction overhead for transfers,
e.g. measurable via x11perf -shm{ge,pu}t* with glamor.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The rules are about to get a bit more complex to account for
gl_InstanceID and gl_VertexID, which are system values.
Extracting this first avoids introducing duplication.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
glClearBuffer() is currently clearing all active draw color buffers (all
buffers that have not been set to GL_NONE when calling glDrawBuffers) instead
of only clearing the one it receives as parameter. Altough brw_clear()
receives a bit mask indicating the color buffers that should be cleared,
this mask is ignored when calling brw_blorp_clear_color().
This was breaking the 'fbo-drawbuffers-none glClearBuffer' piglit test.
The patch provides the bit mask to brw_blorp_clear_color() so it can limit
clearing to the color buffers present in the mask.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76832
Reviewed-by: Eric Anholt <eric@anholt.net>
This always looks crazy when I stumble across it, until I remember
what the hardware is doing. Describing it ought to short-circuit
that process next time :)
V2: Fix indents to 6 spaces, not 7.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Many shaders use a pattern such as:
for (int i = 0; i < NUM_LIGHTS; i++) {
...access a uniform array, or shader input/output array...
}
where NUM_LIGHTS is a small constant (such as 2, 4, or 8).
The expectation is that the compiler will unroll those loops, turning
the array access into constant indexing, which is more efficient, and
which may enable array splitting and other optimizations.
In many cases, our heuristic fails - either there's another tiny nested
loop inside, or the estimated number of instructions is just barely
beyond the threshold. So, we fail to unroll the loop, leaving the
variable indexing in place.
Drivers which don't support the particular flavor of variable indexing
will call lower_variable_index_to_cond_assign(), which generates piles
and piles of immensely inefficient code. We'd like to avoid generating
that.
This patch detects unsupported forms of variable-indexing in loops, where
the array index is a loop induction variable. In that case, it bypasses
the loop-too-large heuristic and forces unrolling.
Improves performance in various microbenchmarks: Gl32PSBump8 by 47%,
Gl32ShMapVsm by 80%, and Gl32ShMapPcf by 27%. No changes in shader-db.
v2: Check ir->array for being an array or matrix, rather than the
ir_dereference_array itself.
v3: Fix and expand statistics in commit message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The "fail" flag is set if loop_unroll_count encounters a nested loop;
calling the flag "nested_loop" is a bit clearer.
The original reasoning was that count is inaccurate (too small) if there
are nested loops, as we don't do any sort of analysis on the inner loop.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Loop unrolling will need to know a few more options in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we pass in gl_shader_compiler_options, it makes sense to just
use options->MaxUnrollIterations, rather than passing a separate
parameter.
Half of the invocations already passed options->MaxUnrollIterations,
while the other half passed in a hardcoded value of 32.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will prevent the two from getting out of sync again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These were out of sync with the flags used to control
lower_variable_index_to_cond_assign in brw_shader.cpp.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Not setting this would prevented coalescing after a failed attempt if
the sources for both MOVs were the same.
total instructions in shared programs: 1654531 -> 1650224 (-0.26%)
instructions in affected programs: 423167 -> 418860 (-1.02%)
GAINED: 2
LOST: 0
Reviewed-by: Eric Anholt <eric@anholt.net>
It's just the array index, so we can just go look at the array and see
which element we are.
No significant performance difference (n=140)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When considering assignment expressions like:
v.x += u.x;
v.x += u.x;
the vectorizer would incorrectly keep going, attempting to find more
instructions to vectorize. It would overwrite the saved assignment
to point at the second one, and increment channels a second time,
resulting in try_vectorize thinking the expression was a vec2 instead of
a float.
Instead, if we see a repeated assignment to a channel, just try to
vectorize everything we've found so far. This clears the saved state
so it will start over.
Fixes Piglit's repeated-channel-assignments.vert.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For blocks, gl_shader_program::UniformStorage isn't very useful. The
names stored there are the names of the elements of the block, so
finding blocks with an instance name is hard. There is also only one
entry in ::UniformStorage for each element of a block array, and that is
a deal breaker.
Using ::UniformBlocks is what _mesa_GetUniformBlockIndex does. I
contemplated sharing code between set_block_binding and
_mesa_GetUniformBlockIndex, but building the stand-alone compiler and
the unit tests make this hard. I plan to return to this effort shortly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76323
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Cc: github@socker.lepus.uberspace.de
In the next patch, we'll see that using
gl_shader_program::UniformStorage is not correct for uniform blocks.
That means we can't use ::UniformStorage to select between the sampler
path and the block path. Instead we want to just use the type of the
variable. That's never passed to set_uniform_binding, and it's easier
to just remove the function (especially for later patches in the series)
than to add another parameter.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76323
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Cc: github@socker.lepus.uberspace.de
And remove nonsensical approximation of linear interpolation behavior
for shadow samplers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
These were missed/typo'd in the previous patch series:
s/R8G8B8A/R8G8B8A8/
s/rgba_16/RGBA_UNORM16/
s/rgba_uint/RGBA_UINT/
s/rgba_int/RGBA_SINT/
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The commit 3b0b44f7de introduced a build
error:
error: dereferencing pointer to incomplete type
This patch fixes this issue in all the affected files.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Keep tasks as linked list, this way we can associate
more than one encoding task with each buffer.
Signed-off-by: Christian König <christian.koenig@amd.com>
It was always getting set to -8/7 unconditionally. Use the
driver-reported value instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Defaults to providing the same offsets as MIN/MAX_TEXEL_OFFSET. For
nvc0, the offset can be -32/31.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Create the screen in the winsys while the mutex is locked.
This also results in a nice code cleanup!
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This also hides the reference count from drivers.
v2: update the reference count while the mutex is locked in winsys_create
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This replaces u_gen_mipmap with an extremely simple implementation based
on pipe->blit. st/mesa is also cleaned up.
Pros:
- less code
- correct mipmap generation for NPOT 3D textures (u_blitter uses a better
formula)
- queries are not affected by mipmap generation if drivers disable them
v2: add "first_layer", "last_layer" parameters, drop "face"
v2.1: add format
v2.2: document the format parameter
This is needed by _mesa_generate_mipmap.
This adds an array of pipe_transfers to st_texture_image. Each transfer is
for mapping a single layer.
v2: allocate the array of transfers on demand
No longer used anywhere. These also caused trouble in the Gallium
state tracker code where we include both core Mesa and Gallium util
headers (and the macros were defined differently in each world.)
Removing these macros should help avoid macro mix-ups in the future.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We moved away from MALLOC/FREE in the rest of core Mesa a while ago.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were using REALLOC() from u_memory.h but FREE() from imports.h.
This mismatch caused us to trash the heap on Windows after we
deleted a texture object.
This fixes a regression from commit 6c59be7776.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
This extension is a huge grab-bag of "stuff that's in DX11". Break it
apart to make it clear what still needs to be done.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is the effective layer count, for clears etc. This differs from the
depth of the miptree level when views are involved.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
We'll still avoid MinLayer here since the fast path doesn't understand
arrays at all, but it's straightforward to do levels.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This allows core mesa's TexSubImage paths etc to work correctly
with views which have nonzero MinLevel or MinLayer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This is the actual mesa_format to use. In non-view cases this is always
the same as the mt's format.
V4: Comment style
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
We need to wire the original texture's mt into the view. All the hard
work of setting up an appropriate tree of gl_texture_image structures
has already been done by core mesa.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
If we were to relayout the miptree, we'd break any views that are
sharing it.
(Simplified based on suggestions from Eric)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
These formats can be cast to others (with different component types or
sizes) via ARB_texture_view or ARB_shader_image_load_store. We want
them to be laid out consistently so that we can just reinterpret the
memory with a different format.
In V1, this was done conditionally on a 'prefer_no_swizzle' flag which
was set in TexStorage/TextureView paths, but we need the same behavior
for ARB_shader_image_load_store (which also works with images created
via TexImage, so we don't want it to be conditional.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
The sampler can handle R8G8B8X8 (and substitute 1.0 for the fourth
component) but we can't use it as a render target.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
None of the other 3-component 16bpc formats are directly supported, so
they get promoted to XRGB equivalents. *Not* promoting RGB16F the same
way makes texture views much more fiddly -- we don't want to have to do
crazy copying behind the scenes.
(with my other master + my experimental ARB_texture_view support) fixes
the piglit test: `spec/ARB_texture_view/view compare 48bit formats`
No regressions in gpu.tests on Haswell.
V4: Don't alter the formats table -- just don't match it to a mesa_format. [Kenneth]
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This is supported by all generations, and is required for memory layout
consistency for texture_view.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Previously, we would unpack the texels to floats using *_TO_FLOAT_TEX,
and then pack them into the desired format using FLOAT_TO_*. Unfortunately,
this isn't quite the inverse operation, and so some texel values would
end up off-by-one.
This fixes the GL_RGB8_SNORM and GL_RGB16_SNORM subcases in piglit's
arb_texture_view-format-consistency-get test on i965. The similar 1-, 2-
and 4-component cases already worked because they took the memcpy path
rather than repacking.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
While linux uses .so as a default extension for shared libraries that is
not the case for other platforms. The loader in libGL (and others) assumes
that the dri module will always have a .so extension, thus it will fail
to load on the affected platforms.
Spotted-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Filenames passed to dlopen() don't need to use the platform's default extension
for shared libraries.
Using the '.so' extension when dlopen()ing DRI drivers is hardcoded into mesa
and the X server, so it should be hardcoded here in the Makefile as well.
A similar fix is probably also needed for gallium DRI drivers.
(Consider that if we were starting from scratch, perhaps we would use a custom
extension like .dri instead)
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 61bedc3d6b.
As the header is the one defining the API/ABI and is distributed
during installation, we should be using it rather than re-defining
the XA version in configure.ac.
Bump the version in the header to 2.2.0, to reflect what was the
original intent of commit 42158926c6.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
With commit 1f1928db001(glx: Drop _Xglobal_lock while we create and
initialize glx display) we've split the big _Xglobal_lock handling in
a more fine grained manner.
Unfortunatelly we forgot to drop the unlock_mutex on the error paths,
leading to undefined behaviour as the mutex is already unlocked.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We hit this assert with some piglit tests. Which appears to be a bug
outside of freedreno. Previously we were relying on assert() being
redefined to debug_assert() so that we didn't crash in release builds.
Somehow that stopped working. So just use debug_assert() directly.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes a crash in svga_context_flush_buffers() if we use the 'draw' module
for AA lines (when the device doesn't support that feature). We need to
initialize this list before we setup the swtnl pieces.
Found/fixed by Charmaine Lee.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The "new" fragment shader backend has never supported the necessary
color conversion code for this to work. We began using the new backend
in Mesa 7.10 for GLSL (commit a81d423d93, October 2010),
and for ARB_fragment_program in Mesa 9.1 (commit 97615b2d8c,
August 2012).
I haven't heard any complaints, so I don't think anyone will miss this
feature. I believe mplayer used it at one point, but these days
defaults to other paths anyway.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
create_mov() was fixed up to handle neg/abs properly for interal mov's,
using absneg.f, but forgot to fix it for TGSI MOV's. The problem with
using add.f to handle negated mov's is that we can only take a single
const reg src. So:
MOV TEMP[n], -CONST[m]
would turn into:
add.f Rdst, (neg)CONST[m], 0.0
which would not work. Anyways, just remove the extra code and always
use create_mov() which DTRT.
This fixes piglit vs-op-neg-int test.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Applications frequently clear to colors other than 0.0 or 1.0, which
prevents us from doing fast color clears. In that case, we issue this
performance warning on basically every glClear call, resulting in so
much spam that it's nearly impossible to see any other messages.
Plus, I don't think it's useful. We aren't suggesting a better way to
do what the application developers want---we're just telling them it
would be faster to do something they don't want.
Driver developers have no control over the clear color, so this message
is totally useless to them.
A better alternative to get this sort of information is to use
INTEL_DEBUG=blorp, which tells you whether color clears were fast,
simd16 repdata, or slow.
v2: Rebase on has_color_component changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Keep track of whether we actually have any sam instructions in the
resulting shader, rather than using TGSI SAMP declarations. If the sam
instruction is optimized out, because the result is not used, we don't
want to emit texture state, etc. In fact emitting sampler state and/or
setting PIXLODENABLE bit when there are no texture fetches seems to
cause lockup.
In theory this should never happen for a "normal" shader, unless the
state tracker is wonky. But it is a very real possibility for binning
pass shaders.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Decompressing ETC2 textures was causing intermitent segfault
by copying resulting 4x4 texel block to the destination texture
regardless of the size of the destination texture. Issue found
via application crash in GLBenchmark 3.0's Manhattan test.
v2: add more detail comment. Compute limit outside inner loops.
v3: add bugzilla reference
v4: Correct cc syntax in commit log
v5: really grab the right patch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74988
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1, suggested v2-3]
v2: move allocation to a function as first step
to clean vid_enc_EncodeFrame
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
For TEX instructions, the set of samplers and sampler views should
be consistent. The XA state tracker sometimes passes an inconsistent
set of samplers and sampler views. Rather than assert and die, issue
a warning.
v2: add debugging code to detect inconsistent state.
v3: also check for null sampler in svga_state_tss.c
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Given
mov vgrf7, vgrf9.xyxz
add vgrf9.xyz, vgrf4.xyzw, vgrf5.xyzw
add vgrf10.x, vgrf6.xyzw, vgrf7.wwww
the last instruction would be wrongly changed to
add vgrf10.x, vgrf6.xyzw, vgrf9.zzzz
during copy propagation.
The issue is that when deciding if a record should be cleared, the old code
checked for
inst->dst.writemask & (1 << ch)
instead of
inst->dst.writemask & (1 << BRW_GET_SWZ(src->swizzle, ch))
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Cc: Jordan Justen <jljusten@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romainck <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.1" <mesa-stable@freedesktop.org>
Nothing bad came of this because they weren't used after visitor running,
but leaving them in a bad state seems like a recipe for pain later.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
When you've got a simple solid-color shader that doesn't generate
pixel_x/y interpolation, we were deciding that the first vgrf was both the
undefined pixel_x and pixel_y, and extending its live interval to avoid
the stride problem. That tricked other optimization that tries to see if
a particular instruction is the last use of a variable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We stopped doing variable index lowering for uniforms in
a64c1eb9b1, 5 months after the comment was
added.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While we wish our optimization passes could identify all the cases where
we can coalesce our variables, we miss out on a lot of opportunities.
total instructions in shared programs: 1673849 -> 1673166 (-0.04%)
instructions in affected programs: 299521 -> 298838 (-0.23%)
GAINED: 7
LOST: 0
Note that many programs are "hurt". The notable ones are where we produce
unrolling in cases we didn't before (presumably just because of the lower
instruction count). But there are also some cases where pushing things
right into the variables prevents copy propagation and tree grafting,
since we don't split our variable usage webs apart.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
is_power_of_two() is now provided by mesa so its definition must be removed
from the i915 driver code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch will introduce an optimization that only works when
integers are not represented as floating point values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The next few patches will introduce an optimization that only works when
integers are not represented as floating point values.
v2: Re-word-wrap a line, as requested by Ian Romanick.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_binop_ubo_load takes unsigned integer operands. However, the array
index used to compute these offsets may be a signed integer. (For
example, see Piglit's spec/glsl-1.40/uniform_buffer/fs-bvec-array).
For some reason, we were missing an ir_binop_i2u cast, and ir_validator
was failing to catch that.
Without this change, ir_builder's type inference code broke for me when
writing a new optimization pass.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The vector backend already implemented this optimization, but
surprisingly, we never bothered to implement it in the scalar backend.
In addition to saving two instructions, this eliminates a use of the
accumulator as an explicit source, which is unsupported in SIMD16 mode
on Gen7+, which could help us gain SIMD16 programs.
Cuts 19.23% of the instructions in dolphin/efb2ram.shader_test.
v2: Rebase on is_16bit_integer_constant -> is_uint16_constant rename.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The i965 MUL instruction doesn't natively support 32-bit by 32-bit
integer multiplication; additional instructions (MACH/MOV) are required.
However, we can avoid those if we know one of the operands can be
represented in 16 bits or less. The vector backend's is_16bit_constant
static helper function checks for this.
We want to be able to use it in the scalar backend as well, which means
moving the function to a more generally-usable location. Since it isn't
i965 specific, I decided to make it an ir_constant method, in case it
ends up being useful to other people as well.
v2: Rename from is_16bit_integer_constant to is_uint16_constant, as
suggested by Ilia Mirkin. Update comments to clarify that it does
apply to both int and uint types, as long as the value is
non-negative and fits in 16-bits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Performance warnings are logged via KHR_debug in addition to when the
INTEL_DEBUG=perf environment variable is set. Without this, messages in
debug contexts would have "(null)" for the reason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These are clearly needed---the comments in the function are even present
for each one of them. I originally had two separate state atoms for
3DSTATE_SBE and 3DSTATE_SBE_SWIZ. When I combined the functions, I must
have forgotten to add the atoms for 3DSTATE_SBE_SWIZ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing actually uses this---we handle rasterizer discard in the
clipper in order for statistics counters to work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
renderer_copy_prepare was setting the first sampler but never telling
the cso code how many samplers were actually used. Fix this.
Cc: "10.1" <mesa-stable@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Binding a new destination may cause the svga driver to emit draw calls
while propagating the surface. Make sure this doesn't happen in the middle
of sampler state setup where state may be incosistent.
In practice, surface propagation should never happen here and even if it did,
it wouldn't be a valid reason for the svga driver to emit partially set up
state, but to avoid future uncertainties, make sure this doesn't happen
anyway.
Found while auditing the state tracker for inconsistent sampler state /
sampler view setup.
Cc: "10.1" <mesa-stable@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
I think this was used for coalescing out partly dead large virtual
registers, but the patch that enabled that caused regressions and didn't
make it upstream.
Reviewed-by: Eric Anholt <eric@anholt.net>
It's more likely that we won't find writes to all channels than one will
interfere, and calculating interference is more expensive. This change
will also help prepare for coalescing load_payload instructions'
operands.
Also update the live intervals for all channels, and not just the last
that we saw.
Reviewed-by: Eric Anholt <eric@anholt.net>
The intent in 9b6b084eb7 was
for urb .size and .min_vs_entries fields to use the values
from the GEN8_FEATURES macro.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rename functions to match format names.
sed commands:
s/signed_rgba8888_rev/R8G8B8A8_SNORM/g
s/signed_rgba8888/A8B8G8R8_SNORM/g
s/f_rgba8888_rev/R8G8B8A_UNORM/g
s/f_rgba8888/A8B8G8R8_UNORM/g
s/f_rgbx8888_rev/R8G8B8X8_UNORM/g
s/f_rgbx8888/X8B8G8R8_UNORM/g
s/f_argb8888_rev/A8R8G8B8_UNORM/g
s/f_argb8888/B8G8R8A8_UNORM/g
s/f_xrgb8888_rev/X8R8G8B8_UNORM/g
s/f_xrgb8888/B8G8R8X8_UNORM/g
s/signed_rgbx8888/X8B8G8R8_SNORM/g
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Implement guest-backed surface sharing using prime fds. Previously only
legacy surfaces could use this functionality. Also use the vmwgfx 2.6
single-ioctl prime fd reference if available.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This opcode provide support for GL_ARB_texture_query_lod,
Signed-off-by: Dave Airlie <airlied@redhat.com>
[imirkin: rebase, docs update]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cuts a small handful of instructions in Serious Sam 3:
instructions in affected programs: 4692 -> 4666 (-0.55%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Which would lead to translating
mad vgrf9:F, vgrf3:F, u0:F, vgrf6:F
mov.sat vgrf7:F, -vgrf9:F
into
mad.sat vgrf9:F, vgrf3:F, u0:F, vgrf6:F
mov vgrf7:F, -vgrf9:F
Fixes some lighting effects in Dota2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
ip needs to be initialized to start_ip - 1, since the first thing in the
main loop is ip++. Otherwise we would incorrectly propagate the saturate
from the mov to the mad:
mad a, b, c, d
mov.sat x, a
add y, z, a
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we'd ignore the sources of instructions with non-GRF
destinations when calculating calculating the dead channels. This would
lead to us incorrectly removing the first instruction in this sequence:
mov vgrf11, ...
cmp.ne.f0 null, vgrf11, 1.0
mov vgrf11, ...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76616
Or else poor programmers might mistakenly use the temporary mem_ctx,
instead of the fs_visitor's mem_ctx and wonder why their code is
crashing.
Also remove the parenting. These contexts are local to the optimization
passes they're in and are freed at the end.
Otherwise calling dump_instructions() after declaring a new fs_reg would
segfault when calculate_register_pressure()'s loop over reg walked off
the end of the virtual_grf_start[] array that calculate_live_intervals()
would have reallocated for you, if it had known there was a new
register.
Don't hardcode /dev/dri/card0 but instead use the drm
macros which allows the correct /dev/drm0 device to be
opened on OpenBSD.
v2: use snprintf and fallback to /dev/dri/card0
v3: check for snprintf truncation
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
After the loader changes libudev is no longer required for
gbm or the egl drm/wayland platforms. Lets these build/run
on OpenBSD.
v2: preserve the libudev requirement for Linux as suggested
by Emil Velikov.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
After the loader changes libudev is no longer required to
build gbm or the egl drm/wayland platforms.
Remove a libudev ifdef which allows the the drm egl driver
to be loaded on OpenBSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
OpenBSD does not have DT_NEEDED entries for libc by design,
over concerns how the symbols would be referenced after
changing the major version of the library.
So avoid -no-undefined checks on OpenBSD as they will fail.
v2: don't include the -no-undefined libtool option in the variable
and change -Wl,--no-undefined references in Automake.inc as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76856
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The targets do not require expat or selinux. Use GALLIUM_COMMON_LIB_DEPS
which provides the core requirements for each gallium target.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The targets do not require expat or selinux. Use GALLIUM_COMMON_LIB_DEPS
which provides the core requirements for each gallium target.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The build system will use g++ to link the static library due to the
dummy.cpp source(s). Thus one does not need the explicit link against
stdc++.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The build system does not know that the static library is C++.
Mention the cpp file to trigger generation of the proper variable
and drop the hacky stdc++ linking.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
With recent commit we started de-duplicating all of the compiler/
linker flags moving their handling inside Automake.inc.
This did not take into consideration that the above variable was set
at configure time, leading to issues on certain build combinations.
Move the variable to where it's used/handled thus cleaning up
configure.ac.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76848
Tested-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The pkg-config module was called "EXPAT" instead of "expat" in
PKG_CHECK_EXISTS. This seems to have been wrong because the wrong
argument was copied from PKG_CHECK_MODULES.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
_GNU_SOURCE is only set/required for linux*|*-gnu*|gnu*) and as the
functionality is available on other systems check for RTLD_DEFAULT instead.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Platforms that lack libudev (OpenBSD and possibly others) need
this change in order to load the correct dri driver.
Under linux we unconditionally require libudev, thus this code
will never get build.
v2: Add commit message (Emil Velikov)
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 96e8b916a7.
In the case of VCE encoding with raw YUV file, CPU load directly
to VRAM is faster than combination of CPU writing to GTT and
then blit to VRAM with GPU.
Reviewed-by: Christian König <christian.koenig@amd.com>
Keep a dynamically increasing array of all the views
created for a texture instead of just the last one.
v2: add comments, fix array size calculation,
release only the first sampler view found
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The xa version number had to be set in two places. In configure.ac and in
xa_tracker.h. Furthermore, xa_tracker.h is an installed header so we can't
use mesa internal defines. So therefore, at configure time, modify the
xa_tracker.h header to use the version given by configure.ac
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
/me puts a paper bag on his head and sits in the corner.
This was supposed to be included in 5a68f731, which added
glPointSizePointerOES back to the list of functions exposed by
libGLESv1_CM. It looks like it was an uncommitted change in my tree
when I sent the patch out.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We are checking for no-ops in the CSO module for both of these items
so there's no reason to do it in the driver.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
As we do for sampler states in single_sampler_done() and many other
CSO functions.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously GLX_EXT_buffer_age has always been advertised as supported because
both client_glx_support and client_glx_only where set. So it did not matter
that direct_support is only set when running dri3 and we ended up always
advertising it.
Fix that by not setting client_glx_only for buffer_age in known_glx_extensions.
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to call pipe->set_sampler_views() with count being the
maximum of the old number of sampler views and the new number.
This makes sure we null-out any old sampler views.
We already do the same thing for sampler states in single_sampler_done().
Fixes some assertions seen in the VMware driver with XA tracker.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Tested-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ages ago Chia-I added an ES compatibility flag to several of the various
generator scripts. The intention was to bridge differences between ES
and desktop in Mesa builds without ES. It doesn't appear that it has
ever been used. Recent changes to static_dispatch status of several ES1
functions caused problems in desktop-only, non-shared-glapi builds.
Enabling the ES compatibility mode appears to fix these build problems.
This is kind of a duct tape solution to this problem. As I mentioned in
the cover letter for the series that triggered the build problem, I
would like to make some major changes to the generator architecture and
the XML. The whole point of the proposed architecture changes is to
better handle the differences between desktop GL and ES. I think duct
tape is okay for now.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76869
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Cc: Chia-I Wu <olv@lunarg.com>
Commit fb78fa58 made the GL_ARB_debug_output functions aliases of the
GL_KHR_debug output functions. As a result, the function names in
struct _glapi_table also changed. The table in check_table.cpp used the
ARB names.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
C89 has a fairly short minimum-maximum string length. To support
compilers limited by the C89 limits, this script had a mode where it
would generate a character array instead of a giant string. These were
functionally the same, but the code generated for the character array is
HUGE and difficult to read.
As far as I can tell, nothing in Mesa uses '-m short' any more. The
generated files used to be tracked in revision control, but I think we
stopped using '-m short' when we stopped tracking the generated files.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Nested for loops running through tables against which they
finally do an assert were ran also with optimized builds.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
% operator could return negative value which would cause
indexing before perm table. Change %256 to &0xff
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is to avoid running out of query buffer space due to winsys
limitations. Instead of a fixed size per screen pool of query buffers,
use a slab allocator that allocates a new slab if we run out of space
in the first one.
v2: Correct email addresses.
v3: s/8192/VMW_QUERY_POOL_SIZE/. Improve documentation and log message.
Reported-and-tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This adds a gallium cap that allows us to fake GL3.0 by
not exposing MSAA on sw rendering.
It also forces the extra extensions needed for GL3.2.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit f6e290f80c.
To fix the broken build. The DRI-enabled build seems OK after reverting.
Th non-DRI/gallium build is still suffering from an unrelated issue in
the pipe-loader code.
Currently, we raise an error when doing this which breaks a conformance
test from the OpenGL samples pack. Even if this is a bit silly it is not
an error.
From http://www.opengl.org/wiki/Rectangle_Texture:
"Rectangle textures contain exactly one image; they cannot have mipmaps.
Therefore, any texture parameters that depend on LODs are irrelevant
when used with rectangle textures; attempting to set these parameters to
any value other than 0 will result in an error."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76496
Reviewed-by: Brian Paul <brianp@vmware.com>
The previous commit stopped exporting 21 libGLESv2 and 88 libGLESv1_CM
functions. This removes the work-arounds for those functions from
ABI-check.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
This has been a long standing issue with the ES libraries. Functions
marked in the XML with 'static_dispatch=false' were still incorrectly
exported. ABI-check is supposed to detect this case, but we have to
paper over failures every time a new extension is added.
This change will cause a big pile of functions to disappear from
libGLESv2 and libGLESv1_CM.
libGLESv2 loses (20 functions):
glBindVertexArrayOES
glCompressedTexImage3DOES
glCompressedTexSubImage3DOES
glCopyTexSubImage3DOES
glDeleteVertexArraysOES
glDiscardFramebufferEXT
glDrawBuffersNV
glFlushMappedBufferRangeEXT
glFramebufferTexture3DOES
glGenVertexArraysOES
glGetBufferPointervOES
glGetProgramBinaryOES
glIsVertexArrayOES
glMapBufferOES
glMapBufferRangeEXT
glProgramBinaryOES
glReadBufferNV
glTexImage3DOES
glTexSubImage3DOES
glUnmapBufferOES
libGLESv1_CM loses (88 functions):
glAlphaFuncxOES
glBindFramebufferOES
glBindRenderbufferOES
glBlendEquationOES
glBlendEquationSeparateOES
glBlendFuncSeparateOES
glCheckFramebufferStatusOES
glClearColorxOES
glClearDepthfOES
glClearDepthxOES
glClipPlanefOES
glClipPlanexOES
glColor4xOES
glDeleteFramebuffersOES
glDeleteRenderbuffersOES
glDepthRangefOES
glDepthRangexOES
glDiscardFramebufferEXT
glDrawTexfOES
glDrawTexfvOES
glDrawTexiOES
glDrawTexivOES
glDrawTexsOES
glDrawTexsvOES
glDrawTexxOES
glDrawTexxvOES
glFlushMappedBufferRangeEXT
glFogxOES
glFogxvOES
glFramebufferRenderbufferOES
glFramebufferTexture2DOES
glFrustumfOES
glFrustumxOES
glGenerateMipmapOES
glGenFramebuffersOES
glGenRenderbuffersOES
glGetBufferPointervOES
glGetClipPlanefOES
glGetClipPlanexOES
glGetFixedvOES
glGetFramebufferAttachmentParameterivOES
glGetLightxvOES
glGetMaterialxvOES
glGetRenderbufferParameterivOES
glGetTexEnvxvOES
glGetTexGenfvOES
glGetTexGenivOES
glGetTexGenxvOES
glGetTexParameterxvOES
glIsFramebufferOES
glIsRenderbufferOES
glLightModelxOES
glLightModelxvOES
glLightxOES
glLightxvOES
glLineWidthxOES
glLoadMatrixxOES
glMapBufferOES
glMapBufferRangeEXT
glMaterialxOES
glMaterialxvOES
glMultiTexCoord4xOES
glMultMatrixxOES
glNormal3xOES
glOrthofOES
glOrthoxOES
glPointParameterxOES
glPointParameterxvOES
glPointSizePointerOES
glPointSizexOES
glPolygonOffsetxOES
glQueryMatrixxOES
glRenderbufferStorageOES
glRotatexOES
glSampleCoveragexOES
glScalexOES
glTexEnvxOES
glTexEnvxvOES
glTexGenfOES
glTexGenfvOES
glTexGeniOES
glTexGenivOES
glTexGenxOES
glTexGenxvOES
glTexParameterxOES
glTexParameterxvOES
glTranslatexOES
glUnmapBufferOES
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Cc: Paul Berry <stereotype441@gmail.com>
This is hella ugly. The same-named function in desktop OpenGL is
hidden, but it needs to be exposed by libGLESv2 for OpenGL ES 3.0.
There's no way to express in the XML that a function should be be hidden
in one API but exposed in another.
This won't affect any change now, but it will prevent a regression in a
later patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Functions that are part of OpenGL ES 1.0 or 1.1 should have static
dispatch functions in libGLESv1_CM. This doesn't affect any change yet,
but it will prevent later regressions.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
It looks like these were added accidentally by Paul in commit 1a1db174.
From the commit message and the look of the patch, I think this was just
some sed-job left overs.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
It was my understanding that the writemask works in SIMD4x2 mode for
texturing instructions and doesn't require a message header. Some bit of
this logic must be wrong, so disable it until it's understood.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76617
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
By doing GC the linker removes all the symbols that are not referenced
and/or used by the final library. This results in a saving of ~100K
up-to ~600K per (stripped) binary (classic vs gallium drivers).
If interested one can ask the compiler to print the sections that are
removed using -Wl,--print-gc-sections.
v2: Check if ld supports the flag before using it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com> (v1)
... apart from the dri drivers.
With this final change we can build mesa without fear that
the resulting libraries will have unresolved symbols.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Set the flag for all but the dri targets. They have missing
glapi symbols which are required for the normal operation with
the X server.
Jon, I fear that you'll need to carry the "no-undefined" hunk
locally when building the dri drivers under cygwin.
Cc: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Explicitly setting the linker variable was required for old and broken
build toolchains. At this point this should no longer be needed, and
setting the sources lists will trigger generation of the correct LINK
variables.
Explicitly include dummy.cpp to use g++ to link the static library which
in most cases is based upon C++ code.
v2: Reword commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This lets us have only one if HAVE_MESA_LLVM block, rather than
one for each driver.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every library uses the same libtool/linker flags. Compact those
into AM_LDFLAGS and append the version script to it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Or it compiles them in, but pretends they don't exist
v2: Rebase (Emil)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mesa does _not_ link against libudev. Additionally the only place
that deals with it is the loader, thus we can drop the CFLAGS.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It makes little sense to enable the vdpau, xvmc and omx state-trackers
as they do not make use of (don't work with) the software driver.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
grep -q is easier to read and consistent with the rest of configure.ac.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gallium osmesa links against the softpipe driver, thus the build
will fail if it's missing.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Evidently at least static OSMesa is still used as shared one
causes substantial increase in the load time for some programs
that use it (from seconds up-to ~30min).
Rather than forcing everyone to use shared mesa, revert commit
a6efbac9fb and default to shared
build when both shared and static are disabled.
v2: Whitespace cleanup, drop silly comment.
Reported-by: Burlen Loring <burlen.loring@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
GL_INTENSITY has never been valid as a pixel format -- to get the memcpy
pack/unpack paths, the app needs to specify GL_RED as the pixel format
(or GL_RED_INTEGER for the integer formats).
Note: This was briefly merged before, but exposed some breakage in gallium, so
was reverted. Hopefully it will stick this time.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_format_matches_format_and_type() returns true for
GL_RED/GL_RED_INTEGER (with an appropriate type) into an intensity
mesa_format.
We want the `red`-based format instead, regardless of the order we find
them in our walk of the mesa formats list.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The case for this was in the wrong function, and this format's store
func was not set in the table at all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Whether or not the coords are normalized is handled in the texture
state. But we otherwise need to treat RECT sample instructions as 2D.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In some cases, we need a register to be assigned up to three components
before the base. Since we can't have negative register #'s, just shift
everything up. May increase register usage for trivial shaders, but I
don't think we are shader limited in those cases. A proper solution is
going to require a better register assignment algorithm (which is on the
TODO list), this is just a hack to get us by until then.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
RB_FRAME_BUFFER_DIMENSION is not a banked context register, so we need
to wait for the GPU to idle before updating it. But we'd rather not
have unnecessary WFI's, so actually keep track if we need to emit it or
not.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is something that XA triggers. In some cases it will only use
SAMP[1] (composite mask) but not SAMP[0] (composite src).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Based on a patch by Ville Syrjälä.
As usual, these are placeholder values; actual values will come later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is the closing for the "\defgroup IR Intermediate representation
nodes" all the way at the top of the file.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When doing software rendering (i.e. rendering to the selection buffer) we need
to make sure that we have valid index bounds before calling _tnl_draw_prims(),
otherwise we can crash.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59455
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The underlying glDrawArrays() calls weren't getting compiled into
the display list. We simply need to use the current dispatch table
so the CALL_DrawArrays() is routed to the display list save function.
This patch also fixes glMultiModeDrawArraysIBM and
glMultiModeDrawElementsIBM.
Fixes the new piglit gl-1.4-dlist-multidrawarrays test.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we only examined the GL_DEPTH_MODE state to determine the
sampler view swizzle for depth textures. Now we also consider the
texture base format for color textures too.
The basic idea is if we're sampling from a RGB texture we always
want to get A=1, even if the actual hardware format might be RGBA.
We had assumed that the texture's A values were always one since that's
what Mesa's texstore code does. But if we render to the RGBA texture,
the A values might not be 1. Subsequent sampling didn't return the
right values.
Now we examine the user-specified texture base format vs. the actual
gallium format to determine the right swizzle.
Fixes several fbo-blending-formats, fbo-clear-formats and fbo-tex-rgbx
failures with VMware/svga driver (and possibly other drivers).
No other piglit regressions with softpipe or VMware/svga.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
In preparation for following changes.
I used a temporary test harness to compare the old code to the new
for all possible swizzle inputs. No change in results.
This also happens to fix a leak of the current GS pull constant BO on
context destroy, by just not holding on to the pull const bos after the
surface state is generated.
No statistically significant performance difference on GLB2.7 on HSW at
1024x768 (n=40) or 320x240 (n=44), or on BYT at 320x240 (n=47).
v2: Rebase on intel_upload simplification.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The implementation kept a page-sized area for uploading data, and
uploaded chunks from that to a 64kb-sized streamed buffer. This wasted
cache footprint (and extra state tracking to do so) when we want to just
write our data into the buffer immediately.
Instead, build it around an interface like brw_state_batch() that just
gets you a pointer to BO memory to upload your stuff immediately.
Improves OpenArena on HSW by 1.62209% +/- 0.355299% (n=61) and on BYT by
1.7916% +/- 0.415743% (n=31).
v2: Rebase on Mesa master, drop old prototypes. Re-do performance
comparison on a kernel that doesn't punish CPU efficiency
improvements.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
it's useful to know what the llvmbuildstore arguments are going to
be before executing it because it can crash and make sure to
print out the inputs only if we're not generating a gs because
it fetches inputs differently.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We used to overallocate the output buffer sometimes running out
of memory with applications rendering large geometries. The actual
maximum number of vertices out is simply the maximum number of
primitives in (number of gs invocations) multiplied by the maximum
number of output vertices per gs input primitive (i.e. gs invocation).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Don't pass null query object pointers into gallium functions.
This avoids segfaulting in the VMware driver (and others?) if the
pipe_context::create_query() call fails and returns NULL.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
sed commands:
s/ARGB2101010_UINT\b/B10G10R10A2_UINT/g
s/ABGR2101010_UINT\b/R10G10B10A2_UINT/g
Reviewed-by: Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It is quite hard to meet the dependency of the libxml2 python bindings
outside Linux, and in particularly on MacOSX; whereas ElementTree is
part of Python's standard library. ElementTree is more limited than
libxml2: no DTD verification, defaults from DTD, or XInclude support,
but none of these limitations is serious enough to justify using
libxml2.
In fact, it was easier to refactor the code to use ElementTree than to
try to get libxml2 python bindings.
In the process, gl_item_factory class was refactored so that there is
one method for each kind of object to be created, as it simplifies
things substantially.
I confirmed that precisely the same output is generated for GL/GLX/GLES.
v2: Remove m4/ax_python_module.m4 as suggested by Matt Turner.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Release the references to the sampler views before
destroying the pipe context.
v2: remove TODO and unrelated change
v3: move to st_texture.[ch], rename callback, add comment
v4: fix rebase mess up and add further cleanups
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
This can get called in some circumstances if both src type and dst type
have same width (seen with float32->unorm32). While this particular case
was bogus anyway let's just fix that as it can work trivially (due to the
way it was called it actually worked anyway apart from the assert).
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When deciding if a clear color is suitable for fast clear,
take into account if a color channel is active in the
buffer format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch introduces two pre-canned MOCS values: BDW_MOCS_WB
(write-back, all caches) and BDW_MOCS_WT (write-through, all caches).
We use write-through caching for render targets, and write-back for
all other data. (At least on Haswell, I believe write-back LLC/eLLC
didn't work for scan-out buffers, while write-through did.)
No performance analysis has been done on the impact of this patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
All of the functionality is implemented in a private function in the one
file where it is used.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fix eglCreateImage() from a packed dma_buf surface with a non-zero offset
to pixels data. In particular, this fixes support for planar YUV surfaces
when they are individually mapped on a per-plane basis, i.e. when the
OES_EGL_image_external is not used and user application wants to use its
own shader code for composition, or processing on individual plane (OCL).
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Implementation note:
I don't use context for ralloc (don't know how).
The check on PROGRAM_SEPARABLE flags is also done when the pipeline
isn't bound. It doesn't make any sense in a DSA style API.
Maybe we could replace _mesa_validate_program by
_mesa_validate_program_pipeline. For example we could recreate a dummy
pipeline object. However the new function checks also the
TEXTURE_IMAGE_UNIT number not sure of the impact.
V2:
Fix memory leak with ralloc_strdup
Formatting improvement
V3 (idr):
* Actually fix the leak of the InfoLog. :)
* Directly generate logs in to gl_pipeline_object::InfoLog via
ralloc_asprintf isntead of using a temporary buffer.
* Split out from previous uber patch.
* Change spec references to include section numbers, etc.
* Fix a bug in checking that a different program isn't active in a stage
between two stages that have the same program. Specifically,
if (pipe->CurrentVertexProgram->Name == pipe->CurrentGeometryProgram->Name &&
pipe->CurrentGeometryProgram->Name != pipe->CurrentVertexProgram->Name)
should have been
if (pipe->CurrentVertexProgram->Name == pipe->CurrentFragmentProgram->Name &&
pipe->CurrentGeometryProgram->Name != pipe->CurrentVertexProgram->Name)
v4 (idr): Rework to use CurrentProgram array in loops.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is much like _mesa_sampler_uniforms_are_valid, but it operates
across an entire pipeline object.
This function differs from _mesa_sampler_uniforms_are_valid in that it
directly creates the gl_pipeline_object::InfoLog instead of writing to
some temporary buffer.
This was originally included in another patch, but it was split out by
Ian Romanick.
v2 (idr): Fix the loop bounds. shProg isn't an array, so
ARRAY_SIZE(shProg) was 1, so only the vertex program was validated.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
V2 (idr):
* Keep the behavior of other info logs in Mesa: and empty info log
reports a GL_INFO_LOG_LENGTH of zero.
* Use a NULL pointer to denote an empty info log.
* Split out from previous uber patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Test become green in piglit:
The updated ext_transform_feedback-api-errors:useprogstage_noactive useprogstage_active bind_pipeline
arb_separate_shader_object-GetProgramPipelineiv
arb_separate_shader_object-IsProgramPipeline
For the moment I reuse Driver.UseProgram but I guess it will be better
to create a UseProgramStages functions. Opinion is welcome
V2: formatting & rename
V3 (idr):
* Change spec references to core OpenGL versions instead of issues in the
extension spec.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now arb_separate_shader_object-GetProgramPipelineiv should pass.
V3 (idr):
* Change spec references to core OpenGL versions instead of issues in
the extension spec.
* Split out from previous uber patch.
v4 (idr): Use _mesa_has_geometry_shaders in _mesa_UseProgramStages to
detect availability of geometry shaders.
v5 (idr): Whitespace cleanup, use _mesa_lookup_shader_program_err
instead of open-coding it again, and update some comments at the end of
_mesa_UseProgramStages. All suggested by Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Extend use_shader_program to support a different target. Allow to reuse the
function to update the pipeline state. Note I bypass the flush when target
isn't current. Maybe it would be better to create a new UseProgramStages
driver function
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
save and restore _Shader/Pipeline binding point. Rational we don't want any
conflict when the program will be unattached.
V2: formatting improvement
V3 (idr):
* Build fix. The original patch added calls to _mesa_use_shader_program
with 4 parameters, but the fourth parameter isn't added to that
function until a much later patch. Just drop that parameter for now.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Basically a sed but shaderapi.c and get.c.
get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior
shaderapi.c => the old api stil update the Shader object directly
V2: formatting improvement
V3 (idr):
* Rebase fixes after a block of code was moved from ir_to_mesa.cpp to
shaderapi.c.
* Trivial reformatting.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To avoid NULL pointer check a default pipeline object is installed in
_Shader when no program is current
The spec say that UseProgram/UseShaderProgramEXT/ActiveProgramEXT got an
higher priority over the pipeline object. When default program is
uninstall, the pipeline is used if any was bound.
Note: A careful rename need to be done now...
V2: formating improvement
V3 (idr):
* Build fix. The original patch added calls to _mesa_use_shader_program
with 4 parameters, but the fourth parameter isn't added to that
function until a much later patch. Just drop that parameter for now.
* Trivial reformatting.
* Updated comment of gl_context::_Shader
v4 (idr): Reformat spec quotations to look like spec quotations. Update
comments describing what gl_context::_Shader can point to. Bot
suggested by Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader.
Simplify lp_geometry_shader, as most of the incoming state is unneeded.
(We could also just use draw_geometry_shader if we were willing to peek
inside the structure.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
* Add HAVE_PTHREAD, we do have pthread support wrappers now for
non-native Haiku threaded applications.
* Viewport changed behavior recently breaking the build.
We fix this by looking at the gl_context ViewportArray
(Thanks Brian for the idea)
Acked-by: Brian Paul <brianp@vmware.com>
The SIMD16 replicated FB write message only works if we don't need the
color calculator to mask our framebuffer writes. Previously, we bailed
on it if color_mask wasn't <true, true, true, true>. However, this was
needlessly strict for formats with fewer than four components - only the
components that actually exist matter.
WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set
to <true, true, true, false>. This will work perfectly fine with the
replicated data message; we just bailed unnecessarily.
Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by
abound 50%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24).
v2: Use _mesa_format_has_color_component() to properly handle ALPHA
formats (and generally be less fragile).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
WebGL Aquarium in Chrome 24 actually hits this.
v2: Move to core Mesa (wisely suggested by Ian); only consider
components which actually exist.
v3: Use _mesa_format_has_color_component to determine whether components
actually exist, fixing alpha format handling.
v4: Add a comment, as requested by Brian. No actual code changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
When considering color write masks, we often want to know whether an
RGBA component actually contains any meaningful data. This function
provides an easy way to answer that question, and handles luminance,
intensity, and alpha formats correctly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
This gets us equivalent code paths on BDW and pre-BDW, except for stencil
(where we don't have MSAA stencil resolve code yet)
Improves MSAA-forced citybench by 7.94496% +/- 2.38429% (n=16). Reduces
DRI2 MSAA glxgears performance by -12.3559% +/- 1.52845% (n=9).
v2: Move the new meta code to brw_meta_updownsample.c, name it
brw_meta_updownsample(), add a comment about
intel_rb_storage_first_mt_slice(), and rename that function and move
the RB generation into it (review ideas by Ken).
v3: Fix 2 src vs dst pasteos in previous change.
v4: Skip this path pre-gen8 for now, until we can analyze the glxgears
performance delta some more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that BindRenderbufferTexImage() is a thing that drivers can do, winsys
FBOs *can* have NeedsFinishRenderTexture set.
v2: Keep the short-circuit for non-BindRenderbufferTexImage() drivers
(review by Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even if the singlesample_mt got reopened from DRI due to
pageflipping/buffer swapping, our private miptree shouldn't need any
changes.
Improves performance of a little swapbuffers-loving microbenchmark with
MSAA forced on, by 1.2371% +/- 0.624802% (n=102)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The formatting was weird, and the tests were duplicated, and it is
guaranteed that mt->region exists.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes a memory leak with MSAA winsys buffers since my move of
singlesample_mt to the rb in 4e0924c5de
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sometimes it would be nice to benchmark some app with MSAA versus not, but
it doesn't offer the controls you want. Just provide a handy knob to
force the issue.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For each write, search previous instructions for unread writes to the
flag register and remove them. Note that this will not eliminate the
last unread write.
total instructions in shared programs: 788074 -> 788004 (-0.01%)
instructions in affected programs: 4930 -> 4860 (-1.42%)
Reviewed-by: Eric Anholt <eric@anholt.net>
With an awful O(n^2) algorithm that searches previous instructions for
dead writes.
total instructions in shared programs: 805582 -> 788074 (-2.17%)
instructions in affected programs: 144561 -> 127053 (-12.11%)
Reviewed-by: Eric Anholt <eric@anholt.net>
That is, modify
mad dst, a, b, c
to be
mad dst.xyz, a, b, c
if dst.w is never read.
total instructions in shared programs: 811869 -> 805582 (-0.77%)
instructions in affected programs: 168287 -> 162000 (-3.74%)
Reviewed-by: Eric Anholt <eric@anholt.net>
A future patch adds support for removing dead writes to the flag
register. This patch simplifies the logic until then.
total instructions in shared programs: 811813 -> 811869 (0.01%)
instructions in affected programs: 3378 -> 3434 (1.66%)
Reviewed-by: Eric Anholt <eric@anholt.net>
To be consistent with the fs backend. Also the instruction scheduler
incorrectly considered SEL with a conditional modifier to read the flag
register.
Reviewed-by: Eric Anholt <eric@anholt.net>
The ARB_framebuffer_object spec lists this case before the
FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER and
FRAMEBUFFER_INCOMPLETE_READ_BUFFER cases.
Fixes two broken cases in piglit's fbo-incomplete test, if
ARB_ES2_compatibility is not advertised. (If it is, this is masked
because the FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER /
FRAMEBUFFER_INCOMPLETE_READ_BUFFER cases are removed by that extension)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
GL_INTENSITY has never been valid as a pixel format -- to get the memcpy
pack/unpack paths, the app needs to specify GL_RED as the pixel format
(or GL_RED_INTEGER for the integer formats).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
With shared glx contexts it is possible that a texture is create and used
in one context and then used in another one resulting in incorrect
sampler view usage.
v2: avoid template copy
v3: add XXX comment
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 2919c3fdb4.
For formats like BGRX, looping through 0..num_components works fine.
But for formats like XRGB, we'd check the color mask for X and fail to
check it for B.
The SIMD16 replicated FB write message only works if we don't need the
color calculator to mask our framebuffer writes. Previously, we bailed
on it if color_mask wasn't <true, true, true, true>. However, this was
needlessly strict for formats with fewer than four components - only the
components that actually exist matter.
WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set
to <true, true, true, false>. This will work perfectly fine with the
replicated data message; we just bailed unnecessarily.
Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by
abound 40%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Currently, we don't use this path on Sandybridge because we suspect
other paths will be faster. But we potentially could. If we do, we
should allow it to support Y-tiled BLTs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested on ILK and CTG (with the GL3isms taken out of the piglits).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
GL generally doesn't seem to allow srgb formats with less (or more) than 8 bit
for the rgb channels, though some hw could easily do it (typically for formats
with up to 10 bits for the rgb channels, at least for formats with less than 8
bits support is likely widespread even). While it may be true there aren't
really any benefits for such formats, we need for it for d3d, though luckily
only for b5g6r5_srgb it seems.
So add this format along with the util code for conversion - since that util
code is heavily tuned for 8bit srgb this isn't really all that well optimized
and rounding doesn't seem right but at least it should give some halfway
meaningful results.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The nvc0 texfetch instruction expects the sample id to be in the second
source (usually used for the offset) rather than as part of the texture
coordinate.
This fixes all the sampler2DMS/Array tests on nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This fallback to triangle strips is silly and should be done in drivers
if they need it.
This should fix the case when quad strips are used with flatshading that is
enabled by the "flat" GLSL varying modifier. It also fixes primitive restart
for quad strips.
This fixes piglit:
NV_primitive_restart/primitive-restart-draw-mode-quad_strip
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
The last changes to it are from 2008 and 2009.
It doesn't support most texture formats and some texture targets.
Nobody can possibly be using this.
Reviewed-by: Brian Paul <brianp@vmware.com>
This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.
I have reused some "cik"-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.
v2: Marek> removed gfx.flush in si_dma_copy_tile
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The AoS version of ld_build_blend_factor was assuming that if the first
channel was alpha, there were no rgb components.
Fixes glean/blendFunc on System z. No piglit regressions on x86_64.
The shortcut is still used in tests like spec/ARB_framebuffer_object/
fbo-alpha.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
libdl does not exist on many platforms which have dlopen in libc.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
In the gallium code, the assert() macro could come from either the
system's assert.h file (via c11/threads.h) or from gallium's u_debug.h.
It looks like all known assert.h files unconditionally #undef assert
before defining their own version. So the assert you get depends on
whether threads.h or u_debug.h was included last.
In the gallium code we really want to use the assert() from u_debug.h
(it behaves better on Windows). In gallium, c11/threads.h is only
included after u_debug.h in the os_thread.h wrapper. So Adding
an #ifndef assert test in the threads*.h files avoids using the system's
assert().
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
EXT_packed_depth_stencil is supported by all drivers, but
ARB_depth_texture isn't (notably nouveau_vieux). This should avoid
passing unexpected values down to ChooseTextureFormat.
The EXT_packed_depth_stencil spec does not make any explicit references
to requiring ARB_depth_texture in order to allow textures with that
format, however if there is no dependency, ARB_depth_texture would be
practically implied by the extension.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Note for 10.0 backport: This will produce a conflict, the solution is to
move the surrounding if as well.
In all uses of dotlike() we're writing generic code that operates on 1-4
component vectors. That our IR requires ir_binop_dot expressions'
operands to be 2+ component vectors is an implementation detail that's
not important when implementing built-in functions with dot(), which is
defined for scalar floats in GLSL.
Reviewed-by: Eric Anholt <eric@anholt.net>
ARB_gpu_shader5 and ES 3.0 expose different subsets of
ARB_shading_language_packing.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Volume 4, part 1 of the Ivybridge PRM says, "Generally, the EWA
approximation algorithm results in higher image quality than the legacy
algorithm." Using a classic anisotropic filtering "tunnel" demo, it
appears that there is *no* anisotropic filtering on IVB without this bit
set.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I meant to include this fixes in v3 of commit
de7ad2c88f, but accidentally pushed a
previous version.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Sadly, we can't use actual ALIGN(), since that only supports
power-of-two values for the alignment parameter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Register sets depend on the particular hardware generation, but don't
depend on anything in the actual OpenGL context. Computing them is
fairly expensive, and they take up a large amount of memory. Putting
them in the screen allows us to compute/allocate them once for all
contexts, saving both time and space.
Improves the performance of a context creation/destruction
microbenchmark by about 3x on my Haswell i7-4750HQ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will allow us to use the screen as a memory context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1,
with no performance difference on timing short shader-db runs (n=9/10,
warmup outlier removed).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This isn't the GL API, so there's no reason to use GLboolean.
Using bool is safer: any non-zero value is treated as "true". When
converting a value to a GLboolean, all but the low byte is discarded,
which means that values like 256 will be incorrectly rendered as false.
Done via the following vim commands:
:%s/GLboolean/bool/g
:%s/GL_TRUE/true/g
:%s/GL_FALSE/false/g
and one line of manual whitespace tidying.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Ideally, we'd like to never even attempt the SIMD16 compile if we could
know ahead of time that it won't succeed---it's purely a waste of time.
This is especially important for state-based recompiles, which happen at
draw time.
The fragment shader compiler has a number of checks like:
if (dispatch_width == 16)
fail("...some reason...");
This patch introduces a new no16() function which replaces the above
pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag.
Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and
issue a helpful performance warning if INTEL_DEBUG=perf is set. (In
SIMD16 mode, no16() calls fail(), for safety's sake.)
The great part is that this is not a heuristic---if the flag is set, we
know with 100% certainty that the SIMD16 compile would fail. (It might
fail anyway if we run out of registers, but it's always worth trying.)
v2: Fix missing va_end in early-return case (caught by Ilia Mirkin).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
This is just a matter of reusing the pull/push constant information set
up by the SIMD8 compile.
This gains us 78 SIMD16 programs in shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we don't renumber uniform registers, assign_constant_locations
and move_uniform_array_access_to_pull_constants use the same names.
So, they can share a single copy of the pull_constant_loc[] array.
This simplifies the code considerably. assign_constant_locations()
doesn't need to walk through pull_params[] to rediscover reladdr
demotions; it just has that information in pull_constant_loc[]. We also
only need to rewrite the instruction stream once, instead of twice.
Even better, we now have a single array describing the layout of
all pull parameters, which we can pass to the SIMD16 program.
This actually hurts a few shaders in Serious Sam 3, and one in KWin:
total instructions in shared programs: 1841957 -> 1842035 (0.00%)
instructions in affected programs: 1165 -> 1243 (6.70%)
Comparing dump_instructions() before and after the pull constant
transformations with and without this patch, it appears that there is
a uniform array with variable indexing (reladdr) and constant indexing
(of array element 0). Previously, we uploaded array element 0 as both
a pull constant (for reladdr) /and/ a push constant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, remove_dead_constants() would renumber the UNIFORM registers
to be sequential starting from zero, and the resulting register number
would be used directly as an index into the params[] array.
This renumbering made it difficult to collect and save information about
pull constant locations, since setup_pull_constants() and
move_uniform_array_access_to_pull_constants() used different names.
This patch generalizes setup_pull_constants() to decide whether each
uniform register should be a pull constant, push constant, or neither
(because it's unused). Then, it stores mappings from UNIFORM register
numbers to params[] or pull_params[] indices in the push_constant_loc
and pull_constant_loc arrays. (We already did this for pull constants.)
Then, assign_curb_setup() just needs to consult the push_constant_loc
array to get the real index into the params[] array.
This effectively folds all the remove_dead_constants() functionality
into assign_constant_locations(), while being less irritable to work
with.
v2: Add assert(remapped <= i), requested by Topi.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
move_uniform_array_access_to_pull_constants() and setup_pull_constants()
both have two parts:
1. Decide which UNIFORM registers to demote to pull constants, and
assign locations.
2. Mechanically rewrite the instruction stream to pull the uniform
value into a temporary VGRF and use that, eliminating the UNIFORM
file access.
In order to support pull constants in SIMD16 mode, we will need to make
decisions exactly once, but rewrite both instruction streams.
Separating these two tasks will make this easier.
This patch introduces a new helper, demote_pull_constants(), which
takes care of rewriting the instruction stream, in both cases.
For the moment, a single invocation of demote_pull_constants can't
safely handle both reladdr and non-reladdr tasks, since the two callers
still use different names for uniforms due to remove_dead_constants()
remapping of things. So, we get an ugly boolean parameter saying
which to do. This will go away.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When demoting a variably indexed uniform array to pull constants, we
only recorded the location for the base of the array (element 0).
Recording locations for all array elements is a trivial amount of code
and will make subsequent refactoring easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, both move_uniform_array_access_to_pull_constants() and
setup_pull_constants() maintained stack-local arrays with this
information. Storing this information will allow it to be used from
multiple functions, allowing us to split and move code around.
We'll also eventually want to pass pull constant location information
to the SIMD16 compile. Saving this information will help us do that.
Unfortunately, the two functions *cannot* share the contents of the
array just yet. remove_dead_constants() renumbers all the UNIFORM
registers to be contiguous starting at zero, so the two functions
talk about uniforms using different names. We can't even remap them,
since move_uniform_array_access_to_pull_constants() deletes UNIFORM
registers that are only accessed with reladdr, so remove_dead_constants
can't even see them.
This situation will improve in the next few patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The SIMD8 compile will determine whether pull parameters are necessary.
If so, it will set prog_data->nr_pull_params to a value greater than 0.
brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile
altogether. So, this code should never occur.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To emphasize that the result is floating point 1.0 or 0.0, to match
other opcodes like SLE and SEQ.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Handle mapping edgeflag data similar to the code around it.
This fixes a crash in piglit test gl-2.0-edgeflag.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Enable EGL_EXT_platform_base and the Linux platform extensions layered
atop it: EGL_EXT_platform_x11, EGL_EXT_platform_wayland,
and EGL_MESA_platform_gbm.
Tested with Piglit's EGL_EXT_platform_base tests under an X11 session.
To enable running the Wayland and GBM tests, windowed Weston was running
and the kernel had render nodes enabled.
I regression tested my EGL_EXT_platform_base patch set with Piglit on
Ivybridge under X11/EGL, standalone Weston, and GBM with rendernodes. No
regressions found.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
From the EGL_EXT_wayland_spec, version 3:
It is not valid to call eglCreatePlatformPixmapSurfaceEXT with a <dpy>
that belongs to Wayland. Any such call fails and generates
EGL_BAD_PARAMETER.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
From the EGL_MESA_platform_gbm spec, version 5:
It is not valid to call eglCreatePlatformPixmapSurfaceEXT with a <dpy>
that belongs to the GBM platform. Any such call fails and generates
EGL_BAD_PARAMETER.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Internally, much of the EGL code uses EGLNativeDisplayType,
EGLNativeWindowType, and EGLPixmapType. However, the EGLNative type
often does not match the variable's actual type.
The concept of EGLNative types are a bad match for Linux, as explained
below. And the EGL platform extensions don't use EGLNative types at all.
Those extensions attempt to solve cross-platform issues by moving the
EGL API away from the EGLNative types.
The core of the problem is that eglplatform.h can define each EGLNative
type once only, but Linux supports multiple EGL platforms.
To work around the problem, Mesa's eglplatform.h contains multiple
definitions of each EGLNative type, selected by feature macros. Mesa
expects EGL clients to set the feature macro approrpiately. But the
feature macros don't work when a single codebase must be built with
support for multiple EGL platforms, *such as Mesa itself*.
When building libEGL, autotools chooses the EGLNative typedefs based on
the first element of '--with-egl-platforms'. For example,
'--with-egl-platforms=x11,drm,wayland' defines the following:
typedef Display* EGLNativeDisplayType;
typedef Window EGLNativeWindowType;
typedef Pixmap EGLNativePixmapType;
Clearly, this doesn't work well for Wayland and GBM. Mesa works around
the problem by casting the EGLNative types to different things in
different files.
For sanity's sake, and to prepare for the EGL platform extensions, this
patch removes from egl/main and egl/dri2 all internal use of the
EGLNative types. It replaces them with 'void*' and checks each explicit
cast with a static assertion. Also, the patch touches egl_gallium the
minimal amount to keep it compatible with eglapi.h.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_image, set it for each platform, and
let egl_dri2 dispatch eglCreateImageKHR to that.
To remove ambiguity, rename egl_dri2.c:dri2_create_image() to
dri2_create_image_from_dri().
This prepares for the EGL platform extensions.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
dri2_initialize_x11_swrast() does a strange thing. For some extensions
it doesn't support, it sets the corresponding functions in
_EGLDriver::API to NULL. The intention here is clear, but misplaced.
NULL or not, the function pointers never get called because their
extensions aren't supported.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_wayland_buffer_from_image, set it for
each platform, and let egl_dri2 dispatch
eglCreateWaylandBufferFromImageWL to that.
This prepares for the EGL platform extensions.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
egl_dri2.c:dri2_terminate() handled terminating X11 and DRM displays.
The Wayland platform implemented its own dri2_wl_terminate(), which was
nearly a copy of the common one.
To implement the EGL platform extensions, we either need to dispatch
eglTerminate per display or define a common implementation for all
platforms. This patch chooses consolidation. It removes
dri2_wl_terminate() by folding it into the common dri2_terminate().
It was necessary to invert the `if (disp->PlatformDisplay == NULL)` and
the switch statement because, unlike DRM and X11, Wayland's terminator
performed action even when EGL didn't own the native display. In the
inversion, I replaced `disp->PlatformDisplay == NULL` with
`dri2_dpy->own_device` because the two expressions are synonymous, but
the latter's meaning is clearer.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When the user calls eglGetDisplay(EGL_DEFAULT_DISPLAY), the Wayland and
DRM platforms set dri2_dpy->own_device=true. This patch makes the X11
platform do the same for consistency.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::post_sub_buffer, set it for each
platform, and let egl_dri2 dispatch eglPostSubBufferNV to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_buffers_region, set it for each
platform, and let egl_dri2 dispatch eglSwapBuffersRegionNOK to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::copy_buffers, set it for each
platform, and let egl_dri2 dispatch eglCopyBuffers to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::query_buffer_age, set it for each
platform, and let egl_dri2 dispatch API.QueryBufferAge to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::destroy_surface, set it for each
platform, and let egl_dri2 dispatch eglDestroySurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_pbuffer_surface, set it for each
platform, and let egl_dri2 dispatch eglCreatePbufferSurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_pbuffer_surface, set it for each
platform, and let egl_dri2 dispatch eglCreatePixmapSurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_window_surface, set it for each
platform, and let egl_dri2 dispatch eglCreateWindowSurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_buffers_with_damage, set it for each
platform, and let egl_dri2 dispatch eglSwapBuffersWithDamageEXT to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_buffers, set it for each platform, and
let egl_dri2 dispatch eglSwapBuffers to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_interval, set it for each platform, and
let egl_dri2 dispatch eglSwapInterval to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't call it through the driver dispatch table. Just call it
statically.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Each of the egl_dri2 platforms (except Android) prefix their function
names with "dri2", not "dri2_${platform}". This means many function
names have three separate definitions in the egl_dri2 directory: one in
each of platform_drm.c, platform_wayland.c, and platform_x11.c. For
example, each of the three files defines dri2_create_window_surface().
The name collisions make it difficult to review patches for correctness
("Is this patch hunk calling a platform_x11 function or a global
egl_dri2 function?"), complicate debugging, and confuse code navigation
tools.
For each function in platform_x11.c prefixed with 'dri2', this patch
changes its prefix to 'dri2_x11'. Likewise for platform_drm.c and
'dri2_drm'; and platform_wayland.c and 'dri2_wl'.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
dri2_egl_display has only one virtual function, 'authenticate'. Define
dri2_egl_display::vtbl and move 'authenticate' there.
This prepares for the EGL platform extensions, which will add many
more virtual functions to dri2_egl_display.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This pulls in EGL_EXT_platform_base, EGL_EXT_platform_wayland,
EGL_EXT_platform_x11, and EGL_MESA_platform_gbm.
This patch has a lot of churn because Khronos recently changed its
method of generating headers. Khronos now generates it headers from XML.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This allows retrieving the existing BO and incrementing its reference count,
instead of creating a separate winsys representation for it, when the kernel
reports that the BO was already assigned a virtual memory address.
This fixes problems with XWayland using radeonsi and the
xf86-video-wlglamor driver, which calls GEM flink outside of the radeon
winsys code and creates BOs from the flinked names using the same DRM file
descriptor.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
install-gallium-links.mk fails to create the compat link for ilo_dri.so
because it looks for dri_LTLIBRARIES instead of noinst_LTLIBRARIES. Fix this
by switching to dri_LTLIBRARIES (and make the driver installable).
Since pci_id_driver_map.h and the DDX both tell libGL.so to look for "i965",
ilo_dri.so will never be loaded even enabled and installed. The change should
not create any more confusion.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The GL 4.4 spec says it's not color-renderable and not accepted
by RenderBufferStorage. The EXT_texture_shared_exponent spec says
it's not color-renderable but it's accepted by RenderBufferStorageEXT.
This seems to be a bug in the extension spec.
Let's do what GL 4.4 says.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can just check the polygon mode when updating the edge flag state.
Also, we can just flag ST_NEW_VERTEX_PROGRAM directly, which makes
ST_NEW_EDGEFLAGS_DATA useless.
Normally, nothing uses live intervals at this point, so this isn't
necessary. However, dump_instructions() calculates them and uses them
to show register pressure. So, calling dump_instructions() in this area
of the code would segfault due to the arrays being the wrong size.
This is not a candidate for stable branches because it only serves to
fix internal debugging code that you manually have to invoke by altering
the source code or using gdb.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, dump_instruction() would print output such as:
{ 2} 3: mov vgrf1:F, u0:F
{ 3} 4: mov vgrf7:F, u0:F
{ 4} 5: mov vgrf8:F, u0:F
which looked like either a scalar access or perhaps a constant-indexed
access of element 0, when it was really a variable index.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit e57d77280e, I fixed this for
destinations in the Vec4 backend, and sources in the scalar backend.
But not both types in both backends.
To prevent this mess from continuing, make the reg_encoding table
static, so only the disassembler can use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
opt_saturate_propagation_local compares scan_inst->dst.reg/reg_offset
with inst->src[0].reg/reg_offset, and ensures that scan_inst->dst.file
is GRF. But nothing ensured that inst->src[0].file was GRF.
In the following program, this resulted in u1:F matching vgrf1:UW,
and a saturate being incorrectly propagated from instruction 8 to
instruction 1.
{ 1} 0: add vgrf0:UW, hw_reg1+8:UW, hw_reg0:V
{ 1} 1: add vgrf1:UW, hw_reg1+10:UW, hw_reg0:V
{ 1} 2: linterp vgrf6:F, hw_reg2:F, hw_reg3:F, hw_reg0:F
{ 2} 3: linterp vgrf27:F, hw_reg2:F, hw_reg3:F, hw_reg0+16:F
{ 4} 4: mov vgrf10+0.0:F, vgrf6:F
{ 3} 5: mov vgrf10+1.0:F, vgrf27:F
{ 6} 6: tex vgrf8+0.0:F, vgrf10+0.0:F
{ 5} 7: mov vgrf32:F, u1:F
{ 5} 8: mov.sat vgrf12:F, u1:F
From shader-db:
total instructions in shared programs: 1841932 -> 1841957 (0.00%)
instructions in affected programs: 5823 -> 5848 (0.43%)
I inspected two of the 25 hurt shaders, and concluded that they were
both hitting this bug, and not legitimately optimized.
This fixes bugs in Left 4 Dead 2 and Team Fortress 2, possibly among
others. The optimization pass didn't exist in 10.0, so this is only
a candidate for 10.1.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out we can allow COHERENT storage/mappings all the time,
regardless of LLC vs non-LLC. It just means never using temporary
mappings to avoid GPU stalls, and on non-LLC we have to use the GTT intead
of CPU mappings. If we were to use CPU maps on non-LLC (which might be
useful if apps end up using buffer_storage on PBO reads, to avoid WC read
slowness), those would be PERSISTENT but not COHERENT, but doing that
would require us driving the clflushes from userspace somehow.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It looks like there's no big difference for write-only workloads, but
using a CPU map means that if they happen to read without having set the
MAP_READ_BIT, they get 100x the performance for those reads.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While in expected usage patterns nobody will ever hit this path, doubling
our bandwidth used seems like a waste, and it cost us extra code too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On LLC, it should always be better to use a cached mapping than the GTT.
On non-LLC, it seems pretty silly to try to optimize read performance for
the INVALIDATE_RANGE_BIT case. This will make the buffer_storage logic
easier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Similar to the other cases, shift some weight/coord calculations to int
space. This should be slightly faster (on x86 sse it should actually safe one
instruction, and generally int instructions are cheaper).
The previous code used coords which were calculated as
(int) (f_coord * tex_size * 256) >> 8.
This is not only unnecessarily complex but can give the wrong texel due to
rounding for negative coords (as an example, after denormalization coords
from -1.0 to 0.0 should give -1, but this will give -1 for numbers from
-1.0-1/256 - 0.0-1/256.
Instead, juse use ifloor, dropping the shift stuff.
Unfortunately, this will most likely be slower - with arch rounding available
it shouldn't be too bad (trades a int shift for a round but also saves an int
mul (which is shared by all coords) but otherwise it's a mess.
The previous method for converting coords to ints was sligthly inaccurate
(effectively losing 1bit from the 8bit lerp weight). This is probably
especially noticeable when trying to draw a pixel-aligned texture.
As an example, for a 100x100 texture after dernormalization the texture
coords in this case would turn up as
0.5, 1.5, 2.5, 3.5, 4.5, ...
After the mul by 256, conversion to int and 128 subtraction, they end up as
0, 256, 512, 768, 1024, ...
which gets us the correct coords/weights of
0/0, 1/0, 2/0, 3/0, 4/0, ...
But even LSB errors (which are unavoidable) in the input coords may cause
these coords/weights to be wrong, e.g. for a coord of 3.49999 we'd get a
coord/weight of 2/255 instead.
Fix this by using round-to-nearest int instead of FPToSi (trunc). Should be
equally fast on x86 sse though other archs probably suffer a little.
Constify the offsets parameter to silence gcc warning 'assignment
from incompatible pointer type' due to function prototype miss-match.
Use a boolean changed as a shorthand for target != current_target.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
There is little point of continuing if fread returns zero, as it
indicates that either the file is empty or cannot be read from.
Bail out if fread returns zero after closing the file.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Core drm defines that the handle is of type int, while all drivers
treat it as uint internally. Typecast the value to silence gcc
warning messages and be consistent amongst all drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Commit 3805a864b1d(nv50: assert before trying to out-of-bounds access
samplers) introduced a series of asserts as a precausion of a previous
illegal memory access.
Although it failed to encapsulate loop within nv50_sampler_state_delete
effectively failing to clear the sampler state, apart from exadurating
the illegal memory access issue.
Fixes gcc warning "array subscript is above array bounds" and
"Nesting level does not match indentation" and "Out-of-bounds read"
defects reported by Coverity.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Indent according to Mesa style, reuse sbc instead of making a new
swap_count field, and actually get a usable back before returning the
age of the back (fixing updated piglit tests). Changes by anholt.
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Adel Gadllah <adel.gadllah@gmail.com> (v2)
Reviewed-by: Eric Anholt <eric@anholt.net>
With the buffer_age code, I need to be able to potentially call this more
than once per frame, and it would be bad if a new special event showing up
meant I chose a different back mid-frame. Now, once we've chosen a back
for the frame, another find_back will choose it again since we know that
it won't have ->busy set until swap.
Note that this makes find_back return a buffer id instead of a backbuffer
index. That's kind of a silly distinction anyway, since it's an identity
mapping between the two (it's the front buffer that is at an offset).
Reviewed-By: Adel Gadllah <adel.gadllah@gmail.com>
This extension provides a way for an application to render to multiple
surfaces with different buffer formats without having to use multiple
contexts. An EGLContext can be created without an EGLConfig by passing
EGL_NO_CONFIG_MESA. In that case there are no restrictions on the surfaces
that can be used with the context apart from that they must be using the same
EGLDisplay.
_mesa_initialze_context can now take a NULL gl_config which will mark the
context as ‘configless’. It will memset the visual to zero in that case.
Previously the i965 and i915 drivers were explicitly creating a zeroed visual
whenever 0 is passed for the EGLConfig. Mesa needs to be aware that the
context is configless because it affects the initial value to use for
glDrawBuffer. The first time the context is bound it will set the initial
value for configless contexts depending on whether the framebuffer used is
double-buffered.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In eglCreateContext there is a check for whether the config parameter is zero
and in this case it will avoid reporting an error if the
EGL_KHR_surfacless_context extension is supported. However there is nothing in
that extension which says you can create a context without a config and Mesa
breaks if you try this so it is probably better to leave it reporting an
error.
The original check was added in b90a3e7d8b based on the API-specific
extensions EGL_KHR_surfaceless_opengl/gles1/gles2. This was later changed to
refer to EGL_KHR_surfacless_context in b50703aea5. Perhaps the original
extensions specified a configless context but the new one does not.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Under GLES 3 it is not valid to pass GL_FRONT to glDrawBuffers. Instead,
GL_BACK has a magic interpretation which means it will render to the front
buffer on single-buffered contexts and the back buffer on double-buffered. We
were incorrectly setting the initial value to GL_FRONT for single-buffered
contexts. This probably doesn't really matter at the moment except that
presumably it would be exposed in the API via glGetIntegerv.
When we switch to configless contexts this is more important because in that
case we always want to rely on the magic interpretation of GL_BACK in order to
automatically switch between the front and back buffer when a new surface with
a different number of buffers is bound. We also do this for GLES 1 and 2
because the internal value doesn't matter in that case and it is convenient to
use the same code to have the magic interpretation of GL_BACK.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In GLES 3 it is not possible to select rendering to the front buffer and
instead selecting GL_BACK has the magic interpretation that it is either the
front buffer on single-buffered configs or the back buffer on double-buffered.
GLES 1 and 2 have no way of selecting the draw buffer at all. In that case we
were initialising the draw buffer to either GL_FRONT or GL_BACK depending on
the context's config and then leaving it at that.
When we switch to having configless contexts we ideally want Mesa to
automatically switch between the front and back buffer whenever a double-
or single-buffered surface is bound. To make this happen we can just allow
the magic behaviour from GLES 3 in GLES 1 and 2 as well. It shouldn't matter
what the internal value of the draw buffer is in GLES 1 and 2 because there
is no way to query it from the external API.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Commit 6e8d04a caused a leak by allocating ctx->Debug but never freeing it.
Release the memory in _mesa_free_errors_data when destroying a context.
Use FREE to match CALLOC_STRUCT from _mesa_get_debug_state.
Reviewed-by: Brian Paul <brianp@vmware.com>
The few paths that were playing with framebuffers and renderbuffer were
saving and restoring them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was being applied in a subset of the places that
intel_prepare_render() was called, to set the same flag that
intel_prepare_render() was setting.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The flag wasn't getting updated correctly when the ctx->DrawBuffer or
ctx->ReadBuffer changed. It usually ended up working out because most
apps only have one window system framebuffer, or if they have more than
one and they have any front read/drawing, they will have called
glReadBuffer()/glDrawBuffer() on it when they get started on the new
buffer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's the ctx->ReadBuffer that gets read from, not the ctx->DrawBuffer.
So, if you happened to have a ctx->ReadBuffer that was the winsys buffer,
and it had previously been intel_prepare_render()ed but not invalidated
since then, and you called glReadBuffer() to switch to front buffer
instead of back buffer reading on the winsys fbo while your drawbuffer was
a user FBO, you'd never get the front buffer's miptree fetched, and
segfault.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I think these are all equivalent to vertex buffer fetches which should be
dword-aligned. Scalar loads are also dword-aligned.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Buffers are disabled by VGT_STRMOUT_BUFFER_CONFIG, but the query only works
if VGT_STRMOUT_CONFIG.STREAMOUT_0_EN is enabled.
This moves VGT_STRMOUT_CONFIG to its own state. The register is set to 1
if either streamout or the primitives-generated query is enabled.
However, the primitives-emitted query is also incremented, so it's disabled
by setting VGT_STRMOUT_BUFFER_SIZE to 0 when there is no buffer bound.
This fixes piglit:
ARB_transform_feedback2/counting with pause
EXT_transform_feedback/primgen-query transform-feedback-disabled
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
When doing fast clear for single-sample color buffers for the first time,
a CMASK buffer has to be allocated and the CMASK state in all pipe_surfaces
referencing the color buffer must be updated. Updating all surfaces is kinda
silly, so let's move the values to r600_texture instead.
This is only for Evergreen and later. R600-R700 don't have fast clear.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This looks like r600g. The shared Cayman MSAA code is used here.
The real motivation for this is that I need the ability to change values
of color registers after the framebuffer state is set. The PM4 state cannot
be modified easily after it's generated. With this, I can just change
r600_surface::cb_color_xxx and set framebuffer.atom.dirty=true and it's done.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes the following build error on OpenBSD:
./.libs/libglsl.a(builtin_functions.o)(.text+0x973): In function `mtx_lock':
../../include/c11/threads_posix.h:195: undefined reference to `pthread_mutex_lock'
./.libs/libglsl.a(builtin_functions.o)(.text+0x9a5): In function `mtx_unlock':
../../include/c11/threads_posix.h:248: undefined reference to `pthread_mutex_unlock'
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
Static and shared builds were possible in the good old days
of static makefiles. Currently the build system does not
distinguish nor does anything special when one requests a
static build.
Print a warning message for the packager that static builds
are not supported and continue building shared libs.
Currently only Debian and derivatives use static build, and
they use it for building a Xlib powered libGL. This patch
will only change the warning message they are seeing but
the binaries produced will be identical.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
- As of commit cb080a10b68(configure.ac: Don't require shared LLVM when
building OpenCL) opencl does not mandate using shared llvm.
- Add a warning message that building with static llvm may cause
compilation problems.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
- xf86dri.h is the old dri1 header, not required by dri2 nor dri3
- fold xf86drm.h inclusiong inside dri2.h
- dri3_glx does not have any drm specific dependencies
- glapi.h is not required by the dri2 and dri3 codepaths
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Recent commit fixed build issues in dri2_query_renderer.c by
wrapping in defined(direct_rendering) && !defined(applegl)
This patch targets the query_renderer tests, so that make check
passes on platforms such as hurd and cygwin.
v2: (Emil)
- Rebase and update commit message.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some platforms different library extension - dll, dylib, a.
Honor that when we are creating the required links.
Rename LIB_EXTENSION to LIB_EXT while we're here.
With libglapi linking aside, building classic drivers on
non-linux platforms should be possible now.
v2: Resolve conflicts.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
In the cases where one links against the static glapi.la there
is no need to create temporary variables only to explicitly
link agaist it.
Instead use SHARED_GLAPI_LIB to explicitly indicate when one
is building and linking with the shared glapi provider.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
All the variables were used before the automake conversion
and do not make sense (nor are used) currently.
Replace GL_LIB_NAME with lib$(GL_LIB).$(LIB_EXTENSION) for
apple-glx. The build has been broken for ages, but this will
ease the recovery process as it happens.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
All three (xvmc and omx) do not have an alternative loading
similar to the dri modules. Thus one needs to explicitly install
them in order to use/test them.
v2:
- Keep vdpau targets, as an equivalent of LIBGL_DRIVERS_PATH
is being worked on.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This helper script will be used to minimise the duplication
during link generation across all gallium targets.
v2:
- Handle vdpau_LTLIBRARIES. Requested by Christian König.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
There is little point in echoing everything that the script does
to stdout. Wrap it in AM_V_GEN so that a reasonable message is
printed as a indication of it's invocation.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
There is little gain in printing whenever a folder is created.
v2:
- Use $(AM_V_at) over @ to have control in verbose builds.
Suggested by Erik Faye-Lund.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
D3D10 allows setting of the internal offset of a buffer, which is
in general only incremented via actual stream output writes. By
allowing setting of the internal offset draw_auto is capable
of rendering from buffers which have not been actually streamed
out to. Our interface didn't allow. This change functionally
shouldn't make any difference to OpenGL where instead of an
append_bitmask you just get a real array where -1 means append
(like in D3D) and 0 means do not append.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The MESA_FORMAT_x enums in formats.h weren't declared in any sort
of reasonable order. Now it should be a little more logical.
This also required reordering tables in formats.c and s_texfetch.c
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Acked-by: Eric Anholt <eric@anholt.net>
There's no real reason to list all the formats in the comments.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Removes unnecessary MOV instructions in L4D2, TF2, Dota2, and many other
Steam games.
total instructions in shared programs: 1668126 -> 1657509 (-0.64%)
instructions in affected programs: 242235 -> 231618 (-4.38%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This keeps us from needing to reemit all the other stage state just
because a surface changed.
Improves unoptimized glamor x11perf -f8text by 1.10201% +/- 0.489869%
(n=296). [v1]
v2:
- Drop binding table packets from Gen8 unit state as well.
- Pass _3DSTATE_BINDING_TABLE_POINTERS_XS to brw_upload_binding_table,
cutting even more code.
v3: Don't forget to drop them from 3DSTATE_GS (botched refactor in v2).
Signed-off-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v2, v3]
Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
This makes both the empty and non-empty binding table paths exit through
the bottom of the function, which gives us a place to share code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Explicitly add radeon_drm_winsys_create and nouveau_drm_screen_create to
the dynamic list. This will ensure vdpau interop still works even when
the user links with -Bsymbolic-functions in hardened builds.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Rachel Greenham <rachel@strangenoises.org>
Reported-by: Peter Frühberger <peter.fruehberger@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This reverts most of commit d80f0c34b7. Upon a
closer reading, having the presumed offsets written is not enough to set the
flag. EXEC_OBJECT_NEEDS_GTT and/or EXEC_OBJECT_WRITE of the reloc entries
must also be set appropriately.
Rename
intel_winsys_check_aperture_size() to intel_winsys_can_submit_bo(),
intel_bo_exec() to intel_winsys_submit_bo(), and
intel_winsys_decode_commands() to intel_winsys_decode_bo().
Make a semantic change to ignore intel_context when the ring is not the render
ring.
The only alloc flag is INTEL_ALLOC_FOR_RENDER, which can as well be expressed
by specifying the initial write domain. The change makes it obvious that we
failed to set INTEL_ALLOC_FOR_RENDER in several places.
Rename
intel_bo_emit_reloc() to intel_bo_add_reloc(),
intel_bo_clear_relocs() to intel_bo_truncate_relocs(), and
intel_bo_references() to intel_bo_has_reloc().
Besides, we need intel_bo_get_offset() only to get the presumed offset afer
adding a reloc entry. Remove the function and make intel_bo_add_reloc()
return the presumed offset. While at it, switch to gem_bo->offset64 from
gem_bo->offset.
Patch adds a remap table for uniforms that is used to provide a mapping
from application specified uniform location to actual location in the
UniformStorage. Existing UniformLocationBaseScale usage is removed as
table can be used to set sequential values for array uniform elements.
This mapping helps to implement GL_ARB_explicit_uniform_location so that
uniforms locations can be reorganized and handled in a more easy manner.
v2: small fixes + rename parameters for merge and split functions (Ian)
improve documentation, remove old check for location bounds (Eric)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing uses this structure, removal fixes Klocwork error about
the possible oom condition in _mesa_symbol_table_iterator_ctor.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This logic is borrowed from the radeon code. The transfer logic will
only get called for PIPE_BUFFER resources, so it shouldn't be necessary
to worry about them becoming render targets.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
It's only used in this one file as far as I can tell, and exporting a
symbol named 'devices' from a shared library is a recipe for trouble.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When building out of tree, the file ends up dangling which
may result in a binary with the old git sha.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That structure member is a pointer, so the loop with
the Elements macro only freed up the first entry.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After preprocessing by glcpp all adjacent spaces were replaced by
single one and glsl parser received column-shifted shader source.
It negatively affected ast location set up and produced wrong error
messages for heavily-spaced shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes this SCons build error introduced with commit
70e7905608.
build/linux-x86_64-debug/mesa/libmesa.a(driverfuncs.os): In function `_mesa_init_driver_functions':
src/mesa/drivers/common/driverfuncs.c:99: undefined reference to `_mesa_meta_GenerateMipmap'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
I don't know how many people care about this case, but it's easy enough
to do, so we may as well. The tricky part is that for some reason Mesa
stores the number of array slices in Height, not Depth.
I thought the easiest way to handle that here was to make Height = 1
(the actual height), and srcDepth = srcImage->Height. This requires
some munging when calling _mesa_prepare_mipmap_level, so I created a
wrapper that sorts it out for us.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is largely a matter of looping over the number of slices/layers,
and not minifying depth (presumably that code exists for the unfinished
3D texture support).
Normally, I would have made the loop over array slices the outermost
loop. I suspect that would make it trickier to support 3D textures
someday, though, so I didn't. The advantage is that we would only have
one BufferData call per slice, rather than one per miplevel and slice.
However, a GenerateMipmaps microbenchmark indicates that either way is
basically just as fast. So I'm not sure it's worth bothering.
Improves performance in a GenerateMipmaps microbenchmark by nearly 5x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Almost the exact same code appeared twice, and it needs to expand to
handle additional texture targets. Refactor it to tidy up the code and
avoid duplicating more work in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
KHR_debug
Also update dispatch sanity removing ARB_debug_output checks and
removing KHR_debug placeholders as the checks have already been added
V2: Make sure we exit case statements with conditional breaks rather than
just dropping through.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the a build breakage caused by
6974eb9076 on build configurations where
all the following are true:
1. radeonsi is not being built
2. r600g is being built
3. opencl is disabled
4. --enable-r600-llvm-compiler is not being used
5. libelf is not installed
v2:
- Add $(RADEON_CFLAGS) to libllvmradeon_la_CFLAGS
Tested-by: Brian Paul <brianp@vmware.com>
And move its definition into r600_pipe_common.h; This struct is a just
a container for shader code and has nothing to do with LLVM.
v2:
- Drop unrelated Makefile change
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
A user would have no idea what "_glthread_" is. This removes the
last remaining instance of the _glthread_ string in Mesa.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
To check that the st_mesa_format_to_pipe_format() and
st_pipe_format_to_mesa_format() functions correctly convert
all corresponding Mesa/Gallium formats.
This found that MESA_FORMAT_YCBCR_REV was missing in
st_mesa_format_to_pipe_format(). Fixed that too.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: rotate in gen_rect_verts instead
v3: clear rotate in vl_compositor_clear_layers,
update calc_drawn_area as well
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Christian König <christian.koenig@amd.com>
It's never used, and it's equivalent to ralloc_parent(ht) if you really
need it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
To match PIPE_FORMAT_R8G8B8A8_SRGB.
v2: fix component name copy&paste bugs
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We've had several problems now with FinishRenderTexture not getting called
enough, and we're ready to just give up on it ever doing what we need. In
particular, an upcoming Steam title had rendering bugs that could be fixed
by always_flush_cache=true.
Instead of hoping Mesa core can figure out when we need to flush our
caches, just track what BOs we've rendered to in a set, and when we render
from a BO in that set, emit a flush and clear the set.
There's some overhead to keeping this set, but most of that is just
hashing the pointer -- it turns out our set never even gets very large,
because cache flushes are so common (even on cairo-gl).
No statistically significant performance difference in cairo-gl (n=100),
despite spending ~.5% CPU in these set operations.
v1: (Original patch by Eric Anholt.)
v2: (Changes by Ken Graunke.)
- Rebase forward from May 7th 2013 -> March 4th 2014.
- Drop the FinishRenderTexture hook entirely; after rebasing the
patch, the hook was just an empty function.
- Move the brw_render_cache_set_clear() call from
intel_batchbuffer_emit_flush() to brw_emit_pipe_control_flush().
In theory, this could catch more cases where we've flushed.
- Consider stencil as a possible texturing source.
v3: (changes by anholt):
- Move set_clear() back to emit_mi_flush() -- it means we can drop
more forced flushes from the code. In the previous location, it
wouldn't have been called when we wanted pre-gen6.
- Move the set clear from batch init to reset -- it should be empty at
the start of every batch, since the kernel handled any inter-batch
flush for us.
v4: Drop the debug code in set.c that I accidentally committed.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com> [v2]
With a non-debug build, gcc has two complaints:
1. 'found' var not used. Silence with '(void) found;'
2. 'id' not initialized. It's assigned by the UniformHash->get()
call, actually. But init it to zero to silence gcc.
Reviewed-by: Matt Turner <mattst88@gmail.com>
sizeof(scissor) returns the size of the full array rather than a single
element. Fix it to consider just the one element.
Fixes: 0705fa35 ("st/mesa: add support for GL_ARB_viewport_array (v0.2)")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The texture formats of winsys fbo are always linear becase the st manager
(st/dri for example) could not know the colorspace used. But it does not mean
that we cannot make the fbo sRGB-capable. By
- setting rb->Visual.sRGBCapable to GL_TRUE when the pipe driver supports the
format in sRGB colorspace,
- giving rb an sRGB internal format, and
- updating code to check rb->Format instead of strb->texture->format,
we should be good.
Fixed bug 75226 for at least llvmpipe and ilo, with no piglit regression.
v2: do not set rb->Visual.sRGBCapable for GLES contexts to avoid surprises
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75226
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
We need the header setup to not be predicated on which pixels are
undiscarded. I'm not sure originally if I had thought that the mask
disable implied predicate disable, or if I had just misread the mask
disable as predicate disable. Either way, I know I had spent more time
thinking about this in the gen8 generator than the gen7 generator.
Plus, it turns out that I had mis-implemented the "the GPU will use the
predicate unless this header is present" comment, by skipping setting up
the pixel mask when the header was present.
Fixes GPU hangs in piglit glsl-fs-discard-mrt, Trine, Trine 2 and
preusmably MLL.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75207
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It was really only used in the radeon driver for a debug printf.
And evidently, libGL.so referenced it just to work around some sort
of linker issue.
This patch removes the two calls to the function and the function
itself.
Fixes undefined _glthread_GetID symbol in libGL reported by 'nm'.
Though, the missing symbol doesn't cause any issues on my system but
it does cause glxinfo to fail on one of our test systems.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Before, it was kind of ugly to set the multisample fields with
assignments after we called _mesa_init_teximage_fields().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
If scissor optimization is used (to avoid bringing scissored portions of
the render target into GMEM and then back out to system memory) in
combination with hw binning pass, the result would be a scissor mismatch
between binning pass and rendering pass. This would cause rendering
bugs in some scenarios with (for example) gnome-shell.
I would have expected that simply using the correct screen-scissor
during the binning pass would be enough, but seems like there is
something else missing. So for now disable binning pass if scissor
optimization is used.
Most of the logic refers to the local variable 'mt' directly but
a few cases use 'intelObj->mt' instead. These are the same for
now but will be different once stencil miptree gets used.
v2 (Ian): fixed also indentation in surrounding lines
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
On earlier hardware, we had to implement math in the shader to translate
Y-tiled or untiled coordinates to W-tiled coordinates (which is what
BLORP does today in order to texture from stencil buffers).
On Broadwell, we can simply state that it's W-tiled in SURFACE_STATE,
and adjust the pitch. This is much easier.
In the surface state code, I chose to handle the "should we sample depth
or stencil?" question separately from the setup for sampling from
stencil. This should make it work with the BindRenderbufferTexImage
hook as well, and hopefully be reusable for GL_ARB_texture_stencil8
someday.
v2: Update docs/GL3.txt (caught by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While the GL_ARB_stencil_texturing extension does not allow the creation
of stencil textures, it does allow shaders to sample stencil values
stored in packed depth/stencil textures.
Specifically, applications can call glTexParameter* with a pname of
GL_DEPTH_STENCIL_TEXTURE_MODE and value of either GL_DEPTH_COMPONENT or
GL_STENCIL_INDEX to select which component they wish to sample. The
default value is GL_DEPTH_COMPONENT (for traditional depth sampling).
Shaders should use an unsigned integer sampler (presumably usampler2D)
to access stencil data. Otherwise, results are undefined. Using shadow
samplers with GL_STENCIL_INDEX selected also is undefined behavior.
This patch creates a new gl_texture_object field, StencilSampling, to
indicate that stencil should be sampled rather than depth. (I chose to
use a boolean since I figured it would be more convenient for drivers.)
It also introduces the [Get]TexParameter code to get and set the value,
and of course the extension plumbing.
v2: Also consider textures incomplete when sampling stencil with
non-NEAREST min/mag filters (caught by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The loader infrastructure for everything but DRI2 requires that udev be
present, so we can figure out an appropriate driver from the fd. We don't
have a portable solution yet, but presumably it will have similar lookup
based on the device node.
It will also be even more required for krh's udev-based hwdb support,
which lets us have a loader that actually loads DRI drivers not included
in the loader's source distribution.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75212
Reviewed-by: Matt Turner <mattst88@gmail.com>
Because in draw we always inject position at slot 0 whenever
fragment shader would take the maximum number of inputs (32) it
meant that we had PIPE_MAX_ATTRIBS + 1 slots to translate, which
meant that we were crashing with fragment shaders that took
the maximum number of attributes as inputs. The actual max number
of attributes we need to translate thus is PIPE_MAX_ATTRIBS + 1.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
draw_current_shader_* functions return a final output when considering
both the geometry shader and the vertex shader. But when code generating
vertex shader we can not be using output slots from the geometry shader
because, obviously, those can be completely different. This fixes a
number of very non-obvious crashes.
A side-effect of this bug was that sometimes the vertex shading code
could save some random outputs as position/clip when the geometry
shader was writing them and vertex shader had different outputs at
those slots (sometimes writing garbage and sometimes something correct).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
From OpenGL 3.3 spec, page 141:
"Textures with a base internal format of DEPTH_COMPONENT or DEPTH_STENCIL
require either depth component data or depth/stencil component data.
Textures with other base internal formats require RGBA component data.
The error INVALID_OPERATION is generated if one of the base internal
format and format is DEPTH_COMPONENT or DEPTH_STENCIL, and the other
is neither of these values."
Fixes Khronos OpenGL CTS test failure: proxy_textures_invalid_size
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From OpenGL 4.0 spec, page 398:
"The initial internal format of a texel array is RGBA
instead of 1. TEXTURE_COMPONENTS is deprecated; always
use TEXTURE_INTERNAL_FORMAT."
Fixes Khronos OpenGL CTS test failure: proxy_textures_invalid_size
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Starting with llvm-3.5svn r202574, LLVM expects C+11 mode.
commit f8bc17fadc8f170c1126328d203f0dab78960137
Author: Chandler Carruth <chandlerc@gmail.com>
Date: Sat Mar 1 06:31:00 2014 +0000
[C++11] Turn off compiler-based detection of R-value references, relying
on the fact that we now build in C++11 mode with modern compilers. This
should flush out any issues. If the build bots are happy with this, I'll
GC all the code for coping without R-value references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202574 91177308-0d34-0410-b5e6-96231b3b80d8
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
`--enable-llvm-shared-libs` option was recently renamed as
`--with-llvm-shared-libs`, but several error messages still mention the
old option, causing confusing.
Trivial.
GetCurrentThread() returns a pseudo-handle (a constant which only makes
sense when used within the calling thread) and not a real handle.
DuplicateHandle() will return a real handle, but it will create a new
handle every time we call. Calling DuplicateHandle() here means we will
leak handles, which can cause serious problems.
In short, the Windows implementation of thrd_t needs a thorough make
over, and it won't be pretty. It looks like C11 committee
over-simplified things: it would be much better to have seperate objects
for threads and thread IDs like C++11 does.
For now, just comment out the thrd_current() implementation, so we get
build errors if anybody tries to use it.
Thanks to Brian Paul for spotting and diagnosing this problem.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
u_thread_self() expects thrd_current() to return a unique numeric ID
for the current thread, but this is not feasible on Windows.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
r600_translate_colorformat is rewritten to look like radeonsi.
r600_translate_colorswap is shared with radeonsi.
r600_colorformat_endian_swap is consolidated.
This adds some formats which were missing. Future "plain" formats will
automatically be supported.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
With commit 0432aa064b(configure: use shared-glapi when more than one
gl* API is used) we removed "disable shared-glapi when building without
dri" hunk.
In the good old days of classic mesa, dri and xlib-glx were mutually
exclusive thus the hunk made sense.
Currently enable-dri is used as a synonym for a range of things thus
it's more appropriate to handle xlib-glx explicitly.
Fixes a missing symbol '_glapi_Dispatch' in a xlib powered libGL,
build using the following
./autogen.sh --enable-xlib-glx --disable-dri --with-gallium-drivers=swrast
Cc: Brian Paul <brianp@vmware.com>
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The _glthread_LOCK/UNLOCK_MUTEX() macros are just wrappers around
the c11 mutex functions. Let's start getting rid of those wrappers.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Update the comments for the packed formats to accurately reflect the
layout of the bits in the pixel. For example, for the packed format
MESA_FORMAT_R8G8B8A8, R is in the least significant position while A
is in the most-significant position of the 32-bit word.
v2: also fix MESA_FORMAT_A1B5G5R5_UNORM, per Roland.
The spec incorrectly used void as return type, when it should have
been GLboolean. This has now been fixed. According to Nvidia, their
implementation always used GLboolean.
Reviewed-by: Christian König <christian.koenig@amd.com>
This is just a simple implementation that stores the extra values into the DRIimage
struct and just uses the fd importer. I haven't looked into what is required
to import YUV or deal with the extra parameters.
Signed-off-by: Dave Airlie <airlied@redhat.com>
i965 recently moved debug printfs to use stderr, including ones which
trigger on MESA_GLSL=dump. This resulted in scrambled output.
For drivers using ir_to_mesa, print_program was already using stderr,
yet all the code around it was using stdout.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
opt_algebraic was translating lrp(x, 0, a) into add(x, -mul(x, a)).
Unfortunately, this references "x" twice, which is invalid in the IR,
leading to assertion failures in the validator.
Normally, cloning IR solves this. However, "x" could actually be an
arbitrary expression tree, so copying it could result in huge piles
of wasted computation. This is why we avoid reusing subexpressions.
Instead, transform it into mul(x, add(1.0, -a)), which is equivalent
but doesn't need two references to "x".
Fixes a regression since d5fa8a9562, which isn't in any stable
branches. Fixes 18 shaders in shader-db (bastion and yofrankie).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The logic to count number of block outputs was out of sync with the
actual array construction. But to simplify / make things less fragile,
we can just allocate the arrays for worst case size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A value may be assigned on only one side of an if/else. In this case we
can simply substitute a mov.f32f32.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add option to generate fragment shader to emulate two sided color.
Additional inputs are added to shader for BCOLOR's (on corresponding to
each COLOR input). CMP instructions are used to select whether to use
COLOR or BCOLOR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If vertex writes pointsize, there are a few extra bits we need to turn
on in the cmdstream here and there.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that we have the infrastructure for shader variants, add support to
generate an optimized shader for hw binning pass (with varyings/outputs
other than position/pointsize removed). This exposes the possibility
that the shader uses fewer constants than what is bound, so we have to
take care to not emit consts beyond what the shader uses, lest we
provoke the wrath of the HLSQ lockup!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes anything that tries to use gl_FrontFacing/gl_FragCoord. Also,
face support is needed to emulate two sided color.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
An unused input might not have a register assigned. We don't want bogus
regid to result in impossibly high max_reg..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
BRW_MAX_TEX_UNIT is the static limit on the number of textures we
support per-stage, not in total.
Core's `Unit` array is sized by MAX_COMBINED_TEXTURE_IMAGE_UNITS, which
is significantly larger, and across the various shader stages, up to
ctx->Const.MaxCombinedTextureImageUnits elements of it may be actually
used.
Fixes invisible bad behavior in piglit's max-samplers test (although
this escalated to an assertion failure on HSW with texture_view, since
non-immutable textures only have _Format set by validation.)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes glGetTexImage() when converting from MESA_FORMAT_Z32_FLOAT_S8X24_UINT
to GL_UNSIGNED_INT_24_8. Hit by the piglit
ext_packed_depth_stencil-getteximage test.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Though it won't matter on Linux, use _mesa_align_free to release it.
Since i965 doesn't have sys_buffer, I overlooked this in the
GL_ARB_map_buffer_alignment work a few months ago. Fixes i915 (and
presumably i830) regressions in ARB_map_buffer_range tests and the
failure in arb_map_buffer_alignment-sanity_test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74960
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There's no reason to have more vertex texture units than fragment
texture units on this hardware. Since increasing the default maximum
number of texture units from 16 to 32, this has triggered some segfault
in i915 driver. There's probably some array or bitfield that isn't
properly sized now. This really papers over the bug, but I don't think
I'll lose any sleep over that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74071
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: vec4_visitor::pack_uniform_registers(): Use correct comparison in the
assert, this->uniforms is already adjusted. Compare the actual value used to
index uniform_size and uniform_vector_size instead.
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Don't add function parameters, pass the required size in
prog_data->nr_params.
v3:
- Use the name uniform_array_size instead of uniform_param_count.
- Round up when dividing param_count by 4.
- Use MAX2() instead of taking the maximum by hand.
- Don't crash if prog_data passed to vec4_visitor constructor is NULL
v4: Rebase for current master
v5 (idr): Trivial whitespace change.
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71254
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLX can be either dri or xlib based, while enable_dri is
used in a variety of contexts.
With enable_dri_glx the context is clearly visible.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
All hardware drivers including the virtual vmwgfx require
the drm pipe-loader in order to be properly loaded by xa,
gbm and opencl.
Note this does _not_ add support for the above three it only
allows the pipe driver to be loaded by the library.
Eg. GBM will now properly open the pipe-i915 driver, should
one be working on the such hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75453
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Building to provide accelration using swrast does not make
sense.
Note: update your build script to explicitly mention svga
in the gallium drivers list, if you are building the vmwgfx
xa library.
v2: Update error message to provide more clarify, add an example.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Recent patch converted our logic to use test -n and test -z.
An emptry string variable (empty_str="") return true for both
thus making the check unreliable.
Fix this by correctly setting the variable when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The issue is caused by a thinko that an empty string will be
considered of zero length by 'test'. This is not the case,
thus we were building the 'core' of megadrivers even when no
classic drivers were built.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
glGetTexImage(GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8) was just
using memcpy() instead of _mesa_unpack_uint_24_8_depth_stencil_row()
to convert texels from the hardware format to the GL format.
Fixes issue reported by David Meng at Intel. The new piglit
ext_packed_depth_stencil-getteximage test checks for this bug.
Also, add some format/type assertions. We don't yet handle the
GL_FLOAT_32_UNSIGNED_INT_24_8_REV type. That should be fixed in
a follow-on patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
API is always API_OPENGL_COMPAT (since commit 4e4a537ad5,
"meta: Push into desktop GL mode when doing meta operations."),
so most of these checks do nothing.
We could instead check save->API to only bother setting/restoring
relevant GL state, but I'm not sure saving a few _mesa_set_enable
calls is worth the complexity. My understanding is the point of
the ctx->API guards was to avoid raising GL errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_meta_begin(), we switch to API_OPENGL_COMPAT, then munge a lot
of state (including some that doesn't exist in the actual API - like
PolygonStipple in API_OPENGL_CORE).
It seems reasonable that in _mesa_meta_end(), we should restore it,
then switch back to the original API. This at least makes it symmetric.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Some formats can't be handled - in particular cannot handle ints/uints formats,
which lack the pack_rgba_float/unpack_rgba_float functions. Instead of trying
to call these (and crash) return an error (I'm not sure yet if we should try
to translate such formats too here might not make much sense).
v2: suggested by Jose, use separate checks for pack/unpack of rgba_8unorm and
rgba_float functions (right now if one exists the other should as well).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are currently only two VUE map layouts: one for Gen4-5, and one
for everything else. We keep having to add new "case N+1" labels for
every new hardware generation, and so far it's always been the same.
This patch makes it so we only have to do work in the case where
something actually changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the BSpec's 3D workarounds page, this is unnecessary on
shipping Haswell hardware, and was never necessary on Broadwell. It
unfortunately doesn't say anything about Baytrail.
The workaround database confirms those results for Ivybridge, Haswell,
and Broadwell. Baytrail is less clear - one page says it's necessary,
while the other says it isn't. For now, be conservative and leave it
enabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it easy to compare output between different cards, especially
for ones that you don't have (and/or not in the current machine).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This should pave the way to being able to use the compiler without a
context. Also leads to cleaner code.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
By adding "#define gvec4 %svec4" to the top of our fragment shader, we
can write generic code without needing to specialize it to vec4, ivec4,
or uvec4 via asprintf.
This also makes the INT and UNSIGNED_INT merge function code identical,
so I combined those two cases.
It's not a big savings, but a little bit tidier.
v2: Rebase on Vinson's MSVC build fixes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes fbo-clear-formats GL_ARB_depth_texture on Ironlake, which
regressed since commit f128bcc7c2
("i965: Drop mt->levels[].width/height.") intel_miptree_copy_slice was
calling minify(.., 7) on a 2x2 texture with mt->first_level == 7.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75292
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tt's kind of a trap---calling do_common_optimization() after
lower_instructions() may cause opt_algebraic() to reintroduce
ir_triop_lrp expressions that were lowered, effectively defeating the
point. Because of this, nobody uses it.
v2: Delete more code (caught by Ian Romanick).
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
When the vec4 backend encountered an ir_triop_lrp, it always emitted an
actual LRP instruction, which only exists on Gen6+. Gen4-5 used
lower_instructions() to decompose ir_triop_lrp at the IR level.
Since commit 8d37e9915a ("glsl: Optimize open-coded lrp into lrp."),
we've had an bug where lower_instructions translates ir_triop_lrp into
arithmetic, but opt_algebraic reassembles it back into a lrp.
To avoid this ordering concern, just handle ir_triop_lrp in the backend.
The FS backend already does this, so we may as well do likewise.
v2: Add a comment reminding us that we could emit better assembly if we
implemented the infrastructure necessary to support using MAC.
(Assembly code provided by Eric Anholt).
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75253
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Similar to u_blitter, u_upload_mgr is now a client of the pipe context. Its
creation needs to be delayed until the context has been (almost) initialized.
The sort priorites for GLX_SAMPLES and GLX_SAMPLE_BUFFERS are
not defined in GL_ARB_multisample, but they are defined in
the GLX 1.4 specification.
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The default values for GLX_DRAWABLE_TYPE and GLX_RENDER_TYPE are
GLX_WINDOW_BIT and GLX_RGBA_BIT respectively, as specified in
the GLX 1.4 specification.
This fixes the glx-choosefbconfig-defaults piglit test.
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This variable is no longer needed after the cleanup to the
code prior to the first arrays of array series
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On my gentoo system, llvm libs are in /usr/lib64/llvm, and llvm-config
--ldflags does not provide the rpath (it does, of course, provide a -L).
This adds the llvm dir to the rpath. It should be harmless if the path
is a system path, and should make things work when it's not.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
This will be used for changing texture properties without modifying
pipe_resource like r600g, but not in this series. For now, this change
allows consolidation of pipe_surface functions.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
db_z_info was unused. This just renames the variable to match the register
name.
Now, db_depth_info is unused on Evergreen.
Both variables will be needed on SI though.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
OpenGL allows a buffer to be mapped only once, but we also map buffers
internally, e.g. in the software primitive restart fallback, for PBOs,
vbo_get_minmax_index, etc. This has always been a problem, but it will
be a bigger problem with persistent buffer mappings, which will prevent
all Mesa functions from mapping buffers for internal purposes.
This adds a driver interface to core Mesa which supports multiple buffer
mappings and allows 2 mappings: one for the GL user and one for Mesa.
Note that Gallium supports an unlimited number of buffer and texture
mappings, so it's not really an issue for Gallium.
v2: fix unmapping in xm_dd.c, remove the GL errors there
v3: fix the intel driver (by Fredrik)
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2: dropped the error that DYNAMIC_STORAGE is required for MAP_WRITE_BIT,
the error is removed in the latest revision of GL 4.4
This adds support for GL_ARB_texture_gather, and one step of
support for GL_ARB_gpu_shader5.
This adds support for passing the TG4 instruction, along
with non-constant texture offsets, and tracking them for the
optimisation passes.
This doesn't support native textureGatherOffsets hw, to do that
you'd need to add a CAP and if set disable the lowering pass,
and bump the MAX offsets to 4, then do the i0,j0 sampling using
those.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to gallium for a TG4 instruction,
and two CAPs. The first CAP is required for GL_ARB_texture_gather.
The second CAP is required to expose GL_ARB_gpu_shader5.
However so far we haven't found any hardware that natively
exposes the textureGatherOffsets feature from GL, so just
lower it for now. If hardware appears for this we can add
another CAP to allow TG4 to take 4 offsets.
v2: add component selection src and a cap to say
hw can do it. (st can use to help control
GL_ARB_gpu_shader5/GLSL 4.00). Add docs.
v3: rename to SM5, add docs.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This lowering pass will be useful for gallium drivers as well, in order to support
the GL TG4 oddity that is textureGatherOffsets.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The offsets will be stored in the handles parameter. This makes
it possible to use sub-buffers.
v2:
- Style fixes
- Add support for constant sub-buffers
- Store handles in device byte order
v3:
- Use endian helpers
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Current automake build does not try to resolve undefined
symbols thus we could end up with a broken library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
With the introduction of the pipe_loader_sw_probe_dri helper we
require the sw/dri winsys during linking stage despite it being
unused by any of the targets. This will cause a minor increase
in the resulting library which will be cleaned up via linker
options with upcoming patches.
v2: Link with libswdri.la only when available.
Reported-and-tested-by: Tom Stellard <thomas.stellard@amd.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
While looking at bug 75356, I've noticed that the presence of
x11 egl platform pulls in sw/xlib as "needed" but fails to
report so at the end of configure.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The above function implies using the the xlib winsys, which
has additional library dependencies that should not be forced.
Make the software xlib pipe loader optional thus avoid all
the dependency hell. A user that wishes to use the particular
pipe-loader would need to set the following within configure.ac.
enable_gallium_xlib_loader=yes
v2:
- Wrap sw/xlib/xlib_sw_winsys.h to handle compilation on systems
lacking X11 headers. Spotted by Christian Prochaska.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75356
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The blitter is completely ignorant of MSAA buffer layouts, so any
attempt to use BLT paths with MSAA buffers is likely to break
spectacularly.
In most cases, BLORP handles MSAA blits, so we never hit this bug.
Until recently, it also wasn't worth fixing, since Meta couldn't handle
MSAA either, so there was nothing to fall back to. But now there is.
+143 piglit tests on Broadwell (which doesn't have BLORP support).
Surprisingly, three also start failing. Since non-IMS MSAA buffers
store samples in successive array slices, using the blitter ought to
access sample 0 and ignore the rest, which is apparently good enough for
a few not-very-picky Piglit tests. Presumably the meta replacement code
is still broken.
No Piglit changes on Ivybridge.
v2: Move the early return to the top of the function (suggested by
Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Using generic shaders caused a measurable fps drop, which was isolated to
use of full precision (vs half precision) output. This is an attempt to
regain that lost performance by using half precision solid/blit shaders
(when the output format is not float32).
Note: for the built-in shaders, I would not expect them to be register
starved. And in fact it is the solid frag shader that seems to have the
biggest impact. So I suspect you get double the pixel pipe units (or
half the cycles) when the output is half precision. So there may be
some gain to using half precision output for application shaders as
well, even though the rest of register usage is still full precision.
But for half precision to work for more complex shaders, we need to deal
with some constraints, like cat2 needing same precision for it's two src
registers. So for now it is not enabled by default except for the
built-in shaders.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Start putting in place infrastructure to deal with multiple shader
variants. Initially we'll use this for two sided color (frag) and
binning pass (vert) shaders. Possibly need for others later (such
as YUV vs RGB eglImage?).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Easier than making more extensive use of rpt, and the more compact
shaders seem to bring some bit of performance boost. (Perhaps repeat
flag benefits are more than just instruction cache, possibly it saves
on instruction decode as well?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead in the common code, construct these shaders from TGSI. For now
we let a2xx keep it's hand coded shaders, as it's compiler isn't quite
up to the job yet. All the same it is a net drop in code size and gets
rid of special cases.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Make things configurable, and tweak the API a bit to avoid an extra
tgsi_shader_scan(). Getting closer to something generic which can be
moved out of freedreno and shaderd by other drivers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
... over the one provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the headers.
Currently both versions are identical, but that is not
guaranteed to be the case in the future.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the spec.
Currently both versions are identical, but that is not
guaranteed to be the case in the future.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the version number provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note the member function releaseTexBuffer was added without
bumping spec version, and currently no drivers implement it.
v2: releaseTexBuffer was introduced by version 3
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
While we want to be able to print to stdout for glsl_compiler, for
debugging drivers we want to be able to dump to stderr because that's
where other driver debug (like LIBGL_DEBUG) tends to go, and because some
apps actually close stdout to shut up their own messages (such as the X
Server, or NWN).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was only going to get worse when tesselation shows up, and was
causing too much extra duplication in my stderr changes coming up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Unfortunately there's only one RT_ARRAY_MODE setting for all
attachments, so clears were previously truncated to the minimum number
of layers any attachment had. Instead set the RT_ARRAY_MODE to 512 (the
max number of layers) before doing the clear. This fixes
gl-3.2-layered-rendering-clear-color-mismatched-layer-count.
Also fix clears of individual layered rt/zeta, in case it ever happens.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Use tex->bo_format instead of zs->format in ilo_blitter_rectlist_clear_zs()
because the latter may be combined depth/stencil format. hiz_can_clear_zs()
is no-op for GEN7+, but move the GEN check so that the assertions are tested.
Finally, call the fast depth clear function from ilo_clear().
Using this emit function implicitly creates three copies, which
is pointlessly inefficient.
1. Code creates the original instruction.
2. Calling emit(fs_inst) copies it into the function.
3. It then allocates a new fs_inst and copies it into that.
The second could be eliminated by changing the signature to
fs_inst(const fs_inst &)
but that wouldn't eliminate the third. Making callers heap allocate the
instruction and call emit(fs_inst *) allows us to just use the original
one, with no extra copies, and isn't much more of a burden.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
These functions (modulo emit_lrp, necessitating the small fix-up) pass
these arguments by value unmodified to other functions. No point in
making an additional copy.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The C99 spec says the type of an enum is implementation defined (but can
be char, signed int, or unsigned int). gcc appears to always give enums
four bytes, even when they can fit in less. It does so because this is
what other compilers seem to do [0] and therefore to maintain ABI
compatibility with them.
gcc has an -fshort-enum flag that tells the compiler to use only as much
space as needed for an enum. Adding __attribute__((__packed__)) to an
enum definition has the same behavior, but on a per-enum basis.
brw_reg_type and register_file are not part of the ABI, so we can safely
mark them as PACKED so that they'll take only a byte, rather than four.
[0] http://gcc.gnu.org/onlinedocs/gcc/Non-bugs.html#index-fshort-enums-3868
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes this SCons build error introduced with commit
4f37e52f37.
Compiling src/gallium/targets/libgl-xlib/xlib.c ...
src/gallium/targets/libgl-xlib/xlib.c:35:42: fatal error: state_tracker/xlib_sw_winsys.h: No such file or directory
#include "state_tracker/xlib_sw_winsys.h"
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75347
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
v2: Handle null_sw_create failure, add missing function return type
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
Will be used in the following commits.
v2: Link gallium tests against the library.
v3: Handle dri_create_sw_winsys failure
v4: Rebase on top of the targets/xa changes
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v2)
Will be used in the upcoming patches.
v2: handle xlib_create_sw_winsys failure, drop unneeded header
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
The sw pipe-loader implicitly handles winsys_create, thus we
it would make sense to implicitly destroy it upon releasing
the loader.
Currently we leak the sw_winsys when releasing the pipe-loader.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We don't need an array mapping the shader index to "sampler2DMS",
"isampler2DMS", and so on. We can simply do "%ssampler2DMS" and pass in
vec4_prefix, which is "", "i", or "u".
This eliminates the use of C99 array initializers and should fix the
MSVC build.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75344
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_texture_object's Target field is never a cube face enumeration, so
target_to_target is just the identity function. Aptly named, at least.
I verified this by putting an assert(!"ZOMG, CUBES!") in the cube face
case, and running Piglit. Nothing ever hit it. Beyond that, I
inspected the code in mesa/main.
This could probably also be deleted from i915, but I haven't tested
there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fixes this SCons build error.
build/linux-x86_64-debug/mesa/libmesa.a(context.os): In function `init_attrib_groups':
src/mesa/main/context.c:815: undefined reference to `_mesa_init_pipeline'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Fix build error introduced with commit f4c13a890f.
CC pixeltransfer.lo
main/pipelineobj.c: In function '_mesa_delete_pipeline_object':
main/pipelineobj.c:59:4: error: unknown type name 'unsinged'
unsinged i;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This was originally included in another patch, but it was split out by
Ian Romanick.
v2 (idr):
* Trivial reformatting.
* Remove GL_COMPUTE_SHADER. Compute shaders don't participate in pipeline
objects anyway. Suggested by Matt Turner.
v3 (idr):
* Use _mesa_has_geometry_shaders.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This was originally included in another patch, but it was split out by
Ian Romanick.
v2 (idr): Return early from _mesa_ActiveShaderProgram if
_mesa_lookup_shader_program_err returns an error. Suggested by Jordan.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v2]
This will allow the guts of the implementation to be shared with
_mesa_CreateShaderProgramv.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Implement IsProgramPipeline based on the VAO code.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Implement GenProgramPipelines based on the VAO code.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Implement DeleteProgramPipelines based on the VAO code.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
V1:
* Extend gl_shader_state as pipeline object state
* Add a new container gl_pipeline_shader_state that contains
binding point of the previous object
* Update mesa init/free shader state due to the extension of
the attibute
* Add an init/free pipeline function for the context
V2:
* Rename gl_shader_state to gl_pipeline_object
* Rename Pipeline.PipelineObj to Pipeline.Current
* Formatting improvement
V3 (idr):
* Split out from previous uber patch.
* Remove '#if 0' debug printfs.
V4 (idr):
* Fix some errors in comments. Suggested by Jordan.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Future patches will use this function outside shaderapi.c.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Nothings implemented yet but glProgramUniform* which are mostly a
copy/paste of the older function glUniform*
I create dedicated pipelineobj.[ch] file that will contains function
related to the "new" pipeline container object.
V2: formatting improvement
V3:
* indentation fix
* Update copyright
* Add a comment on ProgramParameteri already present in another extension
* Remove TODO, will be readded on correct patch
V4 (idr):
* Fix dispatch_sanity unit test
* Make extension string available in core profiles (instead of just
compatibility).
* Trivial reformating
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
GL_ARB_separate_shader_objects adds the ability to specify location
layouts for interstage inputs and outputs.
In addition, this extension makes 'in' and 'out' generally available for
shader inputs and outputs. This mimics the behavior of
GL_ARB_explicit_attrib_location.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Current behaviour states that shared-glapi is usefull when building
with dri, which is not the case. Shared-glapi is used to dispatch
the gl* functions across the one or more gl api's which can be dri
based but do not need to be.
Fixed the following build
./configure --enable-gles2 --disable-dri --enable-gallium-egl \
--with-egl-platforms=fbdev --with-gallium-drivers=swrast
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75098
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit ee55500c22a(configure: cleanup classic dri drivers handling)
cleaned up the logic handling autodetection of dri drivers, but missed
the case when one can explicitly disable dri, and still request opengl.
Fixes build issues for the following
./autogen.sh --disable-dri --with-gallium-drivers=swrast
While we're here, explicitly clear with_dri_drivers whenever building
without such drivers to prevent choking later on.
v2: Simplify with_dri_drivers handling.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75126
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Compared to i965, the code generated doesn't use the AVG instruction. But
I'm not sure that multisampled integer resolves are really that important
to worry about.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are non-stretched, non-resolving blits, so it's just a matter of
sampling once from our gl_SampleID and storing that to our color/depth.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're disabling GL_MULTISAMPLE, so we didn't need to worry about a lot of
that state. But to do MSAA to MSAA blits, we need to start handling more
state.
v2: Fix pasteo caught by Kenneth.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Blending of values would occur when doing GL_LINEAR filtering with
scaling, and in an upcoming commit when doing MSAA resolves.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that this doesn't handle GL_EXT_multisample_scaled_blit yet. The
i965 code for that extension bakes in knowledge of the sample positions
(well, knowledge of the sample positions aligned to a lower-resolution
grid), which we would have to do at runtime somehow for meta.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We haven't been executing this code before the meta-blit case, because
we've been flagging the miptree as validated at texstorage time, and never
having to revalidate.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Identified by Valgrind memory check. Initialized block-opaque in a
different patch. This test seems unnecessary. If opaque must be true,
just set to true.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Define some additional convenience operators, clean up the
implementation slightly, and rename it to 'intrusive_ptr' for reasons
that will be obvious in the next commit.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
I'd neglected to port these to Broadwell. Most of this code is copy
and pasted from Gen7, but instead of using F32TO16/F16TO32, we just
use MOV with HF register types.
Fixes fs-packHalf2x16 and fs-unpackHalf2x16 tests (both the ARB
extension and ES 3.0 variants).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell removed the F32TO16 and F16TO32 instructions. However, it has
actual support for HF values, so they're actually just MOV.
Fixes vs-packHalf2x16 and vs-unpackHalf2x16 tests (both the ARB
extension and ES 3.0 variants).
v2: Emulate F32TO16's align16 zeroing bug, since Chad's front end code
relies on it happening. We can probably refactor this code to be
better later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_init_state() calls brw_upload_initial_gpu_state(). If hardware
contexts are enabled (brw->hw_ctx != NULL), this will upload some
initial invariant state for the GPU. Without hardware contexts, we
rely on this state being uploaded via atoms that subscribe to the
BRW_NEW_CONTEXT bit.
Commit 46d3c2bf4d accidentally moved
the call to brw_init_state() before creating a hardware context.
This meant brw_upload_initial_gpu_state would always early return.
Except on Gen6+, we stopped uploading the initial GPU state via
state atoms, so it never happened.
Fixes a regression since 46d3c2bf4d.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To make sure that both the Gen4 and Gen7 style messages work, I
initially disabled the SHADER_OPCODE_GEN7_SCRATCH_READ optimization,
ran Piglit, re-enabled it, and ran Piglit again. Both worked fine.
Fixes 40 Piglit tests (most of the varying-packing category).
v2: Move num_regs assertion from gen8_fs_generator to
gen8_set_dp_scratch_message() (suggested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The new accessors will make it easy to do Gen7-style scratch messages.
v2: Move num_regs assertion from gen8_fs_generator into
gen8_set_dp_scratch_message() (suggested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the past, 3DSTATE_PS took an absolute number of threads. Conversely,
on Broadwell you always program 64, and it implicitly scales based on
the GT-level with no special programming. So, I stored 64 in
brw_device_info::max_wm_threads.
However, I didn't realize that we also use max_wm_threads to compute the
size of the scratch space buffer. In that case, we really need the
absolute number of threads.
This patch hardcodes 3DSTATE_PS to use the value it expects, and changes
max_wm_threads back to a (completely fake) absolute thread count (once
again copied from Haswell).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Broadwell, g0.5 contains the "Scratch Space Pointer"; using OR
puts some bits of that into "ignored" sections of our message header.
While this doesn't hurt, it's also not terribly /useful/. Using MOV
is sufficient to set the only interesting bits in this part of the
message header.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the latest documentation, any PIPE_CONTROL with the
"Command Streamer Stall" bit set must also have another bit set,
with five different options:
- Render Target Cache Flush
- Depth Cache Flush
- Stall at Pixel Scoreboard
- Post-Sync Operation
- Depth Stall
I chose "Stall at Pixel Scoreboard" since we've used it effectively
in the past, but the choice is fairly arbitrary.
Implementing this in the PIPE_CONTROL emit helpers ensures that the
workaround will always take effect when it ought to.
Apparently, this workaround may be necessary on older hardware as well;
for now I've only added it to Broadwell as it's absolutely necessary
there. Subsequent patches could add it to older platforms, provided
someone tests it there.
v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits
are set (suggested by Ian Romanick).
v3: Prefix the function with "gen8" (requested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Eric Anholt <eric@anholt.net>
Grab the parsed invocation count, check for consistency
during linking, and finally save the result in
gl_shader_program Geom.Invocations.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
_mesa_glsl_parse_state in_qualifier->invocations will store the
invocations count.
v3:
* Use in_qualifier to allow the primitive to be specied
separately from the invocations count (merge_qualifiers)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We introduce a new merge_in_qualifier ast_type_qualifier
which allows specialized handling of merging input layout
qualifiers.
By merging layout qualifiers into state->in_qualifier, we
allow multiple input qualifiers. For example, the primitive
type can be specified specified separately from the
invocations count (ARB_gpu_shader5).
state->gs_input_prim_type is moved into state->in_qualifier->prim_type
state->gs_input_prim_type_specified is still processed separately
so we can determine when the input primitive is specified. This
is important since certain scenerios are not supported until after
the primitive type has been specified in the shader code.
v4:
* Merge with compute shader input layout qualifiers
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Improves performance of a dolphin emulator trace I had laying around by
3.60131% +/- 0.995887% (n=128).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We generate steaming piles of these for the centroid workaround, and this
quickly cleans them up.
total instructions in shared programs: 1591228 -> 1590047 (-0.07%)
instructions in affected programs: 26111 -> 24930 (-4.52%)
GAINED: 0
LOST: 0
(Improved apps are l4d2, csgo, and dolphin)
Reviewed-by: Matt Turner <mattst88@gmail.com>
The previous code relied on cpu denorm support for converting small float
formats (such r11g11b10_float and r16_float) to floats, otherwise denorms
are flushed to zero. We worked around that in llvmpipe blend code by
reenabling denorms, but this did nothing for texture sampling. Now it would
be possible to reenable it there too but I'm not really a fan of messing
with fpu flags (and it seems we can't actually do it reliably with llvm in
any case looking at some bug reports). (Not to mention if you actually have
a lot of denorms in there, you can expect some order-of-magnitude slowdown
with x86 cpus.)
So instead use code which adjusts exponents etc. directly hence not relying
on cpu denorm support for the rescaling mul.
(We still need the fpu flag handling as we can't do float-to-smallfloat
without using cpu denorms at least for now - I actually wanted to keep
both the old and new code and using one or the other depending on from where
it's called but that didn't work out as the parameter would have to be passed
through too many layers than I'd like.)
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
We need to advertise 8x, 4x, and 2x multisamples. Previously, we only
claimed to support 0/1 samples.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I can't find any documentation to explain what ought to be done here, so
I simply guessed based on the pattern I observed in the 4x/8x cases.
It appears to work, but it could be totally wrong.
I was able to find the Sandybridge PRM quote from the comments in the
latest documentation: Shared Functions > 3D Sampler > Multisampled
Surface Behavior. However, it only mentions 4x MSAA - not even 8x.
After a substantial amount more digging, I was able to find a second
page (incorrectly tagged) which confirmed the formulas in our code for
8x MSAA. However, that page didn't mention 2x MSAA at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
According to the "Point Multisample Rasterization" of the OpenGL
specification (3.0 or later), smooth points are supposed to be enabled
implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag.
However, if GL_POINT_SPRITE is enabled, you get square points no matter
what. Core contexts always enable point sprites, so this effectively
makes smooth points go away, even in the case of multisampling.
Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests.
(Yes, that's right folks, we actually have Piglit tests for this.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The meaning and effects of this bit are surprisingly complicated.
See Rasterization > Windower > Multisampling > Multisample ModesState.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This restriction carries forward from earlier platforms. The code is
ported straight from gen7_wm_state.c.
v2: Actually do it right.
v3: Add missing _NEW_MULTISAMPLE bit (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2: Also set the "oMask Present to Render Target" bit, which is required
for shaders that write oMask. Otherwise the hardware won't expect
the extra data.
v3: Add missing _NEW_MULTISAMPLE (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I made a few changes which I think simplify the code a bit compared to
the Gen7 implementation, but which are largely pointless.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We already set the number of samples, but were missing the MSAA layout
mode. Reusing gen7_surface_msaa_bits makes it easy to set both.
This also lets us drop the Gen8 surface_num_multisamples function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The enumerations are just log2(num_samples) shifted by 3, which we can
easily compute via ffs().
This also makes it reusable for Broadwell, which has 2x MSAA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These enumerations are simply log2 of the number of multisamples shifted
by a bit, so we can calculate them using ffs() in a lot less code.
Suggested by Eric Anholt.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The const in
const unsigned foo(void);
is meaningless. Removing it silences this warning:
src/glsl/ast_to_hir.cpp:1802:56: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
From page 14 (page 20 of the PDF) of the GLSL 1.10 spec:
"In addition, all identifiers containing two consecutive underscores
(__) are reserved as possible future keywords."
The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos. Names simply containing __ are dangerous to use, but should
be allowed.
Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and the
GLSL ES spec (all versions) say:
"All macro names containing two consecutive underscores ( __ ) are
reserved for future use as predefined macro names. All macro names
prefixed with "GL_" ("GL" followed by a single underscore) are also
reserved."
The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos. Since every extension adds a name prefixed with GL_ (i.e.,
the name of the extension), that should be an error. Names simply
containing __ are dangerous to use, but should be allowed. In similar
cases, the C++ preprocessor specification says, "no diagnostic is
required."
Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
Linking with LLVM static libraries is easily broken by changes to
the llvm-config program or when LLVM adds, removes, or changes library
components. Keeping up with these changes requires a lot of maintanence
effort to keep the build working on the master and stable branches.
Also, because of issues in the past LLVM static libraries, the release
manager is currently configuring with --with-llvm-shared-libs when
checking the build before release. Enabling shared libraries by
default would allow the release manager to run ./configure with
no arguments, and be reasonably confident that the build would succeed.
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Useful because the total number of uniform components might exceed
MAX_UNIFORMS * 4 in some cases because of the image metadata we'll be
passing as push constants.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Like the VEC4 back-end does. It will make dynamic allocation of the
param_size array easier in a future commit.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since we are now consuming two ringbuffers at a time, we probably want a
pool larger than 4.. but we don't need each individual ringbuffer to be
so large, so offset the pool size increase by reducing rb size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It seems the write-after-read hazard that applies to texture fetch
instructions, also applies to sfu instructions.
Also, cat5/cat6 instructions do not have a (ss) bit, so in these
cases we need to insert a dummy nop instruction with (ss) bit set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
There doesn't seem to be any reason for it to be a method, and it's
surprising that the expression 'reg.retype(t)' doesn't retype its
object but rather it creates a temporary with the new type. Use
'retype(reg, t)' instead.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Add assertion that the register is not in the HW_REG or IMM file,
calculate the conjunction of the old and new mask instead of replacing
the old [consistent with the behavior of brw_writemask(), causes no
functional changes right now], make it static inline to let the
compiler do a slightly better job at optimizing things, and shorten
its name.
v2: Assert that the new writemask is not zero to avoid undefined
hardware behaviour.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
And define non-mutating helper functions to retype fixed and normal
regs with a common interface. At some point we may want to get rid of
::fixed_hw_reg completely and have fixed regs use the normal register
data members (e.g. backend_reg::reg to select a fixed GRF number,
src_reg::swizzle to store the swizzle, etc.), I have the feeling that
this is not the last headache we're going to get because of the
multiple ways to represent the same thing and the different register
interface depending on the file a register is stored in...
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There doesn't seem to be any reason for nr_params, nr_pull_params,
param, and pull_param to be duplicated in the stage-specific
subclasses of brw_stage_prog_data. Moving their definition to the
common base class will allow some code sharing in a future commit, the
removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and
the simplification of the stage-specific brw_*_prog_data_compare.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Broadwell's 3DSTATE_WM_HZ_OP packet makes this much easier.
Instead of programming the whole pipeline, we simply have to emit the
depth/stencil packets, a state override, and a pipe control. Then
arrange for the state to be put back. This is easily done from a single
function.
v2: Use minify(mt->logical_{width,height}0, level) in 3DSTATE_WM_HZ_OP
instead of intel_mipmap_level's width/height fields. Those were
based on the physical width/height, and thus wrong for MSAA buffers.
Eric also deleted those fields.
v3: Use 0xFFFF as the sample mask regardless of what the user set (as
this operation is unrelated); set the drawing rectangle to the
miplevel being operated on, rather than the whole surface; remove
unnecessary MAX2(..., 1) around mt->logical_depth0 (all suggested
by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The existing code followed the vtable function signature, which is not a
great fit: many of the parameters are unused, and the function still
inspects global state, making it less reusable.
This patch refactors the depth buffer packet emission code into a new
function which takes exactly the parameters it needs, and which uses no
global state. It then makes the existing vtable function call the new
one.
Ideally, we would remove the vtable function, and clean up that
interface. But that can happen once HiZ is working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell's "HiZ Resolve" operation still has the restriction that the
rectangle primitive must be 8x4 aligned. So I believe we still need
this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
HiZ buffers still don't exist, but when they do, we'll set them up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_depthbuffer_format is not very reusable at the moment, since it
uses global state (ctx->DrawBuffer) to access a particular depth buffer.
For HiZ on Broadwell, I need a function which simply converts the
formats. However, at least one existing user of brw_depthbuffer_format
really wants the existing interface. So, I've created a new function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 1456ed85f0.
_eglInitResource can and is supposed to be called on subclass objects.
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This reverts commit 498d10e230.
_eglInitResource can and is supposed to be called on subclass objects.
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Even with the other limits raised, TestProxyTexImage would still reject
textures > 1GB in size. This is an artificial limit; nothing prevents
us from having a larger texture. I stayed shy of 2GB to avoid the
larger-than-aperture situation.
For 3D textures, this raises the effective limit:
- RGBA8: 645 -> 738
- RGBA16: 512 -> 586
- RGBA32F: 406 -> 465
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Gen4+ supports 8192x8192 cube maps. Ivybridge and later can actually
support 16384, but that would place GL_MAX_CUBE_MAP_TEXTURE_SIZE above
GL_MAX_TEXTURE_SIZE, which seems like a bad idea.
(Unfortunately, we can't bump GL_MAX_TEXTURE_SIZE to 16384 without
causing regressions due to awful W-tiled stencil buffer interactions.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_ARB_ES2_compatibility doesn't say anything about shader linking
when one of the shaders (vertex or fragment shader) is absent. So,
the extension shouldn't change the behavior specified in GLSL
specification.
Tested the behavior on proprietary linux drivers of NVIDIA and AMD.
Both of them allow linking a version 100 shader program in OpenGL
context, when one of the shaders is absent.
Makes following Khronos CTS tests to pass:
successfulcompilevert_linkprogram.test
successfulcompilefrag_linkprogram.test
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This drops the MOVs for header setup, which are totally mis-scheduled.
total instructions in shared programs: 1590047 -> 1589331 (-0.05%)
instructions in affected programs: 43729 -> 43013 (-1.64%)
GAINED: 0
LOST: 0
glb27-trex:
x before
+ after
+-----------------------------------------------------------------------------+
| + x xx + + + |
| ++ + xxx ++x xx + ** *x+ + + + x * |
|+x xx x* x+++xx*x*xx+++*+*xx++** *x* x+***x*+xx+* + * + + *|
| |__|__________MA___A___________|___| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 49 62.33 65.41 63.49 63.53449 0.62757822
+ 50 62.28 65.4 63.7 63.6982 0.656564
No difference proven at 95.0% confidence
Reviewed-by: Matt Turner <mattst88@gmail.com>
It often confused people because it was unclear on whether it was the
physical or logical, and people needed the other one as well. We can
recompute it trivially using the minify() macro, clarifying which value is
being used and making getting the other value obvious.
v2: Fix a pasteo in intel_blit.c's dst flip.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since only window system renderbuffers can have a singlesample_mt, this
lets us drop a bunch of sanity checking to make sure that we're just a
renderbuffer-like thing.
v2: Fix a badly-written comment (thanks Kenneth!), drop the now trivial
helper function for set_needs_downsample.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only DRI2 vs DRI3 delta was just how to decide about frontbuffer-ness
for doing the upsample.
v2: Fix missing singlesample_mt->region->name update in the merged code,
which would have broken the DRI2 don't-recreate-the-miptree
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pretty silly to pass in values dereferenced out of one of the arguments.
v2: Get the destination size from the dst, even though the callers are
always dealing with src size == dst size cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
So far it's happened to be that we're only ever calling
intel_miptree_blit() (up/downsampling) from the ReadBuffer, but I stumbled
over a null ReadBuffer case when debugging later parts of the series.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mostly mt->target == 2D_MS just results in a few checks that we don't try
to allocate multiple LODs and don't try to do slice copies with them. But
with the introduction of binding renderbuffers to textures, we need more
consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lets us simplify our shaders, and rely on GLES-prohibited
functionality (like ARB_texture_multisample) when writing these
driver-internal functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes:
xa_tracker.c: In function 'xa_tracker_create':
xa_tracker.c:147:5: error: implicit declaration of function 'pipe_loader_drm_probe_fd' [-Werror=implicit-function-declaration]
in some build configurations, as XA now implicitly depends on
gallium_drm_loader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Fixes radeonsi emitting command streams to the kernel even when there
have been no draw calls before a flush, potentially powering up the GPU
needlessly.
Incidentally, this also cuts the runtime of piglit gpu.py in about half
on my Kaveri system, probably because an X11 client going away no longer
always results in a command stream being submitted to the kernel via
glamor.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65761
Cc: "10.1" mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Required for libdrm 2.4.37 and earlier. Both scons and automake
require version 2.4.38 now so that guard is not longer needed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
xorg-server and libkms is no longer required since the removal
of the xorg state-tracker.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is the first version that introduced DRM_CAP_PRIME, which is
implicitly required by egl/wayland.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
* Make sure that only drivers that are handled by configure.ac
are included in DRI_DIRS.
* Change with_dri_drivers default value to auto, and set enable
autodetection, when enable_opengl is on.
v2: Move "test" to the correct location.
v3: Squash DRI_DIRS handling before the switch statement.
Suggested by Ilia Mirkin
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Both x86_64|amd64 and *bsd, already set the full range of available
classic dri drivers. Drop the explicit assignment, and fall back to
the generic default.
Keep explicit list from plafroms/arches that do not handle the default
list.
Update help strings, to explicitly mention "classic" for applicable
DRI drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Move all the cases within one switch statement and handle
i9{1,6}5 and r{adeon,200} independently.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If built without llvm, the following error occurs with mplayer:
Failed to open VDPAU backend .../libvdpau_r600.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
[vo/vdpau] Error when calling vdp_device_create_x11: 1
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
DRM_API_HANDLE_TYPE_SHARED is zero, so doesn't actually fix anything.
But we shouldn't rely on SHARED handle type being zero.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Build two versions of pipe-loader, with only the client version linking
in x11 client side dependencies. This will allow the XA state tracker
to use pipe-loader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Seems texture sample instructions don't immediately consume there
src(s). In fact, some shaders from blob compiler seem to indiciate that
it does not even count the texture sample instructions when calculating
number of delay slots to fill for non-sample instructions. (Although so
far it seems inconclusive as to whether this is required.)
In particular, when a src register of a previous texture sample
instruction is clobbered, the (ss) bit is needed to synchronize with the
tex pipeline to ensure it has picked up the previous values before they
are overwritten.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Was supposed to be a '+', otherwise we end up with a negative offset and
choosing registers below the assigned range.
This seems to fix the scheduling mystery "solved" by adding in extra
delay slots.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since 'kill' does not produce a result, the new compiler was happily
optimizing them out. We need to instead track 'kill's similar to
outputs. But since there is no non-predicated kill instruction,
(and for flattend if/else we do want them to be predicated), we need
to track the topmost branch condition on the stack and use that as src
arg to the kill. For a kill at the topmost level, we have to generate
an immediate 1.0 to feed into the cmps.f for setting the predicate
register.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Thanks to figuring out 32bit float render target, and adding regdump
test in fdre-a3xx, I can more easily play around with instructions to
figure out range of inputs/outputs/etc. And from this I can conclude
that cmps.f works more like expected and I can do something much more
simple in trans_cmp() (compared to before which was more closely
emulating the instruction sequence of the blob compiler).
And using sel.b32 (binary 0/1) often makes more sense than sel.f32
(+/- float) or sel.u32 (+/- uint) as it can use the output directly
from cmps.f without needing the 'add.s tmp0, tmp0, -1'.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
As documented, the _mesa_free_shader_program_data function:
"Frees all the data that hangs off a shader program object, but not
the object itself."
This means that this function may be called multiple times on the same object,
(and has been observed to). Meanwhile, the shProg->Label field was not being
set to NULL after its free(). This led to a second call to free() of the same
address on the second call to this function.
Fix this by setting this field to NULL after free(), (just as with all other
calls to free() in this function).
Reviewed-by: Brian Paul <brianp@vmware.com>
CC: mesa-stable@lists.freedesktop.org
The linux winsys needs to know whether a surface is shared.
For guest-backed surfaces we need this information to avoid allocating a
mob out of the mob cache for shared surfaces, but instead allocate a shared
mob, that is never put in the mob cache, from the kernel.
Also previously, all surfaces were given the "shareable" attribute when
allocated from the kernel. This is too permissive for client-local surfaces.
Now that we have the needed info, only set the "shareable" attribute if the
client indicates that it needs to share the surface.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
In some situations, it may be desirable to bypass the cache at buffer
creation but to insert the buffer in the cache at buffer destruction.
One such situation is where we already have a kernel representation of a
buffer that we want to use, but we also want to insert it in the cache when
it's freed up.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
In some situations it's important to restrict the sizes of buffers that the
cached buffer manager is allowed to return
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This adds new interface functions for guest-backed surfaces and
adds a mobid parameter to the surface_relocation() function.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
The old svga3d_reg.h file is split into separate header files and we
add new items for guest-backed surfaces.
Plus some minor code fixes because of renamed symbols.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Commit f4ebcd133b ("dri/nouveau: NV17_3D class is not available for
NV1a chipset") fixed this partially by using the correct 3d class.
However there were a lot of checks left over comparing against the
chipset.
Reported-and-tested-by: John F. Godfrey <jfgodfrey@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 9.2 10.0 10.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2 (chk): fix eos handling
v3 (leo): implement scaling configuration support
v4 (leo): fix bitrate bug
v5 (chk): add workaround for bug in Bellagio
v6 (chk): fix div by 0 if framerate isn't known,
user separate pipe object for scale and transfer,
always flush the transfer pipe before encoding
v7 (chk): make suggested changes, cleanup a bit more,
only advertise encoder on supported hardware
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
v2: add fw version query
v3: add README.VCE
v4: avoid error msg when kernel doesn't support it
Signed-off-by: Christian König <christian.koenig@amd.com>
Commit 246ca4b001 ("nv50: implement multiple viewports/scissors, enable
ARB_viewport_array") added dirty tracking to scissors/viewports. However
it neglected to mark them all as dirty on a context switch. This fixes
an apparent regression in webgl in chrome, but probably in any
application that switches contexts.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The bound range is disconnected from the viewport dimensions. This is
the relevant bit from glViewportArray:
"""
The location of the viewport's bottom left corner, given by (x, y) is
clamped to be within the implementaiton-dependent viewport bounds range.
The viewport bounds range [min, max] can be determined by calling glGet
with argument GL_VIEWPORT_BOUNDS_RANGE. Viewport width and height are
silently clamped to a range that depends on the implementation. To query
this range, call glGet with argument GL_MAX_VIEWPORT_DIMS.
"""
Just set it to +/-16384, as that is the minimum required by
ARB_viewport_array and the value that all current drivers provide.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This avoids a CopyTexImage() on Intel i965 hardware without blorp.
v2: Move the !readAtt check up higher.
v3: Rebase on idr's changes, plus readAtt check is totally gone, and also
fix a typo in a comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
This will let us use meta's acceleration from renderbuffers without having
to do a CopyTexImage first.
This is like what we do for TFP, but just taking an existing renderbuffer
and binding it to a texture with whatever its format was. The
implementation won't work for stencil renderbuffers, and it only does
non-texture renderbuffers (but then, if you're using a texture
renderbuffer, you can just pull the texture object/level/slice out of the
renderbuffer, anyway).
v2: Don't forget to propagate NumSamples to the teximage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function is only handling the color case. We can just unindent as
long as we're willing to do the check for the bit outside of the
function.
v2: Rebase on idr's changes, drop readAtt check that's always non-null
anyway (it's a pointer into to the statically-allocated attachments
array in the renderbuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
I want split some meta.c code off to a separate file, so these functions
can't be static any more.
v2: Rebase on idr's changes, also expose setup_blit_shader,
blit_shader_table_cleanup, setup_vertex_objects,
setup_ff_tnl_for_blit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
I'd like to split some of our code to separate files, since 4k lines and
growing is pretty unreasonable for all these separate operations.
v2: Rebase on idr's changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
There was this funny argument passed to setup for "did alloc decide we
need to allocate new texture storage?", which goes away if we don't have
the caller do alloc as a separate step.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the GL_ARB_fbo spec:
If the source and destination buffers are identical, and the
source and destination rectangles overlap, the result of the blit
operation is undefined.
As far as I know, that's the only thing that would have been of concern
for this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I tripped over one of these when debugging meta, and it's a lot nicer to
just see the internalFormat being complained about.
v2: Drop a note in the other errors path that there is one early return.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
While these structs are generated per GLSL sampler type, they're structs
of data-about-shaders (notably, the ID of a shader program), not
data-about-samplers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Everyone was just immediately calling it and doing nothing else with the
shader program id.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The only thing that wants to track the glsl_sampler structure is the
shader string generator.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of the VEC4 back-end agrees on src_reg::swizzle being one of the
BRW_SWIZZLE macros defined in brw_reg.h, except in two places where we
use Mesa's SWIZZLE macros. There is even a doxygen comment saying
that Mesa's macros are the right ones. They are incompatible swizzle
representations (3 bits vs. 2 bits per component), and the code using
Mesa's works by pure luck. Fix it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The same effect can be achieved using ::subreg_offset. Remove the
less flexible alternative and define a convenience function to keep
the fs_reg interface sane.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The same effect can be achieved using a combination of ::stride and
::subreg_offset. Remove the less flexible ::smear to keep the data
members of fs_reg orthogonal.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Some improvements for copy propagation with non-contiguous
register strides and mismatching types.
v3: Add example of the situation that the copy propagation changes are
intended to avoid. Clarify that 'fs_reg::apply_stride()' is expected
to work with zero strides too.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It would be nice if we could have a single 'reg_offset' field
expressed in bytes that would serve the purpose of both, but the
semantics of 'reg_offset' are quite complex currently (it's measured
in units of one, eight or sixteen dwords depending on the register
file and the dispatch width) and changing it to bytes would be a very
intrusive change at this stage. Add a separate 'subreg_offset' field
for now.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
To fix MSVC compile breakage. Evidently, _restrict is an MSVC keyword,
though the docs only mention __restrict (with two underscores).
Note: we may want to also rename _volatile to volatile_flag to be
consistent.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74900
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix formatting, add new comments, get rid of extraneous indentation.
Suggested by Ian in bug 74723.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Set driver-specified flag in NewDriverState when glUniform* is
used to bind an image unit.
v3: Abbreviate argument type check.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Because of the combinatorial explosion of different image built-ins
with different image dimensionalities and base data types, enumerating
all the 242 possibilities would be annoying and a waste of .text
space. Instead use a special path in the built-in builder that loops
over all the known image types.
v2: Generate built-ins on GLSL version 4.20 too. Rename
'_has_float_data_type' to '_supports_float_data_type'. Avoid
duplicating enumeration of image built-ins in create_intrinsics()
and create_builtins().
v3: Use a more orthodox approach for passing image built-in generator
parameters.
v4: Cosmetic changes.
Acked-by: Paul Berry <stereotype441@gmail.com>
No opaque types may be statically initialized in the shader, all
opaque variables must be declared uniform or be part of an "in"
function parameter declaration, no opaque types may be used as the
return type of a function.
v2: Add explicit check for opaque types in interface blocks. Check
for opaque types in ir_dereference::is_lvalue().
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Aggregating images inside uniform blocks is explicitly disallowed by
the standard, aggregating them inside structures is not (as of GL
4.4), but there is a similar problem as with atomic counters: image
uniform declarations require either a "writeonly" memory qualifier or
an explicit format qualifier, which are explicitly forbidden in
structure member declarations. In the resolution of Khronos bug
#10903 the same wording applied to atomic counters was decided to mean
that they're not allowed inside structures -- Rejecting image member
declarations within structures seems the most reasonable option for
now.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Add comment next to the read_only and write_only qualifier flags.
Change temporary copies of the type qualifier mask to use uint64_t
too.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Add predicates to query if a GLSL type is or contains an image.
Rename sampler_coordinate_components() to coordinate_components().
v2: Use assert instead of unreachable.
v3: No need to use a separate code-path for images in
coordinate_components() after merging image and sampler fields in
the glsl_type structure.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Reuse the glsl_sampler_dim enum for images. Reuse the
glsl_type::sampler_* fields instead of creating new ones specific
to image types. Reuse the same constructor as for samplers adding
a new 'base_type' argument.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This fixes bug 73200 "vdpau-GL interop fails due to different screen
objects" in the same way radeon does.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Only export __driDriverExtensions by default, and radeon_drm_winsys_create on radeons.
Remove -Bsymbolic which should no longer be needed.
As a side effect, it ought to fix a manifestation of bug 73200 on radeon.
Signed-off-by: Maarten Lankhorst<maarten.lankhorst@canonical.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixed piglit test getteximage-targets S3TC CUBE_ARRAY on systems that
don't have libtxc_dxtn installed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be necessary to support cubemap array textures because they
use all four components.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rectangle textures were not necessary for mipmap generation (because
they cannot have mipmaps), but all of the future users of this common
code will need to support rectangle textures.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is quite like code we want for blits. Pull it out so that it can
be shared by other paths.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will allow the same table of shader-per-sampler-type to be used for
paths in meta other than just mipmap generation. This is also the
reason the declarations of the structures was moved towards the top of
the file.
v2: Code formatting change suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also... glOrtho(-1.0, 1.0, -1.0, 1.0, -1.0, 1.0) *is* the identity
matrix, so drop the unnecessary call to _mesa_Ortho.
v2: Rename setup_ff_TNL_for_blit() to setup_ff_tnl_for_blit(). Seems
silly to capitalize one out of two to three acronyms in the name
(change by anholt, acked by idr).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
I set the "address modify enable" bit in the wrong DWord. The first
DWord is the high 16 bits of the address, while the second is the low
32-bits and enable bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We never set it on previous generations, but I had to set it in
3DSTATE_PS for correct behavior. For symmetry, I set it in 3DSTATE_VS
as well, but there's no actual need to do so. Piglit works fine either
way. The documentation also remarks that there should never be a need
to program this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
My earlier patch (i965: Reserve space for "Vertex Count" in GS outputs.)
incremented Global Offset for most URB writes to make room for the new
"Vertex Count" field, but failed to shift the URB writes used for
writing control bits.
Confusingly, Global Offset must be incremented by 2 here, rather than 1.
The URB writes we use for actual data are HWord writes, which treat
Global Offset as a 256-bit offset. These are OWord writes, so it's
treated as a 128-bit offset instead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I added support for these on Haswell, but forgot to update the Broadwell
code before landing it. Fixes Piglit's max-samplers test.
v2: Use get_element_ud() for the destination as well as the source.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I added support for these on Haswell, but forgot to update the Broadwell
code before landing it. Partially fixes Piglit's max-samplers test.
v2: Use get_element_ud() consistently, rather than using it for the
source but using brw_vec1_grf for the destination..
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
MOV_RAW disables masking, but doesn't force the instruction to be
uncompressed. That needs to be done by hand.
Fixes textureGather and texture offset tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Almost every driver already supported it. All current and future
Gallium drivers always support it, and most existing classic drivers
support it.
This only changes radeon and nouveau.
This extension only adds data types that can be passed to, for example,
glTexImage2D. It does not add internal formats. Since you can already
pass GL_FLOAT to glTexImage2D this shouldn't pose any additional issues
with those drivers. Note that r200 and i915 already supported this
extension, and they don't support floating-point textures either.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Half-float TexBOs should require both GL_ARB_half_float_pixel and
GL_ARB_texture_float. This doesn't matter much in practice. Every
driver that supports GL_ARB_texture_buffer_object already supports
GL_ARB_half_float_pixel. We only expose the TexBO extension in core
profiles, and those require GL_ARB_texture_float.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
drivers/common/meta.c: In function '_mesa_meta_CopyTexSubImage':
drivers/common/meta.c:3744:52: warning: unused parameter 'rb' [-Wunused-parameter]
Unfortunately, the parameter can't just be removed because it is part of
the dd_function_table::CopyTexSubImage interface.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
drivers/common/meta.c: In function 'setup_drawpix_texture':
drivers/common/meta.c:1572:30: warning: unused parameter 'texIntFormat' [-Wunused-parameter]
setup_drawpix_texture has never used this paramater. Before the
refactor commit 04f8193aa it was used in several locations. After that
commit, texIntFormat was only used in alloc_texture.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
All of the other meta routines have a particular pattern for creating
and tracking the VAO and VBO. This one function deviated from that
pattern for no apparent reason.
Almost all of the code added in this patch will be removed shortly.
v2: Drop glDeleteBuffers() of the old, now-uninitialized vbo variable.
Fixes getteximage-formats and fbo-mipmap-copypix regression when "2"
landed in the variable (change by anholt).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Final intermediate step leading to some code sharing. Note that the new
GemerateMipmap and decompress vertex structures are the same as the new vertex
structure in BlitFramebuffer and the others.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step leading to some code sharing. Note that the new DrawPixels
vertex structure is the same as the new vertex structure in BlitFramebuffer
and the others.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step leading to some code sharing. Note that the new Clear
vertex structure is the same as the new BlitFramebuffer and CopyPixels
vertex structure.
The "sizeof(float) * 7" hack is temporary. It will magically disappear
in a just a couple more patches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step leading to some code sharing. Note that the new CopyPixels
vertex structure is the same as the new BlitFramebuffer vertex
structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Initial step of cleaning the exported symbols from targets/omx
- Mark omx_component_library_Setup as public
v2: Keep export-symbols-regex
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
There is no cpp files during the build process, thus we
can safely drop the unused cxxflags.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The function pointer is retrieved via VdpGetProcAddress just
like all the other vdpau functions and should not be exported.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Add VISIBILITY_CFLAGS to automake build, so that
only required symbols are exported.
v2: Rebase
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This symbol is internal and was never part of the API.
Unused by any of the gbm backends, it makes sense to
simply not export it.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
VISIBILITY_CFLAGS
Currently the library exports every symbol imaginable,
rather than the ones defined by the API.
Note: This may cause issues for libraries that are linking
agaist libgbm's internals.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Currently we create a OPENGL_COMPAT context regardless of
what was requested by the program. Correct that by retaining
the program's request and passing it into _mesa_initialize_context.
Based on a similar commit for radeon/r200 by Ian Romanick.
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add the explicit note about the required version during configure.
Require the same version (151) of udev when building the pipe-loader.
Mention the udev version requirement in GBM Requires.private.
v2: Resolve a couple of silly typos. Spotted by Ilia
v3: Cleanup platfrom/platform typo. Spotten by Stefan
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
As of recently we dlopen the library, additionally the only
code that is including the libudev.h header, is the loader.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously the linking was required due to dependency of udev in the
pipe-loader. Now this is no longer the case, as we dlopen the library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously the linking was required due to dependency of udev in the
pipe-loader. Now this is no longer the case, as we dlopen the library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
--enable-gallium-llvm is required by radeonsi. Currently we
check only for LLVM_VERSION_INT which is 0, whenever gallium-llvm
is disabled explicitly.
./configure --with-gallium-drivers=r600,radeonsi --disable-gallium-llvm
v2: Correct typo in error message. Spotted by Tom Stellard
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Matt Turner noted the incorrect order, but I somehow forgotten to
change it before pushing upstream. The other one is a typo during rebase.
Signed-off-by: Christian König <christian.koenig@amd.com>
If we don't recognize the PCI ID, we can't reasonably load the driver.
However, calling abort() is quite rude - it means the application that
tried to initialize us (possibly the X server) can't continue via
fallback paths. We already have a more polite mechanism - failing to
create the context. So, just use that.
While we're at it, improve the error message.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73024
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Consider a multithreaded program with two contexts A and B, and the
following scenario:
1. Context A calls initialize(), which allocates mem_ctx and starts
building built-ins.
2. Context B calls initialize(), which sees mem_ctx != NULL and assumes
everything is already set up. It returns.
3. Context B calls find(), which fails to find the built-in since it
hasn't been created yet.
4. Context A finally finishes initializing the built-ins.
This will break at step 3. Adding a lock ensures that subsequent
callers of initialize() will wait until initialization is actually
complete.
Similarly, if any thread calls release while another thread is still
initializing, or calling find(), the mem_ctx/shader would get free'd while
from under it, leading to corruption or use-after-free crashes.
Fixes sporadic failures in Piglit's glx-multithread-shader-compile.
Bugzilla: https://bugs.freedesktop.org/69200
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1 10.0" <mesa-stable@lists.freedesktop.org>
In the first case, we can simply call stride(mask, 16, 8, 2) rather than
creating a new register with a different stride, then immediately
changing it a second time.
In the second case, the stride was already what we wanted, so we can
just use mask without any changes at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
stride(brw_vec1_reg(...) ...) takes some register, changes the strides,
then changes the strides again. Let's do it once.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
this just ties the mesa code to the pre-existing gallium interface,
I'm not sure what to do with the CSO stuff yet.
0.2: fix min/max bounds
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds GS output and FS input support, even though FS input
support isn't supported until GLSL 4.30 from what I can see.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There are only two sensible placements for 2x MSAA samples - and one is
the mirror image of the other. I chose (0.25, 0.25) and (0.75, 0.75).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The 4x and 8x cases contained identical code for extracting the X and
Y sample offset values and converting them from U0.4 back to float.
Without this refactoring, we'd have to duplicate it a third time in
order to support 2x MSAA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Apparently some players are ill-prepared for us claiming that a decoder
exists only to have creating it fail, and express this poor preparation
with crashes (e.g. flash). Check that firmware is there to increase the
chances of there being a high correlation between reported capabilities
and ability to create a decoder.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.0 10.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
In commit eeed49f5f2, Mark accidentally
renamed MESA_FORMAT_S8_Z24 to MESA_FORMAT_Z24_UNORM_X8_UINT and
MESA_FORMAT_X8_Z24 to MESA_FORMAT_Z24_UNORM_S8_UINT, reversing their
sense. The commit message was correct, but what sed commands actually
got run didn't match that.
This patch swaps the two enum names, reversing them. This should undo
the damage, but might break things if people have manually fixed a few
instances in the meantime...
Mark's commit also failed to mention renames:
s/MESA_FORMAT_ARGB2101010_UINT\b/MESA_FORMAT_B10G10R10A2_UINT/g
s/MESA_FORMAT_ABGR2101010\b/MESA_FORMAT_R10G10B10A2_UNORM/g
but those seem okay.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
nvfx_fragprog_assign_generic only allows for up to 10/8 texcoords for
nv40/nv30. This fixes compilation of the varying-packing tests.
Furthermore it appears that the last 2 inputs on nv4x don't seem to
work in those tests, so just report 8 everywhere for now.
Tested on NV42, NV44. NV4B appears to have additional problems.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 9.1 9.2 10.0 10.1 <mesa-stable@lists.freedesktop.org>
We don't need to allocate all the state related to GL_ARB_debug_output
until some aspect of that extension is actually needed.
The sizeof(gl_debug_state) is huge (~285KB on 64-bit systems), not even
counting the 54(!) hash tables and lists that it contains. This change
reduces the size of gl_context alone from 431KB bytes to 145KB bytes on
64-bit systems and from 277KB bytes to 78KB bytes on 32-bit systems.
Reviewed-by: Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The WHILE instruction doesn't have UIP. It only has JIP.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The bits which normally contain the source register descriptions
actually contain the JIP/UIP jump targets, which we already printed.
Interpreting JIP/UIP as source registers results in some really creepy
looking output, like IF statements with acc14.4<0,1,0>UD sources.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell's 3DSTATE_CLEAR_PARAMS packet expects a floating point value
regardless of format. This means we need to stop converting it to
UNORM.
Storing the value as float would make sense, but since we already have a
uint32_t field, this patch continues shoehorning it into that. In a
sense, this makes mt->depth_clear_value the DWord you emit in the
packet, rather than the clear value itself.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Fix pasteo of an extra abs being inserted (caught by many). Rewrite
to drop the silly switch statement.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
We've had various bug reports over the years where miptrees are missing,
and when I screwed it up while adding DRI2 to the modesetting driver, I
figured I should put the info necessary for debug here.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes it work on Broadwell, too.
v2: Drop bogus double write to 3DPRIM_BASE_VERTEX register
(caught by Chris Forbes).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This saves some boilerplate and hides the OUT_RELOC/OUT_RELOC64
distinction.
Placing the function in intel_batchbuffer.c is rather arbitrary; there
wasn't really an obvious place for it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Since commit 9cee3ff562, INTEL_DEBUG=vs
has caused a NULL pointer dereference for fixed-function/ARB programs.
In the vec4 generators, "prog" is a gl_program, and "shader_prog" is the
gl_shader_program. This is different than the FS visitor.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Mesa fails to retain the precision qualifier when parsing:
#version 300 es
centroid in mediump vec2 v;
Consider how the parser's type_qualifier production is applied.
First, the precision_qualifier rule creates a new ast_type_qualifier:
<precision: mediump>
Then the storage_qualifier rule creates a second one:
<flags: in>
and calls merge_qualifier() to fold in any previous qualifications,
returning:
<flags: in, precision: mediump>
Finally, the auxiliary_storage_qualifier creates one for "centroid":
<flags: centroid>
it then does $$ = $1 and $$.flags |= $2.flags, resulting in:
<flags: centroid, in>
Since precision isn't stored in the flags bitfield, it is lost. We need
to instead call merge_qualifier to combine all the fields.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If st_GetTexImage() is to decompress the texture, avoid the fallback
path even if prefer_blit_based_texture_transfer = false. For drivers
that returned PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER = 0, we
were always taking the fallback path for texture decompression rather
than rendering a quad. The later is a lot faster.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the specification text of NV_vertex_program1_1, the upper
limit of the RCC instruction is written as 1.884467e+19 in
scientific notation, but as 0x5F800000 in binary. But the binary
version translates to 1.84467e+19 rather than 1.884467e+19 in
scientific notation.
Since the lower-limit equals 2^-64 and the binary version equals
2^+64, let's assume the value in scientific notation is a typo
and implement this using the value from the binary version
instead.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_eglInitResource() was used to memset entire _EGLSurface by
writing more than size of pointed target. This does work
as long as Resource is the first element in _EGLSurface,
this patch fixes such dependency.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_eglInitResource() was used to memset entire _EGLContext by
writing more than size of pointed target. This does work
as long as Resource is the first element in _EGLContext,
this patch fixes such dependency.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
If a driver enables ARB_gpu_shader5 and sets Const.MaxVertexSteams >= 4,
then piglit's arb_gpu_shader5-minmax test should now pass.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In the tests they were the same so it didn't matter, but indications are
that this is the correct behaviour. Also take this opportunity to
(trivially) support using gl_Layer in fp.
Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
GLX_ARB_create_context allows making a GLX context current with None
drawable and readables, but this was never implemented correctly in GLX.
We would create a __DRIdrawable for the None GLX drawable and pass that
to the DRI driver and that would somehow work. Now it's somehow broken.
The way this should have worked is that we pass a NULL DRI drawable
to the DRI driver when the GLX user calls glXMakeContextCurrent()
with None for drawable and readables.
https://bugs.freedesktop.org/show_bug.cgi?id=74143
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Flush the context when we unmap a buffer, otherwise VDPAU might
start rendering the next frame while we still reference that buffer.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: StrangeNoises (rachel@strangenoises.org)
Bugzilla (bug needs XBMC changes as well):
https://bugs.freedesktop.org/show_bug.cgi?id=73191
When VL uploads vertex buffers, it uses PIPE_TRANSFER_DONTBLOCK, which always
flushes the context in the winsys if the buffer being mapped is busy. Since
I added handling of DISCARD_RANGE, DONTBLOCK has had no effect when combined
with DISCARD_RANGE and I think the context isn't flushed anywhere else,
so no commands are submitted to the GPU until the IB is full, which takes
a lot of frames.
Using DISCARD_RANGE is not the only way to trigger this bug. The other way
is to reallocate the vertex buffer before every upload.
BTW, I'm not sure if this is the right place for flushing, but it does fix
the bug.
v2 (chk): move the flush to the right place.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: StrangeNoises (rachel@strangenoises.org)
v2: This doesn't change the behavior. It only moves the tiling check
to r600_init_resource and removes the usage parameter.
Reviewed-by: Christian König <christian.koenig@amd.com>
Featuring a full grown MPEG2 and H264 decoder and a couple of hundred bugs.
v2 (Leo): fix an error for pic_order_cnt_type 1
v3 (Leo): implement support for field decoding
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
dri2_dup_image was not copying the dri_format field.
This was causing some bugs, for example:
. we create an gbm_bo.
. we get an EGLImage from the gbm_bo.
. Bug: impossible to get again the gbm_bo from the EGLImage by
importing. (gbm dri2 backend)
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This regressed when I converted BRW_REGISTER_TYPE_* to be an abstract
type that doesn't match the hardware description. dump_instruction()
was using reg_encoding[] from brw_disasm.c, which no longer matches
(and was incorrect for Gen8+ anyway).
This patch introduces a new function to convert the abstract enum values
into the letter suffix we expect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mesa now has a real, feature-rich EGL implementation on X11 via xcb.
Therefore I believe there is no longer a practical need for the egl_glx
driver.
Furthermore, egl_glx appears to be unmaintained. The most recent
nontrivial commit to egl_glx was 6baa5f1 on 2011-11-25.
Tested by running weston-smoke in windowed Weston on X with i965.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
ureg_program is allocated on the heap so we can just bump the
number of immediates that it can handle. It's needed for d3d10.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We need to handle a lot more immediates and in order to do that
we also switch from allocating this structure on the stack to
allocating it on the heap.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We only supported up to 256 immediates, which isn't enough. We had
code which was allocating immediates as an allocated array, but it
was always used along a statically backed array for performance
reasons. This commit adds code to skip that performance optimization
and always use just the dynamically allocated immediates if the
number of them is too great.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The number of allowed temporaries increases almost with every
iteration of an api. We used to support 128, then we started
increasing and the newer api's support 4096+. So if we notice
that the number of temporaries is larger than our statically
allocated storage would allow we just treat them as indexable
temporaries and allocate them as an array from the start.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
If multiple color outputs are written, this shader is unlikely to be
useful with a winsys framebuffer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The driver is supposed to ensure buffers before any drawing operation, but in
do_blit_drawpixels() and do_blit_copypixels() we inspect the buffer format
before calling intel_prepare_render(). That was covered up by the
unconditional call to intel_prepare_render() in intelMakeCurrent(), but we
now only do this on the initial intelMakeCurrent call for a context
(to get the size for the initial viewport values).
https://bugs.freedesktop.org/show_bug.cgi?id=74083
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Alexander Monakov <amonakov@gmail.com>
This will allow testing of compute shader functionality before it is
completed.
To enable ARB_compute_shader functionality in the i965 driver, set
INTEL_COMPUTE_SHADER=1.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Document that the 3-element array MaxComputeWorkGroupCount is
indexed by dimension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Document that the 3-element array MaxComputeWorkGroupSize is
indexed by dimension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch adds MESA_SHADER_COMPUTE to the gl_shader_stage enum.
Also, where it is trivial to do so, it adds a compute shader case to
switch statements that switch based on the type of shader. This
avoids "unhandled switch case" compiler warnings.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Linker loops that iterate through all the stages in the pipeline need
to use MESA_SHADER_FRAGMENT as a bound, so that we can add an
additional MESA_SHADER_COMPUTE stage, without it being erroneously
included in the pipeline.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we were really doing F2I. And also move it to generic section.
(Note that for llvmpipe the code generated is definitely bad, due to lack
of unsigned conversions with sse. I think though what llvm does (using scalar
conversions to 64bit signed either with x87 fpu (32bit) or sse (64bit)
including lots of domain changes is quite suboptimal, could do something like
is_large = arg >= 2^31
half_arg = 0.5 * arg
small_c = fptoint(arg)
large_c = fptoint(half_arg) << 1
res = select(is_large, large_c, small_c)
which should be much less instructions but that's something llvm should do
itself.)
This fixes piglit fs/vs-float-uint-conversion.shader_test (maybe more, needs
GL 3.0 version override to run.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
This moves the value from the GS shader to the copy shader so the registers
are setup correctly.
fixes tests/spec/glsl-1.50/execution/geometry/clip-distance-out-values.shader_test
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
attempt to calculate a better value for array size to avoid breaking apps.
v2: use 0xfff like streamout, suggested by Grigori
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
cayman has a different end of program bit, so do that properly.
fixes hangs with geom shader tests on cayman.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
set the correct values so the misc out register is setup correctly
for the copy shader.
This also updates the state for the gs copy shader so the hw
gets programmed correctly.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This just makes r600 and evergreen do what the radeonsi codepaths do
for layered rendering. This makes the 2d amd_vertex_shader_layer test
pass on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This just adds support for emitting the proper value in the VS out misc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This just enables the workarounds we have for vertex/pixel shaders
for geom shaders as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This follows what fglrx does, it unpacks the input we are
going to indirect into a bunch of registers and indirects
inside them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We need to be able to write to the ring using a base register
for when we emit vertices in a loop, in theory the SB compiler
could collapse these indirect writes to direct writes if the
register value is constant and known, but that is outside my
pay grade.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Vadim's code derived it from the info.mode, but it needs
to be takes from the geometry shader output primitive.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The instance cnt register was missing for a few kernels,
with a new enough kernel we can output it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
If the shader has no CF clauses at all emit an nop
If the last instruction is an ENDLOOP add a NOP for the LOOP to go to
if the last instruction is CALL_FS add a NOP
These fix a bunch of hangs in the geometry shader tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
SB needs fixes for three GS instructions it seems to raise
them outside loops etc despite my best efforts.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Although we don't use SB on geom shaders, the VS copy shader will use it
so we might as well implement MEM_RING support in sb.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This can happen in normal operation, so don't report an error on it,
just continue.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This is Vadim's initial work with a few regression fixes squashed in.
v2: (airlied)
fix regression in glsl-max-varyings - need to use vs and ps_dirty
fix regression in shader exports from rebasing.
whitespace fixing.
v2.1: squash fix assert
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
For geometry shaders we need to call this code from a second place.
Just move it out for now to keep future patches cleaner.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From the GLSL 4.40 spec, section 6.4 (Jumps):
The continue jump is used only in loops. It skips the remainder of
the body of the inner most loop of which it is inside. For while
and do-while loops, this jump is to the next evaluation of the
loop condition-expression from which the loop continues as
previously defined.
Previously, we incorrectly treated a "continue" statement as jumping
to the top of a do-while loop.
This patch fixes the problem by replicating the loop condition when
converting the "continue" statement to IR. (We already do a similar
thing in "for" loops, to ensure that "continue" causes the loop
expression to be executed).
Fixes piglit tests:
- glsl-fs-continue-inside-do-while.shader_test
- glsl-vs-continue-inside-do-while.shader_test
- glsl-fs-continue-in-switch-in-do-while.shader_test
- glsl-vs-continue-in-switch-in-do-while.shader_test
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In addition to making it public, we also need to change its first
argument from an ir_loop * to an exec_list *, so that it can be used
to insert the condition anywhere in the IR (rather than just in the
body of the loop).
This will be necessary in order to make continue statements work
properly in do-while loops.
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is really not needed as blorp blit programs already sample
XRGB normally and get alpha channel set to 1.0 automatically by
the sampler engine. This is simply copied directly to the payload
of the render target write message and hence there is no need for
any additional blending support from the pixel processing pipeline.
The blending formula is anyway broken for color components, it
multiplies the color component with itself (blend factor is the
component itself).
Alpha blending in turn would not fix the alpha to one independent
of the source but simply used the source alpha as is instead
(1.0 * src_alpha + 0.0 * dst_alpha).
Quoting Eric:
"If we want to actually make the no-alpha-bits-present thing work,
we need to override the bits in the surface state or in the
generated code. In the normal draw path, it's done for sampling
by the swizzling code in brw_wm_surface_state.c, and the blending
overrides is just to fix up the alpha blending stage which
doesn't pay attention to that for the destination surface."
If one modifies piglit test gl-3.2-layered-rendering-blit to use
color component values other than zero or one, this change will
kick in on IVB. No regressions on IVB.
This is effectively revert of c0554141a9:
i965/blorp: Support overriding destination alpha to 1.0.
Currently, Blorp requires the source and destination formats to be
equal. However, we'd really like to be able to blit between XRGB and
ARGB formats; our BLT engine paths have supported this for a long time.
For ARGB -> XRGB, nothing needs to occur: the missing alpha is already
interpreted as 1.0. For XRGB -> ARGB, we need to smash the alpha
channel to 1.0 when writing the destination colors. This is fairly
straightforward with blending.
For now, this code is never used, as the source and destination formats
still must be equal. The next patch will relax that restriction.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This moves the intel_batchbuffer_flush before the drm_intel_bo_busy
call, which is a change in behavior. However, the old behavior was
broken.
In the future, we may want to only flush in the batchbuffer references
the BO being mapped. That's certainly more typical.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This additionally measures the time stalled, while also simplifying the
code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Mapping a buffer is a common place where we could stall the CPU.
In a few places, we've added special code to check whether a buffer is
busy and log the stall as a performance warning. Most of these give no
indication of the severity of the stall, though, since measuring the
time is a small hassle.
This patch introduces a new brw_bo_map() function which wraps
drm_intel_bo_map, but additionally measures the time stalled and reports
a performance warning. If performance debugging is not enabled, it
simply maps the buffer with negligable overhead.
We also add a similar wrapper for drm_intel_gem_bo_map_gtt().
This should make it easy to add performance warnings in lots of places.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Hw binning pass doesn't seem to have broken anything. And optimizing
compiler fixes a lot of shaders and doesn't seem to break anything. So
re-org slightly FD_MESA_DEBUG params and make both hw binning and
optimizer enabled by default.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.pnghttp://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
163b6306b1
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For the time being, keep old compiler as fallback for things that the
new compiler does not support yet. Split out as it's own commit to make
the later new-compiler commits easier to follow.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Not really used for anything anymore. So strip it out and avoid
conflicting symbols with upcoming new-compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When we clipped a line weren't copying the provoking vertex
color to the second vertex. We also weren't checking for
first vs. last provoking vertex.
Fixes failures found with the new piglit line-flat-clip-color test.
Cc: "10.0, 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This has been wrong for many years. It was originally 0x000FFFFF and long
ago there was discussion about whether GL_ALL_ATTRIB_BITS should include
the then-new GL_MULTISAMPLE_BIT bit. Eventually the ARB decided that
glPushAttrib(GL_ALL_ATTRIB_BITS) should save all current and future
attribute groups (hence ~0). Unfortunately, Mesa's gl.h was never updated.
This was just recently spotted by Eric Anholt and reported as a bug to the
ARB. Ian, Jon Leech and I discussed it at the ARB meeting and decided to
change Mesa's value to reflect the ARB's decision.
Acked-by: Eric Anholt <eric@anholt.net>
If the shader is too large, plug in a dummy shader. This patch also
reworks the existing dummy shader code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gallivm soa code supported only a single level of nesting for
control flow opcodes (if, switch, loops...) but the d3d10 spec
clearly states that those are nested within functions. To support
nesting of conditionals inside functions we need to store the
nesting data inside function contexts and keep a stack of those.
Furthermore we make sure that if nesting for subroutines is deeper
than 32 then we simply ignore all subsequent 'call' invocations.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
DirectX and most hardware documentation use the term "Index Buffer" to
refer to a buffer containing indexes into arrays of vertex data, which
allows random access to vertex data, rather than sequential access.
OpenGL uses a different term for this concept: "Element Array Buffer".
However, "Index Buffer" has become much more widespread. A quick
Google search shows 29,300 hits for "Element Array Buffer" vs.
82,300 hits for "Index Buffer."
Arguably, "Index Buffer" is clearer: an "element of an array" (or list)
usually refers to an actual item stored in the array, not the index used
to refer to it.
The terminology is also already used in Mesa: some VBO module code for
dealing with ElementArrayBufferObj names local variables "ib".
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/ElementArrayBufferObj/IndexBufferObj/g'
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For consistency with the previous renames.
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/_mesa_lookup_arrayobj/_mesa_lookup_vao/g'
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_update_vao_client_arrays() is less of a mouthful than
_mesa_update_array_object_client_arrays(), and generally clearer.
Generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/_mesa_\([^_]*\)_array_object/_mesa_\1_vao/g'
with manual whitespace and indentation fixes applied.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I considered replacing it with "gl_vao", but spelling it out seemed to
fit better with Mesa's traditional style. Mesa doesn't shy away from
long type names - consider gl_transform_feedback_object,
gl_fragment_program_state, gl_uniform_buffer_binding, and so on.
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/gl_array_object/gl_vertex_array_object/g'
v2: Rerun command to resolve conflicts with Ian's meta patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that the field is named "VAO" instead of "ArrayObj", it makes sense
to call the local variables "vao" instead of "arrayObj".
Completely generated by:
$ find . -type f -print0 | xargs 0 sed -i 's/arrayObj/vao/g'
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When reading through the Mesa drawing code, it's not immediately obvious
to me that "ArrayObj" (gl_array_object) is the Vertex Array Object (VAO)
state. The comment above the structure explains this, but readers still
have to remember this and translate accordingly.
Out of context, "array object" is a fairly vague. Even in context,
"array" has a lot of meanings: glDrawArrays, vertex data stored in user
arrays, gl_client_arrays, gl_vertex_attrib_arrays, and so on.
Using the term "VAO" immediately associates these fields with the OpenGL
concept, clarifying the situation and aiding programmer sanity.
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
-e 's/ArrayObj;/VAO;/g' \
-e 's/->ArrayObj/->VAO/g' \
-e 's/Array\.ArrayObj/Array.VAO/g' \
-e 's/Array\.DefaultArrayObj/Array.DefaultVAO/g'
v2: Rerun command to resolve conflicts with Ian's meta patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-02-03 00:52:58 -08:00
2592 changed files with 236263 additions and 105397 deletions
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=58660">Bug 58660</a> - CAYMAN broken with HyperZ on</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64471">Bug 64471</a> - Radeon HD6570 lockup in Brütal Legend with HyperZ</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66352">Bug 66352</a> - GPU lockup in L4D2 on TURKS with HyperZ</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68799">Bug 68799</a> - [APITRACE] Hyper-Z lockup with Falcon BMS 4.32u6 on CAYMAN</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71547">Bug 71547</a> - compilation failure :#error "SSE4.1 instruction set not enabled"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=72685">Bug 72685</a> - [radeonsi hyperz] Artifacts in Unigine Sanctuary</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73088">Bug 73088</a> - [HyperZ] Juniper (6770): Gone Home / Unigine Heaven 4.0 lock up system after several minutes of use</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74428">Bug 74428</a> - hyperz causes gpu hang in Counter-strike: Source</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74803">Bug 74803</a> - [r600g] HyperZ broken on RV630 (Cogs shadows are broken)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74863">Bug 74863</a> - [r600g] HyperZ broken on RV770 and CYPRESS (Left 4 Dead 2 trees corruption) bisected!</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74892">Bug 74892</a> - HyperZ GPU lockup with radeonsi 7970M PITCAIRN and Distance Alpha game</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78225">Bug 78225</a> - Compile error due to undefined reference to `gbm_dri_backend', fix attached</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78537">Bug 78537</a> - no anisotropic filtering in a native Half-Life 2</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>mesa: fix double-freeing of dispatch tables inside glBegin/End.</li>
</ul>
<p>Carl Worth (3):</p>
<ul>
<li>docs: Add MD5 sums for 10.1.3</li>
<li>cherry-ignore: Roland and Michel agreed to drop these patches.</li>
<li>VERSION: Update to 10.1.4</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>configure: error out if building GBM without dri</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>i965/vs: Use samplers for UBOs in the VS like we do for non-UBO pulls.</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nv50/ir: make sure to reverse cond codes on all the OP_SET variants</li>
<li>nv50: fix setting of texture ms info to be per-stage</li>
<li>nv50/ir: fix integer mul lowering for u32 x u32 -> high u32</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>radeonsi: Fix anisotropic filtering state setup</li>
</ul>
<p>Tom Stellard (2):</p>
<ul>
<li>configure.ac: Add LLVM_VERSION_PATCH to DEFINES</li>
<li>radeonsi: Enable geometry shaders with LLVM 3.4.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77865">Bug 77865</a> - [BDW] Many Ogles3conform framebuffer_blit cases fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78581">Bug 78581</a> - OpenCL: clBuildProgram prints error messages directly rather than storing them</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79029">Bug 79029</a> - INTEL_DEBUG=shader_time is full of lies</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79729">Bug 79729</a> - [i965] glClear on a multisample texture doesn't work</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82709">Bug 82709</a> - OpenCL not working on radeon hainan</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82814">Bug 82814</a> - glDrawBuffers(0, NULL) segfaults in _mesa_drawbuffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83079">Bug 83079</a> - [NVC0] Dota 2 (Linux native and Wine) crash with Nouveau Drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83355">Bug 83355</a> - FTBFS: src/mesa/program/program_lexer.l:122:64: error: unknown type name 'YYSTYPE'</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>radeonsi: Don't use anonymous struct trick in atom tracking</li>
</ul>
<p>Alex Deucher (2):</p>
<ul>
<li>radeonsi: add new CIK pci ids</li>
<li>radeonsi: add new SI pci ids</li>
</ul>
<p>Andreas Boll (1):</p>
<ul>
<li>winsys/radeon: fix nop packet padding for hawaii</li>
</ul>
<p>Anuj Phogat (1):</p>
<ul>
<li>i965: Bail on vec4 copy propagation for scratch writes with source modifiers</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>mesa: fix NULL pointer deref bug in _mesa_drawbuffers()</li>
</ul>
<p>Carl Worth (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.2.6 release</li>
<li>Makefile: Switch from md5sums to sha256sums</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>i965: add missing parens in vec4 visitor</li>
</ul>
<p>Emil Velikov (17):</p>
<ul>
<li>configure.ac: bail out if building gallium_gbm without gallium_egl</li>
<li>android: gallium/nouveau: fix include folders, link against libstlport</li>
<li>android: egl/main: fixup the nouveau build</li>
<li>automake: gallium/freedreno: drop spurious include dirs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77493">Bug 77493</a> - lp_test_arit fails with llvm >= llvm-3.5svn r206094</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82539">Bug 82539</a> - vmw_screen_dri.lo In file included from vmw_screen_dri.c:41: vmwgfx_drm.h:32:17: error: drm.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84140">Bug 84140</a> - mplayer crashes playing some files using vdpau output</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84662">Bug 84662</a> - Long pauses with Unreal demo Elemental on R9270X since : Always flush the HDP cache before submitting a CS to the GPU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85267">Bug 85267</a> - vlc crashes with vdpau (Radeon 3850HD) [r600]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74863">Bug 74863</a> - [r600g] HyperZ broken on RV770 and CYPRESS (Left 4 Dead 2 trees corruption) bisected!</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77865">Bug 77865</a> - [BDW] Many Ogles3conform framebuffer_blit cases fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78225">Bug 78225</a> - Compile error due to undefined reference to `gbm_dri_backend', fix attached</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78258">Bug 78258</a> - make check link_varyings.gl_ClipDistance failure</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78403">Bug 78403</a> - query_renderer_implementation_unittest.cpp:144:4: error: expected primary-expression before ‘.’ token</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78468">Bug 78468</a> - Compiling of shader gets stuck in infinite loop</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78537">Bug 78537</a> - no anisotropic filtering in a native Half-Life 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78581">Bug 78581</a> - OpenCL: clBuildProgram prints error messages directly rather than storing them</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78648">Bug 78648</a> - Texture artifacts in Kerbal Space Program</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78665">Bug 78665</a> - macros in builtin_functions.cpp make invalid assumptions about M_PI definitions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78692">Bug 78692</a> - Football Manager 2014, gameplay rendered black & white</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78716">Bug 78716</a> - Fix Mesa bugs for running Unreal Engine 4.1 Cave effects demo compiled for Linux</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78803">Bug 78803</a> - gallivm/lp_bld_debug.cpp:42:28: fatal error: llvm/IR/Module.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78888">Bug 78888</a> - test_eu_compact.c:54:3: error: implicit declaration of function ‘brw_disasm’ [-Werror=implicit-function-declaration]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79029">Bug 79029</a> - INTEL_DEBUG=shader_time is full of lies</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79095">Bug 79095</a> - x86/common_x86.c:348:14: error: use of undeclared identifier 'bit_SSE4_1'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79263">Bug 79263</a> - Linking error in egl_gallium.la when compiling 32 bit on multiarch</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79294">Bug 79294</a> - Xlib-based build broken on non x86/x86-64 architectures</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79373">Bug 79373</a> - Non-const initializers for matrix and vector constructors</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79382">Bug 79382</a> - build error: multiple definition of `loader_get_pci_id_for_fd'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79616">Bug 79616</a> - L4D2 crash on startup</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79724">Bug 79724</a> - switch statement type check</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79729">Bug 79729</a> - [i965] glClear on a multisample texture doesn't work</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79809">Bug 79809</a> - radeonsi: mouse cursor corruption using weston on AMD Kaveri</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79823">Bug 79823</a> - [NV30/gallium] Mozilla apps freeze on startup with nouveau-dri-10.2.1 libs on dual-screen</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79885">Bug 79885</a> - commit b52a530 (gallium/egl: st_profiles are build time decision, treat them as such) broke egl</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79903">Bug 79903</a> - [HSW Bisected]Some Piglit and Ogles2conform cases fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=81020">Bug 81020</a> - [radeonsi][regresssion] Wireframe of background rendered through objects in Half-Life 2: Episode 2 with MSAA enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82159">Bug 82159</a> - No rule to make target `../../../../src/mesa/libmesa.la', needed by `collision'.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82255">Bug 82255</a> - [VP2] Chroma planes are vertically stretched during VDPAU playback</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82268">Bug 82268</a> - Add support for the OpenRISC architecture (or1k)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82428">Bug 82428</a> - [radeonsi,R9 270X] System lockup when using mplayer/mpv with VDPAU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82483">Bug 82483</a> - format_srgb.h:145: undefined reference to `util_format_srgb_to_linear_8unorm_table'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82517">Bug 82517</a> - [RADEONSI,VDPAU] SIGSEGV in map_msg_fb_buf called from ruvd_destroy, when closing a Tab with accelerated video player</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82534">Bug 82534</a> - src\egl\main\eglapi.h : fatal error LNK1107: invalid or corrupt file: cannot read at 0x2E02</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82536">Bug 82536</a> - u_current.h:72: undefined reference to `__imp__glapi_Dispatch'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82538">Bug 82538</a> - Super Maryo Chronicles fails with st/mesa assertion failure</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82539">Bug 82539</a> - vmw_screen_dri.lo In file included from vmw_screen_dri.c:41: vmwgfx_drm.h:32:17: error: drm.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83355">Bug 83355</a> - FTBFS: src/mesa/program/program_lexer.l:122:64: error: unknown type name 'YYSTYPE'</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.