Compare commits

...

1330 Commits

Author SHA1 Message Date
Emil Velikov
330aa44a0d docs: add release notes for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-13 12:21:22 +02:00
Emil Velikov
e429500dd1 Update version to 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-13 12:21:21 +02:00
Sarah Sharp
bf00b36c9d mesa: Add KBL PCI IDs and platform information.
Add PCI IDs for the Intel Kabylake platforms.  The IDs are taken
directly from the Linux kernel patches, which are under review:

http://lists.freedesktop.org/archives/intel-gfx/2015-October/078967.html
http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=kbl-upstream-v2

The Kabylake PCI IDs taken from the kernel are rearranged to be in order
of GT type, then PCI ID.

Please note that if this patch is backported, the following fixes will
need to be added before this patch:

commit 28ed1e08e8 "i965/skl: Remove early platform support"
commit c1e38ad370 "i965/skl: Use larger URB size where available."

Thanks to Ben for fixing a bug around setting urb.size, and being
patient with my questions about what the various fields mean.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (KBL-GT2)
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 39c41be50d)
2016-01-08 12:05:27 +02:00
Brian Paul
fab2039588 st/mesa: check state->mesa in early return check in st_validate_state()
We were checking the dirty->st flags but not the dirty->mesa flags.
When we took the early return, we didn't clear the dirty->mesa flags
so the next time we called st_validate_state() we'd often flush the
glBitmap cache.  And since st_validate_state() is called from
st_Bitmap(), it meant we flushed the bitmap cache for every glBitmap()
call.

This change seems to recover most of the performance loss observed
with the ipers demo on llvmpipe since commit commit 36c93a6fae.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: José Fonseca <jfonseca@vmware.com>
(cherry picked from commit c28d72a347)
2016-01-08 12:05:27 +02:00
Kenneth Graunke
536c8cbcd3 nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.
The nir_opt_algebraic rule

(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),

can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv.  (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point.  So, make NIR do it for us.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 7295f4fcc2)
2016-01-08 12:05:27 +02:00
Ilia Mirkin
978480d69f nvc0: scale up inter_bo size so that it's 16M for a 4K video
Experimentally, 4M causes corruption and slowness, try to ramp it up
with size instead.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b16c9be4a5)
2016-01-08 12:05:27 +02:00
Ilia Mirkin
c2be35d309 nv50,nvc0: fix crash when increasing bsp bo size for h264
H264 doesn't have a bitplane bo. We just need a device reference, so use
the one from the client.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b5f2f7073f)
2016-01-08 12:05:27 +02:00
Marek Olšák
df47b1e078 st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels (v2)
Spotted by luck. The GLSL uniform storage is only associated once
in LinkShader and can't be reallocated afterwards, because that would
break the association.

v2: don't remove st_upload_constants calls, clarify why they're needed

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 36c93a6fae)
2016-01-08 12:05:27 +02:00
Marek Olšák
bb3581ca3d program: add _mesa_reserve_parameter_storage
The next commit will use this.

Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 294ed5cd13)
2016-01-08 12:05:27 +02:00
Ilia Mirkin
132131af6b nv50,nvc0: make sure there's pushbuf space and that we ref the bo early
First off, we can't flush in the middle of a command. Secondly
requesting the extra push space might cause a flush to happen. If that
flush happens, we'd have to do the PUSH_REFN again. So instead do
PUSH_REFN after the push space request. This helps avoid rare crashes
with supertuxkart in libdrm due to assertion failures.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c1d14c6817)
[Emil Velikov: resolve trivial conflict]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

Conflicts:
	src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
2016-01-08 12:05:27 +02:00
Kenneth Graunke
f4977656c1 nvc0: Set winding order regardless of domain.
Quads need to respect winding order, too - not just triangles.

Fixes rendering in GFXBench 4.0's tessellation benchmark.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 65d3f85eb3)
2016-01-08 12:05:27 +02:00
Kenneth Graunke
b2352d7585 glsl: Fix varying struct locations when varying packing is disabled.
varying_matches::record tries to compute the number of components in
each varying, which varying_matches::assign_locations uses to assign
locations.  With varying packing, it uses glsl_type::component_slots()
to come up with a reasonable value.

Without varying packing, it fell back to an open-coded computation
that didn't bother to handle structs at all.  I believe we can simply
use 4 * glsl_type::count_attribute_slots(false), which already handles
these cases correctly.

Partially fixes rendering in GFXBench 4.0's tessellation benchmark.
(NVE0 is almost right after this, but i965 is still mostly garbage.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7cdc2b9ca0)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/glsl/link_varyings.cpp
2016-01-08 12:05:27 +02:00
Dave Airlie
6c69561068 glsl: only update doubles inputs for vertex inputs.
This doesn't apply to other stages. This is only
used in the mesa/st code, which needs further fixes.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 1fc39dae22)
2016-01-08 12:05:27 +02:00
Dave Airlie
343fc2c3a3 glsl: fix count_attribute_slots to allow for different 64-bit handling
So vertex shader input attributes are handled different than internal
varyings between shader stages, dvec3 and dvec4 only count as
one slot for vertex attributes, but for internal varyings, they
count as 2.

This patch comments all the uses of this API to clarify what we
pass in, except one which needs further investigation

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 5dc22cadb5)
2016-01-08 12:05:26 +02:00
Dave Airlie
703ad59748 glsl/fp64: add helper for dual slot double detection.
The old function didn't work for matrices, and we need this
in other places to fix some other problems, so move to a helper
in glsl type and fix the one user so far.

A dual slot double is one that has 3 or 4 components in it's
base type.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit d97b060e6f)
2016-01-08 12:05:26 +02:00
Dave Airlie
7aab081f98 glsl: pass stage into mark function
Don't use a bool here, as for some 64-bit fixes we need
the stage.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 9fbcd8e847)
2016-01-08 12:05:26 +02:00
Kenneth Graunke
73844f6edf drirc: Disable ARB_blend_func_extended for Heaven 4.0/Valley 1.0.
Unigine Heaven 4.0 and Valley 1.0 use dual color blending but don't
specify which fragment shader output is which, so there's at best a
50/50 chance of us guessing it correctly.  This is invalid.

Unigine fixed this in 4.1 and 1.1 versions over a year and a half ago,
but hasn't actually released them for whatever reason.  So, add the
workaround back so that it works for most people.

Fixes Heaven 4.0/Valley 1.0 rendering on Ivybridge.  For whatever
reason, Broadwell worked.  4.1 and 1.1 have always worked.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4acf71c89b)
2016-01-08 12:05:26 +02:00
Ilia Mirkin
1415b6b0ae nv50/ir: float(s32 & 0xff) = float(u8), not s8
Make sure to make conversion unsigned when we're ANDing the high bits
away. Fixes corruption in dolphin.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 724134f683)
2016-01-08 12:05:26 +02:00
Grazvydas Ignotas
e468d4b1d6 r600: fix constant buffer size programming
When buffer size is less than 16, zero ends up being programmed as
size, which prevents the hardware from fetching the correct values.
Fix it by combining shift and align so that the value is always
rounded up.

Cc: "11.1 11.0 10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92229
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit da0e216e06)
2016-01-08 12:05:26 +02:00
Ilia Mirkin
b95dc1a5c8 nvc0: don't forget to reset VTX_TMP bufctx slot after blit completion
Also release the scratch allocation if any.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 109c348284)
2016-01-08 12:05:26 +02:00
Nicolai Hähnle
a8a7af68e8 gallium/radeon: fix regression in a number of driver queries
This rather silly mistake was introduced by commit 01910676.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit ea8c0b16ec)
2016-01-08 12:05:26 +02:00
Ilia Mirkin
56318a9899 glx/dri3: a drawable might not be bound at wait time
A trace of Alien Isolation hit this on nouveau.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f7b7145123)
2016-01-08 12:05:26 +02:00
Kenneth Graunke
28680b36e8 ralloc: Fix ralloc_adopt() to the old context's last child's parent.
I was cleverly using one iteration to obtain a pointer to the last item
in ralloc's singly list child list, while also setting parents.

Unfortunately, I forgot to set the parent on that last item.

Cc: "11.1 11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 14193e4643)
2016-01-08 12:05:26 +02:00
Eric Anholt
96e931fea1 vc4: Keep sample mask writes from being reordered after TLB writes
Fixes a regression I noticed after introducing scheduling on the QIR.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 960f48809f)
2016-01-08 12:05:26 +02:00
Rob Herring
48b580d1cc freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabled
Android builds with -Werror=pointer-to-int-cast causing an error on 32-bit
builds.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
(cherry picked from commit b201a6ed9f)
2016-01-08 12:05:26 +02:00
Nicolai Hähnle
d4d2315d65 gallium/radeon: only dispose locally created target machine in radeon_llvm_compile
Unify the cleanup paths of the function rather than duplicating code.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 0a6a17b9d7)
2016-01-08 12:05:26 +02:00
Miklós Máté
ca30800dfd mesa: Don't leak ATIfs instructions in DeleteFragmentShader
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7279453da5)
2016-01-08 12:05:25 +02:00
Oded Gabbay
04f01a844a configura.ac: fix test for SSE4.1 assembler support
This patch modifies the SSE4.1 test in configure.ac to use a global
variable to initialize vector variables. In addition, we now return the
value of the computation instead of 0.

This is done so gcc 4.9 (and lower) won't optimize the SSE4.1 assembly
instructions (when using -O1 and higher), because then the configure test
might incorrectly pass even though the assembler doesn't support the
SSE4.1 instructions (the test will pass because the compiler does support the intrinsics).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91806
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 6e44bbe0f5)
2016-01-08 12:05:25 +02:00
Jonathan Gray
6bb88aa04b configure: check for python2.7 for PYTHON2
Check for a 'python2.7' binary, 'python' and 'python2' are not
provided by the OpenBSD python 2.7.x packages.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 4ef44bb484)
2016-01-08 12:05:25 +02:00
Jonathan Gray
a1843802f8 configure.ac: use pkg-config for libelf
Use PKG_CHECK_MODULES to get the flags to link libelf

v2: keep AC_CHECK_LIB as a fallback for elfutils provided
libelf that doesn't install a pkg-config file.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 7f585a6a98)
2016-01-08 12:05:25 +02:00
Samuel Pitoiset
259734ae3f nv50: free memory allocated by the prog which reads MP perf counters
This fixes a memory leak introduced in 6a9c151
("nv50: add compute-related MP perf counters on G84+")

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 695ae816da)
2016-01-08 12:05:25 +02:00
Samuel Pitoiset
10ec6685d0 nv50,nvc0: free memory allocated by performance metrics
The destroy_query() helper was actually never called. This fixes
a memory leak while monitoring performance metrics.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit aeee7f2a4d)
2016-01-08 12:05:25 +02:00
Samuel Pitoiset
89d585357a nvc0: free memory allocated by the prog which reads MP perf counters
This fixes a long time ago memory leak (even before all my query
related changes).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9aca60bfb0)
2016-01-08 12:05:25 +02:00
Neil Roberts
bdf289e7fb i965: Fix crash when calling glViewport with no surface bound
If EGL_KHR_surfaceless_context is used then glViewport can be called
with NULL for the draw and read surfaces. This was previously causing
a crash because the i965 driver tries to use this point to invalidate
the surfaces and it was derferencing the NULL pointer.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93257
Cc: Nanley Chery <nanley.g.chery@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit 8c5310da9d)
2016-01-08 12:05:25 +02:00
Eric Anholt
ab0b96a939 vc4: Warn instead of abort()ing on exec ioctl failures.
It's really harsh to abort() the X Server because of a momentary failure
(particularly -ENOMEM).  I don't see a way to pass an -ENOMEM up the stack
from here, but we can at least log to stderr before proceeding on.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 02bcb443ee)
2016-01-08 12:05:25 +02:00
Ian Romanick
d4474a96df meta/generate_mipmap: Work-around GLES 1.x problem with GL_DRAW_FRAMEBUFFER
GL_DRAW_FRAMEBUFFER does not exist in OpenGL ES 1.x, and since
_mesa_meta_begin hasn't been called yet, we have to work-around API
difficulties.  The whole reason that GL_DRAW_FRAMEBUFFER is used instead
of GL_FRAMEBUFFER is that the read framebuffer may be different.  This
is moot in OpenGL ES 1.x.

I have another patch series that would also fix this (by removing the
calls to _mesa_BindFramebuffer and friends), but it's not quite ready
yet... and I think it may be a bit heavy for some stable branches.
Consider this a stop-gap fix.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93215
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 96dc732ed8)
2016-01-08 12:05:25 +02:00
Ilia Mirkin
66e0a26b56 glsl: assign varying locations to tess shaders when doing SSO
GRID Autosport uses SSO shaders. When a tessellation evaluation shader
is passed through this, it triggers assertion failures down the line
with unassigned varying locations. Make sure to do this when the first
shader in the pipeline is not a vertex shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit eca8f38dcf)
2016-01-08 12:05:25 +02:00
Emil Velikov
242146977f cherry-ignore: don't pick a specific i965 formats patch
commit 839793680f "MESA_FORMAT_B8G8R8X8_SRGB for RGB visuals" causes a
handfull of regressions, some of which listed in fdo#92759.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-08 12:05:01 +02:00
Neil Roberts
a98bc49997 i965: Add B8G8R8X8_SRGB to the alpha format override
brw_init_surface_formats overrides the render format for RGBX formats
which aren't supported for rendering so that they internally use RGBA
instead. However, B8G8R8X8_SRGB was missing so it wasn't marked as a
renderable format. This patch just adds it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 43f4be5f06)
2016-01-08 11:54:09 +02:00
Neil Roberts
0193dc6a42 i965: Add MESA_FORMAT_B8G8R8X8_SRGB to brw_format_for_mesa_format
This will be used in a subsequent patch as the format for RGB visuals.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c769efda93)
2016-01-08 11:54:09 +02:00
Ilia Mirkin
55d9c80d21 gk104/ir: simplify and fool-proof texbar algorithm
With the current algorithm, we only look at tex uses. However there's a
write-after-write hazard where we might decide to, on some path, not use
a texture's output at all, but instead to write a different value to
that register. However without the barrier, the texture might complete
later and overwrite that value.

This fixes Unreal Elemental demo on GK110/GK208, flightgear on GK10x,
and likely other random-looking failures.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7752bbc44e)
2016-01-08 11:54:09 +02:00
Ilia Mirkin
070dbfa810 nv50/ir: can't have predication and immediates
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6aca7fecb7)
2016-01-08 11:54:09 +02:00
Marek Olšák
ceb00fb1b9 gallium/radeon: fix Hyper-Z hangs by programming PA_SC_MODE_CNTL_1 correctly
This is the recommended setting according to hw people and it makes Hyper-Z
stable. Just the two magic states.

This fixes Evergreen, Cayman, SI, CI, VI (using the Cayman code).

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d3c08309ab)
2016-01-08 11:54:09 +02:00
Marek Olšák
408fcfedee radeonsi: apply the streamout workaround to Fiji as well
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 787ada6bf6)
2016-01-08 11:54:08 +02:00
Marek Olšák
f9dd8468b1 radeonsi: don't call of u_prims_for_vertices for patches and rectangles
Both caused a crash due to a division by zero in that function.
This is an alternative fix.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
(cherry picked from commit 0f9519b938)
2016-01-08 11:54:08 +02:00
Marek Olšák
2b526fc2d9 r600g: write all MRTs only if there is exactly one output (fixes a hang)
This fixes a hang in
piglit/arb_blend_func_extended-fbo-extended-blend-pattern_gles2 on REDWOOD.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b5b87c4ed1)
2016-01-08 11:54:08 +02:00
Marek Olšák
4974996545 tgsi/scan: add flag colors_written
This is a prerequisite for the following r600g fix.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit eb4813a952)
2016-01-08 11:54:08 +02:00
Emil Velikov
a1ca42a9ee cherry-ignore: drop the "re-enable" DCC on Stoney
As per Marek's request of the mesa-stable ML.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-08 11:53:00 +02:00
Dave Airlie
fce9699bca mesa/shader: return correct attribute location for double matrix arrays
If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19,
dmat2[2] at 21 etc. The old code was returning 17,18,19.

I think this code is also wrong for float matricies as well.

There is now a piglit for the float case.

This partly fixes:
GL41-CTS.vertex_attrib_64bit.limits_test

[airlied: update with Tapani suggestion to clean it up].

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 18ad641c3b)
2016-01-07 12:41:23 +02:00
Patrick Rudolph
86100d4ca9 gallium/util: return correct number of bound vertex buffers
In case a state tracker unbinds every slot by a seperate
pipe->set_vertex_buffers() call, starting from slot zero, the number
of bound buffers would not reach zero at all.
The current algorithm does not account for pre-existing holes in the
buffer list.

Unbinding all buffers at once or starting at the top-most slot results
in correct behaviour.

Calculating the correct number of bound buffers fixes a NULL pointer
dereference in nvc0_validate_vertex_buffers_shared().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 79bff488bc)
2016-01-07 12:40:14 +02:00
Dave Airlie
c054b2dd33 mesa/varray: set double arrays to non-normalised.
Doesn't have any effect in practice I don't think, but
CTS reads back using GetVertexAttrib.

This fixes: GL41-CTS.vertex_attrib_64bit.get_vertex_attrib

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 21abaad8fe)
2016-01-07 12:39:37 +02:00
Patrick Rudolph
47fa06c839 nv50,nvc0: fix use-after-free when vertex buffers are unbound
Always reset the vertex bufctx to make sure there's no pointer to
an already freed pipe_resource left after unbinding buffers.
Fixes use after free crash in nvc0_bufctx_fence().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
[imirkin: simplify nvc0 fix, apply to nv50]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>

(cherry picked from commit 432a798cf5)
2016-01-07 12:25:19 +02:00
Emil Velikov
525f3c2c28 docs: add sha256 checksums for 11.0.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-15 14:59:56 +00:00
Emil Velikov
5a616125ac docs: Update 11.1.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-15 14:49:25 +00:00
Emil Velikov
a8b2698494 Update version to 11.1.0(final)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-14 12:20:18 +00:00
Francisco Jerez
7753691f1a i965: Resolve color and flush for all active shader images in intel_update_state().
Fixes arb_shader_image_load_store/execution/load-from-cleared-image.shader_test.

Couldn't reproduce any significant FPS regression in CPU-bound
benchmarks from the Finnish benchmarking system on neither VLV nor BSW
after 30 runs with 95% confidence level.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92849
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit 595c818071)
2015-12-12 19:39:03 +00:00
Dave Airlie
ce914d941d radeonsi: handle loading doubles as geometry shader inputs.
This adds the double code to the geometry shader input handling.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e307cfa7d9)
2015-12-12 19:39:03 +00:00
Dave Airlie
300f807649 radeonsi: handle doubles in lds load path.
This handles loading doubles from LDS properly.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.fedoraproject.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8c9e40ac22)
2015-12-12 19:39:03 +00:00
Dave Airlie
61a275b789 r600: handle geometry dynamic input array index
This fixes:
glsl-1.50/execution/geometry/dynamic_input_array_index.shader_test
my profanity.

We need to load the AR register with the value from the index reg

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cce3864046)
2015-12-12 19:39:03 +00:00
Dave Airlie
0f3892ed9d r600g: fix geom shader input indirect indexing.
This fixes:
gs-input-array-vec4-index-rd

The others run out of gprs unfortunately.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 38542921c7)
2015-12-12 19:39:03 +00:00
Dave Airlie
3d942ee4e5 r600/shader: add utility functions to do single slot arithmatic
These utilities are to be used to do things like integer adds and
multiplies to be used in calculating the LDS offsets etc.

It handles CAYMAN MULLO differences as well.

Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 0696ebc899)
2015-12-12 19:39:03 +00:00
Dave Airlie
efdf841238 r600/shader: split address get out to a function.
This will be used in the tess shaders.

Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4d64459a92)
2015-12-12 19:39:02 +00:00
Dave Airlie
5913a8c9ec r600g: fix outputing to non-0 buffers for stream 0.
This fixes:
arb_transform_feedback3-ext_interleaved_two_bufs_gs
arb_transform_feedback3-ext_interleaved_two_bufs_gs_max
transform-feedback-builtins

If we are only emitting one ring, then emit all output
buffers on it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e97ac006d7)
[Emil Velikov: squash trivial conflicts]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

Conflicts:
	src/gallium/drivers/r600/r600_shader.c
2015-12-12 19:39:02 +00:00
Ilia Mirkin
3c9e76fc24 nv50/ir: fix cutoff for using r63 vs r127 when replacing zero
The only effect here is a space savings - 822 programs in shader-db
affected with the following overall change:

total bytes used in shared programs   : 44154976 -> 44139880 (-0.03%)

Fixes: 641eda0c (nv50/ir: r63 is only 0 if we are using less than 63 registers)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f920f8eb02)
2015-12-12 19:39:02 +00:00
Matt Turner
67b1e7b947 glsl: Relax qualifier ordering restriction in ES 3.1.
... and allow the "binding" qualifier in ES 3.1 as well.

GLSL ES 3.1 incorporates only a few features from the extension
ARB_shading_language_420pack: the relaxed qualifier ordering
requirements and the binding qualifier.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit eca846e7ae)
2015-12-12 19:39:02 +00:00
Matt Turner
0586c5844f glsl: Use has_420pack().
These features would not have been enabled with #version 420 otherwise.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 79da7220db)
2015-12-12 19:39:02 +00:00
Matt Turner
7d226ee279 glsl: Allow binding of image variables with 420pack.
This interaction was missed in the addition of ARB_image_load_store.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93266
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit c200e606f7)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
36ff210d0e i965/nir: Remove unused indirect handling
The one and only place where the FS backend allows reladdr is on uniforms.
For locals, inputs, and outputs, we lower it away before the backend ever
sees it.  This commit gets rid of the dead indirect handling code.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 22c273de2b)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
017f4755fd i965/state: Get rid of dword_pitch arguments to buffer functions
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit abb569ca18)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
61cb4db868 i965/vec4: Use a stride of 1 and byte offsets for UBOs
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92909
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 05bdc21f84)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
34785fb7b9 i965/fs: Use a stride of 1 and byte offsets for UBOs
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 13ad8d03f2)
2015-12-12 19:39:02 +00:00
Jason Ekstrand
22d6bf5078 i965/vec4: Use byte offsets for UBO pulls on Sandy Bridge
Previously, the VS_OPCODE_PULL_CONSTANT_LOAD opcode operated on
vec4-aligned byte offsets on Iron Lake and below and worked in terms of
vec4 offsets on Sandy Bridge.  On Ivy Bridge, we add a new *LOAD_GEN7
variant which works in terms of vec4s.  We're about to change the GEN7
version to work in terms of bytes, so this is a nice unification.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e3e70698c3)
2015-12-12 19:39:02 +00:00
Nicolai Hähnle
9908d19699 radeonsi: last_gfx_fence is a winsys fence
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit d5a5dbd71f)
2015-12-12 19:39:02 +00:00
Ilia Mirkin
a500109aad gk110/ir: fix imad sat/hi flag emission for immediate args
According to nvdisasm both the immediate and non-imm cases use the same
bits. Both of these flags are quite rarely set though.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 1d708aacb7)
2015-12-12 19:39:01 +00:00
Ilia Mirkin
0e78a67709 gk104/ir: sampler doesn't matter for txf
We actually leave the sampler unset for OP_TXF, which caused the GK104+
logic to treat some texel fetches as indirect. While this works, it's
incredibly wasteful. This only happened when the texture was > 0 (since
sampler remained == 0).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 63b850403c)
2015-12-12 19:39:01 +00:00
Marek Olšák
4bb16d712a radeonsi: disable DCC on Stoney
Cc: 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 32f05fadbb)
2015-12-12 19:39:01 +00:00
Christian König
950e9886d0 st/va: disable MPEG4 by default v2
The workarounds are too hacky to enable them by default
and otherwise MPEG4 doesn't work reliably.

v2: add docs/envvars.html, CC stable and fix typos

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Cc: "11.1.0" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a2c5200a4b)
2015-12-12 19:39:01 +00:00
Ilia Mirkin
dff89432d8 gk110/ir: fix imul hi emission with limm arg
The elemental demo hits this case.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit db072d2086)
2015-12-12 19:39:01 +00:00
Timothy Arceri
499d409a20 mesa: move pipeline input/output validation inside _mesa_validate_program_pipeline()
This allows validation to be done on rendering calls also.

Fixes 3 dEQP-GLES31.functional.separate tests.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 4dd096d741)
2015-12-12 19:39:01 +00:00
Timothy Arceri
a16f5195ef glsl: don't generate extra errors in ValidateProgramPipeline
From Section 11.1.3.11 (Validation) of the GLES 3.1 spec:

   "An INVALID_OPERATION error is generated by any command that trans-
   fers vertices to the GL or launches compute work if the current set
   of active program objects cannot be executed, for reasons including:"

It then goes on to list the rules we validate in the
_mesa_validate_program_pipeline() function.

For ValidateProgramPipeline the only mention of generating an error is:

   "An INVALID_OPERATION error is generated if pipeline is not a name re-
   turned from a previous call to GenProgramPipelines or if such a name has
   since been deleted by DeleteProgramPipelines,"

Which we handle separately.

This fixes:
ES31-CTS.sepshaderobjs.PipelineApi

No regressions on the eEQP 3.1 tests.

Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit c3ec12ec3c)
Nominated-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-12 19:39:01 +00:00
Timothy Arceri
f65b790089 glsl: re-validate program pipeline after sampler change
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
https://bugs.freedesktop.org/show_bug.cgi?id=93180
(cherry picked from commit da1a01361b)
2015-12-12 19:39:01 +00:00
Gregory Hainaut
aa19234943 glsl: don't sort varying in separate shader mode
This fixes an issue where the addition of the FLAT qualifier in
varying_matches::record() can break the expected varying order.

It also avoids a future issue with the relaxing of interpolation
qualifier matching constraints in GLSL 4.50.

V2: (by Timothy Arceri)
* reworked comment slightly

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 2ab9cd0c4d)
Nominated-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-12 19:39:01 +00:00
Gregory Hainaut
66f216d8ce glsl: don't dead code remove SSO varyings marked as active
GL_ARB_separate_shader_objects allow matching by name variable or block
interface. Input varyings can't be removed because it is will impact the
location assignment.

This fixes the bug 79783 and likely any application that uses
GL_ARB_separate_shader_objects extension.

V2 (by Timothy Arceri):
* simplify now that builtins are not set as always active

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=79783
(cherry picked from commit 8117f46f49)
Nominated-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-12 19:39:01 +00:00
Gregory Hainaut
4d34038ae5 glsl: add always_active_io attribute to ir_variable
The value will be set in separate-shader program when an input/output
must remains active. e.g. when deadcode removal isn't allowed because
it will create interface location/name-matching mismatch.

v3:
* Rename the attribute
* Use ir_variable directly instead of ir_variable_refcount_visitor
* Move the foreach IR code in the linker file

v4:
* Fix variable name in assert

v5 (by Timothy Arceri):
* Rename functions and reword comments
* Don't set always active on builtins

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 618612f867)
Nominated-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-12 19:39:01 +00:00
Timothy Arceri
781a68555d glsl: copy how_declared when lowering interface blocks
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 76c09c1792)
2015-12-12 19:39:01 +00:00
Marek Olšák
e0b11bcc87 radeonsi: fix occlusion queries on Fiji
Tested.

(cherry picked from commit bfc14796b0)
2015-12-12 19:39:01 +00:00
Matt Turner
359679cb33 i965: Pass brw_context pointer, not gl_context pointer.
Fixes a warning introduced by commit dcadd855.

(cherry picked from commit f1b7fefd4e)
2015-12-12 19:39:00 +00:00
Marta Lofstedt
fcf6091521 gles2: Update gl2ext.h to revision: 32120
This is needed to be able to implement the accepted OES
extensions.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 1d5b88e33b)
2015-12-12 19:38:39 +00:00
Emil Velikov
aa5082b135 Revert "cherry-ignore: ignore unneeded header update"
This reverts commit 79f3aaca4f.

The commit (header update) was not needed for the 11.0 branch as opposed
to this one (11.1)
2015-12-12 19:38:39 +00:00
Eric Anholt
1df00e17d3 vc4: When doing algebraic optimization into a MOV, use the right MOV.
If there were src unpacks, changing to the integer MOV instead of float
(for example) would change the unpack operation.

(cherry picked from commit e3efc4b023)
2015-12-11 17:04:11 -08:00
Eric Anholt
ad3df9d168 vc4: Fix handling of src packs on in qir_follow_movs().
The caller isn't going to expect it from a return, so it would probably
get misinterpreted.  If the caller had an unpack in its reg, that's fine,
but don't lose track of it.

(cherry picked from commit 2591beef89)
2015-12-11 17:04:08 -08:00
Eric Anholt
e4cf550501 vc4: Add missing progress note in opt_algebraic.
(cherry picked from commit b70a2f4d81)
2015-12-11 17:04:00 -08:00
Eric Anholt
ecf2885d7f vc4: Fix handling of sample_mask output.
I apparently broke this in a late refactor, in such a way that I decided
its tests were some of those interminable ones that I should just
blacklist from my testing.  As a result, the refactors related to it were
totally wrong.

(cherry picked from commit 53b2523c6e)
2015-12-11 17:03:51 -08:00
Eric Anholt
fc59ca4064 vc4: Enable MSAA.
We still have several failures in the newly enabled tests in simulation:
sRGB downsampling is done as if it was just linear, stencil blits are not
supported on MSAA either, and derivatives are still not supported
(breaking some MSAA simulation shaders).  So, other than sRGB downsampling
quality, things seem to be in good shape.

(cherry picked from commit f61ceeb3fd)
2015-12-11 17:03:44 -08:00
Eric Anholt
396fbdc721 vc4: Add support for mapping of MSAA resources.
The pipe_transfer_map API requires that we do an implicit
downsample/upsample and return a mapping of that.

(cherry picked from commit fc4a1bfb88)
2015-12-11 17:03:40 -08:00
Eric Anholt
50ac2100df vc4: Add support for texel fetches from MSAA resources.
This is the core of ARB_texture_multisample.  Most of the piglit tests for
GL_ARB_texture_multisample require GL 3.0, but exposing support for this
lets us use the gallium blitter for multisample resolves.  We can
sometimes multisample resolve using just the RCL, but that requires that
the blit is 1:1, unflipped, and aligned to tile boundaries.

(cherry picked from commit 6b4dfd53ae)
2015-12-11 17:03:36 -08:00
Eric Anholt
08cf0f8529 vc4: Add support for multisample framebuffer operations.
This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and
GL_SAMPLE_ALPHA_TO_COVAGE.

I haven't implemented a dithering function yet, and gallium doesn't give
me a good chance to do so for GL_SAMPLE_COVERAGE.

(cherry picked from commit a97b40dca4)
2015-12-11 17:03:31 -08:00
Eric Anholt
ba51596b1d vc4: Add a workaround for HW-2905, and additional failure I saw with MSAA.
I only stumbled on this while experimenting due to reading about HW-2905.
I don't know if the EZ disable in the Z-clear is actually necessary, but
go with it for now.

(cherry picked from commit edc3305de7)
2015-12-11 17:03:03 -08:00
Eric Anholt
3d13bb8851 vc4: Add support for drawing in MSAA.
(cherry picked from commit edfd4d853a)
2015-12-11 17:03:03 -08:00
Eric Anholt
3bf2c6b96a vc4: Add kernel RCL support for MSAA rendering.
(cherry picked from commit e7c8ad0a6c)
2015-12-11 17:03:03 -08:00
Eric Anholt
5ab1bb4bec vc4: Rename color_ms_write to color_write.
I was thinking this was the only MSAA resolve thing, so it should be noted
separately, but actually load/store general also do MSAA resolve.

(cherry picked from commit 568d3a8e32)
2015-12-11 17:03:03 -08:00
Eric Anholt
c5ca18ec2f vc4: Allow RCL blits to the edge of the surface.
The recent unaligned fix successfully prevented RCL blits that weren't
aligned inside of the surface, but we also want to be able to do RCL blits
for the whole surface when the width or height of the surface aren't
aligned (we don't care what renders inside of the padding).

(cherry picked from commit bf92017ace)
2015-12-11 17:03:03 -08:00
Eric Anholt
f6cca7a0c9 vc4: Fix check for tile RCL blits with mismatched y.
This was a typo in 3a508a0d94 that didn't
show up in testcases at that moment.

(cherry picked from commit 2792d118f1)
2015-12-11 17:03:03 -08:00
Eric Anholt
ae649bf1ad vc4: Fix compiler warning from size_t change.
I missed this when bringing over the kernel changes.

(cherry picked from commit 1529f138ff)
2015-12-11 17:03:03 -08:00
Eric Anholt
132303cfe4 vc4: Fix accidental scissoring when scissor is disabled.
Even if the rasterizer has scissor disabled, we'll have whatever
vc4->scissor bounds were last set when someone set up a scissor, so we
shouldn't clip to them in that case.

Fixes piglit fbo-blit-rect, and a lot of MSAA tests once they're enabled.

(cherry picked from commit a4eff86f4a)
2015-12-11 17:03:03 -08:00
Eric Anholt
9df2431194 vc4: Disable RCL blitting when scissors are enabled.
We could potentially handle scissored blits when they're tile aligned, but
it doesn't seem worth it.  If you're doing a scissored blit, you're
probably a testcase.

Fixes piglit's fbo-scissor-blit fbo

(cherry picked from commit d16d666776)
2015-12-11 17:03:03 -08:00
Eric Anholt
dd409e2a41 vc4: Bring over cleanups from submitting to the kernel.
(cherry picked from commit 0afe83078d)
2015-12-11 17:03:03 -08:00
Eric Anholt
38c770ec29 vc4: Add debug dumping of MSAA surfaces.
(cherry picked from commit a69ac4e89c)
2015-12-11 17:03:03 -08:00
Eric Anholt
d8450616d9 vc4: Add support for laying out MSAA resources.
For MSAA, we store full resolution tile buffer contents, which have their
own tiling format.  Since they're full resolution buffers, we have to
align their size to full tiles.

(cherry picked from commit 3c3b1184eb)
2015-12-11 17:03:02 -08:00
Eric Anholt
c9fe9e4b42 vc4: Add support for storing sample mask.
From the API perspective, writing 1 bits can't turn on pixels that were
off, so we AND it with the sample mask from the payload.

(cherry picked from commit 74c4b3b80c)
2015-12-11 17:03:02 -08:00
Eric Anholt
693e938321 vc4: Fix up tile alignment checks for blitting using just an RCL.
We were checking that the blit started at 0 and was 1:1, but not that it
went to the full width of the surface, or that the width was aligned to a
tile.  We then told it to blit to the full width/height of the surface,
causing contents to be stomped in a bunch of MSAA tests that happen to
include half-screen-width blits to 0,0.

(cherry picked from commit 3a508a0d94)
2015-12-11 17:03:02 -08:00
Eric Anholt
7a0661839b vc4: Add support for loading sample mask.
(cherry picked from commit a664233042)
2015-12-11 17:03:02 -08:00
Eric Anholt
4c234d183b vc4: Use nir_channel() to simplify all of our nir_swizzle() cases.
(cherry picked from commit 4cff16bc3a)
2015-12-11 17:03:02 -08:00
Eric Anholt
b37189523e vc4: Fix point size lookup.
I think I may have regressed this in the NIR conversion.  TGSI-to-NIR is
putting the PSIZ in the .x channel, not .w, so we were grabbing some
garbage for point size, which ended up meaning just not drawing points.

Fixes glean pointAtten and pointsprite.

(cherry picked from commit 81544f231a)
2015-12-11 16:57:39 -08:00
Emil Velikov
20db46c227 Update version to 11.1.0-rc3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-07 13:50:15 +00:00
Michel Dänzer
b2a5efb56f radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now
llvm.exp2.f32 doesn't work in some cases yet.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit d094631936)
2015-12-04 16:37:19 +00:00
Connor Abbott
38c645b60a i965: fix 64-bit immediates in brw_inst(_set)_bits
If we tried to get/set something that was exactly 64 bits, we would
try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
1's like we want.

v2 (Iago)
 - Replace ~0 by ~0ull
 - Removed unnecessary parenthesis

v3 (Kristian)
 - Avoid the conditional

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
(cherry picked from commit b1a83b5d1b)

Squashed with commit

i965: Use ull immediates in brw_inst_bits

This fixes a regression introduced in b1a83b5d1 that caused basically all
shaders to fail to compile on 32-bit platforms.

Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 9d703de85a)
Nominated-by: Ian Romanick <ian.d.romanick@intel.com>
2015-12-04 16:37:07 +00:00
Emil Velikov
2dff4c6fa7 mesa: rework the meaning of gl_debug_message::length
Currently it stores strlen(buf) whenever the user originally provided a
negative value for length.

Although I've not seen any explicit text in the spec, CTS requires that
the very same length (be that negative value or not) is returned back on
Pop.

So let's push down the length < 0 checks, tweak the meaning of
gl_debug_message::length and fix GetDebugMessageLog to add and count the
null terminators, as required by the spec.

v2: return correct total length in GetDebugMessageLog
v3: rebase (drop _mesa_shader_debug hunk).

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 5a23f6bd8d)
2015-12-04 16:37:07 +00:00
Emil Velikov
d81ddb3ed8 mesa: errors: validate the length of null terminated string
We're about to rework the meaning of gl_debug_message::length to only
store the user provided data. Thus we should add an explicit validation
for null terminated strings.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 622186fbdf)
2015-12-04 16:37:07 +00:00
Emil Velikov
c25c1dbf51 mesa: accept TYPE_PUSH/POP_GROUP with glDebugMessageInsert
These new (relative to ARB_debug_output) tokens, have been explicitly
separated from the existing ones in the spec text. With the reference
to glDebugMessageInsert was dropped.

At the same time, further down the spec says:
   "The value of <type> must be one of the values from Table 5.4"

... and these two are listed in Table 5.4.

The GL 4.3 and GLES 3.2 do not give any hints on the former
'definition', plus CTS requires that the tokens are valid values for
glDebugMessageInsert.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 66fea8bd96)
2015-12-04 16:37:07 +00:00
Emil Velikov
bed982c4b7 mesa: add SEVERITY_NOTIFICATION to default state
As per the spec quote:

    "All messages are initially enabled unless their assigned severity
    is DEBUG_SEVERITY_LOW"

We already had MEDIUM and HIGH set, let's toggle NOTIFICATION as well.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 53be28107b)
2015-12-04 16:37:06 +00:00
Emil Velikov
dcaf3989d1 mesa: return the correct value for GroupStackDepth
We already have one group (the default) as specified in the spec. So
lets return its size, rather than the index of the current group.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 078dd6a0b4)
2015-12-04 16:37:06 +00:00
Emil Velikov
996a4958da mesa: rename GroupStackDepth to CurrentGroup
The variable is used as the actual index, rather than the size of the
group stack - rename it to reflect that.

Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit f39954bf7c)
2015-12-04 16:37:06 +00:00
Emil Velikov
0cf5a8159f mesa: do not enable KHR_debug for ES 1.0
The extension requires (cough implements) GetPointervKHR (alias of
GetPointerv) which in itself is available for ES 1.1 enabled mesa.

Anyone willing to fish around and implement it for ES 1.0 is more than
welcome to revert this commit. Until then lets restrict things.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93048
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit 1ca735701b)
2015-12-04 16:37:06 +00:00
Emil Velikov
6cc9a53d84 glapi: add GetPointervKHR to the ES dispatch
The KHR_debug extension implements this.

Strictly speaking it could be used with ES 1.0, although as the original
function is available on ES 1.1, I'm inclined to lift the KHR_debug
requirement to ES 1.1.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93048
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit f53f9eb8d4)

Squashed with commit

mesa/tests: add KHR_debug GLES glGetPointervKHR entry points

Should have been part of commit f53f9eb8d4 "glapi: add GetPointervKHR
to the ES dispatch".

v2: comment out the ES1.1 symbol and use the same description (pattern)
as elsewhere (Matt)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93235
Fixes: f53f9eb8d4 "glapi: add GetPointervKHR to the ES dispatch".
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Vinson Lee <vlee@freedesktop.org> (v1)
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 1074e38fbb)
2015-12-04 16:36:45 +00:00
Emil Velikov
0a51e77fa1 mesa: remove len argument from _mesa_shader_debug()
There was only a single user which was using strlen(buf).
As this function is not user facing (i.e. we don't need to feed back
original length via a callback), we can simplify things.

Suggested-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
(cherry picked from commit d37ebed470)
2015-12-04 16:36:45 +00:00
Ilia Mirkin
ca6d0a3dbe nv50/ir: avoid looking at uninitialized srcMods entries
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2b98914fe0)
2015-12-04 16:36:45 +00:00
Ilia Mirkin
4ae9142f8b nv50/ir: fix DCE to not generate 96-bit loads
A situation where there's a 128-bit load where the last component gets
DCE'd causes a 96-bit load to be generated, which no GPU can actually
emit. Avoid generating such instructions by scaling back to 64-bit on
the first load when splitting.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 49692f86a1)
2015-12-04 16:36:45 +00:00
Marek Olšák
aff9f8a6f7 radeonsi: fix Fiji for LLVM <= 3.7
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dd27825c8c)
2015-12-04 16:36:45 +00:00
Nanley Chery
b0b163c82a mesa/version: Update gl_extensions::Version during version override
Commit a16ffb743c, which introduced
gl_extensions::Version, updates the field when the context version
is computed and when entering/exiting meta. Update this field when
the version is overridden as well.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
(cherry picked from commit 808e752796)
2015-12-04 16:36:45 +00:00
Tapani Pälli
f70574c835 i965: use _Shader to get fragment program when updating surface state
Atomic counters and Images were using ctx::Shader that does not take in
to account program pipeline changes, ctx::_Shader must be used for SSO to
work. Commit c0347705 already changed ubo's to use this.

Fixes failures seen with following Piglit test:
	arb_separate_shader_object-atomic-counter

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 231db5869c)
2015-12-04 16:36:45 +00:00
Ilia Mirkin
26dff8a7bb nv50/ir: don't forget to mark flagsDef on cvt in txb lowering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 101e315cc1)
2015-12-04 16:36:45 +00:00
Ilia Mirkin
ea21336d15 nv50/ir: fix instruction permutation logic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 06055121e6)
2015-12-04 16:36:44 +00:00
Ilia Mirkin
7f6e9c5f59 nv50/ir: the mad source might not have a defining instruction
For example if it's $r63 (aka 0), there won't be a definition.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 11fcf46590)
2015-12-04 16:36:44 +00:00
Ilia Mirkin
0828391a34 nv50/ir: deal with loops with no breaks
For example if there are only returns, the break bb will not end up part
of the CFG. However there will have been a prebreak already emitted for
it, and when hitting the RET that comes after, we will try to insert the
current (i.e. break) BB into the graph even though it will be
unreachable. This makes the SSA code sad.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit adcc547bfb)
2015-12-04 16:36:44 +00:00
Ilia Mirkin
75b6f14ab8 nvc0/ir: fold postfactor into immediate
SM20-SM50 can't emit a post-factor in the presence of a long immediate.
Make sure to fold it in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ff61ac4838)
2015-12-04 16:36:44 +00:00
Roland Scheidegger
69df6ac272 mesa: fix VIEWPORT_INDEX_PROVOKING_VERTEX and LAYER_PROVOKING_VERTEX queries
These are implementation-dependent queries, but so far we just returned the
value of whatever the current provoking vertex convention was set to, which
was clearly wrong.
Just make this a variable in the context constants like for other things
which are implementation dependent (I assume all drivers will want to set
this to the same value for both queries), and set it to GL_UNDEFINED_VERTEX
which is correct for everybody (and drivers can override it).

Reviewed-by: Brian Paul <brianp@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 09f74e6ef4)
2015-12-04 16:36:44 +00:00
Dave Airlie
0f53b2010c r600: SMX returns CONTEXT_DONE early workaround
streamout, gs rings bug on certain r600s, requires a wait idle
before each surface sync.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit af4013d26b)
2015-12-04 16:36:44 +00:00
Dave Airlie
67be605b96 r600: do SQ flush ES ring rolling workaround
Need to insert a SQ_NON_EVENT when ever geometry
shaders are enabled.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b63944e8b9)
2015-12-04 16:36:44 +00:00
Tom Stellard
be20f1d7c1 clover: Handle NULL devices returned by pipe_loader_probe() v2
When probing for devices, clover will call pipe_loader_probe() twice.
The first time to retrieve the number of devices, and then second time
to retrieve the device structures.

We currently assume that the return value of both calls will be the
same, but this will not be the case if a device happens to disappear
between the two calls.

When a device disappears, the pipe_loader_probe() will add a NULL
device to the device list, so we need to handle this.

v2:
  - Keep range for loop

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>

CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9adbb9e713)
2015-12-04 16:36:44 +00:00
Jonathan Gray
15344c978b automake: fix some occurrences of hardcoded -ldl and -lpthread
Correct some occurrences of -ldl and -lpthread to use
$(DLOPEN_LIBS) and $(PTHREAD_LIBS) respectively.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 99cd600835)
2015-12-04 16:36:44 +00:00
Dave Airlie
f1bb27acc5 r600: workaround empty geom shader.
We need to emit at least one cut/emit in every
geometry shader, the easiest workaround it to
stick a single CUT at the top of each geom shader.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 4f34722575)
2015-12-04 16:36:44 +00:00
Dave Airlie
dd37db0c80 r600: rv670 use at least 16es/gs threads
This is specified in the docs for rv670 to work properly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 04efcc6c7a)
2015-12-04 16:36:43 +00:00
Dave Airlie
8e3fbb90a9 r600: geometry shader gsvs itemsize workaround
On some chips the GSVS itemsize needs to be aligned to a cacheline size.

This only applies to some of the r600 family chips.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 8168dfdd4e)
2015-12-04 16:36:43 +00:00
Emil Velikov
79f3aaca4f cherry-ignore: ignore unneeded header update
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-12-04 16:36:35 +00:00
Julien Isorce
f9a2bd212a vl/buffers: fixes vl_video_buffer_formats for RGBX
Fixes: 42a5e143a8 "vl/buffers: add RGBX and BGRX to the supported formats"
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 10c14919c8)
2015-12-04 16:33:10 +00:00
Emil Velikov
aefd6769e8 Update version to 11.1.0-rc2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-30 00:13:23 +00:00
Neil Roberts
82a363b851 i965: Handle lum, intensity and missing components in the fast clear
It looks like the sampler hardware doesn't take into account the
surface format when sampling a cleared color after a fast clear has
been done. So for example if you clear a GL_RED surface to 1,1,1,1
then the sampling instructions will return 1,1,1,1 instead of 1,0,0,1.
This patch makes it override the color that is programmed in the
surface state in order to swizzle for luminance and intensity as well
as overriding the missing components.

Fixes the ext_framebuffer_multisample-fast-clear Piglit test.

v2: Handle luminance and intensity formats
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
(cherry picked from commit 2010de4015)
2015-11-30 00:13:23 +00:00
Nanley Chery
b3183c81c4 mesa/teximage: Fix S3TC regression due to ASTC interaction
A prior, literal reading of the ASTC spec led to the prohibition
of some compressed formats being used against the targets:
TEXTURE_CUBE_MAP_ARRAY and TEXTURE_3D. Since the spec does not specify
interactions with other extensions for specific compressed textures,
remove such interactions.

Fixes the following Piglit tests on Gen9:
piglit.spec.arb_direct_state_access.getcompressedtextureimage
piglit.spec.arb_get_texture_sub_image.arb_get_texture_sub_image-getcompressed
piglit.spec.arb_texture_cube_map_array.fbo-generatemipmap-cubemap array s3tc_dxt1
piglit.spec.ext_texture_compression_s3tc.getteximage-targets cube_array s3tc

v2. Don't interact with other specific compressed formats (Ian).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91927
Suggested-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit d1212abf50)
2015-11-30 00:13:23 +00:00
Nanley Chery
f5e508649d mesa/extensions: Enable overriding permanently enabled extensions
Provide the ability to prevent any permanently enabled extension
from appearing in the string returned by glGetString[i]().

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 21d43fe51a)
2015-11-30 00:13:23 +00:00
Leo Liu
31546c0e8f radeon/vce: disable Stoney VCE for 11.0
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-30 00:13:23 +00:00
Emil Velikov
6b149bedc3 auxiliary/vl/dri: fd management cleanups
Analogous to previous commit, minus the extra dup. We are the one
opening the device thus we can directly use the fd.

Spotted by Coverity (CID 1339867, 1339877)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 5d294d9fa3)
2015-11-30 00:13:23 +00:00
Emil Velikov
7a4ba7bfad auxiliary/vl/drm: fd management cleanups
Analogous to previous commit.

Spotted by Coverity (CID 1339868)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 151290c154)
2015-11-30 00:13:23 +00:00
Emil Velikov
ef6769f18f st/xa: fd management cleanups
Analogous to previous commit.

Spotted by Coverity (CID 1339866)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit fe71059388)
2015-11-30 00:13:23 +00:00
Emil Velikov
a71db1c46e st/dri: fd management cleanups
Add some checks if the original/dup'd fd is valid and ensure that we
don't leak it on error. The former is implicitly handled within the
pipe_loader, although let's make things explicit and check beforehand.

Spotted by Coverity (CID 1339865)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit d90ba57c08)
2015-11-30 00:13:23 +00:00
Emil Velikov
88cd21fefb pipe-loader: check if winsys.name is non-null prior to strcmp
In theory this wouldn't be an issue, as we'll find the correct name and
break out of the loop before we hit the sentinel.

Let's fix this and avoid issues in the future.

Spotted by Coverity (CID 1339869, 1339870, 1339871)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 5f92906b87)
2015-11-30 00:13:23 +00:00
Ilia Mirkin
97d4954f3f mesa: support GL_RED/GL_RG in ES2 contexts when driver support exists
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93126
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0396eaaf80)
2015-11-30 00:13:23 +00:00
Nicolai Hähnle
3d525c8650 radeon: only suspend queries on flush if they haven't been suspended yet
Non-timer queries are suspended during blits. When the blits end, the queries
are resumed, but this resume operation itself might run out of CS space and
trigger a flush. When this happens, we must prevent a duplicate suspend during
preflush suspend, and we must also prevent a duplicate resume when the CS flush
returns back to the original resume operation.

This fixes a regression that was introduced by:

commit 8a125afa6e
Author: Nicolai Hähnle <nhaehnle@gmail.com>
Date:   Wed Nov 18 18:40:22 2015 +0100

    radeon: ensure that timing/profiling queries are suspended on flush

    The queries_suspended_for_flush flag is redundant because suspended queries
    are not removed from their respective linked list.

    Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Reported-by: Axel Davy <axel.davy@ens.fr>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 9e5e702cfb)
2015-11-30 00:13:22 +00:00
Emil Velikov
9b9fff6830 targets: use the non-inline sw helpers
Previously (with the inline ones) things were embedded into the
pipe-loader, which means that we cannot control/select what we want in
each target.

That also meant that at runtime we ended up with the empty
sw_screen_create() as the GALLIUM_SOFTPIPE/LLVMPIPE were not set.

v2: Cover all the targets, not just dri.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Edward O'Callaghan <edward.ocallaghan@koparo.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
(cherry picked from commit 59cfb21d46)

Squashed with commit

targets/xvmc: use the non-inline sw helpers

This was missed in commit 59cfb21d ("targets: use the non-inline sw
helpers").

Fixes build failure:

  CXXLD    libXvMCgallium.la
../../../../src/gallium/auxiliary/pipe-loader/.libs/libpipe_loader_static.a(libpipe_loader_static_la-pipe_loader_sw.o):(.data.rel.ro+0x0): undefined reference to `sw_screen_create'
collect2: error: ld returned 1 exit status
Makefile:756: recipe for target 'libXvMCgallium.la' failed
make[3]: *** [libXvMCgallium.la] Error 1

Trivial.

(cherry picked from commit 22d2dda03b)
2015-11-30 00:12:58 +00:00
Emil Velikov
3d09bede30 target-hepers: add non inline sw helpers
Feeling rather dirty copying the inline ones, yet we need the inline
ones for swrast only targets like libgl-xlib, osmesa.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Edward O'Callaghan <edward.ocallaghan@koparo.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
(cherry picked from commit fbc6447c3d)
2015-11-29 19:36:56 +00:00
Emil Velikov
aad5c7d1ca pipe-loader: fix off-by one error
With earlier commit we've dropped the manual iteration over the fixed
size array and prepemtively set the variable storing the size, that is
to be returned. Yet we forgot to adjust the comparison, as before we
were comparing the index, now we're comparing the size.

Fixes: ff9cd8a67c "pipe-loader: directly use
pipe_loader_sw_probe_null() at probe time"
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93091
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

(cherry picked from commit f623517188)
2015-11-29 19:36:55 +00:00
Kenneth Graunke
323161333c i965: Fix scalar vertex shader struct outputs.
While we correctly set output[] for composite varyings, we set completely
bogus values for output_components[], making emit_urb_writes() output
zeros instead of the actual values.

Unfortunately, our simple approach goes out the window, and we need to
recurse into structs to get the proper value of vector_elements for each
field.

Together with the previous patch, this fixes rendering in an upcoming
game from Feral Interactive.

v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).

Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 3810c15614)
2015-11-29 19:36:55 +00:00
Kenneth Graunke
80febef0ad i965: Fix fragment shader struct inputs.
Apparently we have literally no support for FS varying struct inputs.
This is somewhat surprising, given that we've had tests for that very
feature that have been passing for a long time.

Normally, varying packing splits up structures for us, so we don't see
them in the backend.  However, with SSO, varying packing isn't around
to save us, and we get actual structs that we have to handle.

This patch changes fs_visitor::emit_general_interpolation() to work
recursively, properly handling nested structs/arrays/and so on.
(It's easier to read with diff -b, as indentation changes.)

When using the vec4 VS backend, this fixes rendering in an upcoming
game from Feral Interactive.  (The scalar VS backend requires additional
bug fixes in the next patch.)

v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).

Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 3e9003e9cf)
2015-11-29 19:36:55 +00:00
Tom Stellard
cf70584907 radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values
The compiler has more information and is able to optimize the bits
it sets in these registers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 89851a2965)
2015-11-29 19:36:55 +00:00
Tom Stellard
96e1bf8791 radeonsi: Rename si_shader::ls_rsrc{1,2} to si_shader::rsrc{1,2}
In the future, these will be used by other shaders types.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 95e0510916)
2015-11-29 19:36:55 +00:00
Ian Romanick
34521c2840 docs: add missed i965 feature to relnotes
Trivial.  GL_ARB_fragment_layer_viewport support was added in 8c902a58
by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9b41489cb5)
2015-11-29 17:59:46 +00:00
Timothy Arceri
6a71090002 glsl: implement recent spec update to SSO validation
Enables 200+ dEQP SSO tests to proceed past validation,
and fixes a ES31-CTS.sepshaderobjs.PipelineApi subtest.

V2: split out change that reverts a previous patch into its own commit,
move variable declaration to top of function, and fix some formatting
all suggested by Ian.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2571a768d6)
2015-11-29 17:59:22 +00:00
Timothy Arceri
88fd679706 Revert "mesa: return initial value for VALIDATE_STATUS if pipe not bound"
This reverts commit ba02f7a3b6.

The commit checked whether the pipeline was currently bound instead
of checking whether it had ever been bound.  The previous setting
of Validated during object creation makes this unnecessary.  The
real problem was that Validated was not properly set to false
elsewhere in the code.  This is fixed by a later patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 3c4aa7aff2)
2015-11-29 17:58:57 +00:00
Boyuan Zhang
bb7a1ee11f radeon/uvd: uv pitch separation for stoney
v2: set the behaviour default for future ASICs.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit f55f134a03)
2015-11-29 17:58:34 +00:00
Dave Airlie
7a41162b45 texgetimage: consolidate 1D array handling code.
This should fix the getteximage-depth test that currently asserts.

I was hitting problem with virgl as well in this area.

This moves the 1D array handling code to a single place.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 237bcdbab5)
2015-11-29 17:58:09 +00:00
Ilia Mirkin
5e853a4f01 docs: add missed freedreno features to relnotes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e4c1221d36)
2015-11-29 17:57:46 +00:00
Ilia Mirkin
2e073938d0 freedreno/a4xx: use a factor of 32767 for snorm8 blending
It appears that the hardware wants the integer to be scaled the same way
that the hardware representation is. snorm16 uses one of the float
factors, so this is only relevant for snorm8.

This fixes a number of subcases of
  bin/fbo-blending-formats GL_EXT_texture_snorm

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 81b16350fa)
2015-11-29 17:57:23 +00:00
Emil Velikov
30e1c390b3 configure.ac: default to disabled dri3 when --disable-dri is set
Not too long ago, the dri3 code was living in src/glx, which in itself
was guarded by HAVE_DRI_GLX. As the name suggests we didn't dive into
the folder when dri was disabled, thus we missed that dri3 does not
consider/honour --enable-dri.

Cc: mesa-stable@lists.freedesktop.org
Fixes: 6bd9ba7d07 "loader: Add dri3 helper"
Cc: Pali Rohár <pali.rohar@gmail.com>
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit b89d1b2ccf)
2015-11-29 17:57:00 +00:00
Emil Velikov
72e51e5dfa loader: unconditionally add AM_CPPFLAGS to libloader_la_CPPFLAGS
It seems that due to the conditional autotools is getting confused and
forgetting to add AM_CPPFLAGS when building libloader (when
HAVE_DRICOMMON is not set).

Cc: mesa-stable@lists.freedesktop.org
Fixes: 5a79e0a8e3 "automake: loader: rework the CPPFLAGS"
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit b9b0a1f58e)
2015-11-29 17:56:36 +00:00
Emil Velikov
902378d6c8 pipe-loader: link against libloader regardless of libdrm presence
Whether or not the loader has libdrm support is up-to it. Anyone using
the loader should just include it whenever they depend on it.

Cc: mesa-stable@lists.freedesktop.org
Fixes: 0f39f9cb7a "pipe-loader: add a dummy 'static' pipe-loader"
Reported-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Tested-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 8a6d476588)
2015-11-29 17:56:13 +00:00
Ilia Mirkin
f6f127b597 nv50/ir: fix (un)spilling of 3-wide results
There is no 96-bit load/store operations, so we have to split it up
into a 32-bit parts, with a split/merge around it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90348
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4deb118d06)
2015-11-29 17:55:51 +00:00
Ilia Mirkin
a2f2329cdd nv50,nvc0: properly handle buffer storage invalidation on dsa buffer
In case that the buffer has no bind at all, assume it can be a regular
buffer. This can happen on buffers created through the ARB_dsa
interfaces.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit ad5f6b03e7)
2015-11-29 17:55:27 +00:00
Ilia Mirkin
642b66291c nouveau: use the buffer usage to determine placement when no binding
With ARB_direct_state_access, buffers can be created without any binding
hints at all. We still need to allocate these buffers to VRAM or GART,
as we don't have logic down the line to place them into GPU-mappable
space. Ideally we'd be able to shift these things around based on usage.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92438
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 079f713754)
2015-11-29 17:55:04 +00:00
Eric Anholt
06c3ed8d21 vc4: Take precedence over ilo when in simulator mode.
They're exclusive at build time, but the ilo entry is always present, so
we'd try to use it and fail out.

v2: Add comment in the code, from Emil.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 1b62a4e885)
2015-11-29 17:54:41 +00:00
Eric Anholt
cfbb08168a vc4: Just put USE_VC4_SIMULATOR in DEFINES.
In the pipe-loader reworks, it was missed in one of the new directories it
was used.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit a39eac80fd)
2015-11-29 17:54:18 +00:00
Igor Gnatenko
43b0b8a9a3 virgl: pipe_virgl_create_screen is not static
Cc: mesa-stable@lists.freedesktop.org
Fixes: 17d3a5f857 "target-helpers: add a non-inline drm_helper.h"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93063
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 05eed0eca7)
2015-11-29 17:53:55 +00:00
Ilia Mirkin
85b6f905e1 freedreno/a4xx: disable blending and alphatest for integer rt0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 22aeb0c568)
2015-11-29 17:53:31 +00:00
Ilia Mirkin
6a6326dcd4 freedreno/a4xx: fix independent blend
This fixes the ext_draw_buffers2 and arb_draw_buffers_blend tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 4c170d9e1d)
2015-11-29 17:53:08 +00:00
Ilia Mirkin
17a64701cb freedreno/a4xx: fix 3d texture setup
Same fix as on a3xx - set the second (tiny) layer size bitfield to the
smallest level's size so that the hw knows not to minify beyond that.

This fixes texelFetch sampler3D piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 740eb63aa7)
2015-11-29 17:52:43 +00:00
Ilia Mirkin
cb4f6e2a30 freedreno/a4xx: only align slices in non-layer_first textures
When layer is the container, slices are tightly packed inside of each
layer. We don't need any additional alignment. On a3xx, each slice
contains all the layers, so having alignment makes sense.

This fixes a whole slew of array-related piglits, including texelFetch
and tex-miplevel-selection varieties.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ecb0dcd34c)
2015-11-29 17:48:28 +00:00
Ian Romanick
8c564f0376 meta: Don't save or restore the active client texture
This setting is only used by glTexCoordPointer and related glEnable
calls.  Since the preceeding commits removed all of those, it is not
necessary to save, reset to default, or restore this state.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 47b3a0d235)
2015-11-24 11:36:06 -08:00
Ian Romanick
3d2bf5a5f5 meta: Don't save or restore the VBO binding
Nothing left in meta does anything with the VBO binding, so we don't
need to save or restore it.  The VAO binding is still modified.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit c63f9c735d)
2015-11-24 11:36:06 -08:00
Ian Romanick
d1b7a1f5af meta/TexSubImage: Don't pollute the buffer object namespace
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 58aa56d40b)
2015-11-24 11:36:06 -08:00
Ian Romanick
9c2a7cfbbf meta: Don't pollute the buffer object namespace in _mesa_meta_DrawTex
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 76cfe2bc44)
2015-11-24 11:36:06 -08:00
Ian Romanick
089fa07dee meta: Use internal functions for buffer object and VAO access in _mesa_meta_DrawTex
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit a222d4cbc3)
2015-11-24 11:36:06 -08:00
Ian Romanick
7ebc8c36a0 meta: Track VBO using gl_buffer_object instead of GL API object handle in _mesa_meta_DrawTex
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit b8a7369fb7)
2015-11-24 11:36:06 -08:00
Ian Romanick
79468fac69 meta: Partially convert _mesa_meta_DrawTex to DSA
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit d5225ee5d9)
2015-11-24 11:36:06 -08:00
Ian Romanick
756e323f2c meta: Don't pollute the buffer object namespace in _mesa_meta_setup_vertex_objects
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 37d11b13ce)
2015-11-24 11:36:06 -08:00
Ian Romanick
507732ea3d meta: Use internal functions for buffer object and VAO access
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit b1b73a42c8)
2015-11-24 11:36:06 -08:00
Ian Romanick
01909c1f29 meta: Use DSA functions for VBOs in _mesa_meta_setup_vertex_objects
The fixed-function attribute paths don't get the DSA treatment because
there are no DSA entry-points for fixed-function attributes.  These
could have been added, but this is a temporary patch intended to make
later patches easier to review.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 52921f8e08)
2015-11-24 11:36:06 -08:00
Ian Romanick
76b155c9cd meta: Track VBO using gl_buffer_object instead of GL API object handle
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 1035e00a81)
2015-11-24 11:36:06 -08:00
Ian Romanick
4a5c29d877 meta: Don't leave the VBO bound after _mesa_meta_setup_vertex_objects
Meta currently does this, but future changes will make this impossible.
Explicitly do it as a step in the patch series now to catch any possible
kinks.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 3b5a7d450d)
2015-11-24 11:36:06 -08:00
Ian Romanick
bf3f0b9e9b i965: Use _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit ed0bd6573b)
2015-11-24 11:36:06 -08:00
Ian Romanick
e097324fee meta: Use _mesa_NamedBufferData and _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 7f2f300071)
2015-11-24 11:36:05 -08:00
Ian Romanick
aa607c69af meta: Use DSA functions for PBO in create_texture_for_pbo
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 89a61afdd7)
2015-11-24 11:36:05 -08:00
Ian Romanick
72470a9c37 i965: Don't pollute the buffer object namespace in brw_meta_fast_clear
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.

In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions.  The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.

Here's the problem scenario:

 - Application calls a meta function that generates a name.  The first
   Gen will probably return 1.

 - Application decides to use the same name for an object of the same
   type without calling Gen.  Many demo programs use names 1, 2, 3,
   etc. without calling Gen.

 - Application calls the meta function again, and the meta function
   replaces the data.  The application's data is lost, and the app
   fails.  Have fun debugging that.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 4e6b9c11fc)
2015-11-24 11:36:05 -08:00
Ian Romanick
de299e1e2e i965: Use internal functions for buffer object access
Instead of going through the GL API implementation functions, use the
lower-level functions.  This means that we have to keep track of a
pointer to the gl_buffer_object and the gl_vertex_array_object.

This has two advantages.  First, it avoids a bunch of CPU overhead in
looking up objects and validing API parameters.  Second, and much more
importantly, it will allow us to stop calling _mesa_GenBuffers /
_mesa_CreateBuffers and pollute the buffer namespace (next patch).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit e62799bd4e)
2015-11-24 11:36:05 -08:00
Ian Romanick
ded66b1451 i965: Use DSA functions for VBOs in brw_meta_fast_clear
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 1c5423d3a0)
2015-11-24 11:36:05 -08:00
Ian Romanick
b7b4104a7f i965: Pass brw_context instead of gl_context to brw_draw_rectlist
Future patches will use the brw_context instead.  Keeping this
non-functional change separate should make the function changes easier
to review.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit dcadd855f1)
2015-11-24 11:36:05 -08:00
Ian Romanick
236fb067a5 mesa: Refactor enable_vertex_array_attrib to make _mesa_enable_vertex_array_attrib
Pulls the parts of enable_vertex_array_attrib that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).

_mesa_enable_vertex_array_attrib can also be used to enable
fixed-function arrays.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 4a644f1caa)
2015-11-24 11:36:05 -08:00
Ian Romanick
2d9093fdf0 mesa: Refactor update_array_format to make _mesa_update_array_format_public
Pulls the parts of update_array_format that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit a336fcd36a)
2015-11-24 11:36:05 -08:00
Ian Romanick
d757c04215 mesa: Make bind_vertex_buffer avilable outside varray.c
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 8fae494df2)
2015-11-24 11:36:05 -08:00
Emil Velikov
f9339359d5 Update version to 11.1.0-rc1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 13:00:25 +00:00
Emil Velikov
623f64efc1 util: use RTLD_LOCAL with util_dl_open()
Otherwise we risk things blowing up due to conflicting symbols.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
8943a562e2 targets/nine: remove unused static functions
Dead code since commit 8f50614910

Cc: Axel Davy <axel.davy@ens.fr>
Cc: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
42dde5aa24 targets/nine: add note about messy header inclusion order
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
0942250781 targets/nine: add note about fd owndership
v2:
 - move autotools hunk into correct patch
 - correct the note based on Axel's feedback

Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
f8a1665542 auxiliary/vl: Don't close the drm fd on failure
Ported from an identically named commit in st/xa

commit 35cf3831d7
Author: Thomas Hellstrom <thellstrom@vmware.com>
Date:   Thu Jul 3 02:07:36 2014 -0700

    st/xa: Don't close the drm fd on failure v2

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:21 +00:00
Emil Velikov
e43a771dfa st/dri: NULL check the pscreen earlier
We delay the null check only to jump through hoops to work around that.
Check early to make our lives easier.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
13bccee87d st/dri: Don't close the drm fd on failure
Ported from an identically named commit in st/xa

commit 35cf3831d7
Author: Thomas Hellstrom <thellstrom@vmware.com>
Date:   Thu Jul 3 02:07:36 2014 -0700

    st/xa: Don't close the drm fd on failure v2

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
b7f5c2ee48 target-helpers: remove inline_drm_helper.h
As of earlier all the targets use the non inline version. Don't forget
to remove the function prototypes/declarations.

v2: rebase on top of virgl support.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
dddedbec0e {st,targets}/nine: use static/dynamic pipe-loader
Analogous to previous commits.

v2: add the missing winsys libs linkage

Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
611ef64ed5 {st,targets}/xa: use static/dynamic pipe-loader
Analogous to previous commits.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
1eb6e8a23c {auxiliary,targets}/vl: use static/dynamic pipe-loader
Analogous to previous commit.

v2: rebase on top of vl_winsys_drm.c addition

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
23fb11455b {st,targets}/dri: use static/dynamic pipe-loader
Covert DRI to use only the pipe-loader interface.

With drisw_create_screen and kms_swrast_create_screen replaced by their
pipe-loader equivalent, we can now drop them.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
c4d337146a pipe-loader: add preliminary Android support
Add a 'static' pipe-loader build, which will be used with follow-up
commits.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
234b03cc23 pipe-loader: add preliminary scons support
Add a 'static' pipe-loader build, which will be used with follow-up
commits.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
7999e6ddba pipe-loader: don't mix code and variable declarations
We cannot use this C99 feature here quite yet, as the code needs to be
build with MSVC prior to 2013.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:20 +00:00
Emil Velikov
17d3a5f857 target-helpers: add a non-inline drm_helper.h
Unlike the inline ones, here we'd want to have an extern definition of
the functions. This is required as with follow-up commits, we'll
gradually start using the static pipe-loader, with the latter needing
the symbols.

These are direct copy from the inline version.

v2:
 - rebase on top of virgl support
 - add "driver missing" printfs (Nicolai)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
af031deed6 target-helpers: move the DRI specifics to the target
Rather than having all targets include the file, with only some defining
the relevant guard macro, just move things where they are used.

v2: rebase on top of virgl support.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
950e06a29b automake: remove no longer needed HAVE_LOADER_GALLIUM conditional
As of last few commits we have a static and dynamic pipe-loader. Either
of which will be used with (almost) all targets..

We can look into allowing the user to select which way the targets are
built, be that 'static for all' or 'per target' in follow up commits.
After which we can look into building only the static or dynamic
version, although building both shouldn't cause any issues.

Hack/workaround alert:
Control the standalone pipe-drivers via HAVE_CLOVER. Will need to be
fixed as the targets are converted/configure knobs are in.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
be78f73b37 pipe-loader: wire up the 'static' sw pipe-loader
Analogous to previous commit with a small catch.

As the sw inline helpers are mere wrappers, and the screen <> winsys
split is more prominent (with the latter not being part of the final
pipe-driver), things will just work.

v2: rebase on top of earlier 'consolitate teardown' changes

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
1b589207de pipe-loader: wire up the 'static' drm pipe-loader
Add a list of driver descriptors and select one from the list, during
probe time.

As we'll need to have all the driver pipe_foo_screen_create() functions
provided externally (i.e. from another static lib) we need a separate
(non-inline) drm_helper, which contains the function declarations.

v2: rebase on top of virgl support.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
0f39f9cb7a pipe-loader: add a dummy 'static' pipe-loader
It is to be used in contrast of the dynamic one. The state-tracker does
not need to know if the pipe-driver is built into the final blob or
a separate object. This will allow us to move the logic to the final
step (in target) where the appropriate pipe-loader will be chosen.

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
ad12027d8f gallium: rename libpipe_loader to libpipe_loader_dynamic
With the next commits we'll introduce a 'static' version, which will
essentially load the statically linked-in pipe-drivers, rather than the
standalone pipe-$foo.so ones.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
3ca12ee976 pipe-loader: dlopen/dlsym the pipe-driver at probe time
Rather than giving false hopes that things might work, just check at
probe time. This allows us to remove the duplication and consolidate
the code wrt the upcomming static pipe-loader.

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
e465de5a51 pipe-loader: annotate the ops as const data
Already defined as such in struct pipe_loader_device::ops.

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
46991ab9aa pipe-loader: teardown the winsys, if create_screen fails
i.e. plug some (hard to hit) memory leaks.

v2: fix rebase fallout - really teardown the winsys (Brian)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:19 +00:00
Emil Velikov
d54ca54faa pipe-loader: rework the sw backend
Move the winsys into the pipe-target, similar to the hardware
pipe-driver.

v2:
 - move int declaration outside of loop (Brian)
 - fold the teardown into a goto + separate function.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
f58a6f7be3 gallium: keep the libdrm link alongside libkmsdri.la
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
ff9cd8a67c pipe-loader: directly use pipe_loader_sw_probe_null() at probe time
Due to the nature of the other sw winsys' we cannot use them during the
generic probe stage. As such there is little point in keeping the
abstraction layer.

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
4e3c06a501 pipe-loader: add pipe_loader_sw_probe_init_common() helper
Allows us to fold the duplication in pipe_loader_sw_probe_*().

Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
6d68d714c0 gallium/tests: remove unneeded include paths
The tests don't (and shouldn't) need to have anything driver and/or
winsys specific.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
74d41a32bc gallium: remove library_path argument from pipe_loader_create_screen()
Currently the location is determined at configure/build time and
consistently copied across gallium. Just remove the extra argument, and
use PIPE_SEARCH_DIR where appropriate.

This will allow us to remove the duplication in the *configuration and
*screen_create APIs by moving util_dl_get_proc_address() and friends to
probe time.

v2: rebase on top of vl_winsys_drm.c addition

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
cbc4d9730a targets/nine: remove the custom pipe-driver path management
Since the up-streaming of nine, the static target was used by default.
The dynamic pipe-drivers being available only via manual tweak of
configure.ac.

As we'll be removing the library_path argument from the pipe-loader with
follow-up commits, we can remove D3D9_DRIVERS_PATH/D3D9_DRIVERS_DIR.
Everyone doing local hacking on nine, or wishing to have a env override
can bring them back within the pipe-loader.

Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
149454bb13 pipe-loader: remove HAVE_DRM_LOADER_GALLIUM and HAVE_PIPE_LOADER_DRM
... in favour of HAVE_LIBDRM. After all we solely want to build the code
when the latter is available.

In the not too distant future we will remove the libudev/sysfs
dependency and simplify configure.ac even further.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
33f1db1eb4 pipe-loader: add pipe_loader_sw_probe_kms() implementation
Will be used as a counterpart for target-helpers'
kms_swrast_create_screen().

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
be430726e2 configure: use HAVE_DRISW_KMS when handling kms swrast
Using HAVE_DRI2 to manage it seems counter-intuitive.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:18 +00:00
Emil Velikov
f9c9471b76 targets/nine: use the existing sw_screen_wrap() over our custom version
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
6bcd5f0d02 automake: use GALLIUM_PIPE_LOADER_DEFINES only where applicable
As of last commit we no longer need the defines in order to have the
function prototypes.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
b7875ca493 pipe-loader: remove HAVE_PIPE_LOADER_foo function prototype guards
They serve little to no purpose, as we don't need any additional
dependencies (headers and/or symbols). On the other hand dropping them
will allow us to use GALLIUM_PIPE_LOADER_DEFINES in only one single
place - the pipe-loader.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
c751d33a20 gallium/trace: remove useless NULL check from trace_screen_create()
Currently every target makes sure that the screen is non-null prior to
using the debug (trace including) wrappers. If that no longer holds true
we want to know and fix this ASAP rather than silently bailing out.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
e762a46a07 configure: remove obsolete _CLIENT comment
The referenced variable(s) have been removed with commit abc20120e4
(automake: pipe-loader: remove the 'client' pipe-loader)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
2015-11-21 12:52:17 +00:00
Emil Velikov
1a18457a52 docs: add news item and link release notes for 11.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 12:42:48 +00:00
Emil Velikov
da2cb8a2ee docs: add sha256 checksums for 11.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 2555e000fc)
2015-11-21 12:41:22 +00:00
Emil Velikov
380aec1703 docs: add release notes for 11.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 04fd3a6f62)
2015-11-21 12:41:21 +00:00
Ilia Mirkin
d8c26969d5 freedreno/a4xx: add missing formats to enable ARB_vertex_type_2_10_10_10_rev
Same as commit 84d087aea but for a4xx. The RE'd enums had the same issue
too.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 20:41:39 -05:00
Matt Turner
f6986a81c9 i965: Test that nonrepresentable floats cannot be converted to VF.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-20 17:39:34 -08:00
Matt Turner
f450030f66 i965: Use ldexpf() in VF float test set up.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-20 17:39:34 -08:00
Matt Turner
0684aed8ab i965/vec4: Initialize nir_inputs with src_reg().
nir_locals, nir_ssa_values, and nir_system_values are all dst_reg (not
that that makes a whole lot of sense to me), and only nir_inputs is a
src_reg.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-20 17:39:34 -08:00
Matt Turner
c875e3cdd2 i965/fs: Add support for gl_HelperInvocation system value.
In most cases (when the negate is copy propagated and the MOV removed),
this is two instructions on Gen >= 8 and only two instructions on
earlier platforms -- and it doesn't use the flag register.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-20 17:39:33 -08:00
Matt Turner
4b15281295 i965: Add brw_imm_uv(). 2015-11-20 17:39:33 -08:00
Matt Turner
ce11d4f369 i965: Don't bother setting regioning on immediates.
The region fields are unioned with the immediate storage.
2015-11-20 17:39:33 -08:00
Matt Turner
c28b574170 nir: Add support for gl_HelperInvocation system value.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-20 17:39:33 -08:00
Ilia Mirkin
fe29330406 freedreno/a4xx: use hardware RGTC texture samplers
a4xx hardware has real support for RGTC so there's no need to fake it
like we do on a3xx. Undo the hacks, and keep track of an "internal
format" of a resource, which on a3xx will be different, triggering the
transfer-time conversions to take place.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 19:46:21 -05:00
Ilia Mirkin
39fa5c8419 freedreno/a4xx: hook up RGB565 format
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 19:46:21 -05:00
Ilia Mirkin
3b77826cc1 freedreno/a4xx: logic op handling
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 19:46:21 -05:00
Ilia Mirkin
e1319dcdd6 freedreno/a4xx: add 16-bit unorm/snorm format texturing/rendering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 19:46:21 -05:00
Ilia Mirkin
ff9450ecd1 freedreno/a4xx: point regid to "red" even for alpha-only rb formats
Looks like a4xx hw does this in a more standard way and we don't need to
hack around it like we do on a3xx. Fixes GL_ALPHA formats in
fbo-blending-formats, fbo-colormask-formats, and fbo-alphatest-formats.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-20 18:15:15 -05:00
Ilia Mirkin
4fd24caf92 ttn: add TEX2 support
This fixes CubeArrayShadow tests (where the shadow comes in via a second
arg to the TEX2 instruction).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-20 17:45:08 -05:00
Ilia Mirkin
c1babbd85c freedreno: always set all border colors
Instead of playing the guessing game as to which texture format reads
from which border color encoding type, just write both of them always.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 17:44:10 -05:00
Ilia Mirkin
ec106e9f62 freedreno/a4xx: fix dst_alpha blend for RGBX render targets
There are not native RGBX render formats, so we must manually force
dst_alpha to be one, same as for a3xx.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 17:44:10 -05:00
Nicolai Hähnle
5bda3d0958 radeon: re-prepare query buffers on begin_query for predicate queries
The point of prepare_buffer is to ensure that the query buffer contains valid
initial data for conditional rendering: as long as the buffer is initialized
correctly, the GPU is able to tell whether query results have been written
already (and wait or fall back to unconditional rendering if desired).

This means prepare_buffer needs to be called again when a buffer is reused.

Conversely, for queries that cannot be used for conditional rendering
(notably pipeline statistics), we can re-use buffers immediately, and they
do not need to be initialized.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
2015-11-20 22:46:11 +01:00
Nicolai Hähnle
6f4fe8e76a radeon: reset query buffers for PIPE_QUERY_TIMESTAMP
Since begin_query is not called for this query type, we need to reset the
query buffer state in end_query instead.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93015
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Tested-by: Mathias Tillman <master.homer@gmail.com>
2015-11-20 22:46:11 +01:00
Brian Paul
47fae842d0 mesa: update some old-style (K&R?) function pointer calls
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 14:09:15 -07:00
Brian Paul
1def5ef958 docs: mention GL 3.3 support for VMware driver in Mesa 11.1 relnotes
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-11-20 14:06:25 -07:00
Brian Paul
527466d9a1 svga: add num-bytes-uploaded HUD query
To graph the number of bytes uploaded to GPU per frame (vertex buffer data,
constant buffer data, texture data, etc).

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-20 13:40:06 -07:00
Brian Paul
e96d7a1489 svga: add some sanity check assertions in svga_buffer_transfer_map()
Make sure y and z values of buffers are as expected.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-20 13:40:06 -07:00
Timothy Arceri
b109cd3c27 docs: mark compile-time constant expressions as done
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:30:18 +11:00
Timothy Arceri
f7af69c350 glsl: add subroutine index qualifier support
ARB_explicit_uniform_location allows the index for subroutine functions
to be explicitly set in the shader.

This patch reduces the restriction on the index qualifier in
validate_layout_qualifiers() to allow it to be applied to subroutines
and adds the new subroutine qualifier validation to ast_function::hir().

ast_fully_specified_type::has_qualifiers() is updated to allow the
index qualifier on subroutine functions when explicit uniform locations
is available.

A new check is added to ast_type_qualifier::merge_qualifier() to stop
multiple function qualifiers from being defied, before this patch this
would cause a segfault.

Finally a new variable is added to ir_function_signature to store the
index. This value is validated and the non explicit values assigned in
link_assign_subroutine_types().

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-21 07:30:12 +11:00
Timothy Arceri
02d2ab2378 glsl: add support for complie-time constant expressions
This patch replaces the old interger constant qualifiers with either
the new ast_layout_expression type if the qualifier requires merging
or ast_expression if the qualifier can't have mulitple declarations
or if all but the newest qualifier is simply ignored.

We also update the process_qualifier_constant() helper to be
similar to the one in the ast_layout_expression class, but in
this case it will be used to process the ast_expression qualifiers.

Global shader layout qualifier validation is moved out of the parser
in this change as we now need to evaluate any constant expression
before doing the validation.

V2: Fix minimum value check for vertices (Emil)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:28:06 +11:00
Timothy Arceri
0954b813a3 glsl: add new type for compile time constants
In this patch we introduce a new ast type for holding the new
compile-time constant expressions. The main reason for this is that
we can no longer do merging of layout qualifiers before they have been
converted into GLSL IR so we need to store them to be proccessed later.

The new type has two helper functions:

- process_qualifier_constant()

 Used to merge and then evaluate qualifier expressions

- merge_qualifier()

 Simply appends a qualifier to a list to be merged later by
 process_qualifier_constant()

In order to avoid cascading error messages the process_qualifier_constant()
helpers return a bool

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:56 +11:00
Timothy Arceri
4196af4ce7 glsl: call set_shader_inout_layout() earlier
This will allow us to add error checking to this function
in a later patch, if we don't move it the error messages
will go missing.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:49 +11:00
Timothy Arceri
e74fe2a844 glsl: replace binding layout min boundary check
Use new helper that will in a later patch allow for
compile time constants.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:42 +11:00
Timothy Arceri
64710db664 glsl: encapsulate binding validation and setting
This change moves the binding layout handing code into an apply
function to be consistent with other helper functions in the ast
code, and to encapsulate the code so that when we introduce
compile time constants the code will be much cleaner.

One small downside is for unnamed interface blocks we will now
be revalidating the binding for each member its applied to.
However this seems a small sacrifice in order to have code which
is readable.

We also remove the incorrect comment in the named interface code
about propagating bindings to members which seems to have been
copied from the unnamed interface code.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:30 +11:00
Timothy Arceri
db3c36aedf glsl: move stream layout max validation
This validation is moved later so we can validate the
max value when compile time constant support is added in a
later patch.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:21 +11:00
Timothy Arceri
17e224e8ec glsl: move stream layout qualifier validation
We are moving this out of the parser in preparation for compile
time constant support.

The reason a validation function is used rather than an apply
function like what is used with bindings is because glsl allows
streams to be defined on members of blocks even though they must
match the stream thats associated with the current block, this
means we need access to the value after validation to do this
comparision.

V2: Fix typo in comment (Emil)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:15 +11:00
Timothy Arceri
efa34e4a1d glsl: replace index layout min boundary check
Use new helper that will in a later patch allow for
compile time constants.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:09 +11:00
Timothy Arceri
1d87d6f9ca glsl: remove duplicate validation for index layout qualifier
The minimum value for index is validated in apply_explicit_location()
and we want to remove validation from the parser so we can add
compile time constant support.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:04 +11:00
Timothy Arceri
d1f23545a1 glsl: move location layout qualifier validation
We are moving this out of the parser in preparation for compile
time constant support.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:27:00 +11:00
Timothy Arceri
de8f0c9ab9 glsl: add process_qualifier_constant() helper
For now this just validates that a qualifier is inside its
minimum boundary, in a later patch we will expand it to
evaluate compile time constants.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-21 07:26:55 +11:00
Samuel Pitoiset
f57285c8fc docs: mark GL_AMD_performance_monitor for nv50
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 21:03:14 +01:00
Samuel Pitoiset
aede8ca9a7 nv50: expose two groups of compute-related MP perf counters
This turns on GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 21:03:14 +01:00
Ben Widawsky
0288f92e7b i965/gen9: Support fast clears for 32b float
SKL supports the ability to do fast clears and resolves of 32b RGBA as both
integer and floats. This patch only enables float color clears because we
haven't yet enabled integer color clears, (HW support for that was added in
BDW).

v2: Remove LUMINANCE16F and INTENSITY16F special cases since they are now
handled by Neil's patch to disable MSAA fast clears.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-20 11:45:44 -08:00
Ben Widawsky
7c690da29c Revert "i965/gen9: Enable rep clears on gen9"
This reverts commit 8a0c85b258.

It's not a strict revert because I don't want to bring back the gen < 9 check at
this point in time.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
f838e53c70 Revert "i965/gen9: Disable MCS for 1x color surfaces"
This reverts commit dcd59a9e32.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
c4edc048c6 i965/meta/gen9: Individually fast clear color attachments
The impetus for this patch comes from a seemingly benign statement within the
spec (quoted within the patch).

It is very important for clearing multiple color buffer attachments and can be
observed in the following piglit tests:
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0

v2: Doing the framebuffer binding only once (Chad)
Directly use the renderbuffers from the mt (Chad)

v3: Patch from Neil whose feedback I originally missed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
6fa1130cd2 i965/skl: skip fast clears for certain surface formats
Some of the information originally in this commit message is now in the patch
before this.

SKL adds compressible render targets and as a result mutates some of the
programming for fast clears and resolves. There is a new internal surface type
called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "Auxiliary Surfaces For
Sampled Tiled Resource".

The formats which are supported are defined in the table titled "Render Target
Surface Types [SKL+]". There is no PRM yet to reference. The previously
implemented helper function already does the right thing provided the table is
correct.

v2: Use better English in commit message (Matt)
s/compressable/compressible/ (Matt)
Don't compare bools to true (Matt)
Use the helper function and don't increase the context size - this is mostly
implemented in the patch just before this (Chad, Neil)
Remove an "invalid" assert (Chad)
Fix assertion to check num_samples > 1, instead of num_samples (Chad)

v3:
Use Matt's code as Requested-by: Chad. I didn't even look at it since Chad said
he was fine with that, and presumably Matt is fine with it.

v4: Use better quote from spec (Topi)

Cc: Chad Versace <chad.versace@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
9d94eeb8a4 i965: Add lossless compression to surface format table
Background: Prior to Skylake and since Ivybridge Intel hardware has had the
ability to use a MCS (Multisample Control Surface) as auxiliary data in
"compression" operations on the surface. This reduces memory bandwidth.  This
hardware was either used for MSAA compression, or fast clear operations. On
Gen8, a similar mechanism exists to allow the hiz buffer to be sampled from, and
therefore this feature is sometimes referred to more generally as "AUX buffers".

Skylake adds the ability to have the display engine directly source compressed
surfaces on top of the ability to sample from them. Inference dictates that
enabling this display features adds a restriction to the formats which could
actually be compressed. This is backed up by a blurb in the AUX_CCS_D section
from the RENDER_SURFACE_STATE: "In addition, if the surface is bound to the
sampling engine, Surface Format must be supported for Render Target Compression
for surfaces bound to the sampling engine." The current set of surfaces seems
to be a subset as compared to previous gens (see the next patch). Also, if I had
to guess I would guess that future gens add support for more surface formats. To
make handling this a bit easier to read, and more future proof, the support for
this is moved into the surface formats table.

Along with the modifications to the table, a helper function is also provided to
determine if a surface is CCS_E compatible. Because fast clears are currently
disabled on SKL, we can plumb the helper all the way through here, and not
actually have anything break.

v2:
- rename ccs to ccs_e; Requested-by: Chad
- rename lossless_compression to lossless_compression Requested-by: Chad
- change meaning of brw_losslessly_compressible_format Requested-by: Chad
  - related changes to the code to reflect this.
- remove excess ccs (Chad)

v3:
- Commit message changes (Topi)
- Const some things which could be const (Topi)

Requested-by: Chad Versace <chad.versace@intel.com>
Requested-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-20 11:45:32 -08:00
Ben Widawsky
d23aa634e0 i965/skl: Add fast color clear infrastructure
Patch was originally called:
i965/skl: Enable fast color clears on SKL

Skylake introduces some differences in the way that fast clears are programmed
and in the restrictions for using fast clears. Since some of these are
non-obvious, and fast clears are currently disabled globally, we can enable the
simple stuff here and leave the weirder stuff and separately reviewable work.

Based on a patch originally from Kristian.

Note that within this patch the change in scaling factors could be achieved with
this hunk instead. I've opted to keep things more like how the docs describe it
however.
   --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
   +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
   @@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
          /* In release builds, fall through */
       case I915_TILING_Y:
          *width_px = 32 / mt->cpp;
   -      *height = 4;
   +      if (brw->gen >= 9)
   +         *height = 2;
   +      else
   +         *height = 4;

v2: Add braces for the multiline (Matt + Chad)
Comment updates (requested by Chad)
Modified commit message
Commit message from Chad explaining the MCS height change (Chad)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-20 11:45:32 -08:00
Ian Romanick
2f7d2fd997 docs: Add GL_EXT_shader_samples_identical to the release notes
Trivial

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-20 11:38:11 -08:00
Leo Liu
8762570cc5 radeon/vce: disable two pipe mode for stoney
Only one encoding pipe available for Stoney

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-20 13:21:54 -05:00
Leo Liu
99d92de5d0 radeon/vce: add new firmware interface support
Add new interface to create and encode

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-20 13:21:54 -05:00
Emil Velikov
8fdb548799 egl: don't forget to ship platform_x11_dri3.h into the tarball
Should have been a part of f35198bade

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 18:08:04 +00:00
Emil Velikov
ae6d6941f6 glsl: move builtin_type_macros.h into the correct list
Commit b9b40ef9b7 moved the file, but forgot to update the reference in
the makefile. Thus the out of tree build was busted :\

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 18:07:58 +00:00
Emil Velikov
c45b4257c2 automake: use static llvm for make distcheck
With llvm 3.7 semi-dropping the autoconf build, we rely on their cmake
build. With the latter of which annoyingly using another (busted?)
SONAME.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 18:07:52 +00:00
Brian Paul
0743e14aee mesa: remove unused var in _mesa_PushDebugGroup()
Trivial.
2015-11-20 09:35:18 -07:00
Brian Paul
108013b8e5 mesa: whitespaces fixes in _mesa_one_time_init_extension_overrides()
Trivial.
2015-11-20 09:35:05 -07:00
Nicolai Hähnle
8a125afa6e radeon: ensure that timing/profiling queries are suspended on flush
The queries_suspended_for_flush flag is redundant because suspended queries
are not removed from their respective linked list.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-20 17:27:40 +01:00
Nicolai Hähnle
6a14a39fab st/mesa: add support for batch driver queries to perfmon
v2 + v3: forgot null-pointer checks (spotted by Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:36 +01:00
Nicolai Hähnle
424a614ff1 gallium/hud: add support for batch queries
v2 + v3: be more defensive about allocations

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:32 +01:00
Nicolai Hähnle
d61d4df02e gallium: add the concept of batch queries
Some drivers (in particular radeon[si], but also freedreno judging from
a quick grep) may want to expose performance counters that cannot be
individually enabled or disabled.

Allow such drivers to mark driver-specific queries as requiring a new
type of batch query object that is used to start and stop a list of queries
simultaneously.

v3: adjust recently added nv50 queries

v2: documentation for create_batch_query

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:28 +01:00
Nicolai Hähnle
c235300bfc st/mesa: maintain active perfmon counters in an array
It is easy enough to pre-determine the required size, and arrays are
generally better behaved especially when they get large.

v2: make sure init_perf_monitor returns true when no counters are active
(spotted by Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:23 +01:00
Nicolai Hähnle
afa6121b4e st/mesa: use BITSET_FOREACH_SET to loop through active perfmon counters
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:18 +01:00
Nicolai Hähnle
0aea83dc4a st/mesa: store mapping from perfmon counter to query type
Previously, when a performance monitor was initialized, an inner loop through
all driver queries with string comparisons for each enabled performance
monitor counter was used. This hurts when a driver exposes lots of queries.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:27:09 +01:00
Nicolai Hähnle
4e1339691d st/mesa: map semantic driver query types to underlying type
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:26:59 +01:00
Nicolai Hähnle
050db20d37 gallium/hud: remove unused field in query_info
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:26:50 +01:00
Nicolai Hähnle
ddf27a3dd0 gallium: remove pipe_driver_query_group_info field type
This was only used to implement an unnecessarily restrictive interpretation
of the spec of AMD_performance_monitor. The spec says

  A performance monitor consists of a number of hardware and software
  counters that can be sampled by the GPU and reported back to the
  application.

I guess one could take this as a requirement that counters _must_ be sampled
by the GPU, but then why are they called _software_ counters? Besides,
there's not much reason _not_ to expose all counters that are available,
and this simplifies the code.

v3: add a missing change in the nouveau driver (thanks Samuel Pitoiset)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-20 17:26:39 +01:00
Roland Scheidegger
24dc0316b4 gallivm: use sampler index 0 for texel fetches
texel fetches don't use any samplers. Previously we just set the same
number for both texture and sampler unit (as per "ordinary" gl style
sampling where the numbers are always the same) however this would trigger
some assertions checking that the sampler index isn't over PIPE_MAX_SAMPLERS
limit elsewhere with d3d10, so just set to 0.
(Fixing the assertion instead isn't really an option, the sampler isn't
really used but might still pass an out-of-bound pointer around and even
copy some things from it.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-20 17:00:15 +01:00
Ilia Mirkin
9a93da4e83 freedreno/a4xx: add BPTC support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-20 09:25:39 -05:00
François Tigeot
8a94ba5e0c xmlconfig: Add support for DragonFly
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:18:47 +00:00
Mauro Rossi
480ba46bcb android: export the path of glsl nir headers
The change is necessary to avoid building errors in glsl and i965
modules due to missing glsl_types.h header

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:18:19 +00:00
Boyan Ding
b8547a5063 mesa: re-enable KHR_debug for ES contexts
With the earlier issues resolved we can expose the extension.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:14:05 +00:00
Boyan Ding
ab7294668c main: Don't restrict several KHR_debug enum to desktop GL
In preparation for supporting GL_KHR_debug in OpenGL ES

v2: add a missing hunk in _mesa_IsEnabled (Emil)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:14:05 +00:00
Emil Velikov
af27236854 mesa: use the correct string for the ES GL_KHR_debug functions
As defined in the spec

    when implemented in an OpenGL ES context, all entry points defined
    by this extension must have a "KHR" suffix.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-20 11:14:05 +00:00
Gregory Hainaut
9108a785a0 glsl: avoid linker and user varying location to overlap
Current behavior on the interface matching:

layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR0 by the linker

New behavior on the interface matching:

layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR1 by the linker

v4:
* Fix variable name in assert

Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-20 22:04:02 +11:00
Emil Velikov
3afb253e9b auxiliary/vl/dri2: coding style fixes
Rewrap long(ish) lines, add space between struct foo and *.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
b31f092bfb auxiliary/vl/dri2: hide internal functions
Analogous to previous commit. While we're here prefix all functions
identically -> vl_dri2_foo

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
4533c022f4 auxiliary/vl/drm: hide internal functions
As of last commit everyone is using the vl_screen dispatch, thus we can
hide this function from the headers and make it static.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
abbfda60d8 st/vdpau: use the vl_screen dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
4307155127 st/xvmc: use the vl_screen dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
422356ed2f st/va: use the vl_screen dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:45 +00:00
Emil Velikov
9eb109f4d3 st/omx: use the vl_screen dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:44 +00:00
Emil Velikov
32094979f7 auxiliary/vl/dri2: setup the dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:44 +00:00
Emil Velikov
6150d8d4bd auxiliary/vl/drm: use a label for the error path
... just like every other place in gallium.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:44 +00:00
Emil Velikov
d03d9ecafa auxiliary/vl/drm: setup the dispatch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:44 +00:00
Emil Velikov
6b152ee7b6 auxiliary/vl: add dispatch table
As mentioned previously, it will allow us to use different vl backend in
a generic way from either video state-tracker.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:58:41 +00:00
Emil Velikov
2bd9116b82 auxiliary/vl: rename vl_screen_create to vl_dri2_screen_create
In a preparation of having proper multi-platform/backend handling in VL.

With follow up commits we'll introduce a dispatch within vl_screen
similar to the one in pipe_screen. This way any VL state-tracker can
operate seamlessly, considering the backend/platform is properly setup.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:56:34 +00:00
Emil Velikov
c31218cdb3 st/va: trivial cleanup
Drop the temporary variable and fold the two conditional.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-20 10:56:17 +00:00
Emil Velikov
a8f45e0161 st/omx: straighten get/put_screen
The current code is busted in a number of ways.

 - initially checks for omx_display (rather than omx_screen), which may
or may not be around.
 - blindly feeds the empty env variable string to loader_open_device()
 - reads the env variable every time get_screen is called
 - the latter manifests into memory leaks, and other issues as one sets
the variable between two get_screen calls.

Additionally it cleans up a couple of extra bits
 - drops unneeded set/check of omx_display.
 - make the teardown (put_screen) order was not symmetrical to the setup
(get_screen)

v2: Drop the "is empty string" check (Leo)

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-11-20 10:56:10 +00:00
Emil Velikov
7157085140 automake: loader: don't create an empty dri3 helper
Seems that creating an empty one does not fair too well with MacOSX's
ar. Considering that all the users of the helper include it only when
needed, let's reshuffle the makefile.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92985
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-20 10:40:35 +00:00
Emil Velikov
115f179852 automake: loader: honour the XCB_DRI3 cflags
Without this the compilation will fail, as the headers are installed in
a non-default location.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-20 10:40:29 +00:00
Emil Velikov
166314dd88 automake: egl: add symbols test
Should help us catch issues where we expose any extra symbols by
mistake. Just like the ones fixes with previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-11-20 10:40:23 +00:00
Emil Velikov
5a79e0a8e3 automake: loader: rework the CPPFLAGS
Rather than duplicating things, just use the generic AM_CPPFLAGS. This
has the fortunate side-effect of adding VISIBILITY_CFLAGS for the dri3
helper. The latter of which was erroneously exposing some internal
symbols.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-20 10:40:11 +00:00
Ian Romanick
99840eb983 i965: Enable EXT_shader_samples_identical
On the vec4 backend, textureSamplesIdentical() will always return
false.  There are currently no test cases for the vec4 backend, so we
don't have much confidence in any implementation.  We also don't think
anyone is likely to miss it.

v2: Handle immediate value for MCS smarter.  Rebase on changes to
nir_texop_sampels_identical (missing second parameter).  Suggested by
Jason.

v3: Add Neil's code to handle 16x MSAA in the FS.  Also rebase on top of
f9a9ba5e.  Stub out the vec4 implementation.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v2]
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
2015-11-19 20:17:16 -08:00
Ian Romanick
84b6c64efc i965/vec4: Handle nir_tex_src_ms_index more like the scalar
v2: Rebase on top of f9a9ba5e.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
457bb290ef nir: Add nir_texop_samples_identical opcode
This is the NIR analog to GLSL IR ir_samples_identical.

v2: Don't add the second nir_tex_src_ms_index parameter.  Suggested by
Ken and Jason.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
06c56f443a glsl: Add textureSamplesIdenticalEXT built-in functions
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
8343583557 glsl: Add ir_samples_identical opcode
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
ef54434c52 glsl: Extension tracking for EXT_shader_samples_indentical
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:16 -08:00
Ian Romanick
ff59700d29 mesa: Extension tracking for EXT_shader_samples_indentical
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:15 -08:00
Ian Romanick
b1b9f68d4c Import current draft of EXT_shader_samples_identical spec
v2: Add Neil to the list of contributors.  I meant to do that before,
but Matt reminded me.

v3: Fix typos noticed by Nicolai.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-19 20:17:15 -08:00
Rob Clark
acca6c65d3 nir: add nir_ssa_for_alu_src()
Using something like:

   numer = nir_ssa_for_src(bld, alu->src[0].src,
                           nir_ssa_alu_instr_src_components(alu, 0));

for alu src's with swizzle, like:

   vec1 ssa_10 = intrinsic load_uniform () () (0, 0)
   vec2 ssa_11 = intrinsic load_uniform () () (1, 0)
   vec2 ssa_2 = udiv ssa_10.xx, ssa_11

ends up turning into something like:

   vec1 ssa_10 = intrinsic load_uniform () () (0, 0)
   vec2 ssa_11 = intrinsic load_uniform () () (1, 0)
   vec2 ssa_13 = imov ssa_10
   ...

because nir_ssa_for_src() ignore's the original nir_alu_src's swizzle.
Instead for alu instructions, nir_src_for_alu_src() should be used to
ensure the original alu src's swizzle doesn't get lost in translation:

   vec1 ssa_10 = intrinsic load_uniform () () (0, 0)
   vec2 ssa_11 = intrinsic load_uniform () () (1, 0)
   vec2 ssa_13 = imov ssa_10.xx
   ...

v2: check for abs/neg, and re-use existing nir_alu_src

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-19 20:03:32 -05:00
Rob Clark
c73f40c473 nir: fix missing increments of num_inputs/num_outputs
Note: not quite perfect, we should use type_size vfunc (in
compiler_options or nir_shader?) to determine how much we
increment num_inputs/outputs/uniforms.  But we don't have
that yet, so let's at least fix things for the existing
users of these passes.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-19 20:03:32 -05:00
Rob Clark
fec9367deb nir/print: show # of uniforms/inputs/outputs
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-19 20:03:32 -05:00
Rob Clark
01e94d8d5d nir/print: show shader name/label if set
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-19 20:03:32 -05:00
Rob Clark
006e4f070f nir: add nir_var_all enum
Otherwise, passing -1 gets you:

  error: invalid conversion from 'int' to 'nir_variable_mode' [-fpermissive]

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-19 20:03:32 -05:00
Ilia Mirkin
769b3ab6c5 freedreno/a4xx: fix 5_5_5_1 texture sampler format
This fixes teximage-colors, fbo-generatemipmap-formats, and probably
others (in relation to the RGB5 formats, others still fail).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-19 19:00:18 -05:00
Ilia Mirkin
a05e5491c3 freedreno/a4xx: add depth clamp and halfz clip
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 19:00:18 -05:00
Ilia Mirkin
b17a405609 freedreno/a4xx: allow seamless cubemap filtering to be enabled per-texture
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 19:00:18 -05:00
Ilia Mirkin
0a4462ad6e freedreno/a4xx: support lod_bias
The lower layers assume that we support this, and it's been core since
GL 1.4. This fixes a slew of piglit tests, especially around
tex-miplevel-selection.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-19 19:00:17 -05:00
Samuel Pitoiset
0cfc1304be nv50: allow using inline vertex data submit when gl_VertexID is used
The hardware can actually generates vertexid when vertices come from
a client-side buffer like when glDrawElements is used.

This doesn't fix (or break) any piglit tests but it improves the
previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex
data submit when gl_VertexID is used")

The only disadvantage is that only works on G84+, but we don't really
care of that weird and old NV50 chipset.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 21:11:38 +01:00
Samuel Pitoiset
9e40a621c1 nv50: add NV84_3D macro
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 21:11:27 +01:00
Matt Turner
a5b3115f0a i965: Drop IMM fs_reg/src_reg -> brw_reg conversions.
The previous two commits make this unnecessary.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Matt Turner
f9a9ba5eac i965/vec4: Replace src_reg(imm) constructors with brw_imm_*().
Cuts 1.5k of .text.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Matt Turner
9b978046eb i965/fs: Use brw_imm_uw().
W/UW immediates are 16-bits, but those 16-bits must be replicated
in the high 16-bits of the 32-bit field.

Remove the useless W/UW immediate saturating code, since we'll now be
using the appropriate immediate (and W/UW immediates in the IR can now
no longer be larger than 16-bits).

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Matt Turner
3ccc41ecfc i965/fs: Replace fs_reg(imm) constructors with brw_imm_*().
Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor
implementations themselves.

   text     data      bss      dec      hex  filename
5204535   214112    27784  5446431   531b1f  i965_dri.so before
5193977   214112    27784  5435873   52f1e1  i965_dri.so after

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Matt Turner
c15a407eb4 i965: Make brw_imm_vf4() take 8-bit restricted floats.
This partially reverts commit bbf8239f92.

I didn't like that commit to begin with -- computing things at compile
time is fine -- but for purposes of verifying that the resulting values
are correct, looking up 0x00 and 0x30 in a table is a lot better than
evaluating a recursive function.

Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats
directly (instead of only integral values that would be converted to
restricted float), we can use this function as a replacement for the
vector float src_reg/fs_reg constructors.

brw_float_to_vf() is not currently an inline function, so it will not be
evaluated at compile time. I'll address that in a follow-up patch.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-19 11:12:24 -08:00
Nanley Chery
e8c5ef3eca mesa: Add test for sorted extension table
Enable developers to know if the table's alphabetical sorting
is maintained or lost.

v2: Move "*" next to pointer name (Matt)
    Include extensions_table.h instead of extensions.h (Ian)
    Remove extra " *" in comment (Ian)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-19 11:12:45 -08:00
Nanley Chery
f030227f46 mesa/extensions: Sort the extension table alphabetically
Make it easier to determine where to add new extensions.
Performed with the vim sort command.

v2: Insert newline after last #define (Matt)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-19 10:35:20 -08:00
Ilia Mirkin
bcda79676a docs: GL3.1 for a3xx and a4xx
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 12:26:28 -05:00
Ryan Houdek
0ec218d167 mesa: enable EXT_blend_func_extended if the driver supports the ARB version
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
f7c23f225f mesa: allow MAX_DUAL_SOURCE_DRAW_BUFFERS to be available to ES
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
4b549f0d8c mesa: enable usage of blend_func_extended blend factors in GLES2
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
33ddc8e865 glsl: add a parse check to check for the index layout qualifier
This can only be used if EXT_blend_func_extended is enabled

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
ef9e6d1ec8 glsl: add GL_EXT_blend_func_extended preprocessor define
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
1d1d02f2ac glsl: add support for EXT_blend_func_extended builtins
gl_MaxDualSourceDrawBuffersEXT - Maximum dual-source draw buffers supported

For ESSL 1.0, it provides two builtins since you can't have user-defined
color output variables:
  gl_SecondaryFragColorEXT
  gl_SecondaryFragDataEXT[MaxDSDrawBuffers]

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
ceecb0876f glsl: add EXT_blend_func_extended parser enables
This adds a state for the maximum dual source draw variables available
and the variable for determining if the extension has been enabled
in the program shaders.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Ryan Houdek
625414f78c glapi: add EXT_blend_func_extended XML definitions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-19 11:39:51 -05:00
Brian Paul
15f8dc7b23 os: check for GALLIUM_PROCESS_NAME to override os_get_process_name()
Useful for debugging and for glretrace.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-19 09:23:04 -07:00
Connor Abbott
f1ba0a5ea0 glsl: fix ir_constant::equals() for doubles
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-11-19 09:16:18 +01:00
Connor Abbott
84ed3819a4 glsl: fix isinf() for doubles
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-11-19 09:16:18 +01:00
Connor Abbott
7820b2c071 nir: fix constant folding of bfi
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-11-19 09:16:18 +01:00
Brian Paul
1cfffb95eb hud: fix Windows build break
Protect signal-related code with PIPE_OS_UNIX test.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-19 07:57:09 +00:00
Ian Romanick
2f55476153 glsl: Fix off-by-one error in array size check assertion
Apparently, this has been a bug since 2010 (c30f6e5d).

Also use ARRAY_SIZE instead of open coding it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-11-18 18:35:56 -08:00
Ian Romanick
0aded03046 mesa: Don't expose GL_EXT_shader_integer_mix in GLES 1.x
There are no shaders, so it doesn't even make sense to expose the
extension.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Nanley Chery <nanley.g.chery@intel.com>
2015-11-18 18:35:56 -08:00
Ian Romanick
37c2cfa6bc glsl: Silence unused parameter warnings
builtin_functions.cpp:5289:52: warning: unused parameter 'num_arguments' [-Wunused-parameter]
                                           unsigned num_arguments,
                                                    ^
builtin_functions.cpp:5290:52: warning: unused parameter 'flags' [-Wunused-parameter]
                                           unsigned flags)
                                                    ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-18 18:35:56 -08:00
Ian Romanick
c82498c4da glsl: Silence ignored qualifier warning
I think the intention was to mark the "this" parameter as const, but
const goes on the other end to do that.

In file included from glsl_symbol_table.cpp:26:0:
ast.h:339:35: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
    const bool is_single_dimension()
                                   ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-18 18:35:56 -08:00
Kenneth Graunke
fc19a0d2e4 i965: Allow indirect GS input indexing in the scalar backend.
This allows arbitrary non-constant indices on GS input arrays,
both for the vertex index, and any array offsets beyond that.

All indirects are handled via the pull model.  We could potentially
handle indirect addressing of pushed data as well, but it would add
additional code complexity, and we usually have to pull inputs anyway
due to the sheer volume of input data.  Plus, marking pushed inputs
as live due to indirect addressing could exacerbate register pressure
problems pretty badly.  We'd need to be careful.

v2: Use updated MOV_INDIRECT opcode.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-18 15:42:36 -08:00
Jimmy Berry
09d610796c gallium/hud: document GALLIUM_HUD_PERIOD in envvars.html.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-11-19 00:02:34 +01:00
Jimmy Berry
56a1c10bb8 gallium/hud: control visibility at startup and runtime.
- env GALLIUM_HUD_VISIBLE: control default visibility
- env GALLIUM_HUD_SIGNAL_TOGGLE: toggle visibility via signal

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-11-19 00:02:33 +01:00
Jason Ekstrand
0bee3acc2a i965/nir: Add hooks for testing nir_shader_clone
This commit adds code for testing nir_shader_clone by running it after each
and every optimization pass and throwing away the old shader.  Testing
nir_shader_clone is hidden behind a new INTEL_CLONE_NIR environment
variable.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-18 12:28:55 -08:00
Jason Ekstrand
9fbd390dd4 nir: Add support for cloning shaders
This commit is heavily based on one by Rob Clark <robdclark@gmail.com> but
reworked to re-use nir_create functions and do less hashing.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 12:28:32 -08:00
Kenneth Graunke
9ff71b649b i965/nir: Validate that NIR passes call nir_metadata_preserve().
Failing to call nir_metadata_preserve() can have nasty consequences:
some pass breaks dominance information, but leaves it marked as valid,
causing some subsequent pass to go haywire and probably crash.

This pass adds a simple validation mechanism to ensure passes handle
this properly.  We add a new bogus metadata flag that isn't used for
anything in particular, set it before each pass, and ensure it *isn't*
still set after the pass.  nir_metadata_preserve will reset the flag,
so correct passes will work, and bad passes will assert fail.

(I would have made these functions static inline, but nir.h is included
in C++, so we can't bit-or enums without lots of casting...)

Thanks to Dylan Baker for the idea.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-18 12:28:32 -08:00
Kenneth Graunke
7bc0978999 i965/nir: Add OPT() and OPT_V() macros for invoking NIR passes.
OPT() is the normal macro for passes that return booleans, while OPT_V()
is a variant that works for passes that don't properly report progress.
(Such passes should be fixed to return a boolean, eventually.)

These macros take care of calling nir_validate_shader() and setting
progress appropriately.  In the future, it would be easy to add shader
dumping similar to INTEL_DEBUG=optimizer by extending the macro.

v2 (Jason Ekstrand):
 - Fix an unused variable warning

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-18 12:28:32 -08:00
Rob Clark
d27ae2cf8c nir: add array length field
This will simplify things somewhat in clone.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-18 12:28:32 -08:00
Rob Clark
624ec66653 nir: remove nir_variable::max_ifc_array_access
No users.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-18 12:28:32 -08:00
Rob Clark
4671c13852 freedreno/a4xx: add fake RGTC support (required for GL3)
The a4xx bits corresponding to 'freedreno/a3xx: add fake RGTC support
(required for GL3)'

TODO some more r/e.. maybe we get lucky and hw supports some of this
directly?  For now this will help us enable gl3.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Rob Clark
2379cc9fe0 freedreno/a4xx: add compressed texture formats
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Rob Clark
fadd39442b freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
4607b2b9b6 freedreno: expose GLSL 140 and fake MSAA for GL3.0/3.1 support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
9c409c8df3 freedreno/a3xx: fix texture buffers, enable offsets
The main issue is that the current logic looked into cso->u.tex, which
is the wrong side of the union to look into for texture buffers. While I
was at it, it was easy enough to add the logic to handle offsets
(first_element).

 - reduce texture buffer size limit (determined experimentally)
 - don't look at first/last levels, instead look at first/last element
 - include the first element offset
 - set offset alignment to 16 (determined experimentally)

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
d69e557f2a freedreno: add support for conditional rendering, required for GL3.0
A smarter implementation would make it possible to attach this to emit
state for the BY_REGION versions to avoid breaking the tiling. But this
is a start.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
059da344ec freedreno/a3xx: add fake RGTC support (required for GL3)
Also throw in LATC while we're at it (same exact format). This could be
made more efficient by keeping a shadow compressed texture to use for
returning at map time. However... it's not worth it for now...
presumably compressed textures are not updated often.

Lastly fix up Z32S8 transfers to non-0 layers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
84d087aea2 freedreno/a3xx: add missing formats to enable ARB_vertex_type_2_10_10_10_rev
The previously RE'd formats were from an ES driver implementing
OES_vertex_type_10_10_10_2 and thus backwards. A future change could add
the 2_10_10_10 support.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Rob Clark
8106fec74c freedreno/a3xx+a4xx: fix for stk binning pass hang
We'd end up in a state where shader uses no inputs, yet num_elements is
greater than zero.  Triggered by a TF vertex shader which did:

  gl_Position = vec4(0.0, 0.0, 0.0, 0.0);

resulting in a binning pass variant with no inputs.

Includes equiv fix in a4xx, even though we don't have binning-pass
enabled yet on a4xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Rob Clark
b24c9a8aee freedreno/a3xx+a4xx: fix GL_POINTS lockup w/ GLES
point_size_per_vertex is always TRUE for GLES, causing us to configure
the hw as if gl_PointSize was written, even if it was not.  Which makes
for grumpy hw.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Ilia Mirkin
b40e144a66 nir: fix typo in idiv lowering, causing large-udiv-udiv failures
In nv50, and in the python script that Rob circulated, we do:

   bld.mkCmp(OP_SET, CC_GE, TYPE_U32, (s = bld.getSSA()), TYPE_U32, m, b);

Do the same in the nir div lowering pass. This fixes the large-udiv-udiv
piglit tests on freedreno.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-18 14:31:13 -05:00
Oded Gabbay
4581f8428e llvmpipe: disable VSX in ppc due to LLVM PPC bug
This patch disables the use of VSX instructions, as they cause some
piglit tests to fail

For more details, see: https://llvm.org/bugs/show_bug.cgi?id=25503#c7

With this patch, ppc64le reaches parity with x86-64 as far as piglit test
suite is concerned.

v2:
- Added check that we have at least LLVM 3.4
- Added the LLVM bug URL as a comment in the code

v3:

- Only disable VSX if Altivec is supported, because if Altivec support
is missing, then VSX support doesn't exist anyway.

- Change original patch description.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-18 21:27:29 +02:00
Ilia Mirkin
8e68113c1a nvc0/ir: actually emit AFETCH on kepler
Looks like this was forgotten in the commit which added the AFETCH
logic.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-18 14:26:16 -05:00
Kenneth Graunke
2631bfd62c nir: Store the size of the TCS output patch in nir_shader_info.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-18 10:49:18 -08:00
Kenneth Graunke
b196f1fff3 i965: Add enums for 3DSTATE_TE field values.
3DSTATE_TE has partitioning, output topology, and domain fields,
each of which has several enumerated values.  We'll also need to
switch on the domain, so enums (rather than #defines) seem like a
natural fit.

I chose to put these in brw_compiler.h because they'll be stored
in struct brw_tes_prog_data, which will live there.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-18 10:49:18 -08:00
Ian Romanick
72e232374e meta/generate_mipmap: Don't leak the framebuffer object
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-18 09:38:21 -08:00
Brian Paul
1a48326a84 svga: use more VGPU10 formats
We always want to prefer the VGPU10 formats over the VGPU9 ones when
we have VGPU10 support.

Original patch by Jose and updated by Brian.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-18 09:16:12 -07:00
Brian Paul
1a90e3e1e3 svga: add/use new svga_sampler_format() function
This is important for the case of sampling from a depth texture.  In
that case, we need to sample the texture as if it were a single-channel
color texture.  For other/color formats, we can use the format as-is.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-18 09:15:54 -07:00
Nicolai Hähnle
27ce75ed12 radeon: count cs dwords separately for query begin and end
This will be important for perfcounter queries.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
ffd01b7781 radeon: expose r600_query_hw functions for reuse
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
50f0f938e3 radeon: implement r600_query_hw_get_result via function pointers
We will need the clear_result override for the batch query implementation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
c207c55fc0 radeon: split hw query buffer handling from cs emit
The idea here is that driver queries implemented outside of common code
will use the same query buffer handling with different logic for starting
and stopping the corresponding counters.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
1d10b3d01e radeon: convert hardware queries to the new style
Move r600_query and r600_query_hw into the header because we will want to
reuse the buffer handling and suspend/resume logic outside of the common
radeon code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
019106760d radeon: convert software queries to the new style
Software queries are all queries that do not require suspend/resume
and explicit handling of result buffers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
829a9808a9 radeon: add query handler function pointers
The goal here is to be able to move the implementation details of hardware-
specific queries (in particular, performance counters) out of the common code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
50cab4788d radeon: move R600_QUERY_* constants into a new query header file
More query-related structures will have to be moved into their own
header file to support hardware-specific performance counters.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
c56e83e518 radeon: cleanup driver query list
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
e117e74baf radeon: move get_driver_query_info to r600_query.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:11 +01:00
Neil Roberts
5dfb4dbc05 i965: Prevent fast clears for MSRTs on SKL
There are currently a bunch of formats that behave strangely when
sampling the cleared color from the MCS buffer on SKL. They seem to
mostly be formats that don't have an alpha component, although it's
not all of them, and we haven't yet found anything in the specs which
would explain this. For now to be on the safe side this patch just
prevents fast clears for MSRTs on SKL altogether so that when fast
clears are eventually enabled it will only be for single-sampled
surfaces. The assumption is that clears are probably more likely to be
used in single-sampled applications anyway so we can at least get them
working and we can enable MSRTs later once we understand the problem
better.

This patch should have no functional effect other than perhaps
receiving fewer perf_debug messages on SKL+.

v2: Improve the commit message to avoid saying the patch disables fast
    clears because it will be merged before fast clears are enabled
    for any surfaces so it doesn't actually disable anything.
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-18 10:29:07 +01:00
Eric Anholt
dd05ffebfc vc4: Don't bother lowering uniforms when the same value is used twice.
DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get
the absolute value would get lowered.  The lowering doesn't bother to try
to restrict the lifetime of the lowered uniforms, so we'd end up register
allocation failng due to this on 5 of the tests (More tests still fail in
RA, which look like we'll need to reduce lowered uniform lifetimes to
fix).

No changes on shader-db, though fewer extra MOVs are generated on even
glxgears (MOVs pair well enough that it ends up being the same instruction
count).
2015-11-17 17:45:23 -08:00
Eric Anholt
dffe7260cd vc4: Fix uniform reordering to support reading the same uniform twice.
This does actually happen in the wild (particularly fabs of a uniform), so
we'd like to support it.
2015-11-17 17:45:23 -08:00
Eric Anholt
d18d1ba587 vc4: Fix documentation on vc4_qir_lower_uniforms.c. 2015-11-17 17:45:23 -08:00
Eric Anholt
a4bf28178f vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.
It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-17 17:45:23 -08:00
Kenneth Graunke
27b1d34438 i965: Fix PIPE_CONTOL typo.
PIPE_CONTOL!!!
2015-11-17 16:33:48 -08:00
Ben Widawsky
c531d40927 i965: Add assertion for src_stencil payload size
This helps address a coverity warning and prevents future questions about this
code.

Reported-by: Coverity (via Ilia)
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-17 14:47:33 -08:00
Kenneth Graunke
2bec154b47 i965: Implement ARB_pipeline_statistics_query tessellation counters.
We basically just need to uncomment Ben's code.

v2: Fix obvious bugs caught by Ben.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-17 14:23:26 -08:00
Timothy Arceri
d4fbf11b58 glsl: rename location layout helper
Change name from validate -> apply to more accurately describe what
the function does.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-18 07:30:23 +11:00
Timothy Arceri
03bbddd139 glsl: don't validate binding when its not needed
Checking that the flag has been set is all the validation thats
needed here.

Also not calling the binding validation function will make things
much simpler when adding compile time constant support as we
won't need to resolve the binding value.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:30:19 +11:00
Timothy Arceri
4f4ca6b90a glsl: remove temp variable to make code easier to read
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:30:07 +11:00
Timothy Arceri
a01b8c7e77 glsl: cleanup and fix validate matrix function for arrays
Previously if the member was an array of matrices then a
warning message would be incorrectly given.

Also the struct case could never be met so it has been removed.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:58 +11:00
Timothy Arceri
f8b5cc827e glsl: use better location in struct and block error messages
Previously we only gave the location for some members and never
gave the variable location. In those cases we were just giving
the location of the struct/block.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:53 +11:00
Timothy Arceri
c54865db78 glsl: only do type and qualifier validation once per declaration
For struct and block members previously we were doing it for
every variable declaration.

So for example

struct S {
  atomic_uint x, y, z;
};

Would previously generate three error messages when one is sufficient.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:47 +11:00
Timothy Arceri
14d343b024 glsl: rename function that processes struct and iface members
As of the previous commit this function handles only struct/iface
members.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:37 +11:00
Timothy Arceri
8cf795dc7c glsl: move block validation outside function that validates members
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:32 +11:00
Timothy Arceri
649803742d glsl: move ast layout qualifier handling code into its own function
We now also only apply these rules to variables rather than also
trying to apply them to function params.

V2: move code for handling stream layout qualifier

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-18 07:29:25 +11:00
Kenneth Graunke
5b596f3878 i965: Add INTEL_DEBUG=shader_time support for tessellation shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:34:04 -08:00
Kenneth Graunke
df87cb837f i965: Add INTEL_DEBUG=tcs,tes and hs,ds flags for tessellation shaders.
Even though both tessellation shader stages must be used together, I
still think it makes sense to add separate debug flags for each stage.
It makes it possible to read the TCS/HS, rule out problems, then read
the TES/DS separately, without sifting through as much printed text.

I decided to add both the GL names (tcs/tes) and hardware names (hs/ds)
so they can be used interchangeably.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:33:54 -08:00
Kenneth Graunke
e9b0fa496c i965: Add more MAX_*_URB_ENTRY_SIZE_BYTES #defines.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2015-11-17 10:18:08 -08:00
Kenneth Graunke
874a1ed813 i965: Add missing stdio.h include to brw_compiler.h.
This is needed for the FILE * type in brw_print_vue_map().

Apparently, all files that include brw_compiler.h already pick this up
via some include chain, so this isn't actually a build fix.  However,
I have patches which introduce new consumers of brw_compiler.h that
fail to build because of the missing #include.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-17 10:18:08 -08:00
Martin Peres
4518eea065 egl: make it clear which platform x11 backend is being used (dri2 or 3)
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
fcdc798515 egl/x11_dri3: Implement EGL_KHR_image_pixmap
v2: from Martin Peres
 - Replace a tab with spaces

v3: from Martin Peres
 - disable EGL_KHR_image_pixmap when is_different_gpu is set (Axel Davy)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
bd6131a8d1 loader/dri3: Expose function to create __DRIimage from pixmap
Used to support EGL_KHR_image_pixmap.

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
f35198bade egl/x11: Implement dri3 support with loader's dri3 helper
v2: From Martin Peres
 - Tell we are compiling the dri3 backend in configure.ac
 - Update the Makefile.am
 - get rid of the LIBDRM_HAS_RENDERNODE_SUPPORT macro
 - fix some warnings related to EGLuint64KHR to int64_t conversions
 - use dri2_get_dri_config to get the __DRIconfig instead of open-coding it
 - replace the occasional tabs with spaces

v3: From Martin Peres
 - fix and indent problem (Matt Turner)
 - drop the authenticate function, use NULL in the vtable instead (Emil)
 - drop some useless includes (Emil Velikov)
 - mandate libdrm (Emil Velikov)
 - link to xcb-dri3 (Kristian Høgsberg)
 - convert to the new loader interface for drwable (Kristian)
 - remove some dead code after the dropping of some vfuncs (Kristian)
 - add a comment on the topic of rendering to the frontbuffer

v4: From Martin Peres
 - do not expose the preserved swap behavior (Acked by Eric Anholt)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
a25df54571 egl_dri2: Add a function to let platform code return dri drawable from _EGLSurface
dri3 for EGL will use different struct other than dri2_egl_surface for
an EGL surface, the common code only uses __DRIdrawable from that
struct, so instead of converting _EGLSurface to dri2_egl_surface, let
the platform code return the __DRIdrawable by its own (although the
current platforms use the same function).

v2: From Martin Peres
 - convert to the new drawable interface (Kristian)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
fdacbc439e glx/dri3: Convert to use dri3 helper in loader library
v2: From Martin Peres
 - convert to the new drawable interface
 - delete dead code after the dropping of some vfuncs
 - delete the width and height attributes since they are found in the helper

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Boyan Ding
6bd9ba7d07 loader: Add dri3 helper
v2: From Martin Peres
 - Try to fit in the 80-col limit as much as possible

v3: From Martin Peres
 - introduce loader_dri3_helper.la to avoid dragging the xcb dep everywhere (Kristian & Emil)
 - get rid of the width, height, dri_screen and is_different_gpu vfuncs (Kristian)
 - replace the create/destroy functions with init/fini for dri3 drawables
 - prefix static functions with dri3_ and exported ones with loader_dri3 (Emil)
 - keep the function definition consistent (Emil)

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-17 17:26:20 +02:00
Eduardo Lima Mitev
252b143e9e i965: Return the correct value type from brw_compile_gs()
brw_compile_gs() should return a pointer to unsigned, but it is returning the
bool 'false' at some point, hence annoying us with a compiler warning:

In function 'const unsigned int* brw::brw_compile_gs(const brw_compiler*,
   void*, void*, const brw_gs_prog_key*, brw_gs_prog_data*, const nir_shader*,
   gl_shader_program*, int, unsigned int*, char**)':

brw_vec4_gs_visitor.cpp:776:14: warning: converting 'false' to pointer type
                                'const unsigned int*' [-Wconversion-null]
                                return false;
                                       ^
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-17 12:50:09 +01:00
Samuel Iglesias Gonsálvez
dfa60e7057 glsl: copy each field's precision information in glsl_types's structure constructor
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
688b58c40c glsl: copy each field's precision information from the old gl_PerVertex interface block
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
cfe32cfa8e glsl: copy each field's precision information when generating varying variables
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
91eefe8505 glsl: initialize data.precision value in ir_variable constructor
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
58954e4daa glsl/nir: initialize precision field in glsl_struct_field constructor
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:42 +01:00
Samuel Iglesias Gonsálvez
a96afaced8 nir: reduce memory footprint of glsl_struct_field's precision
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-17 10:36:41 +01:00
Tapani Pälli
f4f30ad730 mesa: do runtime validation of precision varyings only on ES
Precision qualifier should be ignored on desktop OpenGL.

v2: include spec quote (Samuel)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-17 09:23:54 +02:00
Tapani Pälli
023fd58fd6 glsl: initialize precision when adding per vertex record fields
Fixes issues with tessellation builtin variables since precision was
introduced to IR with commit f84bc57d7d.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-17 07:37:13 +02:00
Kenneth Graunke
292df19401 i965: Set MaxCombinedUniformBlocks properly.
Up until now, we've been letting core Mesa initialize it to 36 for us
(which is presumably BRW_MAX_UBO (12) * (VS+GS+FS stages -> 3)).

With compute and tessellation, we need to increase this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-16 16:24:44 -08:00
Kenneth Graunke
5ee5dfddea i965: Clean up context constant initialization code.
This was getting pretty out of hand, and with compute partially in place
and tessellation on the way, it was only going to get worse.

This patch makes a "stage exists?" predicate and a "number of stages"
count and uses them to clean up a lot of calculations.  We can just
loop over shader stages and set things for the ones that exist.  For
combined counts, we can just multiply by the number of stages.

It also tries to organize a little bit.

We should probably use _mesa_has_geometry_shaders/tessellation/compute
here, but we can't because ctx->Version isn't initialized yet.  Perhaps
that could be fixed in the future.

No change in "glxinfo -l" on Broadwell.

v2: Drop stray compute shader hunk.  Mark stage_exists as const.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-16 16:24:44 -08:00
Kenneth Graunke
44d6c0c805 i965: Convert scalar_* flags to a scalar_stage array.
I was going to add scalar_tcs and scalar_tes flags, and then thought
better of it and decided to convert this to an array.  Simpler.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-16 16:24:44 -08:00
Roland Scheidegger
a2611ffe4b r200: fix bgrx8/xrgb8 blits
Since 779cabfc7d the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there).
This is untested but essentially addressing the same bug as for radeon.
(I don't think that the second entry per le/be table is actually necessary,
but shouldn't hurt...)

Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-17 01:04:09 +01:00
Roland Scheidegger
983614dbed radeon: fix bgrx8/xrgb8 blits
Since d21320f625 the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there). This caused lots of piglit regressions (and probably lots of
trouble outside piglit too).
This fixes bug https://bugs.freedesktop.org/show_bug.cgi?id=92900.

Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-17 01:01:38 +01:00
Ian Romanick
c40a88b6c5 meta/generate_mipmap: Only modify the draw framebuffer binding in fallback_required
Previously GL_FRAMEBUFFER was used.  However, if GL_EXT_framebuffer_blit
is supported (note: it is supported by every Mesa driver), this is
*sometimes* an alias for GL_DRAW_FRAMEBUFFER (getters) and *sometimes*
an alias for *both* GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER
(setters).  As a result, the code saved one binding but modified both.
If the bindings were different, the GL_READ_FRAMEBUFFER would be
incorrect on exit.

Fixes the piglit fbo-generatemipmap-versus-READ_FRAMEBUFFER test.

Ideally this function would use DSA functions and not modify the binding
at all.  However, that would be a much more intrusive change because
_mesa_meta_bind_fbo_image would also need to be modified.
_mesa_meta_bind_fbo_image has a lot of callers.  Much of this code is
about to get a major rework due to bug #92363, so I don't think it
matters too much.  In fact, I discovered this bug while working on the
other bug.  Le bon temps!

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-16 10:30:10 -08:00
Matt Turner
d564b5b58e nir/glsl: Fix copy-n-paste mistakes from commit 213f864.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-16 09:05:53 -08:00
Alex Deucher
00f554abba radeonsi: enable optimal raster config setting for fiji (v2)
Requires proper kernel tiling configuration so check the tiling
config registers.

v2: send the right version of the patch

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-16 10:09:47 -05:00
Alex Deucher
5b37d8b50c radeonsi: use proper GRBM_GFX_INDEX offset for CI+
The offset is different on CI and newer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-16 10:09:34 -05:00
Neil Roberts
2ca018cb65 docs: Add 16x MSAA on i965 to the release notes
Signed-off-by: Neil Roberts <neil@linux.intel.com>
2015-11-16 14:36:27 +01:00
Emil Velikov
1780a562bc nv50: add missing header into the sources list
Otherwise it won't end up in the tarball.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-16 10:49:14 +00:00
Juan A. Suarez Romero
40c2acef5c nir/glsl_to_nir: use _mesa_fls() to compute num_textures
Replace the current loop by a direct call to _mesa_fls() function.

It also fixes an implicit bug in the current code where num_textures
seems to be one value less than it should be when sh->Program->SamplersUsed > 0.

For instance, num_textures is 0 instead of 1 when
sh->Program->SamplersUsed is 1.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-16 09:24:28 +01:00
Iago Toral Quiroga
3f34afa0aa nir/copy_propagate: do not copy-propagate MOV srcs with source modifiers
If a source operand in a MOV has source modifiers, then we cannot
copy-propagate it from the parent instruction and remove the MOV.

v2: remove the check for source modifiers from is_move() (Jason)

v3: Put the check for source modifiers back into is_move() since
    this function is called from copy_prop_alu_src(). Add source
    modifiers checks to is_vec() instead.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-16 08:11:13 +01:00
Ilia Mirkin
ff17b3ccf4 nv50,nvc0: disable render condition around clear_* functions
Only the regular "clear" call is supposed to respect the render
condition. The rest should ignore it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 20:15:22 -05:00
Kenneth Graunke
d2f089ba17 i965: Introduce a MOV_INDIRECT opcode.
The geometry and tessellation control shader stages both read from
multiple URB entries (one per vertex).  The thread payload contains
several URB handles which reference these separate memory segments.

In GLSL, these inputs are represented as per-vertex arrays; the
outermost array index selects which vertex's inputs to read.  This
array index does not necessarily need to be constant.

To handle that, we need to use indirect addressing on GRFs to select
which of the thread payload registers has the appropriate URB handle.
(This is before we can even think about applying the pull model!)

This patch introduces a new opcode which performs a MOV from a
source using VxH indirect addressing (which allows each of the 8
SIMD channels to select distinct data.)

Based on a patch by Jason Ekstrand.

v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it
    a bit more generic.  Use regs_read() instead of hacking up the
    register allocator.  (Suggested by Jason Ekstrand.)

v3: Fix regs_read() to be more accurate for small unaligned regions.
    Also rebase on Matt's work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v3]
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> [v1]
2015-11-14 16:41:37 -08:00
Samuel Pitoiset
848fa3101d nv50: add support for performance metrics on G84+
Currently only one metric is exposed but more will be added later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:42:46 +01:00
Samuel Pitoiset
6a9c151dbb nv50: add compute-related MP perf counters on G84+
These compute-related MP performance counters have been reverse
engineered using CUPTI which is part of NVIDIA CUDA.

As for nvc0, we use a compute kernel to read out those performance
counters, and the command stream to configure them. Note that Tesla
only exposes 4 MP performance counters, while Fermi has 8.

Only G84+ is supported because G80 is an old and weird card.

Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64,
xonotic-glx, heaven and valley.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:42:42 +01:00
Samuel Pitoiset
ff72440b40 nv50: implement a basic compute support
This adds the ability to launch simple compute kernels like the one I
will use to read out MP performance counters in the upcoming patch.

This compute support is based on the work of Francisco Jerez (aka curro)
that he did as part of his EVoC project in 2011/2012 to get OpenCL
working on Tesla. His original work can be found here:
https://github.com/curro/mesa/commits/nv50-compute

I did some improvements on the original code, like fixing using both 3D
and COMPUTE simultaneously, improving global buffers binding, and making
the code closer to what nvc0 already does. This compute support has been
tested by Pierre Moreau and myself with some compute kernels. This is a
step towards OpenCL.

Speaking about this, it seems like compute programs overlap fragment
programs when they are used both. To fix this, we need to re-validate
fragment programs when binding compute programs and vice versa.

Note that, textures, samplers and surfaces still need to be implemented.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:42:15 +01:00
Samuel Pitoiset
7167a058ba nv50: free interpolation parameters in nv50_program_destroy()
As for nvc0, we need to free memory allocated by interpolation
parameters. This fixes a memory leak spotted by valgrind.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:16:12 +01:00
Samuel Pitoiset
69271bba06 nvc0: reduce the number of GPR used when reading MP perf counters
No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-14 17:38:57 +01:00
Ilia Mirkin
f94e1d9738 nouveau: don't expose HEVC decoding support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-14 10:32:10 -05:00
Vinson Lee
3a0fef0005 nir: Silence GCC maybe-uninitialized warnings.
nir/nir_control_flow.c: In function ‘split_block_cursor.isra.11’:
nir/nir_control_flow.c:460:15: warning: ‘after’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       *_after = after;
               ^
nir/nir_control_flow.c:458:16: warning: ‘before’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       *_before = before;
                ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-13 16:19:11 -08:00
Kenneth Graunke
5480bbd90e i965: Add a SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT opcode.
We need to use per-slot offsets when there's non-uniform indexing,
as each SIMD channel could have a different index.  We want to use
them for any non-constant index (even if uniform), as it lives in
the message header instead of the descriptor, allowing us to set
offsets in GRFs rather than immediates.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2015-11-13 16:11:02 -08:00
Kenneth Graunke
511de1a80c glsl: Allow implicit int -> uint conversions for the % operator.
GLSL 4.00 and GL_ARB_gpu_shader5 introduced a new int -> uint implicit
conversion rule and updated the rules for modulus to use them.  (In
earlier languages, none of the implicit conversion rules did anything
relevant, so there was no point in applying them.)

This allows expressions such as:

   int foo;
   uint bar;
   uint mod = foo % bar;

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-13 16:09:58 -08:00
Kenneth Graunke
a4ba476c30 i965: Print input/output VUE maps on INTEL_DEBUG=vs, gs.
I've been carrying around a patch to do this for the last few months,
and it's been exceedingly useful for debugging GS and tessellation
problems.  I've caught lots of bugs by inspecting the interface
expectations of two adjacent stages.

It's not that much spam, so I figure we may as well just print it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-11-13 16:08:51 -08:00
Kenneth Graunke
f88c175a29 i965: Make convert_attr_sources_to_hw_regs handle stride == 0.
This makes expressions like component(fs_reg(ATTR, n), 7) get a proper
<0,1,0> region instead of the invalid <0,8,0>.

Nobody uses this today, but I plan to.

v2: Rebase on Matt's changes; simplify.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
2015-11-13 15:17:58 -08:00
Kenneth Graunke
26f9469a46 nir: Add helpers for getting input/output intrinsic sources.
With the many variants of IO intrinsics, particular sources are often in
different locations.  It's convenient to say "give me the indirect
offset" or "give me the vertex index" and have it just work, without
having to think about exactly which kind of intrinsic you have.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:46 -08:00
Kenneth Graunke
d12bde0944 nir: Don't lower TCS outputs to temporaries.
We'd like to shadow these when possible, but the current code doesn't
work properly for TCS outputs.  For now, disable it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:46 -08:00
Kenneth Graunke
134728fdae nir: Allow outputs reads and add the relevant intrinsics.
Normally, we rely on nir_lower_outputs_to_temporaries to create shadow
variables for outputs, buffering the results and writing them all out
at the end of the program.  However, this is infeasible for tessellation
control shader outputs.

Tessellation control shaders can generate multiple output vertices, and
write per-vertex outputs.  These are arrays indexed by the vertex
number; each thread only writes one element, but can read any other
element - including those being concurrently written by other threads.
The barrier() intrinsic synchronizes between threads.

Even if we tried to shadow every output element (which is of dubious
value), we'd have to read updated values in at barrier() time, which
means we need to allow output reads.

Most stages should continue using nir_lower_outputs_to_temporaries(),
but in theory drivers could choose not to if they really wanted.

v2: Rebase to accomodate Jason's review feedback.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:41 -08:00
Kenneth Graunke
c51d7d5fe3 nir/lower_io: Introduce nir_store_per_vertex_output intrinsics.
Similar to nir_load_per_vertex_input, but for outputs.  This is not
useful in geometry shaders, but will be useful in tessellation shaders.

v2: Change stage_uses_per_vertex_outputs() to is_per_vertex_output(),
    taking a nir_variable (requested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:10 -08:00
Kenneth Graunke
0df452cd0d nir/lower_io: Use load_per_vertex_input intrinsics for TCS and TES.
Tessellation control shader inputs are an array indexed by the vertex
number, like geometry shader inputs.  There aren't per-patch TCS inputs.

Tessellation evaluation shaders have both per-vertex and per-patch
inputs.  Per-vertex inputs get the new intrinsics; per-patch inputs
continue to use the ordinary load_input intrinsics, as they already
work like we want them to.

v2: Change stage_uses_per_vertex_inputs into is_per_vertex_input(),
    which takes a variable (requested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 15:15:10 -08:00
Ian Romanick
1cb49eedb5 i965: Silence unused parameter warnings in get_buffer_rect
brw_meta_fast_clear.c: In function 'get_buffer_rect':
brw_meta_fast_clear.c:318:37: warning: unused parameter 'brw' [-Wunused-parameter]
 get_buffer_rect(struct brw_context *brw, struct gl_framebuffer *fb,
                                     ^
brw_meta_fast_clear.c:319:44: warning: unused parameter 'irb' [-Wunused-parameter]
                 struct intel_renderbuffer *irb, struct rect *rect)
                                            ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-13 12:29:57 -08:00
Ian Romanick
758f12fd98 meta/generate_mipmap: Don't leak the sampler object
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-13 12:29:56 -08:00
Matt Turner
7a879e422b i965: Remove unneeded #includes.
Some of these are no longer needed since all the backends switched to
NIR.
2015-11-13 12:16:48 -08:00
Matt Turner
386759b02d i965: Silence warning.
intel_asm_annotation.c: In function ‘annotation_insert_error’:
intel_asm_annotation.c:214:18:
warning: ‘ann’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
       ann->error = ralloc_strdup(annotation->mem_ctx, error);
                         ^

I initially tried changing the type of ann_count to unsigned (is
currently int), since that in addition to the check that it's non-zero
at the beginning of the function seems sufficient to prove that it must
be greater than zero. Unfortunately that wasn't sufficient.
2015-11-13 12:13:14 -08:00
Juha-Pekka Heikkila
8b145d6a3d i965: Don't write beyond allocated memory.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
2015-11-13 12:06:11 -08:00
Matt Turner
0eb3db117b i965: Use BRW_MRF_COMPR4 macro in more places.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
49b3215d70 i965: Combine register file field.
The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
b3315a6f56 i965: Replace HW_REG with ARF/FIXED_GRF.
HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.

Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.

After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
4b0fbebf02 i965/fs: Set stride correctly for immediates in fs_reg(brw_reg).
The fs_reg() constructors for immediates set stride to 0, except for
vector-immediates, which set stride to 1.  This patch makes the fs_reg
constructor that takes a brw_reg do likewise, so that stride is set
correctly for cases such as fs_reg(brw_imm_v(...)).

The generator asserts that this is true (and presumably it's useful in
some optimization passes?) and the VF fs_reg constructors did this (by
virtue of the fact that it doesn't override what init() does).

In the next commit, calling this constructor with brw_imm_* will generate
an IMM file register rather than a HW_REG, making this change necessary
to avoid breakage with existing uses of brw_imm_v().

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
b99e1fd547 i965/fs: Handle type-V immediates in brw_reg_from_fs_reg().
We use brw_imm_v() to produce type-V immediates, which generates a
brw_reg with fs_reg's .file set to HW_REG. The next commit will rid us
of HW_REGs, so we need to handle BRW_REGISTER_TYPE_V in the IMM case.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:51 -08:00
Matt Turner
b163aa0148 i965: Rename GRF to VGRF.
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.

Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
5a23b31c75 i965: Move BAD_FILE from the beginning of enum register_file.
I'm going to begin using brw_reg's file field in backend_reg and its
derivatives, and in order to keep the hardware value for ARF as 0, we
have to do something different.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
dba309fc14 i965: Initialize registers.
The test (file == BAD_FILE) works on registers for which the constructor
has not run because BAD_FILE is zero.  The next commit will move
BAD_FILE in the enum so that it's no longer zero.

In the case of this->outputs, the constructor was being run implicitly,
and we were unnecessarily memsetting is to zero.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
7638e75cf9 i965: Use brw_reg's nr field to store register number.
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.

Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
3048053908 i965: Unwrap some lines.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
58fa9d47b5 i965/vec4: Remove swizzle/writemask fields from src/dst_reg.
Also allows us to handle HW_REGs in the swizzle() and writemask()
functions.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
94b1031703 i965: Remove fixed_hw_reg field from backend_reg.
Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
1392e45bfb i965: Use immediate storage in inherited brw_reg.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
d74dd703f8 i965: Add and use enum brw_reg_file.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
977df90d65 i965: Reorganize brw_reg fields.
Put fields that are meaningless with an immediate in the same storage
with the immediate. This leaves fields type, file, nr, subnr in the
first dword where there's now extra room for expansion.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
e42fb0c2a6 i965: Make 'dw1' and 'bits' unnamed structures in brw_reg.
Generated by

   sed -i -e 's/\.bits\././g' *.c *.h *.cpp
   sed -i -e 's/dw1\.//g' *.c *.h *.cpp

and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.

There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.

This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")

See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
182f137521 i965: Delete type field from backend_reg.
Switching from an implicitly-sized type field to field with an explicit
bit width is safe because we have fewer than 2^4 types, and gcc will
warn if you attempt to set a value that will not fit.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
433df2e03c i965: Delete abs/negate fields from backend_reg.
Instead use the ones provided by brw_reg. Also allows us to handle
HW_REGs in the negate() functions.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
c7ed5d1d1c i965: Make backend_reg inherit from brw_reg.
Some fields (file, type, abs, negate) in brw_reg are shadowed by
backend_reg.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner
88f349c4e1 i965/fs: Replace nested ternary with if ladder.
Since the types of the expression were

   bool ? src_reg : (bool ? brw_reg : brw_reg)

the result of the second (nested) ternary would be implicitly
converted to a src_reg by the src_reg(struct brw_reg) constructor. I.e.,

   bool ? src_reg : src_reg(bool ? brw_reg : brw_reg)

In the next patch, I make backend_reg (the parent of src_reg) inherit
from brw_reg, which changes this expression to return brw_reg, which
throws away any fields that exist in the classes derived from brw_reg.
I.e.,

   src_reg(bool ? brw_reg(src_reg) : bool ? brw_reg : brw_reg)

Generally this code was gross, and wasn't actually shorter or easier to
read than an if ladder.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-11-13 11:27:50 -08:00
Marek Olšák
3694d58e6c radeonsi: remove dead code after ES-GS linkage change
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
d79a3449a7 radeonsi: link ES-GS just like LS-HS
This reduces the shader key for ES.

Use a fixed attrib location based on (semantic name,  index).

The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
b1c5f3faa9 radeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.

There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.

This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
2f5d911ba2 radeonsi: rename si_update_gs_rings
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
4acd856088 radeonsi: calculate ESGS_RING_ITEMSIZE in create_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
a0cf589961 radeonsi: move maximum gs stream calculation into create_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
3ab0c49f04 radeonsi: clean up small duplication in si_shader_gs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
eb0d3e8a90 gallium/radeon: shorten render_cond variable names
and ..._cond -> ..._invert

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
70c40cc989 gallium/radeon: remove predicate_drawing flag
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
12596cfd4c gallium/radeon: atomize render condition (SET_PREDICATION)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
3521907622 gallium/radeon: simplify restoring render condition after flush
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
600e212d87 gallium/radeon: don't use PREDICATION_OP_CLEAR
Not setting the predication bit is sufficient.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
6eff5415e4 gallium/radeon: simplify disabling render condition for u_blitter
just disable it by not setting the predication bit

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
8dd1ee6ff3 r600g: don't set predication on non-draw packets
This has no effect.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
6cc8f6c6a7 gallium/radeon: inline the r600_rings structure
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
3d963abc81 radeonsi: prevent recursion in si_context_gfx_flush
The recursion can only occur if you modify need_cs_space to always flush.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
8569f9a87e gallium/radeon: remove the IB flushing flag
Not needed anymore. A similar flag will be introduced in the next commit,
which will be private in radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
81d412e02c gallium/radeon: move GFX/DMA flushing from add_to_buffer_list to need_cs_space
need_cs_space isn't invoked so often and is called before all commands too.
This is a lot cleaner. The code in radeon_add_to_buffer_list always seemed
dodgy to me.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
c6012a6650 radeonsi: rename cache flushing flags once more
KCACHE, TC L1 and TC L2 are renamed to:
- SMEM L1
- VMEM L1
- GLOBAL L2

You can easily tell what they are used for now.
Shaders must deal with coherency issues between both L1s manually,
e.g. by setting GLC=1 or by using s_dcache_*.

BOTH_ICACHE_KCACHE was an unused definition.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
10130ccd8c radeonsi: set the DISABLE_WR_CONFIRM flag on CI-VI as well
I missed this in commit c3e527f93d
    radeonsi: only enable write confirmation on the last CP DMA packet

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
40912dd91e radeonsi: initialize SX_PS_DOWNCONVERT to 0 on Stoney
otherwise the SX or CB blocks can go bananas

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-13 19:54:41 +01:00
Marek Olšák
f7757100f2 radeonsi: add glClearBufferSubData acceleration
8-bit and 16-bit clears which are not aligned to dwords are done in software.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
19773f9805 radeonsi: add SI_SAVE_FRAGMENT_STATE blitter flag
Buffer clears via transform feedback won't set this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
19a9c1ecc7 gallium/u_blitter: add support for multi-dword clear values in clear_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
e15c5c7a06 radeonsi: fix a future crash in emit_cb_target_mask
This can't crash currently, but it would crash if clear_buffer
from u_blitter were used with a clean context.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
65d0c558d5 radeonsi: fix unaligned clear_buffer fallback
This is unreachable currently, but it will be used by unaligned 8-bit and
16-bit fills.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:40 +01:00
Marek Olšák
7f1e34e6c8 r600g: fix clear_buffer fallback with offset != 0
Discovered by luck. This code path hasn't been exercised since transform
feedback was implemented.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:40 +01:00
Marek Olšák
01526136ba gallium/radeon: fix PIPE_QUERY_GPU_FINISHED
Broken by the addition of r600_multi_fence
in 3b37155a68

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89014

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-13 19:54:40 +01:00
Brian Paul
40663864d2 mesa: minor comment fix in blend.c 2015-11-13 08:02:19 -07:00
Brian Paul
5a5efbf804 docs: add link to Coverity on developer utilities page
Signed-off-by: Brian Paul <brianp@vmware.com>
2015-11-13 08:02:19 -07:00
Brian Paul
00046393f8 docs: update VMware driver instructions
Use a LIBDIR variable, set per-platform.
Update the Mesa configuration flags.
Run update-initramfs or dracut, update /etc/modules

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-11-13 08:02:19 -07:00
Daniel Stone
d1314de293 egl/wayland: Ignore rects from SwapBuffersWithDamage
eglSwapBuffersWithDamage accepts damage-region rectangles to hint the
compositor that it only needs to redraw certain areas, which was passed
through the wl_surface_damage request, as designed.

Wayland also offers a buffer transformation interface, e.g. to allow
users to render pre-rotated buffers. Unfortunately, there is no way to
query buffer transforms, and the damage region was provided in surface,
rather than buffer, co-ordinate space.

Users could in theory account for this themselves, but EGL also requires
co-ordinates to be passed in GL/mathematical co-ordinate space, with an
inversion to Wayland's natural/scanout co-ordinate space, so
transformations other than a 180-degree rotation will fail as EGL
attempts to subtract the region from (its view of the) surface height.

Pending creation and acceptance of a wl_surface.buffer_damage request,
which will accept co-ordinates in buffer co-ordinate space, pessimise to
always sending full-surface damage.

bce64c6c provides the explanation for why we send maximum-range damage,
rather than the full size of the surface: in the presence of buffer
transformations, full-surface damage may not actually cover the entire
surface.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 10:09:23 +00:00
Iago Toral Quiroga
a29d922c1a Revert "nir/copy_propagate: do not copy-propagate MOV srcs with source modifiers"
The change proposed in the review leads to piglit regressions because
is_move() is used in other places and relies on the checks for source
modifiers to be there.

Revert this until we agree on a better solution.
2015-11-13 08:53:10 +01:00
Samuel Iglesias Gonsálvez
5f004fd197 glsl: fix 'shared' layout qualifier related regressions
Commit 8b28b35 added 'shared' as a keyword for compute shaders
but it broke the existing 'shared' layout qualifier support for
uniform and shader storage blocks.

This patch fixes 578 dEQP-GLES31.functional.ssbo.* tests.

v2:
- Move SHARED to interface_block_layout_qualifier (Timothy)
- Don't remove "shared" case insensitive check (Timothy)
- Remove the clearing of shared_storage flag (Timothy)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-13 08:04:49 +01:00
Iago Toral Quiroga
8610cd6b8c nir/copy_propagate: do not copy-propagate MOV srcs with source modifiers
If a source operand in a MOV has source modifiers, then we cannot
copy-propagate it from the parent instruction and remove the MOV.

v2: remove the check for source source modifiers from is_move() (Jason)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-13 07:54:33 +01:00
Jason Ekstrand
5f43e074d4 nir/vars_to_ssa: Delete dead output set code
This was a remnant of an early attempt to handle output reads in
vars_to_ssa.  That attempt was abandon a long time ago but these few lines
were aparently left in the pass and managed to evade review.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-12 22:08:43 -08:00
Jason Ekstrand
226ba889a0 nir/vars_to_ssa: Rework copy set handling in lower_copies_to_load_store
Previously, we walked through a given deref_node's copies and, after
lowering the copy away, removed it from both the source and destination
copy sets.  This commit changes this to only remove it from the other
node's copy set (not the one we're lowering).  At the end of the loop, we
just throw away the copy set for the node we're lowering since that node no
longer has any copies.  This has two advantages:

 1) It's more efficient because we're doing potentially half as many set
    search operations.

 2) It now properly handles copies from a node to itself.  Perviously, it
    would delete the copy from the set when processing the destinatioon and
    then assert-fail when we couldn't find it for the source.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92588
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-12 22:08:43 -08:00
Jason Ekstrand
4bbf2ac06e nir/validate: Allow subroutine types for the tails of derefs
The shader-subroutine code creates uniforms of type SUBROUTINE for
subroutines that are then read as integers in the backends.  If we ever
want to do any optimizations on these, we'll need to come up with a better
plan where they are actual scalars or something, but this works for now.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92859
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-12 22:08:43 -08:00
Nanley Chery
79f68306d2 mesa: Replace gl_extensions::EXT_texture3D with ::dummy_true
Mesa unconditionally sets this driver flag to true in
_mesa_init_extensions(). There is therefore no need for
the driver to communicate support for this extension.
Replace the driver capability flag with ::dummy_true.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 21:31:05 -08:00
Brian Paul
2de2e1702b mesa: fix MSVC build break in extensions.h
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-12 16:57:18 -07:00
Ilia Mirkin
39f51ec96f nvc0/ir: add support for TGSI_SEMANTIC_HELPER_INVOCATION
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-12 17:58:42 -05:00
Ilia Mirkin
e3d9dbe304 gallium: add support for gl_HelperInvocation semantic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-11-12 17:58:23 -05:00
Ilia Mirkin
20748318c5 glsl: add gl_HelperInvocation system value
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-12 17:58:23 -05:00
Jordan Justen
b52cb9ec6a glsl: Correctly handle vector extract on function parameter
This commit accidentally used a '==' when '=' was intended.

commit 96b22fb080
Author: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Date:   Wed Nov 4 14:58:54 2015 -0800

    glsl: Use array deref for access to vector components

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-12 14:11:16 -08:00
Nanley Chery
a16ffb743c mesa: In helpers, only check driver capability for meta
Make API context and version checks done by the helper functions pass
unconditionally while meta is in progress. This transparently makes
extension checks solely dependent on struct gl_extensions while in meta.

v2: Use an 8-bit data type instead of a GLuint

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
5645770742 mesa/extensions: Prefix global struct and extension type
Rename the following types and variables:
* struct extension -> struct mesa_extension,
  like the mesa_format type.
* extension_table -> _mesa_extension_table,
  like the _mesa_extension_override_{enables,disables} structs.

Suggested-by: Marek Olšák <marek.olsak@amd.com>
Suggested-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
ab129a44ae mesa: Generate a helper function for each extension
Generate functions which determine if an extension is supported in the
current context. Initially, enums were going to be explicitly used with
_mesa_extension_supported(). The idea to embed the function and enums
into generated helper functions was suggested by Kristian Høgsberg.

For performance, the function body no longer uses
_mesa_extension_supported() and, as suggested by Chad Versace, the
functions are also declared static inline.

v2: Place function qualifiers on separate line (Chad)
v3: Move function curly brace to new line (Chad)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
eda15abd84 mesa/extensions: Replace extension::api_set with ::version
The api_set field has no users outside of _mesa_extension_supported().
Remove it and allow the version field to take its place.

The brunt of the transformation was performed with the following vim commands:
s/\(GL [^,]\+\),\s*\d*,\s*\d*\(,\s*\d*\)\(,\s*\d*\)/\1, GLL, GLC\2\3/g
s/\(GLL [^,]\+\)\,\s*\d*/\1, GLL/g
s/\(GLC [^,]\+\)\(,\s*\d*\),\s*\d*\(,\s*\d*\)\(,\s*\d*\)/\1\2, GLC\3\4/g
s/\( ES1[^,]*\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\),\s*\d*/\1\2\4, ES1/g
s/\( ES2[^,]*\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\),\s*\d*/\1\2\4\6, ES2/g

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
a82bc779af mesa/extensions: Use _mesa_extension_supported()
Replace open-coded checks for extension support with
_mesa_extension_supported().

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
f6a818e76d mesa/extensions: Create _mesa_extension_supported()
Create a function which determines if an extension is supported in the
current context.

v2: Use common variable names (Emil)
    Insert new line between variables and return statement (Chad)
    Rename api_set variable to api_bit (Chad)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
f47df8f729 mesa/extensions: Add extension::version
Enable limiting advertised extension support by context version with
finer granularity. This new field is currently unused and is set to
0 everywhere. When it is used, a value of 0 will indicate that the
extension is supported for any version of a context.

v2: Use uint*t type for version and note the expected values (Emil)
    Use an 8-bit data type
    Reformat macro for better readability (Chad)

v3: Note preparatory nature of commit (Chad)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
8bd82a91c0 mesa/extensions: Move entries entries to separate file
With this infrastructure set in place, we can now reuse the entries to
generate useful code.

v2: Add the new file into Makefile.sources (Emil)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
c0b568f3db mesa/extensions: Wrap array entries in macros
Now that we're using macros, remove the redundant text from each entry.

Remove comments between the entries to make editing easier and separate
the sections with blank lines. Structure the EXT macros in a way that
helps reviewers verify that no meaning has been altered.

v2: Indent the entries (Chad)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Nanley Chery
e5af09f9ba mesa/extensions: Remove array sentinel
Simplify future updates to the extension struct array by removing
the sentinel.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-11-12 13:10:37 -08:00
Matt Turner
74e48e9544 i965: Check instructions appear only on supported hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:06:08 -08:00
Matt Turner
0b45d47f71 i965: Add initial assembly validation pass.
Initially just checks that sources are non-NULL, which would have
alerted us to the problem fixed by commit 6c846dc5.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:06:04 -08:00
Matt Turner
34ed45557e i965: Add annotation_insert_error() and support for printing errors.
Will allow annotations to contain error messages (indicating an
instruction violates a rule for instance) that are printed after the
disassembly of the block.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
a280e83d71 i965: Combine assembly annotations if possible.
Often annotations are identical between sets of consecutive
instructions. We can perhaps avoid some memory allocations by reusing
the previous annotation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
93e371c140 i965: Set annotation_info's mem_ctx.
It was being memset to 0 previously.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-11-12 11:00:10 -08:00
Matt Turner
9ab45b4df9 i965: Don't consider control flow instructions to have sources.
And why did IFF have a destination?

I suspect that once upon a time the disassembler used this information
to know which fields to find the jump targets in. The jump targets have
moved, so the disassembler has to know how to handle these
per-generation anyway.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
0865e743c1 i965: Fill out instruction list.
Add some instructions: illegal, movi, sends, sendsc.

Remove some instructions with reused opcodes: msave, mrestore, push,
pop, goto. I did have some gross code for disassembling opcodes
per-generation, but there's very little meaningful overlap so it's
probably not needed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
238877207e ralloc: Set *start in ralloc_vasprintf_rewrite_tail() if str is NULL.
We were leaving it undefined, even though we were writing a string to
*str.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-12 11:00:10 -08:00
Matt Turner
903050694b i965: Consolidate is_3src() functions.
Otherwise I'll have to add another later in this series.
2015-11-12 11:00:10 -08:00
Brian Paul
3e74038280 st/wgl: add a comment about recursive locking in stw_make_current()
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
f45b644e11 st/wgl: add a lock assertion in stw_framebuffer_from_hwnd_locked()
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
José Fonseca
a1c9feafd5 st/wgl: add some mutex checking code
This would have caught the locking bug that was fixed in the earlier
"st/wgl: fix locking issue in stw_st_framebuffer_present_locked()"
patch.

v2: minor coding style changes by Brian.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
166769fe4b st/wgl: rename stw_framebuffer_release() to stw_framebuffer_unlock()
To match the new stw_framebuffer_lock() function.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
dabc423ed0 st/wgl: reimplement stw_framebuffer::mutex with CRITICAL_SECTION
v2: update comments on the stw_framebuffer::mutex field regarding locking
order.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
f71508ae79 st/wgl: include u_debug.h
To get declaration for debug_printf() directly instead of getting it
indirectly through os_thread.h

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
fce68832c5 st/wgl: reimplement stw_device::fb_mutex with CRITICAL_SECTION
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:25 -07:00
Brian Paul
fa30de7643 st/wgl: re-implement stw_device::ctx_mutex with CRITICAL_SECTION
This is Windows-only code so we can use the native Win32 functions for
critical sections.  This will also allow us to (cleanly) add some mutex
check/debug code in subsequent patches.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-12 11:21:24 -07:00
Brian Paul
a02385cd69 gallium/hud: add cpu graph support for Windows
We support "cpu" but not "cpu#" because there's no good way of querying
per-cpu usage.  Also, the cpu usage is for the process, not the whole
system.

Original code cobbled together by Brian and then fixed/polished by Jose.

Signed-off-by: Brian Paul <brianp@vmware.com>
2015-11-12 09:11:15 -07:00
Tapani Pälli
f2fe607261 glsl: set matrix_stride for non matrices with atomic counter buffers
Patch sets matrix_stride as 0 for non matrix uniforms that are in a
atomic counter buffer. Matrix stride calculation for actual matrix
uniforms is done during link_assign_uniform_locations.

From ARB_program_interface_query specification:

GL_MATRIX_STRIDE:

   "For active variables not declared as a matrix or array of matrices,
   zero is written to <params>.  For active variables not backed by a
   buffer object, -1 is written to <params>, regardless of the variable
   type."

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-12 14:15:29 +02:00
Tapani Pälli
7e6dac1186 mesa: validate precision of varyings during ValidateProgramPipeline
Fixes following failing ES3.1 CTS tests:

   ES31-CTS.sepshaderobjs.InterfacePrecisionMatchingFloat
   ES31-CTS.sepshaderobjs.InterfacePrecisionMatchingInt
   ES31-CTS.sepshaderobjs.InterfacePrecisionMatchingUInt

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-12 09:50:14 +02:00
Tapani Pälli
5bd122cad9 glsl: do not lose precision information when packing varyings
This information will be used by cross stage validation of varyings
for pipeline objects.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-12 09:50:14 +02:00
Iago Toral Quiroga
f84bc57d7d glsl: Add precision information to ir_variable
We will need this later on when we implement proper support for
precision qualifiers in the drivers and also to do link time checks for
uniforms as indicated by the spec.

This patch also adds compile-time checks for variables without precision
information (currently, Mesa only checks that a default precision is set
for floats in fragment shaders).

As indicated by Ian, the addition of the precision information to
ir_variable has been done using a bitfield and pahole to identify an
available hole so that memory requirements for ir_variable stay the
same.

v2 (Ian):
  - Avoid if-ladders by defining arrays of supported sampler names and
    indexing
    into them with type->sampler_array + 2 * type->sampler_shadow
  - Make the code that selects the precision qualifier to use an utility
    function
  - Fix a typo

v3 (Tapani):
  - rebased
  - squashed in "Precision qualifiers are not allowed on structs"
  - fixed select_gles_precision for sampler arrays
  - fixed precision_qualifier_allowed for arrays of structs

v4 (Tapani):
  - add atomic_uint handling
  - do not allow precision qualifier on images
  (issues reported by Marta)

v5 (Tapani):
  - support precision qualifier on image types

v6 (Tapani):
  - set precision qualifier on interface block members

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-12 09:50:13 +02:00
Iago Toral Quiroga
9a00e1a69d glsl: Move the definition of precision_qualifier_allowed
We will need this to build later patches

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-12 09:50:13 +02:00
Iago Toral Quiroga
e6629d814f glsl: Add user-defined default precision qualifiers to the symbol table
Notice that the spec requires that a default precision has been set for every
type used by a shader that can use a precision qualifier and does not have a
predefined precision, however, at the moment, Mesa only checks this for floats
in the fragment shader. This is probably because the GLSL ES 1.0 specs mentions
this case specifically, but GLSL ES 3.0 clarifies that the same applies to
other types:

"The fragment language has no default precision qualifier for floating point
 types. Hence for float, floating point vector and matrix variable
 declarations, either the declaration must include a precision qualifier or
 the default float precision must have been previously declared. Similarly,
 there is no default precision qualifier for the following sampler types in
 either the vertex or fragment language:

 sampler3D;
 samplerCubeShadow;
 sampler2DShadow;
 sampler2DArray;
 sampler2DArrayShadow;
 isampler2D;
 isampler3D;
 isamplerCube;
 isampler2DArray;
 usampler2D;
 usampler3D;
 usamplerCube;
 usampler2DArray;"

we will fix this in a later patch.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-12 09:50:13 +02:00
Iago Toral Quiroga
e3082fb273 glsl: Add default precision qualifiers to the symbol table
The GLSL ES spec specifies default precision qualifiers for certain types,
so populate the symbol table with these.

Notice that the desktop GLSL spec also indicates defaults for some types
but this is not really useful since precision qualifiers are completely
ignored in desktop GLSL.

v2: simplify and add samplerExternalOES, specified by
    OES_EGL_image_external (Tapani)

v3: add atomic_uint (reported missing by Marta)

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-12 09:50:13 +02:00
Iago Toral Quiroga
d6a6167354 glsl: Add API to put default precision qualifiers in the symbol table
These have scoping rules that match the ones defined for other things such
as variables, so we want them in the symbol table.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-12 09:50:13 +02:00
Samuel Iglesias Gonsálvez
d4fdb84f80 i965/fs/nir: fix the number of register written by FS_OPCODE_GET_BUFFER_SIZE
FS_OPCODE_GET_BUFFER_SIZE is calculated with a resinfo's sampler message.

This patch adjusts the number of registers written by the opcode
following what the PRM spec says about the number of registers written
by the SIMD8 and SIMD16's writeback messages for sampler messages.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-12 08:39:14 +01:00
Ben Widawsky
55314c5be4 i965/skl/gt4: Fix URB programming restriction.
The comment in the code details the restriction. Thanks to Ken for having a very
helpful conversation with me, and spotting the blurb in the link I sent him :P.

There are still stability problems for me on GT4, but this definitely helps with
some of the failures.

v2: Comment fixes

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-11 18:13:19 -08:00
Ilia Mirkin
c4182bb9b0 nv50,nvc0: add ARB_clear_texture support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-11 19:20:41 -05:00
Ilia Mirkin
ae39b0fda8 st/mesa: implement ARB_clear_texture
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-11 19:20:41 -05:00
Ilia Mirkin
3695b253f9 gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototype
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-11 19:20:41 -05:00
Timothy Arceri
725fcdfbb1 glsl: add helper to check for enhanced layouts support
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-12 10:18:14 +11:00
Timothy Arceri
82e4f22d1e mesa: add ARB_enhanced_layouts
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
2015-11-12 10:18:08 +11:00
Dave Airlie
df8af7d751 r600: initialised PGM_RESOURCES_2 for ES/GS
This fixes the corruption on rendering that we are seeing in
certain geometry shaders.

Fixes:  https://bugs.freedesktop.org/show_bug.cgi?id=91780
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested / Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-12 09:03:13 +10:00
Kenneth Graunke
918bda23dd i965: Split nir_emit_intrinsic by stage with a general fallback.
Many intrinsics only apply to a particular stage (such as discard).
In other cases, we may want to interpret them differently based on
the stage (such as load_primitive_id or load_input).

The current method isn't that pretty - we handle all intrinsics in
one giant function.  Sometimes we assert on stage, sometimes we forget.
Different behaviors are handled via if-ladders based on stage.

This commit introduces new nir_emit_<stage>_intrinsic() functions,
and makes nir_emit_instr() call those.  In turn, those fall back to
the generic nir_emit_intrinsic() function for cases they don't want
to handle specially.

This makes it clear which intrinsics only exist in one stage, and makes
it easy to handle inputs/outputs differently for various stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-11 11:57:37 -08:00
Ilia Mirkin
912babba7b mesa/copyimage: allow width/height to not be multiples of block
For compressed textures, the image size is not necessarily a multiple of
the block size (e.g. the last mip levels). Section 18.3.2 (Copying
Between Images) of the OpenGL 4.5 Core Profile spec says:

    An INVALID_VALUE error is generated if the dimensions of either
    subregion exceeds the boundaries of the corresponding image
    object, or if the image format is compressed and the dimensions of
    the subregion fail to meet the alignment constraints of the
    format.

and Section 8.7 (Compressed Texture Images) says:

    An INVALID_OPERATION error is generated if any of the following
    conditions occurs:

      * width is not a multiple of four, and width + xoffset is not
        equal to the value of TEXTURE_WIDTH.
      * height is not a multiple of four, and height + yoffset is not
        equal to the value of TEXTURE_HEIGHT.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92860
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-11 14:37:55 -05:00
Jason Ekstrand
80890eb0d3 i965/brw_reg: Add a brw_VxH_indirect helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-11 10:52:30 -08:00
Brian Paul
68993f77cd mesa: remove old comments in arrayobj.c 2015-11-11 09:38:22 -07:00
Brian Paul
9870a5c6c9 st/wgl: clarify code in stw_framebuffer_from_hwnd_locked()
Just a minor code change to make it obvious that NULL is returned when
we don't find the given HWND.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-11 09:38:22 -07:00
Brian Paul
004ed6f4a9 st/wgl: improve some function comments
In particular, explain when stw_framebuffer objects are
locked/unlocked/etc.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-11 09:38:22 -07:00
Brian Paul
b93cb6c1dc st/wgl: whitespace/formatting fixes 2015-11-11 09:38:22 -07:00
Brian Paul
eb812921ac st/wgl: fix locking issue in stw_st_framebuffer_present_locked()
When stw_st_framebuffer_present_locked() is called, the
stw_framebuffer's mutex will already be locked.  Normally, the
stw_framebuffer_present_locked() function calls
stw_framebuffer_release() to unlock the mutex when it's done.  But if
for some reason the 'resource' pointer in
stw_st_framebuffer_present_locked() is null, we'd return without
unlocking the stw_framebuffer.  This fixes that to avoid potential
deadlocks.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-11 09:38:22 -07:00
Kenneth Graunke
e42a29531a i965: Print force_writemask_all in dump_instructions().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-11 08:35:15 -08:00
Kenneth Graunke
ecb5e0a986 i965: Combine BRW_NEW_*_BINDING_TABLE dirty bits.
A while back, we moved to directly emitting the Gen7+ state when
constructing the binding tables.  These flags are only used on
Gen4-6, which emit all the binding table pointers at once.

We gain nothing by having separate flags, so combine them.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-11-11 08:33:58 -08:00
Kenneth Graunke
a2987ff57f i965: Map GL_PATCHES to 3DPRIM_PATCHLIST_n.
Inspired by a patch by Fabian Bieler.

Fabian defined a _3DPRIM_PATCHLIST_0 macro (which isn't actually a valid
topology type); I instead chose to make a macro that takes an argument.
He also took the number of patch vertices from _mesa_prim (which was set
to ctx->TessCtrlProgram.patch_vertices) - I chose to use it directly to
avoid the need for the VBO patch.

v2: Change macro to 0x20 + (n - 1) instead of 0x1F + n to better match
    the documentation (suggested by Ian).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-11 08:33:48 -08:00
Emil Velikov
cbb7d90e57 docs: add news item and link release notes for 11.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-11 11:18:32 +00:00
Emil Velikov
6435d8ac5a docs: add sha256 checksums for 11.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 66c949d0a1)
2015-11-11 11:16:43 +00:00
Emil Velikov
07948b03fb docs: add release notes for 11.0.5
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ee57c22141)
2015-11-11 11:16:42 +00:00
Glenn Kennard
3f45d29fe4 r600g: Pass conservative depth parameters to hw
Supported on R700 and up.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-11 09:06:25 +10:00
Dave Airlie
b3e793f2db Revert "r600g: Pass conservative depth parameters to hw"
This reverts commit a1fc78911e.

I pushed the wrong patch.
2015-11-11 09:05:50 +10:00
Glenn Kennard
c878d61124 r600g: Implement ARB_texture_view
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-11 08:36:08 +10:00
Glenn Kennard
a1fc78911e r600g: Pass conservative depth parameters to hw
Supported on R700 and up.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-11 08:32:35 +10:00
Eduardo Lima Mitev
de51676b41 i965/nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const
When both fadd and fmul instructions have at least one operand that is a
constant and it is only used once, the total number of instructions can
be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
the constants will be progagated as immediate operands of fmul and fadd.

This patch detects these situations and prevents fusing fmul+fadd into ffma.

Shader-db results on i965 Haswell:

total instructions in shared programs: 6235835 -> 6225895 (-0.16%)
instructions in affected programs:     1124094 -> 1114154 (-0.88%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                7612
HURT:                                  843
GAINED:                                4
LOST:                                  0

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-10 21:13:35 +01:00
Eduardo Lima Mitev
fb3b5669ce util: Add list_is_singular() helper function
Returns whether the list has exactly one element.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-10 21:13:35 +01:00
Eduardo Lima Mitev
94ff35204d nir/nir_opt_peephole_ffma: Move this lowering pass to the i965 driver
Because the next patch will add an optimization that is specific to i965,
we want to move this loweing pass to that driver altogether.

This is safe because i965 is the only consumer.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-10 21:13:35 +01:00
Kristian Høgsberg Kristensen
96b22fb080 glsl: Use array deref for access to vector components
We've assumed that we could lower per-component vector access from

  vec[i] = scalar

to

  vec = ir_triop_vector_insert(vec, scalar, i)

but with SSBOs (and compute shader SLM and tesselation outputs) this is
no longer valid. If a vector is "externally visible", multiple threads
can write independent components simultaneously. With lowering to
ir_triop_vector_insert, each thread read the entire vector, changes one
component, then writes out the entire vector. This is racy.

Instead of generating a ir_binop_vector_extract when we see v[i], we
generate ir_dereference_array. We then add a lowering pass to lower the
ir_dereference_array to ir_binop_vector_extract for rvalues and for to
vector_insert for lvalues in a separate lowering pass.

The resulting IR is the same as before, but we now have a window between
ast->ir conversion and the lowering pass where v[i] appears in the IR as
an array deref. This lets us run lowering passes that lower the vector
access to I/O (eg for SSBO load/store) before we lower the per-component
access to full vector writes.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-11-10 12:02:46 -08:00
Kristian Høgsberg Kristensen
60dd5287ff glsl: Lower UBO and SSBO access in glsl linker
All GLSL IR consumers run this lowering pass so we can move it to the
linker. This moves the pass up quite a bit, but that's the point: it
needs to run before we throw away information about per-component vector
access.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-11-10 12:02:46 -08:00
Kristian Høgsberg Kristensen
f0e95c2500 glsl: Drop exec_list argument to lower_ubo_reference
We always pass in shader->ir and we already pass in the shader, so just
drop the exec_list. Most passes either take just a exec_list or a
shader, so this seems more consistent.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-11-10 12:02:46 -08:00
Connor Abbott
213f86416f nir/glsl: switch to using the builder
v2: use nir_bulder_cf_insert (Ken)

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-10 13:56:43 -05:00
Connor Abbott
fbbfb7c025 nir/glsl: make emit() take nir_ssa_def * sources
Again, this matches what the builder will have to do.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-10 13:56:35 -05:00
Connor Abbott
a60e990dd2 nir/glsl: convert nir_visitor::result to a nir_ssa_def *
Its only user now returns a nir_ssa_def *, and we'll need this since the
builder returns a nir_ssa_def *.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-10 13:55:54 -05:00
Connor Abbott
30fe8eaa8e nir/glsl: make evaluate_rvalue() return a nir_ssa_def *
A long time ago, before NIR was even merged to master, glsl_to_nir used
registers and these sources were actually register sources. But nowadays
everything in glsl_to_nir is an SSA value, so stop pretending that by
evaluating an rvalue we can get an arbitrary nir_src. Most importantly,
we need this since the builder takes nir_ssa_def * sources directly.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-10 13:55:14 -05:00
Jose Fonseca
6f42162329 st/mesa: Destroy buffer object's mutex.
Ideally we should have a _mesa_cleanup_buffer_object function in
src/mesa/bufferobj.c so that the destruction logic resided in a single
place.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-11-10 11:04:28 +00:00
Kenneth Graunke
db54673b54 nir: Store PatchInputsRead and PatchOutputsWritten in nir_shader_info.
These tessellation shader related fields need plumbing through NIR.

v2: Use uint32_t instead of uint64_t to match the source type of
    GLbitfield (caught by Iago Toral).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-10 01:03:43 -08:00
Eric Anholt
437d7b6119 vc4: Avoid loading undefined (newly-allocated) FBO contents.
Since X has undefined contents in new pixmaps, it will allocate new
textures for an FBO and draw to them without an explicit clear.  For
VC4, it's much faster to emit a clear than the load of the actual
undefined memory contents, so just do that instead.
2015-11-09 19:17:36 -08:00
Eric Anholt
5980389bbf vc4: Return NULL when we can't make our shadow for a sampler view.
I'm not sure what the caller does is appropriate (just have a NULL sampler
at this slot), but it fixes the immediate crash.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-09 19:17:36 -08:00
Eric Anholt
eb8fb0064d vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.
I was afraid our callers weren't prepared for this, but it looks like
at least for resource creation, mesa/st throws an error appropriately.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-09 19:17:36 -08:00
Eric Anholt
84608e07e7 vc4: Add CL dumping for GL_ARRAY_PRIMITIVE. 2015-11-09 19:17:36 -08:00
Eric Anholt
855a3ca598 vc4: Fix a compiler warning. 2015-11-09 19:17:36 -08:00
Jordan Justen
fb3da129d1 glsl: Use shared storage variable type for shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:24 -08:00
Jordan Justen
32746fc9b4 glsl: Add shared variable type
Shared variables are stored in a common pool accessible by all threads
in a compute shader local work group.

These variables are similar to OpenCL's local/__local variables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:24 -08:00
Jordan Justen
c0ac4740a7 glsl: Add space to shader_storage in print_visitor
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:17 -08:00
Jordan Justen
007d96730e glsl: Align comments on variables types
v2:
 * Split from patch to add ir_var_shader_shared (tarceri)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:17 -08:00
Jordan Justen
8b28b35531 glsl: Parse shared keyword for compute shader variables
v2:
 * Move shared parsing under storage qualifiers (tarceri)
 * Fail to compile if shared is used in non-compute shader (tarceri)
 * Use separate shared_storage bit for shared variables (tarceri)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-09 17:21:12 -08:00
Timothy Arceri
a4a46fe3fa glsl: simplify interface block stream qualifier validation
Qualifiers on member variables are redundent all we need to do
if check if it matches the stream associated with the block and
throw an error if its not.

Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-10 12:02:30 +11:00
Ilia Mirkin
3ea3727998 docs: note that ARB_copy_image was added to nv50, nvc0 in this release
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-09 07:14:07 -05:00
Brian Paul
28f6faca51 st/wgl: add null pointer check for HUD texture
Fixes crash when using HUD with Nobel Clinician Viewer.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-09 11:25:59 +00:00
Brian Paul
75d1e363ff st/wgl: fix double-present on swapbuffers bug
The stw_st_framebuffer_present_locked() function was getting called
twice per SwapBuffers.  First, when st_context_iface::flush() was
called from DrvSwapBuffers() because the ST_FLUSH_FRONT flag was
given.  Second, by stw_st_swap_framebuffer_locked() which does the
actual SwapBuffers.

Two code changes:
1. Pass ST_FLUSH_END_OF_FRAME, instead of ST_FLUSH_FRONT.
2. Move the implementation of stw_flush_current_locked() into
DrvSwapBuffers() since it's not called anywhere else.

Not much change in perf for benchmarks like Lightsmark, but some simple
Mesa demos are measurably faster.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-09 11:25:59 +00:00
Brian Paul
8083943e2e st/wgl: reorder pixel formats to put MSAA formats last
And put 8-bit/channel formats before 5/6/5 formats.

The ChoosePixelFormat() function seems to be finicky about format
selection.  Putting the MSAA formats after the non-MSAA formats
means most apps get a low-numbered format.  Now we generally get
the same pixel format regardless of whether using vgpu9 or 10.

VMware bug 1455030

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-09 11:25:59 +00:00
José Fonseca
e524df5ef3 st/wgl: Don't rely on GDI to bookkeep pixelformat for us.
This allows to use apitrace's retracediff script on Windows to retrace and
compare two builds of a Mesa based opengl32.dll/ICD side-by-side.

See also e4a4f15f5b
2015-11-09 11:08:27 +00:00
Michel Dänzer
24abbaff9a winsys/radeon: Use CPU page size instead of hardcoding 4096 bytes v3
Fixes GPUVM conflicts with non-4K page size.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92738

v2: Replace sanitization of VM base address alignment with comment why
    that's not necessary.
v3: Use unsigned instead of long as the type for the size_align member.
    (Marek)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-09 17:24:32 +09:00
Christian König
df4f9b0236 radeon/uvd: add H.265/HEVC to legal notes
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-08 18:16:01 -05:00
Leo Liu
519502d08f st/omx: add headless support
This will allow dec/enc/transcode without X

v2:  use env override even with X,
     use loader_open_device instead of open
v3:  clean up

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-08 18:15:57 -05:00
Leo Liu
25526d77b1 st/va: use vl screen drm support from vl_wys_drm
v2: move the dup to vl_wys_drm for pipe loader

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-08 18:15:57 -05:00
Leo Liu
7da86e0ec0 vl: add drm support for vl_screen
This will allow the state trackers to use render nodes
with screen creation

v2: dup fd for pipe loader

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-08 18:15:57 -05:00
Leo Liu
d115e47099 st/va: fix build fails with pipe loader
There is no dev in drv, and dev should be from vl_screen here

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-11-08 18:15:57 -05:00
Samuel Pitoiset
ffb60e7788 nvc0: enable compute support on Fermi
Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.

This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-08 16:47:59 +01:00
Ilia Mirkin
e06238cb9e nv50/ir: fix emission of s[] args in certain situations
There might only be a single arg (e.g. cvt), so use mode rather than
looking at the source directly. Also we don't want to rely on the type
of the value, which can be unreliable, but instead use the
instruction's. This works out well since mkSplit doesn't adjust the
type.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-07 18:58:58 -05:00
Ilia Mirkin
af218217d7 nv50/ir: only take abs value when computing high result
Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-07 18:58:58 -05:00
Ilia Mirkin
53cbb11707 nouveau: avoid queueing too much work onto a single fence
Force the fence to get kicked off, which won't actually wait for its
completion, but any additional work will be put onto a fresh list.

This fixes crashes in teximage-colors --benchmark with too many active
maps.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-07 18:58:58 -05:00
Dave Airlie
0f5b1409fd llvmpipe: disable front updates for now
As pointed out by Emil, this sometimes hangs, appears to be due to threading

need to rethink how this stuff works for llvmpipe.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-08 07:55:17 +10:00
Dave Airlie
87711183ac virgl: wrap ret assignment with braces to do correct thing
Coverity reported that ret could only be 0 or 1, since it
was setting ret = fn() > 0, instead of doing (ret = fn()) > 0.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-08 06:27:02 +10:00
Jason Ekstrand
6c731d8566 nir: Add a nir_deref_tail helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 12:09:44 -08:00
Jason Ekstrand
7d90e570f3 nir/types: Add an is_vector_or_scalar helper
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 12:09:38 -08:00
Jason Ekstrand
d43e16b163 i965/fs: Use regs_read/written for post-RA scheduling in calculate_deps
Previously, we were assuming that everything read/wrote exactly 1 logical
GRF (1 in SIMD8 and 2 in SIMD16).  This isn't actually true.  In
particular, the PLN instruction reads 2 logical registers in one of the
components.  This commit changes post-RA scheduling to use regs_read and
regs_written instead so that we add enough dependencies.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 08:41:48 -08:00
Jason Ekstrand
c839174d55 nir/validate: Add better validation of load/store types
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-11-07 08:41:35 -08:00
Marek Olšák
d57ede92b7 radeonsi: add register definitions for Stoney
There are a few non-stoney changes too.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-07 10:22:13 +01:00
Marek Olšák
2658777f46 radeonsi: add workarounds for CP DMA to stay on the fast path
v2: set emit_scratch_reloc, add a NULL check

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:13 +01:00
Marek Olšák
fc0416ef5d radeonsi: unify CP DMA preparation logic
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:13 +01:00
Marek Olšák
89da3b4458 radeonsi: unify CP DMA code determining various flags
v2: don't call get_flush_flags twice per function

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:12 +01:00
Marek Olšák
c3e527f93d radeonsi: only enable write confirmation on the last CP DMA packet
This should improve performance for big copies that need to be split.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-07 10:22:12 +01:00
Ilia Mirkin
8e9ade7eb3 nv50/ir: allow emission of immediates in imul/imad ops
Nothing actually uses this yet (due to complications), but the emission
logic is right.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-07 00:42:15 -05:00
Ilia Mirkin
393d0c336b nv50/ir: properly set the type of the constant folding result
This removes the hack used for merge, which only covers a fraction of
the cases.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 19:39:32 -05:00
Ilia Mirkin
2f9aaed749 nv50/ir: add support for const-folding OP_CVT with F64 source/dest
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 19:39:32 -05:00
Ilia Mirkin
76957389fc nv50/ir: add fp64 opcode emission support for G200 (NVA0)
Need to emulate rcp/rsq before providing full fp64 support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:36:25 -05:00
Hans de Goede
f979d3cfec nv50/ir: Add support for 64bit immediates to checkSwapSrc01
Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Hans de Goede
9f2f8bda6e nvc0/ir: Teach insnCanLoad about double immediates
Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"

This turns the following (nvc0) code:
  1: mov u32 $r2 0x00000000 (8)
  2: mov u32 $r3 0x3fe00000 (8)
  3: add f64 $r0d $r0d $r2d (8)

Into:
  1: add f64 $r0d $r0d 0.500000 (8)

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Hans de Goede
428506ece2 nv50/ir: Add support for merge-s to the ConstantFolding pass
This allows later passes like LoadPropagation to properly deal with 64
bit immediates.

If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Ilia Mirkin
2437f00853 nv50/ir: disallow 64-bit immediates on nv50 targets
No instructions are able to load short immediates like nvc0 can.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Ilia Mirkin
11e3dac36e nv50/ir: allow movs with TYPE_F64 destinations to be split
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 18:13:31 -05:00
Hans de Goede
b487b55f7d gm107/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 17:22:40 -05:00
Hans de Goede
12c850d01c nvc0/ir: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 17:22:40 -05:00
Francisco Jerez
5169407221 i965/nir/fs: Add comment for no-op memory barrier functions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-11-06 13:19:56 -08:00
Jordan Justen
faa1193070 i965/nir/fs: Implement new barrier functions for compute shaders
For these nir intrinsics, we emit the same code as
nir_intrinsic_memory_barrier:

 * nir_intrinsic_memory_barrier_atomic_counter
 * nir_intrinsic_memory_barrier_buffer
 * nir_intrinsic_memory_barrier_image

We treat these nir intrinsics as no-ops:
 * nir_intrinsic_group_memory_barrier
 * nir_intrinsic_memory_barrier_shared

v3:
 * Add comment for no-op cases (curro)

v4:
 * Moving comment to a separate patch authored by curro

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-06 13:16:11 -08:00
Jordan Justen
9d65f3208b nir: Add new barrier functions for compute shaders
When these functions are called in glsl-ir, we create a corresponding
nir intrinsic function call.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-06 13:15:16 -08:00
Jordan Justen
91f188710a glsl: Add new barrier functions for compute shaders
When these functions are called in GLSL code, we create an intrinsic
function call:

 * groupMemoryBarrier => __intrinsic_group_memory_barrier
 * memoryBarrierAtomicCounter => __intrinsic_memory_barrier_atomic_counter
 * memoryBarrierBuffer => __intrinsic_memory_barrier_buffer
 * memoryBarrierImage => __intrinsic_memory_barrier_image
 * memoryBarrierShared => __intrinsic_memory_barrier_shared

v2:
 * Consolidate with memoryBarrier function/intrinsic creation (curro)

v3:
 * Instead of add_memory_barrier_function, add an intrinsic_name
   parameter to _memory_barrier (curro)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-06 13:14:44 -08:00
Boyuan Zhang
6bad554d98 radeon/uvd: fix VC-1 simple/main profile decode v2
We just needed to set the extra width/height fields to get this working.

v2 (chk): rebased, CC stable added, commit message added, fixed coding style

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-06 20:07:23 +01:00
Boyuan Zhang
ed55def44f st/vaapi: fix vaapi VC-1 simple/main corruption v2
Apply the start code fix only to advanced profile.

v2 (chk): add commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-11-06 20:07:23 +01:00
Julien Isorce
cc1e5c972e st/va: add support for RGBX and BGRX in VPP
Before it was only possible to convert a NV12 surface to
RGBA or BGRA. This patch uses the same post processing
function, "handleVAProcPipelineParameterBufferType", but
add definitions for RGBX and BGRX.

This patch also makes vlVaQuerySurfaceAttributes more generic
to avoid copy and pasting the same lines.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:45 +00:00
Julien Isorce
42a5e143a8 vl/buffers: add RGBX and BGRX to the supported formats
Useful is one wants to create RGBX or BGRX surfaces.
The infrastructure is such that it required just a
few definitions to support these formats.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:38 +00:00
Julien Isorce
bf6acbb2db st/va: properly use brackets in vlVaAcquireBufferHandle's switch
In "switch (mem_type)" the brackets were surrounding "case+default"
instead of "case" only.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:16 +00:00
Julien Isorce
bfc245e9ac st/va: properly indent buffer.c, config.c, image.c and picture.c
Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-06 17:33:01 +00:00
Rob Clark
6459e780ae freedreno/a4xx: fix blend color
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:19:04 -05:00
Rob Clark
7465e16124 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:18:47 -05:00
Guillaume Charifi
6f5e0c08a4 freedreno: add a305 support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:17:58 -05:00
Boyan Ding
8f55ebe802 freedreno/ir3: Use nir_foreach_variable
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-06 11:17:53 -05:00
Rob Clark
99597d033a nir: some small cleanups
The various cf nodes all get allocated w/ shader as their ralloc_parent,
so lets make this more explicit.  Plus couple other corrections/
clarifications.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-06 11:15:41 -05:00
Ilia Mirkin
d68226087c nvc0: reintroduce BGRA4 format support
Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the
support to fix a WoW trace. However after further experimentation, I was
able to get the blit to work by using a different "fake" format in the
2d engine.

The reason why this worked on nv50 is that nv50 falls back to the 3d
blit path in case either the src or the dst aren't "faithfully"
supported, while nvc0 only does it for the dst format. RG8 is better
supported by the nvc0 2d engine than R16.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-06 00:47:44 -05:00
Brian Paul
581111c4d6 mesa: report enum name in glClientActiveTexture() error string
As we do for glActiveTexture().  Trivial.
2015-11-05 20:12:33 -07:00
Julien Isorce
497bde6727 st/va: fix memory leak on error in vlVaCreateSurfaces2
Found by coverity: CID #1337953

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-05 23:39:45 +00:00
Julien Isorce
e0b896c86c st/va: indent vlVaQuerySurfaceAttributes and vlVaCreateSurfaces2
Some lines were using 4 indentation spaces instead of 3.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-11-05 23:39:43 +00:00
Kenneth Graunke
8dcf807cb4 i965: Fix scalar VS float[] and vec2[] output arrays.
The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken).  Outputs need to be padded
out to vec4 slots.

In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4.  However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4.  This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.

Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them.  Nothing used those values,
and dead code elimination threw a party.

To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.

Normally, varying packing avoids this problem by turning varyings into
vec4s.  So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.

v2: Shorten the implementation of type_size_4x to a single line (caught
    by Connor Abbott), and rename it to type_size_vec4_times_4()
    (renaming suggested by Jason Ekstrand).  Use type_size_vec4
    rather than using type_size_vec4_times_4 and then dividing by 4.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 15:26:07 -08:00
Roland Scheidegger
5ae37ae615 llvmpipe: disable texture cache
There are some weird problems with 8-wide vectors.
2015-11-05 18:00:42 +01:00
Ilia Mirkin
ba093a099a nouveau: send back a debug message when waiting for a fence to complete
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin
4f6cd5fad0 nv50,nvc0: provide debug messages with shader compilation stats
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin
4335b28840 nouveau: add support for sending debug messages via KHR_debug
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-05 11:22:19 -05:00
Ilia Mirkin
6706cc1671 st/clover: provide a path for drivers to call through to pfn_notify
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

[ Francisco Jerez: Clean up clover::context interface by passing
  around a function object. ]
2015-11-05 11:22:19 -05:00
Ilia Mirkin
c93c9d220b st/mesa: set debug callback for debug contexts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-11-05 11:22:19 -05:00
Ilia Mirkin
fc76cc05e3 gallium: expose a debug message callback settable by context owner
This will allow gallium drivers to send messages to KHR_debug endpoints

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-05 11:22:18 -05:00
Ilia Mirkin
e587590a83 st/mesa: account for texture views when doing CopyImageSubData
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-05 11:22:18 -05:00
Iago Toral Quiroga
eea3c907cc i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga
eca4c43a33 i965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE
Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga
6105d1d0a0 i965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const, do not add unnecessary temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga
d7013988fb i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Iago Toral Quiroga
027b64a55a i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const and remove useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-05 16:11:52 +01:00
Neil Roberts
6c5f371a27 i965/skl+: Enable support for 16x multisampling
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:21 +01:00
Neil Roberts
aa3f9aaf31 mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit
Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to rounding errors. It is likely that these
positions would be used on 16x MSAA because that is where they are
defined to be in D3D.

To fix that this patch makes it use interpolateAtOffset in the blit
shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
is available. This forces it to interpolate the texture coordinates at
the pixel center to avoid these problematic positions.

This fixes ext_framebuffer_multisample-unaligned-blit and
ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
SKL+.

v2: Use interpolateAtOffset instead of interpolateAtSample
v3: Always try to enable GL_ARB_gpu_shader5 in the shader
    [Ian Romanick]

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts
b080b3d54d meta/blit: Always try to enable GL_ARB_sample_shading
Previously this extension was only enabled when blitting between two
multisampled buffers. However I don't think it does any harm to just
enable it all the time. The ‘enable’ option is used instead of
‘require’ so that the shader will still compile if the extension isn't
available in the cases where it isn't used. This will make the next
patch simpler because it wants to add another optional extension.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts
2dd76ec16e meta: Support 16x MSAA in the multisample scaled blit shader
v2: Fix the x_scale in the shader. Remove the doubts in the commit
    message.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts
1a22b12fc5 i965/meta: Support 16x MSAA in the meta stencil blit
The destination rectangle is now drawn at 4x4 the size and the shader
code to calculate the sample number is adjusted accordingly.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
a680465428 i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA
In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-11-05 10:33:16 +01:00
Neil Roberts
bf6bd7eaf0 i965: Support allocating the MCS buffer for 16x MSAA
When 16 samples are used the MCS buffer needs 64 bits per pixel.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
b4c2e6054f i965: Support calculating the bits needed to set up 16x MSAA
The gen7_surface_msaa_bits function already returns the right values
for 16 samples but it just needs its assert to be relaxed.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
1a97cac767 i965/fs: Add a sampler program key for whether the texture is 16x MSAA
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.

Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
4ef27745c8 i965/vec4/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
e386fb0dee i965/fs/skl+: Use ld2dms_w instead of ld2dms
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data retrieved from the
ld_mcs instruction already returns 4 or 8 registers and is documented
to return zeroes for the mcsh value when the sample count is less than
16.

v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when
    the message length would be too long in SIMD16.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:16 +01:00
Neil Roberts
20250e854e i965: Program 16x MSAA sample positions.
This is the standard pattern used by the other 3D graphics API.

BDW has slots for these values, but they aren't actually used until
SKL. Even though the documentation for BDW says they must be zero, it
doesn't seem to cause any harm to program them anyway.

The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.

arb_gpu_shader5-interpolateAtSample-different

Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern.

(Based on a patch by Kenneth Graunke)

Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
2015-11-05 10:33:15 +01:00
Kenneth Graunke
5048da974e i965: Handle 16x MSAA in IMS dimension munging code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-11-05 10:33:15 +01:00
Kenneth Graunke
b9f8e729c8 nir: Rename nir_live_variables.c to nir_liveness.c.
It doesn't actually operate on variables.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 00:09:40 -08:00
Kenneth Graunke
5c6f21579d nir: Rename live_variables to live_ssa_defs.
This computes liveness of SSA values, not nir_variables.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-05 00:09:40 -08:00
Alejandro Piñeiro
56774e6302 i965/vec4: select predicate based on writemask for sel emissions
Equivalent to commit 8ac3b525c but with sel operations. In this case
we select the PredCtrl based on the writemask.

This patch helps on cases like this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

In this case, cmod propagation can't optimize instruction #2, because
instructions #1 and #2 have different writemasks, and we can't update
directly instruction #2 writemask because our code thinks that sel at
instruction #3 reads all four channels of the flag, when it actually
only reads .x.

So, with this patch, the previous case becames this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Now only the x channel of the flag is used, allowing dead code
eliminate to update the writemask at the second instruction:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

So now cmod propagation can simplify out #2:
 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Shader-db numbers:
total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
instructions in affected programs:     219850 -> 212023 (-3.56%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                1192
HURT:                                  0
2015-11-05 08:57:23 +01:00
Ilia Mirkin
bb73fc4cb8 nouveau: relax fence emit space assert
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2015-11-04 22:43:56 -05:00
Eric Anholt
6d3a24bce8 vc4: When the create ioctl fails, free our cache and try again.
This greatly increases the pressure you can put on the driver before
create fails.  Ultimately we need to let the kernel take control of
our cached BOs and just take them from us (and other clients)
directly, but this is a very easy patch for the moment.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-04 14:04:14 -08:00
Eric Anholt
3f7c96c36c vc4: Print the rounded shader size in debug output.
It's surprising to see "0kb" printed for debug on short shaders, while
4kb alignment won't be suprising.
2015-11-04 13:32:07 -08:00
Eric Anholt
4a951f1c08 vc4: Fix dumping the size of BOs allocated/cached.
60MB of cached BOs are a lot less scary than 600MB.
2015-11-04 13:32:07 -08:00
Ilia Mirkin
5bbd522452 mesa/tests: add glBufferStorageEXT to ES 3.1 dispatch list
I thought that aliased functions didn't need to be added, but that might
only be if the function aliases something in the same {desktop,ES}
space. Resolves the dispatch sanity test failure.

Fixes: 13b19aa81 (mesa: expose support for GL_EXT_buffer_storage)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92824
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-04 14:28:57 -05:00
Brian Paul
bdf6cef033 vbo: fix another GL_LINE_LOOP bug
Very long line loops which spanned 3 or more vertex buffers were not
handled correctly and could result in stray lines.

The piglit lineloop test draws 10000 vertices by default, and is not
long enough to trigger this.  Even 'lineloop -count 100000' doesn't
trigger the bug.

For future reference, the issue can be reproduced by changing Mesa's
VBO_VERT_BUFFER_SIZE to 4096 and changing the piglit lineloop test to
use glVertex2f(), draw 3 loops instead of 1, and specifying -count
1023.

Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-04 11:51:59 -07:00
Brian Paul
d31481e70a svga: implement 'white_fragments' option for VGPU10 fragment shaders
When we emulate XOR logicop mode with blend-subtract, we need to ensure
that the fragment shader always emits white.  We had this implemented
for VGPU9, but not VGPU10.

VMware bug 1545492.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-04 11:51:41 -07:00
Brian Paul
149ac1fe43 u_vbuf: minor code reformatting / line wrapping
Trivial.
2015-11-04 11:51:41 -07:00
Brian Paul
e450d4371a u_vbuf: add some const qualifiers
Trivial.
2015-11-04 11:51:40 -07:00
Brian Paul
3f98c812b3 svga: use new enum indices_mode type
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-04 11:51:40 -07:00
Brian Paul
fa6efbd27d util/indices: replace #define tokens with enum type
To ease debugging in gdb.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-11-04 11:51:40 -07:00
Alejandro Piñeiro
c3d7caa1e0 i965: check inst->predicate when clearing flag_live at dead code eliminate
Detected by Matt Turner while reviewing commit
a59359ecd2

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-11-04 19:33:56 +01:00
Roland Scheidegger
c19443bc8b gallivm: fix sampling for s3tc srgb formats when using texture cache
This actually stored the values as 8bit linear values in the cache,
then did another srgb->linear conversion...
We don't want to do the former (decoding 8bit srgb values to 8bit linear
completely defeats the purpose of srgb in the first place), so just decode
to 8bit srgb.
Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.
2015-11-04 14:21:43 +01:00
Ben Widawsky
d56a1478a8 i965/meta: Assert fast clears and rep clears never overlap
There is nothing wrong with the code today, but as one modifies the code it
turns out to be not too difficult to mess up the code, and this easy assertion
should catch such driver implementation failures quickly.

Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-03 21:54:11 -08:00
Ryan Houdek
13b19aa815 mesa: expose support for GL_EXT_buffer_storage
This extension requires ES 3.1 since it relies on glMemoryBarrier.
For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
function.
This has been tested with the piglit in the ML and the Dolphin emulator.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-04 00:01:03 -05:00
Timothy Arceri
8e4cf900f0 glsl: make sure to only add subroutines to resource list
Over looked in 763cd8c080.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-04 15:43:12 +11:00
Timothy Arceri
f6b3c163f9 glsl: remove old TODO
SSBO support now exists as of commits f24e5e and f408a13dd3.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2015-11-04 15:40:38 +11:00
Timothy Arceri
6e3b380387 docs: Mark AoA as done for i965
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-04 13:41:16 +11:00
Timothy Arceri
5b75dbd7be i965: enable ARB_arrays_of_arrays
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-11-04 13:39:08 +11:00
Timothy Arceri
fb77da89f5 i965: add support for image AoA
V3: clamp array index to the correct size (the size of the current array
rather than the inner array) Francisco Jerez.

V2: avoid useless zero-initialization and addition for the first AoA level,
avoid redundant temporary, make use of type_size_scalar(), rename aoa_size
to element_size, assign the indirect indexing temporary directly to
image.reladdr, and replace while loop with a for loop. All suggested
by Francisco Jerez.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-11-04 13:38:32 +11:00
Roland Scheidegger
9285ed98f7 llvmpipe: add cache for compressed textures
compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-11-04 02:51:02 +01:00
Oded Gabbay
39b4dfe6ab llvmpipe: use simple coeffs calc for 128bit vectors
There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.

The decision which method to use is determined by the size of the vector
that is used by the platform.

For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.

For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.

This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).

This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.

The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:

- glxspheres, on ppc64le:

   - original code:  4.892745317 frames/sec 5.460303857 Mpixels/sec
   - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
   - Additional 0.8% performance boost

- glxspheres, on x86-64 without AVX:

   - original code:  20.16418809 frames/sec 22.50323395 Mpixels/sec
   - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
   - Additional 0.74% performance boost

- glmark2, on ppc64le:

  - original code:  score of 58
  - with my change: score of 57

- glmark2, on x86-64 without AVX:

  - original code:  score of 175
  - with the patch: score of 167
  - Impact of of -4.5% on performance

- OpenArena, on ppc64le:

  - original code:  3398 frames 1719.0 seconds 2.0 fps
                    255.0/505.9/2773.0/0.0 ms

  - with the patch: 3398 frames 1690.4 seconds 2.0 fps
                    241.0/497.5/2563.0/0.2 ms

  - 29 seconds faster with the patch, which is about 2%

- OpenArena, on x86-64 without AVX:

  - original code:  3398 frames 239.6 seconds 14.2 fps
                    38.0/70.5/719.0/14.6 ms

  - with the patch: 3398 frames 244.4 seconds 13.9 fps
                    38.0/71.9/697.0/14.3 ms

  - 0.3 fps slower with the patch (about 2%)

Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-11-04 02:38:53 +01:00
Kenneth Graunke
59bbe2681b nir: Properly invalidate metadata in nir_opt_remove_phis().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
bc3942e297 nir: Properly invalidate metadata in nir_lower_vec_to_movs().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
0f037bd71f nir: Properly invalidate metadata in nir_opt_copy_prop().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
4cb7546066 nir: Properly invalidate metadata in nir_remove_dead_variables().
v2: Preserve live_variables too (Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-11-03 17:06:48 -08:00
Kenneth Graunke
8bb44510fc nir: Properly invalidate metadata in nir_split_var_copies().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-03 17:06:48 -08:00
Kenneth Graunke
aea40091f0 nir: Properly invalidate metadata in nir_lower_global_vars_to_local().
v2: Preserve nir_metadata_live_variables as well (caught by Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-11-03 17:06:48 -08:00
Jason Ekstrand
531be601d5 nir: Unexpose _impl versions of copy_prop and dce
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-03 17:06:48 -08:00
Jordan Justen
4bc16ad217 mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Iago Toral <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
2015-11-03 16:44:22 -08:00
Matt Turner
cf3121ed18 i965/vec4: Send from GRF in atomic operations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-03 16:38:36 -08:00
Marek Olšák
3b37155a68 gallium/radeon: allow returning SDMA fences from pipe->flush
pipe->flush never returned SDMA fences. This fixes it.
This is only an issue on amdgpu where fences can signal out of order.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-04 00:43:14 +01:00
Marek Olšák
7f9122c968 gallium/radeon: always return the last SDMA fence on SDMA flush if needed
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-04 00:43:14 +01:00
Kenneth Graunke
36fd653817 i965: Add scalar geometry shader support.
This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support
instanced geometry shaders, and Orbital Explorer's shader spills like
crazy.  But the infrastructure is in place, and it's largely working.

v2: Lots of rebasing.

v3: (feedback from Kristian Høgsberg)
- Handle stride and subreg_offset correctly for ATTRs; use a helper.
- Fix missing emit_shader_time_end() call.
- Delete dead code after early EOT in static vertex case to avoid
  tripping asserts in emit_shader_time_end().
- Use proper D/UD type in intexp2().
- Fix "EndPrimitve" and "to that" typos.
- Assert that invocations == 1 so we know this is missing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-03 15:08:49 -08:00
Kenneth Graunke
c9541a74e4 i965: Add scalar GS input lowering code.
We really ought to compute the VUE map at link time and stash it, rather
than recomputing it here, but with the mess of program structures I
wasn't sure where to put it.  We can improve that later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-03 15:08:49 -08:00
Kenneth Graunke
4861835d1c i965: Fix the fs_visitor GS constructor to take shader_time_index.
Jason reworked this so it isn't simply ST_GS anymore...it's either -1
(not enabled) or an actual offset.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-11-03 15:08:49 -08:00
Ben Widawsky
5d4b019d2a i965/gen8+: Extract color clear surface state
On future generation platforms the color clear value is stored elsewhere in the
surface state. By extracting this logic, we can cleanly implement the difference
in an upcoming patch.

Should have no functional impact.

v2: Move hunk from the next patch into this patch (Matt)
Whitespace fix (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-03 13:49:21 -08:00
Ben Widawsky
f3223ebd6c i965/gen8+: Remove redundant zeroing of surface state
The allocate_surface_state already zeroes out the surface state, and doing it
later in the function is destructive for what we want to accomplish when we
split out support for gen9 fast clears (next patch).

NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent
to remove the other instances as well. I can make an argument both ways (open
coding it, vs. not). I can rework the next patch if requires.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-11-03 13:49:21 -08:00
Samuel Pitoiset
e887407491 nvc0: add missing compute parameters required by clover
This fixes crashes with some piglit OpenCL tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-03 22:17:00 +01:00
Samuel Pitoiset
e640ba41ed nvc0: handle NULL pointer in nvc0_get_compute_param()
To get the size (in bytes) of a compute parameter, clover first calls
get_compute_param() with a NULL data pointer. The RET() macro is based
on nv50.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-03 22:16:45 +01:00
Ben Widawsky
dde33fc23c i965/skl: PCI ID cleanup and brand strings
A few new PCI ids are added here, and one is removed (0x190B) because it no
longer seems to exist anywhere.

v2-4:
Only use ascii characters (Ilia)
0x1921 is no longer marked as f

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-11-03 10:00:17 -08:00
Ben Widawsky
7cbd6608f5 i965/skl: Add GT4 PCI IDs
Like other gen8+ hardware, the hardware automatically scales up thread counts.
We must be careful about the URB sizes since GT4 adds another slice.

One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a
real bug since the URB size will be wrong. Because this patch is simply meant to
add the missing IDs, that will be fixed in a later patch.

v2: No longer relevant.

v3: Update the wm thread count to support GT4. The WM thread count is used to
determine the maximum scratch space required. Currently the code always
allocates the maximum amount even though lower GT SKUs require less. The formula
is threads_per_psd * subslices_per_slice * slices

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
2015-11-03 09:45:04 -08:00
Jordan Justen
55365a7ad5 mesa: Add spec citations for DispatchCompute*
Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute
appears to have an error regarding the max allowed values. When adding
the specification citation, we note why the code does not match the
specification language.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-02 15:25:37 -08:00
Jordan Justen
44c399f20a mesa: Update DispatchComputeIndirect errors for indirect parameter
There is some discrepancy between the return values for some error
cases for the DispatchComputeIndirect call in the ARB_compute_shader
specification. Regarding the indirect parameter, in one place the
extension spec lists that the error returned for invalid values should
be INVALID_OPERATION, while later it specifies INVALID_VALUE.

The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent
in requiring the INVALID_VALUE error return in this case.

Here we update the code to match the main specifications, and update
the citations use the main specification rather than the extension
specification.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-02 15:25:37 -08:00
Matt Turner
0b19f65195 i965/fs: Clean up FBH code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-02 09:33:31 -08:00
Matt Turner
c22d62f599 i965/vec4: Clean up FBH code.
It did a bunch of unnecessary stuff, emitting an extra MOV included.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-02 09:33:31 -08:00
Matt Turner
7c81a6a647 i965: Replace default case with list of enum values.
If we add a new file type, we'd like to get warnings if it's not
handled.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-11-02 09:33:31 -08:00
Matt Turner
d9b09f8a30 i965/vec4: Don't disable channels in any/all comparisons.
We've made a mistake in calling the Channel Enable bits "writemask",
because they do more than control which channels of the destination are
written -- they actually control which channels are enabled (surprise!
surprise!)

So, if we emit

               cmp.z.f0(8) null.xy<1>D  g10<4,4,1>.xyzzD g2<0,4,1>.xyzzD
               mov(8)      g12<1>.xUD   0x00000000UD
   (+f0.all4h) mov(8)      g12<1>.xUD   0xffffffffUD

where the CMP instruction has only .xy channel enables, it won't write
the .zw channels of the flag register, which are of course read by the
+f0.all4 predicate.

We need to always emit CMP instructions whose flag result might be read
by such a predicate with all channels enabled.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-11-02 09:33:31 -08:00
Tapani Pälli
f4466c856f mesa: fix uniforms calculation in glGetProgramiv
Since introduction of SSBO, UniformStorage contains not just uniforms
but also buffer variables, this needs to be taken in to account when
calculating active uniforms with GL_ACTIVE_UNIFORMS and
GL_ACTIVE_UNIFORM_MAX_LENGTH.

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-11-02 11:22:10 +02:00
Tapani Pälli
efb333acb7 mesa: fix program resource queries for atomic counter buffers
gl_active_atomic_buffer contains index to UniformStorage, we need to
calculate resource index for that gl_uniform_storage.

Fixes following CTS tests:
   ES31-CTS.program_interface_query.atomic-counters
   ES31-CTS.program_interface_query.atomic-counters-one-buffer

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-02 11:22:06 +02:00
Juha-Pekka Heikkila
c2c124f891 glsl: join calculate_array_size() and calculate_array_stride()
These helpers are ran for same case the same loop. Here joined
their operation so the loop is ran just once. Also fixed
out-of-memory condition here.

v2: Make the loop simpler to read as per Tapani's suggestion

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2015-11-02 10:03:32 +02:00
Ryan Houdek
af7c98a9c7 mesa: expose support for OES/EXT_draw_elements_base_vertex to OpenGL ES
This has been tested with the piglits in the mailing list and
on the Dolphin emulator.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-01 23:02:06 -05:00
Ilia Mirkin
985b51551a nouveau: set MaxDrawBuffers to the same value as MaxColorAttachments
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-01 20:15:15 -05:00
Samuel Pitoiset
00bb524716 nv50: use correct heaps for FP and GP code segments
This is just a cosmetic change. Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-01 23:29:20 +01:00
Jordan Justen
39bb59a566 mesa/sso: Add compute shader support
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
[itoral@igalia.com: Reviewed-by for all except the ctx->_Shader change]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-01 01:10:01 -07:00
Jordan Justen
6e11855050 mesa/sso: Add MESA_VERBOSE=api trace support
v2:
 * Use %u for unsigned values (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-01 01:09:20 -07:00
Jordan Justen
5bfe2835c2 i965: Setup pull constant state for compute programs
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-11-01 00:35:12 -07:00
Jordan Justen
a4a416f567 main/get: Add MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-11-01 00:11:42 -07:00
Jordan Justen
218e94906d glsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules
The OpenGLES GLSL 3.1 specification uses the precision qualifier
ordering rules from ARB_shading_language_420pack.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-31 23:17:06 -07:00
Jordan Justen
b6e9b2b7a0 glsl: Add compute shader builtin variables for OpenGLES 3.1
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-31 23:08:09 -07:00
Ilia Mirkin
67635a0a71 nouveau: get rid of tabs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-31 19:58:14 -04:00
Connor Abbott
0ef8c5cb96 i965/sched: don't calculate live intervals for post-RA scheduling
For some reason, this causes assertions on gm965 only. In any case, it's
unnecessary since we don't need liveness information in the post-RA
scheduler.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92744
Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-31 08:05:52 -07:00
Dave Airlie
425d8c2578 virgl/vtest: fix extra malloc
This somehow got added twice, drop the first one.

Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 18:05:33 +10:00
Dave Airlie
8d731ebd33 virgl: free sampler view on failure path
Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 16:16:44 +10:00
Dave Airlie
7153b12651 gallium/swrast: fixup build breakage and warnings
The front buffer rendering changes broke an interface, I didn't
fix up all of them.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 16:16:44 +10:00
Dave Airlie
2b67657096 gallium/swrast: fix front buffer blitting. (v2)
So I've known this was broken before, cogl has a workaround
for it from what I know, but with the gallium based swrast
drivers BlitFramebuffer from back to front or vice-versa
was pretty broken.

The legacy swrast driver tracks when a front buffer is used
and does the get/put images when it is mapped/unmapped,
so this patch attempts to add the same functionality to the
gallium drivers.

It creates a new context interface to denote when a front
buffer is being created, and passes a private pointer to it,
this pointer is then used to decide on map/unmap if the
contents should be updated from the real frontbuffer using
get/put image.

This is primarily to make gtk's gl code work, the only
thing I've tested so far is the glarea test from
https://github.com/ebassi/glarea-example.git

v2: bump extension version,
check extension version before calling get image. (Ian)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-31 16:04:36 +10:00
Timothy Arceri
103de0225b glsl: set image access qualifiers for AoA
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-31 08:37:08 +11:00
Ian Romanick
7b3684877c i965: Do legacy userclipping in OpenGL ES 1.x contexts.
Commit fba4823a disabled user clipping for everything except
compatibility profile.  Core profile and OpenGL ES 2.0+ have all removed
the classic, OpenGL 1.0 user clip planes.  ES 1.x, however, still has
them.

Fixes OpenGL ES 1.1 conformance mustpass.c and userclip.c

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Olivier Berthier <olivierx.berthier@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92639
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92641
2015-10-30 14:25:33 -07:00
Emmanuel Gil Peyrot
f3d4d10a1d gbm.h: Add a missing stddef.h include for size_t.
This was causing compilation issues when one of its providers wasn’t
already included before gbm.h.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-30 19:12:14 +00:00
Emil Velikov
7bac333508 winsys/virgl: rework line wrapping/indent
Wrap some of the 'omg it's getting out of hand' long lines, and
re-indent where things feel off.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
493e410d55 virgl: unwrap the includes
Include what you want, rather than relying on a header foo.h N levels
down the include chain, to provide something that you need.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
7154d48c6e winsys/virgl: remove temporary ret variable
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
bdcb005788 winsys/virgl: always memset prior to ioctl
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
e992715da2 winsys/virgl: use MALLOC to match FREE
The uppercase versions are wrappers which must be matched.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
72d7d1e224 winsys/virgl: remove calloc/malloc casts
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
1ce685f05e winsys/virgl: throw in some inline wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
78be78b681 virgl: introduce virgl_query() inline wrapper
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
dafcb21405 virgl: use virgl_screen/surface upcast wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
7af46b9c74 virgl: introduce and use virgl_transfer/texture/resource inline wrappers
The only two remaining cases of (struct virgl_resource *) require a
closer look. Either the error checking is missing or the arguments
provided feel wrong.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:09 +00:00
Emil Velikov
6b123fa07f virgl: add virgl_context/sampler_view/so_target() upcast wrappers
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
1f43e4e1a3 winsys/virgl/drm: drop unneeded forward declaration
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
e0056228f6 virgl: remove sw_winsys pointer from virgl_screen
The screen already has a pointer to the (base) winsys object.
With the latter of which implemented/sub-classed as either drm or sw
based one, depending on the target.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
0c82c2fb0b virgl: rename virgl.h to virgl_screen.h
Provide a more meaningful name considering it's purpose.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
87f7d61e19 virgl: move virgl_hw.h into the driver dir
Strictly speaking virgl_hw.h should reside in the driver folder, as
it describes the hardware. Moving it allows us to nuke the following
strange dependency

winsys/vtest > driver > winsys/drm

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
014f8ef2ff virgl: straighten the includes confusion
Use the relevant GALLIUM_foo_CFLAGS which has all the requirements
(not to mention VISIBITY_CFLAGS) and keep ../ out of the include
directives.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
2c705d2220 virgl: remove the _FILE_OFFSET_BITS defines
The build already sets it as needed.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
a05648fd7e winsys/virgl/drm: add all files to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:37:08 +00:00
Emil Velikov
8b9e69e2ea winsys/virgl/vtest: list all files in Makefile.sources
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:36:46 +00:00
Emil Velikov
73308ca802 virgl: move sources list to Makefile.sources
... and add the missing files while we're at it.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:33:11 +00:00
Emil Velikov
c1bf71f77c virgl: fix drm.h include path
The drm/ prefix is required, if using the kernel provided headers. As
most distros don't ship them it and we already depend on libdrm (which
adds the relevant -I flag) just drop the drm/ from the include.

Once a libdrm release with the virtgpu_drm.h header is released, we can
drop our local copy of the file.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-30 17:29:01 +00:00
Emil Velikov
60418a28ea i965: enable ARB_shader_clock on gen7+
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:23:18 +00:00
Emil Velikov
4379ca22f1 i965: Implement nir_intrinsic_shader_clock
v2:
 - Add a few const qualifiers for good measure.
 - Drop unneeded retype()s (Matt)
 - Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
SIMD4 (Connor)

v3:
 - Remove unneeded temporary + MOV (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:40 +00:00
Emil Velikov
6a15517242 i965/fs: move the fs_reg::smear() from get_timestamp() to the callers
We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
In the latter the generalisation does not apply, so move the smear()
where needed. This also makes the function analogous to the vec4 one.

v2: Tweak the comment - The caller -> We (Matt, Connor).
v3: More comment tweaks (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:36 +00:00
Emil Velikov
7682844f34 nir: add shader_clock intrinsic
v2: Add flags and inline comment/description.
v3: None of the input/outputs are variables
v4: Drop clockARB reference, relate code motion barrier comment wrt
intrinsic flag.
v5: Drop the "thus we can eliminate..." comment (Connor)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:32 +00:00
Emil Velikov
f1d98fc90a glsl: add support for the clock2x32ARB function
v2: correctly set the return type

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:29 +00:00
Emil Velikov
51265c1b85 glsl: add ARB_shader_clock infrastructure
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:27 +00:00
Emil Velikov
e916d5e013 mesa: add infra for ARB_shader_clock
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-30 17:22:23 +00:00
Samuel Pitoiset
0d0329df8f nv50: do not create an invalid HW query type
While we are at it, store the rotate offset for occlusion queries to
nv50_hw_query like on nvc0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
5f1eeb799b nv50: move HW queries to nv50_query_hw.c/h files
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
76b48ceee9 nv50: move nva0_so_target_save_offset() to its correct location
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2015-10-30 17:57:15 +01:00
Samuel Pitoiset
2e3fe0379e nv50: add a header file for nv50_query
Like for nvc0, this will allow to split different types of queries and
to prepare the way for both global performance counters and MP counters.

While we are at it, make use of nv50_query struct instead of pipe_query.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-30 17:57:15 +01:00
Julien Isorce
e7ed3963ed st/va: add support to export a surface as dmabuf
I.e. implements:
VaAcquireBufferHandle
VaReleaseBufferHandle
for memory of type VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME

And apply relatives change to:
vlVaMapBuffer
vlVaUnMapBuffer
vlVaDestroyBuffer

Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Tested with gstreamer-vaapi with nouveau driver.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:21:20 +01:00
Julien Isorce
802ba6f865 st/va: implement VaDeriveImage
And apply relatives change to:
vlVaBufferSetNumElements
vlVaCreateBuffer
vlVaMapBuffer
vlVaUnmapBuffer
vlVaDestroyBuffer
vlVaPutImage

It is unfortunate that there is no proper va buffer type and struct
for this. Only possible to use VAImageBufferType which is normally
used for normal user data array.
On of the consequences is that it is only possible VaDeriveImage
is only useful on surfaces backed with contiguous planes.
Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:21:11 +01:00
Julien Isorce
5e763aaa21 st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:41 +01:00
Julien Isorce
86eb4131a9 st/va: add headless support, i.e. VA_DISPLAY_DRM
This patch allows to use gallium vaapi without requiring
a X server running for your second graphic card.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:35 +01:00
Julien Isorce
1bdea0e579 st/va: handle Video Post Processing for configs
Add support for VA_PROFILE_NONE and VAEntrypointVideoProc
in the 4 following functions:

vlVaQueryConfigProfiles
vlVaQueryConfigEntrypoints
vlVaCreateConfig
vlVaQueryConfigAttributes

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:29 +01:00
Julien Isorce
0b868807e4 st/va: add colospace conversion through Video Post Processing
Add support for VPP in the following functions:
vlVaCreateContext
vlVaDestroyContext
vlVaBeginPicture
vlVaRenderPicture
vlVaEndPicture

Add support for VAProcFilterNone in:
vlVaQueryVideoProcFilters
vlVaQueryVideoProcFilterCaps
vlVaQueryVideoProcPipelineCaps

Add handleVAProcPipelineParameterBufferType helper.

One application is:
VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:10 +01:00
Julien Isorce
05b6ce4209 st/va: implement dmabuf import for VaCreateSurfaces2
For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:20:03 +01:00
Julien Isorce
adf1133118 st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes
Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/
especially src/i965_drv_video.c::i965_CreateSurfaces2.

This patch is mainly to support gstreamer-vaapi and tools that uses
this newer libva API. The first advantage of using VaCreateSurfaces2
over existing VaCreateSurfaces, is that it is possible to select which
the pixel format for the surface. Indeed with the simple VaCreateSurfaces
function it is only possible to create a NV12 surface. It can be useful
to create a RGBA surface to use with video post processing.

The avaible pixel formats can be query with VaQuerySurfaceAttributes.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:19:54 +01:00
Julien Isorce
d42029d2d9 st/va: do not destroy old buffer when new one failed
If formats are not the same vlVaPutImage re-creates the video
buffer with the right format. But if the creation of this new
video buffer fails then the surface looses its current buffer.
Let's just destroy the previous buffer on success.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:19:47 +01:00
Julien Isorce
87109e5f88 st/va: properly defines VAImageFormat formats and improve VaCreateImage
Added PIPE_VIDEO_CHROMA_FORMAT_NONE in p_format.h
and return it by default in ChromaToPipe.

Renamed YCbCrToPipe to VaFourccToPipeFormat because it now
contains RGB.

Implemented PipeFormatToVaFourcc which will be used later in
VlVaDeriveImage.

Note that gstreamer-vaapi check all the VAImageFormat fields.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-30 13:05:23 +01:00
Samuel Iglesias Gonsalvez
7b8cc37585 main: fix basename match's check if it's an array or struct
Commit 4565b6f did not update the basename match's check for
the case that string would exactly match the name of the
variable if the suffix "[0]" were appended to it.

Fixes two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Change the position of rname_has_array_index_zero to avoid an out-of-bounds
  read. Reported by Tapani Pälli.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-30 08:12:53 +01:00
Kristian Høgsberg
f7f1bc6cca i965: Fix invalid memory accesses after resizing brw_codegen's store table
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-30 07:49:10 +01:00
Connor Abbott
73caa26e43 i965/sched: use liveness analysis for computing register pressure
Previously, we were using some heuristics to try and detect when a write
was about to begin a live range, or when a read was about to end a live
range. We never used the liveness analysis information used by the
register allocator, though, which meant that the scheduler's and the
allocator's ideas of when a live range began and ended were different.
Not only did this make our estimate of the register pressure benefit of
scheduling an instruction wrong in some cases, but it was preventing us
from knowing the actual register pressure when scheduling each
instruction, which we want to have in order to switch to register
pressure scheduling only when the register pressure is too high.

This commit rewrites the register pressure tracking code to use the same
model as our register allocator currently uses. We use the results of
liveness analysis, as well as the compute_payload_ranges() function that
we split out in the last commit. This means that we compute live ranges
twice on each round through the register allocator, although we could
speed it up by only recomputing the ranges and not the live in/live out
sets after scheduling, since we only shuffle around instructions within
a single basic block when we schedule.

Shader-db results on bdw:

total instructions in shared programs: 7130187 -> 7129880 (-0.00%)
instructions in affected programs: 1744 -> 1437 (-17.60%)
helped: 1
HURT: 1

total cycles in shared programs: 172535126 -> 172473226 (-0.04%)
cycles in affected programs: 11338636 -> 11276736 (-0.55%)
helped: 876
HURT: 873

LOST:   8
GAINED: 0

v2: use regs_read() in more places.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:19:43 -04:00
Connor Abbott
c1860299b8 i965/fs: split out calculation of payload live ranges
We'll need this for the scheduler too, since it wants to know when the
live ranges of payload registers end in order to model them in our
register pressure calculations.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:19:33 -04:00
Connor Abbott
45cd76e342 i965: dump scheduling cycle estimates
The heuristic we're using is rather lame, since it assumes everything is
non-uniform and loops execute 10 times, but it should be enough for
measuring improvements in the scheduler that don't result in a change in
the number of instructions.

v2:
- Switch loops and cycle counts to be compatible with older shader-db.
- Make loop heuristic 10x to match with spilling code.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:19:24 -04:00
Connor Abbott
486268bdb0 i965: always run the post-RA scheduler
Before, we would only do scheduling after register allocation if we
spilled, despite the fact that the pre-RA scheduler was only supposed to
be for register pressure and set the latencies of every instruction to
1. This meant that unless we spilled, which we rarely do, then we never
considered instruction latencies at all, and we usually never bothered
to try and hide texture fetch latency. Although a later commit removes
the setting the latency to 1 part, we still want to always run the
post-RA scheduler since it's able to take the false dependencies that
the register allocator creates into account, and it can be more
aggressive than the pre-RA scheduler since it doesn't have to worry
about register pressure at all.

Test                   master      post-ra-sched     diff       %diff
bench_OglPSBump2       396.730     402.386           5.656      +1.400%
bench_OglPSBump8       244.370     247.591           3.221      +1.300%
bench_OglPSPhong       241.117     242.002           0.885      +0.300%
bench_OglPSPom         59.555      59.725            0.170      +0.200%
bench_OglShMapPcf      86.149      102.346           16.197     +18.800%
bench_OglVSTangent     388.849     395.489           6.640      +1.700%
bench_trex             65.471      65.862            0.390      +0.500%
bench_trexoff          69.562      70.150            0.588      +0.800%
bench_heaven           25.179      25.254            0.074      +0.200%

Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>
2015-10-30 02:19:00 -04:00
Connor Abbott
85fce2d2f5 i965/sched: write-after-read dependencies are free
Although write-after-write dependencies have the same latency as
read-after-write dependencies due to how the register scoreboard works,
write-after-read dependencies aren't checked by the EU at all, so
they're purely a constraint on how the scheduler can order the
instructions.

v2: fix accumulator dependencies too.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:18:56 -04:00
Connor Abbott
6f231fddff i965: fix cycle estimates when there's a pipeline stall
The issue time for an instruction is how many cycles it takes to
actually put it into the pipeline. If there's a pipeline stall that
causes the instruction to be delayed, we should first take that into
account to figure out when the instruction would start executing and
*then* add the issue time. The old code had it backwards, and so we
would underestimate the total time whenever we thought there would be a
pipeline stall by up to the issue time of the instruction.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-30 02:18:53 -04:00
Eric Anholt
04c42f3ab5 vc4: Allow user index buffers, to avoid slow readback for shadow IBs.
Improves low-settings openarena performance by 31.9975% +/- 0.659931%
(n=7).
2015-10-29 22:58:01 -07:00
Ilia Mirkin
06fa2e864a nv50: mark contexts shareable, compile at creation time
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 23:25:08 -04:00
Ilia Mirkin
f768eaa87d nv50: allow per-sample interpolation to be forced via rast
Uses the same technique as for nvc0 of fixups before upload, and
evicting in case of state change. Removes one source of variants kept by
st/mesa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 22:42:38 -04:00
Matt Turner
85ee2f7fcf i965: Add INTEL_DEBUG=nocompact to disable instruction compaction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
93268939e4 i965: Add INTEL_DEBUG=hex to print the hex with the disassembly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
18b194925f i965: Print the type and writemask on null destinations.
These are often useful in debugging, and the writemask (actually
"Channel Enables") determines more than just what goes into the
destination.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
bcdf664682 i965: Test fixed_hw_reg.file against BRW_IMMEDIATE_VALUE, not IMM.
No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
ee46c1e626 i965/vec4: Test against BRW_IMMEDIATE_VALUE, not IMM.
No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
8c4151b866 i965/fs: Use group(4, 0) to emit an exec-size 4 MOV.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
9115fa1d13 i965/cfg: Handle no-idom case in cfg_t::dump_domtree().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
5916b073f6 i965/disasm: Remove unused _addr_mode argument from src_ia1().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
e09e5f992e i965: Set correct field for indirect align16 addrimm.
This has been wrong since the initial import of the i965 driver.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
fa142773d9 i965/vec4: Drop brw_set_default_* before popping insn state.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-29 17:51:16 -07:00
Matt Turner
11a7b6bbaa i965/vec4: Remove unnecessary #includes from the generator.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-29 17:51:16 -07:00
Dave Airlie
744cc036b9 r600: enable SB for geom shaders on pre-evergreen
I've checked with piglit and one tests fails, but it fails
on evergreen as well, so will get fixed later.

Otherwise SB seems to be working fine for geom shaders on my
rv635.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-30 10:40:59 +10:00
Kenneth Graunke
c6b24448b5 i965/vec4: Eliminate the vec4_generator class altogether.
We really weren't taking advantage of vec4_generator being a class.
By adding a "p" parameter to the helper methods, and "prog_data" to
ones which need binding table information, we can convert everything
to static functions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-29 16:56:41 -07:00
Kenneth Graunke
1a094a2ee2 i965/vec4: Move vec4_generator class definition into the .cpp file.
The public API for the generator is brw_vec4_generate_code(); nobody
actually needs to use the class.  This means we can extend it without
triggering the recompiles associated with altering brw_vec4.h.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-29 16:56:41 -07:00
Kenneth Graunke
4cba8f5d21 i965/vec4: Wrap vec4_generator in a C function.
vec4_generator is a class for convenience, but only exports a single
method as its public API.  It makes much more sense to just export a
single function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-29 16:56:41 -07:00
Kenneth Graunke
73ff0ead36 i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.
This patch makes the visitor convert registers to the HW_REG file at the
very end, after register allocation, post-RA scheduling, and dependency
control flagging.  After that, everything is in fixed brw_regs.

This simplifies the code generator, as it can just use the hardware
registers rather than having to interpret our abstract files.  In
particular, interpreting the UNIFORM file meant reading prog_data
to figure out where push constants are supposed to start.

Having the part of the code that performs register allocation also
translate everything to hardware registers seems sensible.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-29 16:56:41 -07:00
Ivan Kalvachev
f75f21a24a r600g: Fix special negative immediate constants when using ABS modifier.
Some constants (like 1.0 and 0.5) could be inlined as immediate inputs
without using their literal value. The r600_bytecode_special_constants()
function emulates the negative of these constants by using NEG modifier.

However some shaders define -1.0 constant and want to use it as 1.0.
They do so by using ABS modifier. But r600_bytecode_special_constants()
set NEG in addition to ABS. Since NEG modifier have priority over ABS one,
we get -|1.0| as result, instead of |1.0|.

The patch simply prevents the additional switching of NEG when ABS is set.

[According to Ivan Kalvachev, this bug was fond via
https://github.com/iXit/Mesa-3D/issues/126 and
https://github.com/iXit/Mesa-3D/issues/127]

Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: <mesa-stable@lists.freedesktop.org>
2015-10-29 23:56:57 +01:00
Nicolai Hähnle
24c90888ae st/mesa: fix mipmap generation for immutable textures with incomplete pyramids
Without the clamping by NumLevels, the state tracker would reallocate the
texture storage (incorrect) and even fail to copy the base level image
after reallocation, leading to the graphical glitch of
https://bugs.freedesktop.org/show_bug.cgi?id=91993 .

A piglit test has been submitted for review as well (subtest of
arb_texture_storage-texture-storage).

v2: also bypass all calls to st_finalize_texture (suggested by Marek Olšák)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-29 23:56:57 +01:00
Nanley Chery
65f6caf43e mesa: Enable ASTC in GLES' [NUM_]COMPRESSED_TEXTURE_FORMATS queries
In OpenGL ES, the COMPRESSED_TEXTURE_FORMATS query returns the set of
supported specific compressed formats. Since ASTC formats fit within
that category, include them in the set and update the
NUM_COMPRESSED_TEXTURE_FORMATS query as well.

This enables GLES2-based ASTC dEQP tests to run. See the Bugzilla for
more info.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-29 15:40:37 -07:00
Nanley Chery
8090a1c326 mesa/texcompress: Restrict FXT1 format to desktop GL subset
In agreement with the extension spec and commit
dd0eb00487, filter FXT1 formats to the
desktop GL profiles. Now we no longer advertise such formats as supported
in an ES context and then throw an INVALID_ENUM error when the client
tries to use such formats with CompressedTexImage2D.

Fixes the following 26 dEQP tests:
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_size
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_cube_pos
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_cube
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_tex2d

v2. Use _mesa_is_desktop_gl() (Ilia, Ian)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2015-10-29 15:31:59 -07:00
Samuel Pitoiset
0260620ab3 nvc0: expose a group of performance metrics on Fermi
This allows to monitor those performance metrics through
GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-29 23:01:29 +01:00
Ilia Mirkin
6166a8e369 st/mesa: create temporary textures with the same nr_samples as source
Not sure if this is actually reachable in practice (to have a complex
copy with MS textures).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-29 13:20:45 -04:00
Tapani Pälli
afbe8b6085 glsl: add fragdata arrays to program resource list
This makes sure that user is still able to query properties about
variables that have gotten removed by opt_dead_builtin_varyings pass.

Fixes following OpenGL ES 3.1 test:
   ES31-CTS.program_interface_query.output-layout

No Piglit regressions.

v2: cleanup, drop extra parenthesis (Topi)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-29 17:17:42 +02:00
Tapani Pälli
6ce0857e30 mesa: add fragdata_arrays list to gl_shader
This is required to store information about fragdata arrays, currently
these variables get lost and cannot be retrieved later in sensible way
for program interface queries. List will be utilized by next patch.

Patch also modifies opt_dead_builtin_varyings pass to build list when
lowering fragdata arrays. This is identical approach as taken with
packed varyings pass.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-29 17:16:22 +02:00
Samuel Iglesias Gonsalvez
85f1f04413 glsl: fix GL_BUFFER_DATA_SIZE value for shader storage blocks with unsize arrays
From ARB_program_interface_query:

"For the property of BUFFER_DATA_SIZE, then the implementation-dependent
 minimum total buffer object size, in basic machine units, required to hold
 all active variables associated with an active uniform block, shader
 storage block, or atomic counter buffer is written to <params>.  If the
 final member of an active shader storage block is array with no declared
 size, the minimum buffer size is computed assuming the array was declared
 as an array with one element."

Fixes the following dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.block_array

v2:
- Fix comment's indentation and explain that the parser already
  checked that unsized array is in last element of a shader
  storage block (Iago).
- Add assert (Iago).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-29 08:29:06 +01:00
Kenneth Graunke
cf93251bed docs: Mark GL_ARB_fragment_layer_viewport as done on i965. 2015-10-28 22:05:08 -07:00
Kenneth Graunke
8c902a580a i965: Implement ARB_fragment_layer_viewport.
Normally, we could read gl_Layer from bits 26:16 of R0.0.  However, the
specification requires that bogus out-of-range 32-bit values written by
previous stages need to appear in the fragment shader as-written.

Instead, we pass in the full 32-bit value from the VUE header as an
extra flat-shaded varying.  We have the SF override the value to 0
when the previous stage didn't actually write a value (it's actually
defined to return 0).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Kenneth Graunke
5392328a32 i965: Make calculate_attr_overrides return the URB read offset.
Traditionally, we've hardcoded "URB Entry Read Offset" to 1 (which
represents 2 vec4 varying slots) to skip over the 8 DWord VUE header.

In order to support ARB_fragment_layer_viewport, we'll need to read
from that header.  This patch adds the basic plumbing necessary to
calculate a value dynamically and hook it up in the SBE packets.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Kenneth Graunke
b3d19d20f2 glsl: Mark gl_ViewportIndex and gl_Layer varyings as flat.
Integer varyings need to be flat qualified - all others were already.
I think we just missed this.  Presumably some hardware passes this via
sideband and ignores attribute interpolation, so no one has noticed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Kenneth Graunke
b94cdcdada i965/fs: Properly check for PAD in fragment shaders with > 16 varyings.
Commit 268008f98c changed unused VUE map
slots to be initialized with BRW_VARYING_SLOT_PAD, not COUNT.  I missed
updating this.  It also means that commit message was wrong, as some
code *did* rely slots being initialized to COUNT.

This may fix a bug with SSO programs with > 16 FS input varyings.
I think we probably just emitted extra pointless code, but probably
didn't break anything.  We might also just have no tests for that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Kenneth Graunke
6ae47a3eb4 i965: Update stale comment about unused VUE map slots.
I changed this from COUNT to PAD in commit 268008f98c.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Ilia Mirkin
5227e91580 nv50/ir: adapt to new method for passing in cull/clip distance masks
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 00:44:22 -04:00
Ilia Mirkin
a5bae7b31d nvc0: share shaders between contexts and build immediately
Avoid deferring building shaders until draw time, should hopefully
reduce any stuttering, as well as enable shader-db style analysis.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 00:44:22 -04:00
Ilia Mirkin
b75fff70d8 nvc0: do upload-time fixups for interpolation parameters
Unfortunately flatshading is an all-or-nothing proposition on nvc0,
while GL 3.0 calls for the ability to selectively specify explicit
interpolation parameters on gl_Color/gl_SecondaryColor which would
override the flatshading setting. This allows us to fix up the
interpolation settings after shader generation based on rasterizer
settings.

While we're at it, we can add support for dynamically forcing all
(non-flat) shader inputs to be interpolated per-sample, which allows
st/mesa to not generate variants for these.

Fixes the remaining failing glsl-1.30/execution/interpolation piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 00:44:22 -04:00
Kenneth Graunke
77f58c04cc nir: Copy "patch" flag from ir_variable to nir_variable.
This was introduced in GLSL IR after NIR development had branched.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-28 21:15:29 -07:00
Kenneth Graunke
9c8208f2c1 nir: Add intrinsics for tessellation shader system values.
nir_intrinsic_load_patch_vertices_in corresponds to gl_PatchVerticesIn,
a special input in both the TCS and TES stages.

nir_intrinsic_load_tess_coord corresponds to gl_TessCoord, a special
tessellation evaluation shader input.

nir_intrinsic_load_tess_level_outer/inner correspond to the
gl_TessLevelOuter[] and gl_TessLevelInner[] evaluation shader inputs,
which we treat as system values because they're stored specially.
(These intrinsics are only for the TES - the TCS uses output variables.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-28 21:14:53 -07:00
Kenneth Graunke
bf05af3f0e i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.
Consider the case of two nearly identical GLSL fragment shaders:

   out vec4 color;
   void main() { color = vec4(1); }

and

   layout(early_fragment_tests) in;
   out vec4 color;
   void main() { color = vec4(1); }

These shaders compile to the exact same assembly, but have distinct
values for brw_wm_prog_data::early_fragment_tests.

Since these are two independent GLSL shaders, they have different
program keys - notably, brw_wm_prog_key::program_string_id differs.

When uploading the second, brw_upload_cache will find an existing copy
of the assembly in the cache BO, which means matching_data will be
non-NULL.  Although we create a second cache item (with the new key
and prog_data), we set item->offset to the existing copy and avoid
re-uploading duplicate assembly.

However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if
item->offset differed from the supplied offset.  With reuse, both
programs have the same offset, but prog_data changed.  We have to
flag it, but failed to.

To fix this, we simply need to check if the aux (prog_data) pointer
changed.  If either the assembly or the prog_data differs, flag it.

This fixes a regression since 1bba29ed40,
where Topi fixed brw_upload_cache() to actually reuse identical
assembly.  Prior to that, reuse basically never happened due to bugs.
Unfortunately, this code apparently wasn't prepared to handle reuse!

Fixes GPU hangs in Dolphin on Broadwell.

Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this
and helping track down the real issue.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Pierre Bourdon <delroth@gmail.com>
2015-10-28 21:13:54 -07:00
Laurent Carlier
37402014e8 clover: fix building fix clang-3.8
https://bugs.freedesktop.org/show_bug.cgi?id=92705

v2.1: use Linker::Flags::None instead of 0 and emplace_back()

Signed-off-by: Laurent Carlier <lordheavym@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-29 12:35:37 +09:00
Ilia Mirkin
d0693d7515 nv50: add ARB_copy_image support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-28 20:53:30 -04:00
Ilia Mirkin
ebbd7b41c0 nvc0: add ARB_copy_image support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-28 20:42:59 -04:00
Julien Isorce
3bbb8715ac nvc0: fix crash when nv50_miptree_from_handle fails
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-28 18:26:20 +01:00
Brian Paul
2bf224b3f9 vbo: replace assertion with conditional in vbo_compute_max_verts()
With just the right sequence of per-vertex commands and state changes,
it's possible for this assertion to fail (such as with viewperf11's
lightwave-06-1 test).  Instead of asserting, return 0 so that the
caller knows the VBO is full and needs to be flushed.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-28 11:03:27 -06:00
Brian Paul
8e9c3070bf mesa: minor formatting fix in get_tex_rgba_compressed() 2015-10-28 11:03:21 -06:00
Marek Olšák
f04f13622f st/mesa: implement ARB_copy_image
I wonder if the craziness was worth it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-28 11:52:17 +01:00
Marek Olšák
ce9db16e1c gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS
For ARB_copy_image.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-28 11:52:17 +01:00
Marek Olšák
e82c527f1f radeonsi: allow copying between compatible compressed and uncompressed formats
which is where a block in src maps to a pixel in dst and vice versa.
e.g. DXT1 <-> R32G32_UINT
     DXT5 <-> R32G32B32A32_UINT

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-28 11:52:17 +01:00
Marek Olšák
6a4dc1ad49 mesa: set TargetIndex in VDPAURegister*SurfaceNV (v2)
We initialized Target, but not TargetIndex.
This is required since 7d7dd18711.

v2: do it in the right place. Noticed by Brian Paul.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92645

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-28 11:52:17 +01:00
Emil Velikov
bfc73ff10e i965: remove unneeded src_reg copy in emit_shader_time_write
The variable is already of type src_reg. creating a new instance only to
destroy it seems unnecessary.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-28 02:28:38 -07:00
Emil Velikov
0325f68228 i965: remove cache_aux_free_func array
There is only one function that can be called, which is well known at
compilation time.

The abstraction used here seems unnecessary, so let's use a direct call
to brw_stage_prog_data_free() when appropriate, cut down the size of
struct brw_cache.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-28 02:28:38 -07:00
Samuel Iglesias Gonsalvez
74fcc4c41f main: fix GL_MAX_NUM_ACTIVE_VARIABLES value for shader storage blocks
The maximum number of active variables for shader storage blocks should
take into account the specific rules for shader storage blocks, i.e. for
an active shader storage block member declared as an array, an entry
will be generated only for the first array element, regardless of its type.

Fixes 3 dEQP-GLES31.functional.* tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.block_array

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-28 08:40:51 +01:00
Boyuan Zhang
03c92ffbf6 st/vdpau: disable RefPicList for Vdpau HEVC
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Boyuan Zhang
ad2752e94b st/va: add VAAPI HEVC decode support
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Boyuan Zhang
38c3d7cfc4 radeon/uvd: implement and add flag for VAAPI HEVC decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Boyuan Zhang
231605d14d vl: add RefPicList defines for VAAPI HEVC decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Marta Lofstedt
16c49da63a mesa: Draw indirect is not allowed if the default VAO is bound.
From OpenGL ES 3.1 specification, section 10.5:
"DrawArraysIndirect requires that all data sourced for the
command, including the DrawArraysIndirectCommand
structure,  be in buffer objects,  and may not be called when
the default vertex array object is bound."

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-27 12:16:23 +01:00
Marek Olšák
93eb4f9287 winsys/amdgpu: remove the dcc_enable surface flag
dcc_size is sufficient and doesn't need a further comment in my opinion.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marek Olšák
3aebc596b3 radeonsi: add debug flags that disable DCC and DCC fast clear
For debugging, bug reports, etc.
This is not in the radeonsi directory, but it is about radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marek Olšák
235d38584c radeonsi: properly check if DCC is enabled and allocated
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marek Olšák
5bc5dca0cb radeonsi: simplify DCC handling in si_initialize_color_surface
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marta Lofstedt
3daa7e5147 mesa: Draw indirect is not allowed when xfb is active and unpaused
OpenGL ES 3.1 specification, section 10.5:
"An INVALID_OPERATION error is generated if
transform feedback is active and not paused."

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-27 08:49:21 +01:00
Marta Lofstedt
2c91e08656 mesa: Draw Indirect return wrong error code on unalinged
From OpenGL 4.4 specification, section 10.4 and
Open GL Es 3.1 section 10.5:
"An INVALID_VALUE error is generated if indirect is not a multiple
of the size, in basic machine units, of uint."

However, the current code follow the ARB_draw_indirect:
https://www.opengl.org/registry/specs/ARB/draw_indirect.txt
"INVALID_OPERATION is generated by DrawArraysIndirect and
DrawElementsIndirect if commands source data beyond the end
of a buffer object or if <indirect> is not word aligned."

V2: After discussions on the list, it was suggested to
only keep the INVALID_VALUE error.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-27 08:49:21 +01:00
Samuel Iglesias Gonsalvez
4565b6f4fb main: Remove interface block array index for doing the name comparison
From ARB_program_query_interface spec:

"uint GetProgramResourceIndex(uint program, enum programInterface,
                                   const char *name);
 [...]
 If <name> exactly matches the name string of one of the active resources
 for <programInterface>, the index of the matched resource is returned.
 Additionally, if <name> would exactly match the name string of an active
 resource if "[0]" were appended to <name>, the index of the matched
 resource is returned. [...]"

"A string provided to GetProgramResourceLocation or
 GetProgramResourceLocationIndex is considered to match an active variable
 if:
[...]
   * if the string identifies the base name of an active array, where the
     string would exactly match the name of the variable if the suffix
     "[0]" were appended to the string;
[...]
"

Fixes the following two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Add AoA support (Timothy)
- Apply it too for GetUniformLocation(), GetUniformName() and others
  because ARB_program_interface_query says that they are equivalent
  to GetProgramResourceLocation() and GetProgramResourceName() (Tapani)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-27 08:10:04 +01:00
Eric Anholt
3359ad6cda vc4: Add support for copy propagation with unpack flags present.
total instructions in shared programs: 89251 -> 87862 (-1.56%)
instructions in affected programs:     52971 -> 51582 (-2.62%)
2015-10-26 16:48:34 -07:00
Eric Anholt
01ca4f207e vc4: Rewrite the pack instructions as a MOV with a dst pack flag
Another step in reducing the special-casing of instructions.
2015-10-26 16:48:34 -07:00
Eric Anholt
72fa2ae20b vc4: Move dst pack setup out to a helper function with more asserts. 2015-10-26 16:48:34 -07:00
Eric Anholt
99a9a5a345 vc4: Switch the unpack ops to being unpack flags on a mov.
This paves the way for copy propagating our unpacks.  We end up with a
small change on shader-db:

total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs:     19041 -> 18902 (-0.73%)

which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.
2015-10-26 16:48:34 -07:00
Eric Anholt
548b05d53f vc4: Drop some confused code about pack/unpack handling.
At one point I thought packs and unpacks were in the same field of the
instruction.  They aren't.  These instructions therefore never cause a
pack.

total instructions in shared programs: 89472 -> 89390 (-0.09%)
instructions in affected programs:     15261 -> 15179 (-0.54%)
2015-10-26 16:48:34 -07:00
Eric Anholt
a7b424e835 vc4: Reduce MOV special-casing in QIR-to-QPU.
I'm going to introduce some more types of MOV, which also want the elision
of raw MOVs.
2015-10-26 16:48:34 -07:00
Eric Anholt
652a864b25 vc4: Fix up the test for whether the unpack can be from r4.
We can do 16a/16b from float as well.  No difference on shader-db.
2015-10-26 16:48:34 -07:00
Eric Anholt
3d7a088608 vc4: Don't try to follow MOVs across a pack. 2015-10-26 16:48:34 -07:00
Eric Anholt
6eb0760f48 vc4: Only copy propagate raw MOVs.
No problems being fixed, but needed for the new unpack changes.
2015-10-26 16:48:34 -07:00
Eric Anholt
0ccacfa017 vc4: If a QIR source has an unpack set, print it.
Not used yet, but will be.
2015-10-26 16:48:34 -07:00
Kenneth Graunke
8034e7d6f1 glsl: Convert TES gl_PatchVerticesIn into a constant when using a TCS.
When a TCS is present, the TES input gl_PatchVerticesIn is actually a
constant - it's simply the # of output vertices specified by the TCS
layout qualifiers.  So, we can replace the system value with a constant,
which may allow further optimization, and will likely be more efficient.

If the TCS is absent, we can't do this optimization.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-26 16:37:07 -07:00
Ian Romanick
8f84a8e257 i965: Add missing close-parenthesis in error messages
Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-26 16:15:55 -07:00
Ian Romanick
7070c8879a i965: Fix is-renderable check in intel_image_target_renderbuffer_storage
Previously we could create a renderbuffer with format
MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage,
then FAIL to convert the EGLImage back to a renderbuffer because
reasons.  Just use the same check in
intel_image_target_renderbuffer_storage that brw_render_target_supported
uses.

There are more checks in brw_render_target_supported, but I don't think
they are necessary here.  A different approach would be to refactor
brw_render_target_supported to take rb->Format and rb->NumSamples as
parameters (instead of a gl_renderbuffer) and use the new function here.

Fixes:

    ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476
Cc: "10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-26 16:15:55 -07:00
Timothy Arceri
a3d0359aff glsl: keep track of intra-stage indices for atomics
This is more optimal as it means we no longer have to upload the same set
of ABO surfaces to all stages in the program.

This also fixes a bug where since commit c0cd5b var->data.binding was
being used as a replacement for atomic buffer index, but they don't have
to be the same value they just happened to end up the same when binding is 0.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175
2015-10-27 07:03:05 +11:00
Roland Scheidegger
711489648b gallivm: disable f16c when not using AVX
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919a it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-26 16:45:49 +01:00
Julien Isorce
a61be1a798 st/va: pass picture desc to begin and decode
At least vl_mpeg12_decoder uses the picture
desc in begin_frame and decode_bitstream.

https://bugs.freedesktop.org/show_bug.cgi?id=92634

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-26 13:53:10 +01:00
Tapani Pälli
8ae4317c36 mesa: add additional checks for uniform location query
Patch adds additional check to make sure we don't return locations for
structures or arrays of structures.

From page 79 of the OpenGL 4.2 spec:
    "A valid name cannot be a structure, an array of structures, or any
    portion of a single vector or a matrix."

v2: use without-array() to simplify code (Timothy)

No Piglit or CTS regressions observed.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-26 12:52:17 +02:00
Emil Velikov
a305d59baa docs: add news item and link release notes for 11.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-25 10:17:14 +00:00
Emil Velikov
47dd80a35d docs: add sha256 checksums for 11.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ec14e6f8fd)
2015-10-25 10:14:04 +00:00
Emil Velikov
bddb7a51c3 docs: add release notes for 11.0.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 31bf247031)
2015-10-25 10:14:03 +00:00
Kenneth Graunke
fcb39f5b6a i965: Make brw_varying_to_offset take a const pointer to the VUE map.
It doesn't modify it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-24 20:30:14 -07:00
Eric Anholt
a2eba3362f vc4: Fix names of the 16-bit unpacks
They're only f16-to-f32 on a float operation, otherwise they're
i16-to-i32.
2015-10-24 17:55:55 -07:00
Eric Anholt
a238ad372d vc4: Don't try to register coalesce into the VPM across non-raw MOVs.
No known bugs, just something I noticed while updating optimization code
for other changes.
2015-10-24 17:55:38 -07:00
Eric Anholt
ae1d3322cc vc4: Take advantage of the 8888 pack function in pack_unorm_4x8.
One instruction instead of four, and it turns out you do this a lot for
the Over operator.

total uniforms in shared programs: 32168 -> 32087 (-0.25%)
uniforms in affected programs:     318 -> 237 (-25.47%)
total instructions in shared programs: 89830 -> 89472 (-0.40%)
instructions in affected programs:     6434 -> 6076 (-5.56%)
2015-10-24 17:55:22 -07:00
Eric Anholt
f09ed63f43 vc4: Fix the test for skipping raw MOVs.
I don't know what previous test was trying to do, but it dates back to the
first add of vc4_qpu_emit.c.  No change to shader-db.
2015-10-24 17:55:22 -07:00
Ben Widawsky
9ecfc6baf1 i965: Remove unused devinfo revision
I left the function to obtain the revision because it is, and will continue to
be useful in the future. I'd rather not have to dig it up every time we need it.
Comments left at the implementation to say as much.

This was accidentally left here when I moved the early platform support:
commit 28ed1e08e8
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Fri Aug 7 13:58:37 2015 -0700

    i965/skl: Remove early platform support

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-24 12:48:01 -07:00
Fabio Pedretti
b0342f48d0 docs/index.html: fix typo
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-24 19:27:24 +01:00
Rob Clark
1e8d0cc628 freedreno: remove unnecessary null checks
According to piglit/xonotic/neverball/stc, blend/rasterize/zsa state
will always be bound (never null).  And the null checks were in-
consistent anyways, so remove them.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-24 12:38:33 -04:00
Bas Nieuwenhuizen
6529daca39 radeonsi: Implement DCC fast clear.
Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
still works. Furthermore, with DCC compression we can directly clear
to a limited set of colors such that we do not need a postprocessing step.

v2 Marek: check dcc_buffer && dirty_level_mask in set_sampler_view

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 17:46:08 +02:00
Roland Scheidegger
205a3ce5c1 gallivm: fix tex offsets with mirror repeat linear
Can't see why anyone would ever want to use this, but it was clearly broken.
This fixes the piglit texwrap offset test using this combination.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Roland Scheidegger
71ff5af5dd gallivm: fix sampling with texture offsets in SoA path
When using nearest filtering and clamp / clamp to edge wrapping results could
be wrong for negative offsets. Fix this by adding the offset before doing
the conversion to int coords (could also use floor instead of trunc int
conversion but probably more complex on "typical" cpu).

This fixes the piglit texwrap offset failures with this filter/wrap combo
(which only leaves the linear/mirror repeat combination broken).

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Roland Scheidegger
fb586e1edb softpipe: fix using non-zero layer in non-array view from array resource
For vertex/geometry shader sampling, this is the same as for llvmpipe - just
use the original resource target.
For fragment shader sampling though (which does not use first-layer based mip
offsets) adjust the sampling code to use first_layer in the non-array cases.
While here also fix up some code which looked wrong wrt buffer texel fetch
(no piglit change).

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Roland Scheidegger
fe707c0373 llvmpipe: fix using non-zero layer in non-array view from array resource
Just need to use resource target not view target when calculating
first-layer based mip offsets. (This is a gl specific problem since
d3d10 does not distinguish between non-array and array resources neither
at the resource nor view level, only at the shader level.)
Fixes new piglit arb_texture_view sampling-2d-array-as-2d-layer test.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Alex Deucher
830e57b82d radeonsi: add Stoney to si_init_gs_info()
This patch was originally written before stoney support
was merged.  Add stoney.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-23 18:56:45 -04:00
Bas Nieuwenhuizen
48b5f104ac radeonsi: Enable DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:30 +02:00
Bas Nieuwenhuizen
81ebd6a882 radeonsi: Add FLUSH_AND_INV_CB_DATA_TS for DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:28 +02:00
Bas Nieuwenhuizen
bb77467df9 radeonsi: Disable operations that do not work with DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:24 +02:00
Bas Nieuwenhuizen
afa357c3b0 radeonsi: Allocate buffers for DCC.
As the alignment requirements can be 32 KiB or more, also adding
an aligned buffer creation function.

DCC is disabled for textures that can be shared as sharing the
DCC buffers has not been implemented yet.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:01 +02:00
Marek Olšák
edf6a4537c radeonsi: only apply the SNORM blit workaround to *8_SNORM
Like the comment says. This fixes DCC, which doesn't like blitting RG16
as RGBA8.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
e1c098f238 util/format: add helper util_format_is_snorm8
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
06083046a4 radeonsi: add another requirement for PARTIAL_ES_WAVE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
0d2cb35f68 radeonsi: merge two ifs setting WD_SWITCH_ON_EOP
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
ca18f12dbb radeonsi: make PARTIAL_ES_WAVE globally dependent on SWITCH_ON_EOI
This catches the other cases that enable SWITCH_ON_EOI.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
2070af2fb1 radeonsi: add one more SWITCH_ON_EOI requirement for Hawaii and VI
The VI condition depends on geometry shaders and MAX_PRIMGRP_IN_WAVE.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
a6b5684e99 radeonsi: only apply the instancing bug workaround to Bonaire
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
96d5879d38 radeonsi: add SWITCH_ON_EOI requirement for 4 SE parts
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
7e056f872f radeonsi: remove unnecessary PARTIAL_VS_WAVE setting for streamout
hardware does this automatically

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
3a157e6e68 radeonsi: allow unbinding vertex shaders
Draw calls without a vertex shader are skipped.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
07b3cc6ecf radeonsi: allow unbinding pixel shaders and remove the dummy shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
50bb2decf7 radeonsi: add draw_vbo check for a NULL pixel shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
ed95cb3a31 radeonsi: add checks for a NULL pixel shader
This will allow removing the dummy PS.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
d842d2f251 gallium/util: add a test for NULL fragment shaders
Just to validate that radeonsi doesn't crash.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
dd05824b89 st/mesa: don't load state parameters if there are none
Out of 7063 shaders from my shader-db:
- 6564 (93%) shaders don't have any state parameters.
- 347 (5%) shaders have 1 state parameter for WPOS lowering.
- The remaining 2% have more state parameters, usually matrices.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-24 00:01:20 +02:00
Samuel Li
98546bfd03 radeonsi: add Stoney pci ids
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-10-23 17:53:48 -04:00
Samuel Li
bf0d0ce0d5 radeonsi: add support for Stoney asics (v3)
v2 (agd): rebase on mesa master, split pci ids to
separate commit
v3 (agd): use carrizo for llvm processor name for
llvm 3.7 and older

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-10-23 17:53:14 -04:00
Ilia Mirkin
e05021ff72 nvc0: respect edgeflag attribute width
The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with
plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for
the edgeflag in the pointer case.

Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind
complaints about reading uninitialized memory).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-23 16:43:06 -04:00
Jose Fonseca
ea421e919a gallivm: Explicitly disable unsupported CPU features.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92214
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-23 20:25:19 +01:00
Eric Anholt
70b06fb5d5 vc4: Convert blending to being done in 4x8 unorm normally.
We can't do this all the time, because you want blending to be done in
linear space, and sRGB would lose too much precision being done in 4x8.
The win on instructions is pretty huge when you can, though.

total uniforms in shared programs: 32065 -> 32168 (0.32%)
uniforms in affected programs:     327 -> 430 (31.50%)
total instructions in shared programs: 92644 -> 89830 (-3.04%)
instructions in affected programs:     15580 -> 12766 (-18.06%)

Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
2015-10-23 18:11:21 +01:00
Eric Anholt
8e701fda49 vc4: Add QIR/QPU support for the 8-bit vector instructions. 2015-10-23 18:11:21 +01:00
Eric Anholt
817a7eb588 vc4: Don't try to CSE non-SSA instructions.
This can happen when we're doing destination packing -- we don't know
what's in the rest of the register.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-10-23 18:11:21 +01:00
Eric Anholt
5b2fb138bc nir: Add opcodes for saturated vector math.
This corresponds to instructions used on vc4 for its blending inside of
shaders.  I've seen these opcodes on other architectures before, but I
think it's the first time these are needed in Mesa.

v2: Rename to 'u' instead of 'i', since they're all 'u'norm (from review
    by jekstrand)
2015-10-23 18:11:21 +01:00
Eric Anholt
1066a372d8 vc4: Add dumping of VC4_PACKET_GL_INDEXED_PRIMITIVE. 2015-10-23 18:11:21 +01:00
Eric Anholt
7d7fbcdf4e vc4: Add a workaround for HW-2116 (state counter wrap fails).
I haven't proven that this happens (I've got other GPU hangs in the
way), but the closed driver also does this and it's documented as an
errata.
2015-10-23 18:11:21 +01:00
Eric Anholt
73f6104532 vc4: Fix missing \n in a perf_debug(). 2015-10-23 18:11:21 +01:00
Kristian Høgsberg Kristensen
8f60dc83f7 i965/fs: Allow copy propagating into new surface access opcodes
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
0cb7d7b4b7 i965/fs: Optimize ssbo stores
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Write groups of enabled components together.

Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
feff21d1a6 i965/fs: Drop offset_reg temporary in ssbo load
Now that we don't read each component one-by-one, we don't need the
temoprary vgrf for the offset. More importantly, this register was type
UD while the nir source was type D. This broke copy propagation and left
a redundant MOV in the generated code.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
0a5a738252 i965/fs: Avoid scalar destinations in emit_uniformize()
The scalar destination registers break copy propagation. Instead compute
the results to a regular register and then reference a component when we
later use the result as a source.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
a19bf6d3cc i965/fs: Don't uniformize surface index twice
The emit_untyped_read and emit_untyped_write helpers already uniformize
the surface index argument. No need to do it before calling them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
aedc0aab19 i965/fs: Use unsigned immediate 0 when eliminating SHADER_OPCODE_FIND_LIVE_CHANNEL
The destination for SHADER_OPCODE_FIND_LIVE_CHANNEL is always a UD
register.  When we replace the opcode with a MOV, make sure we use a UD
immediate 0 so copy propagation doesn't bail because of non-matching
types.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
24a3a697e5 i965/fs: Read all components of a SSBO field with one send
Instead of looping through single-component reads, read all components
in one go.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
de5a450bd3 i965: Don't use message headers for untyped reads
We always set the mask to 0xffff, which is what it defaults to when no
header is present. Let's drop the header instead.

v2: Only remove header for untyped reads. Typed reads always need the
    header.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Alejandro Piñeiro
2f1bc1da86 i965/vec4: check opcode on vec4_instruction::reads_flag(channel)
Commit f17b78 added an alternative reads_flag(channel) that returned
if the instruction was reading a specific channel flag. By mistake it
only took into account the predicate, but when the opcode is
VS_OPCODE_UNPACK_FLAGS_SIMD4X2 there isn't any predicate, but the flag
are used.

That mistake caused some regressions on old hw. More information on
this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=92621

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-23 18:11:09 +02:00
Eric Anholt
fb064901e9 vc4: Use Rob's NIR-based user clip lowering. 2015-10-23 14:30:15 +01:00
Eric Anholt
b3797a8f88 vc4: Also dump the decimation mode for resolved stores. 2015-10-23 14:30:15 +01:00
Eric Anholt
7516cbd261 vc4: Use VC4_GET_FIELD and other defines in dumping VC4_RENDER_CONFIG. 2015-10-23 14:30:15 +01:00
Eric Anholt
b0963ce758 vc4: Add a sentinel after simulator buffers for buffer overflow detection.
This is a little bit like the mprotect-based fencing I've experimented
with, but it's simple and low overhead.  The downside is that only catches
writes, not reads.

It didn't catch any bad writes on a current piglit run, but may be useful
in the future.
2015-10-23 14:29:07 +01:00
Samuel Iglesias Gonsalvez
f408a13dd3 glsl: fix shader storage block member rules when adding program resources
Commit f24e5e did not take into account arrays of named shader
storage blocks.

Fixes 20 dEQP-GLES31.functional.ssbo.* tests:

dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.2
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.29
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.33
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.3

V2:
- Rename some variables (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-10-23 13:12:43 +02:00
Chia-I Wu
582ecb3b91 ilo: add support for scratch spaces
When a kernel reports a non-zero per-thread scratch space size, make sure the
hardware state is correctly set up, and a scratch bo is allocated.
2015-10-23 17:29:58 +08:00
Chia-I Wu
4a7d18296a ilo: fix scratch space setup in core
Move scratch_size out of ilo_state_shader_kernel_info and
ilo_state_compute_interface_info.  A scratch space is shared by all
kernels/interfaces.  Update builder to emit relocs for scratch bos.
2015-10-23 17:29:58 +08:00
Timothy Arceri
3994ef5f1b glsl: remove excess location qualifier validation
Location has never been able to be a negative value because it has
always been validated in the parser.

Also the linker doesn't check for negatives like the comment claims.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-23 17:05:56 +11:00
Dave Airlie
5145243018 docs: update relnotes to mention virgl driver. 2015-10-23 14:40:07 +10:00
Dave Airlie
b3b82fe8ea virgl/vtest: add vtest driver
virgl/vtest is a swrast driver that allows the
virgl acceleration to be tested without having
a virtual machine.

The backend has a unix socket server that
this connects to.

This is run by setting
LIBGL_ALWAYS_SOFTWARE=y
GALLIUM_DRIVER=virpipe

In this mode all renderering is sent over
a socket to the remote renderer, and the
results are readback and copies to the screen
using drisw. This works well enough to develop
new features and to help debug.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-23 14:40:07 +10:00
Dave Airlie
a8987b88ff virgl: add driver for virtio-gpu 3D (v2)
virgl is the 3D acceleration backend for the
virtio-gpu shipping with qemu.

The 3D acceleration is designed around gallium
and TGSI as the virtualisation layer. The backend
renderer translates the virgl interface into
OpenGL currently.

This is the initial import of the driver to mesa.

The kernel driver portions are lined up for drm-next.

Currently this driver supports up to GL3.3 and some
misc extensions if the host driver exposes it. It is
planned to iterate the virgl API to new GL levels
as mesa host drivers gain features.

v2: fix resource tracking across flushes to avoid
->bind hack in mapping.
consolidate mapping and waiting code for transfers.
use u_range for dirt tracking.
handle larger shaders in protocol.
include virtgpu_drm.h in mesa for now.
add translation layer for gallium tgsi to virgl tgsi.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-23 14:40:07 +10:00
Dave Airlie
531f5d1270 tgsi: try and handle overflowing shaders. (v2)
This is used to detect error in virgl if we overflow the shader
dumping buffers.

v2: return a bool.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-23 11:57:56 +10:00
Dave Airlie
041081dc21 tgsi: add option to dump floats as hex values
This adds support to the parser to accept hex values as floats,
and then adds support to the dumper to allow the user to select
to dump float as 32-bit hex numbers.

This is required to get accurate values for virgl use of TGSI.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-23 11:55:02 +10:00
Sinclair Yeh
231d539239 svga: Condition preemptive flush on draw emission
On ultra high resolution modes, the preemptive flush flag can be
set midway through command submission, a condition that cannot be
recovered from a flush-retry, causing rendering artifacts.

This patch prevents a preemtive_flush until a draw has been
emitted.

Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
99effaa965 svga: try to avoid index generation for some primitive types
The svga device doesn't directly support quads, quad strips or polygons
so we have to convert those types to indexed triangle lists.  But we
can sometimes avoid that if we're drawing flat/constant-colored prims
and we don't have to worry about provoking vertex.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
129d34da49 svga: avoid provoking vertex conversion when possible
Provoking vertex comes into play when doing flat shading.  But if we know
that all fragments in a primitive are the same color, the provoking vertex
doesn't matter.  Check for that case and use whichever provoking vertex
convention is supported by the device.

This avoids generating an index buffer to do the PV conversion.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
1082735bb6 svga: detect constant color writes in fragment shaders
Examine the fragment shader to try to detect TGSI shaders which use
"MOV OUT[0], CONST[i]" to write a constant value for the fragment color.
In this case, all fragments will have the same color (unless blending is
enabled).

This is a common case for OpenGL code such as: glColor(), glBegin(),
glVertex(), ..., glEnd() when lighting/fog/etc are disabled.  In this
case, the Mesa/gallium state tracker actually generates a simple
"MOV OUT[0], CONST[i]" fragment shader.

This will be used by the next commit to avoid provoking vertex conversion
(creating/rewriting an index buffer) when drawing flat-shaded primitives.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
df0f817e31 mesa: check for unchanged line width before error checking
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 17:19:20 -06:00
Brian Paul
990afdc045 st/mesa: use _mesa_RasterPos() when possible
The st_RasterPos() function goes to great pains to implement the
rasterpos transformation.  It basically uses gallium's draw module to
execute the vertex shader to draw a point, then capture that point's
attributes.

But glRasterPos isn't typically used with a vertex shader so we can
usually use the old/fixed-function implementation which is a lot simpler
and faster.

This can add up for legacy apps that make a lot of calls to glRasterPos.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
af0399a1ce tnl: remove t_rasterpos.c
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
234d5320bb drivers/common: use _mesa_RasterPos instead of _tnl_RasterPos
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
614a743767 mesa: copy rasterpos evaluation code into core Mesa
We'll remove it from the tnl module next.  By lifting this code into core
Mesa we can use it from the gallium state tracker.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-22 17:19:20 -06:00
Brian Paul
9919f56099 vbo: optimize vertex copying when 'wrapping'
Instead of calling memcpy() 'n' times, we can do it all at once since
the source and dest regions are all contiguous.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 17:19:20 -06:00
Alex Deucher
7b63658125 radeon/uvd: don't expose HEVC on old UVD hw (v3)
The section for UVD 2 and older was not updated
when HEVC support was added. Reported by Kano
on irc.

v2: integrate the UVD2 and older checks into the
main switch statement.
v3: handle encode checking as well.  Encode is
already checked in the top case statement, so
drop encode checks in the lower case statement.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-10-22 16:22:44 -04:00
Alejandro Piñeiro
8cf84a7e47 i965/vec4: print predicate control at brw_vec4 dump_instruction
v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding
    a copy on brw_vec4.c, as suggested by Matt Turner

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
92ae101ed0 i965/vec4: use an envvar to decide to print the assembly on cmod_propagation tests
The complete way to do this would be parse INTEL_DEBUG and
print the output if DEBUG_VS (or a new one) is present
(see intel_debug.c).

But that seems like an overkill for the unit tests, that
after all, the most common use case is being run when
calling make check.

v2: use the same idea for the fs counterpart too, as suggested by
    Matt Turner

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
8fc8fcc04f i965/vec4: Add unit tests for cmod propagation pass
This include the same tests coming from test_fs_cmod_propagation, (non
vector glsl types included) plus some new with vec4 types, inspired on
the regressions found while the optimization was a work in progress.

Additionally, the check of number of instructions after the
optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to
avoid a crash on failing tests that expected no optimization, as after
checking the number of instructions, there were some checks related to
this last instruction opcode/conditional mod.

v2: update tests after Matt Turner's review of the optimization pass
v3: tweaks on the tests (mostly on the comments), after Matt Turner's
    review

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
627f94b72e i965/vec4: adding vec4_cmod_propagation optimization
vec4 port of fs_cmod_propagation.

Shader-db results (no vec4 grepping):
total instructions in shared programs: 6240413 -> 6235841 (-0.07%)
instructions in affected programs:     401933 -> 397361 (-1.14%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                2265
HURT:                                  0

v2: remove extra space and combine two if blocks, as suggested by
    Matt Turner
v3: add condition check to bail out if current inst and inst being
    scanned has different writemask, as pointed by Matt Turner
v3: updated shader-db numbers
v4: remove block from foreach_inst_in_block_*_starting_from after
    commit 801f151917

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
a59359ecd2 i965/vec4: track and use independently each flag channel
vec4_live_variables tracks now each flag channel independently, so
vec4_dead_code_eliminate can update the writemask of null registers,
based on which component are alive at the moment. This would allow
vec4_cmod_propagation to optimize out several movs involving null
registers.

v2: added support to track each flag channel independently at vec4
    live_variables, as v1 assumed that it was already doing it, as
    pointed by Francisco Jerez
v3: general cleaningn after Matt Turner's review

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-22 21:58:03 +02:00
Alejandro Piñeiro
8ac3b525c7 i965/vec4: nir_emit_if doesn't need to predicate based on all the channels
v2: changed comment, as suggested by Matt Turner

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-22 21:58:03 +02:00
Matt Turner
1095d837dc i965/vec4/gs: Fix signed/unsigned comparison warning. 2015-10-22 12:27:04 -07:00
Matt Turner
e2707c8765 i965/fs: Emit a single ADD instruction for SET_SAMPLE_ID on Gen8+.
Gen8+ lifted the register region restriction that an instruction whose
destination spans two registers must have sources that also span two
registers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-22 12:27:00 -07:00
Matt Turner
0f74796e33 i965/fs: Drop unnecessary write-enable-all from SET_SAMPLE_ID.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-22 12:26:57 -07:00
Matt Turner
e2344e11ce i965/fs: Trim unneeded channels in SampleID setup.
The AND and SHR produce a scalar value that we had been replicating
across $dispatch_width channels. The immediate MOV produces only four
useful channels of data.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-22 12:26:54 -07:00
Matt Turner
e10fc055e7 i965/fs: Use type-W for immediate in SampleID setup.
Not a functional difference, but register is loaded with a signed
immediate (V) and added to a signed type (D) producing a signed result
(D).

Also change the type of g0 to allow for compaction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-22 12:26:49 -07:00
Matt Turner
cfb67c3d06 i965/vec4: Initialize LOD to 0.0f for textureQueryLevels() and texture().
We implement textureQueryLevels (which takes no arguments, save the
sampler) using the resinfo message (which takes an argument of LOD).
Without initializing it, we'd generate a MOV from the null register to
load the LOD argument.

Essentially the same logic applies to texture. A vertex shader cannot
compute derivatives and so cannot produce an LOD, so TXL with an LOD of
0.0 is used.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-22 10:16:52 -07:00
Matt Turner
65ffaf2740 i965: Note that the UV immediate type is Gen6+. 2015-10-22 10:16:52 -07:00
Jose Fonseca
718249843b gallivm: Translate all util_cpu_caps bits to LLVM attributes.
This should prevent disparity between features Mesa and LLVM
believe are supported by the CPU.

http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990

Tested on a i7-3720QM w/ LLVM 3.3 and 3.6.

v2: Increase SmallVector initial size as suggested by Gustaw Smolarczyk.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-22 11:11:40 +01:00
Jordan Justen
627c15cde4 i965/fs: Disable CSE optimization for untyped & typed surface reads
An untyped surface read is volatile because it might be affected by a
write.

In the ES31-CTS.compute_shader.resources-max test, two back to back
read/modify/writes of an SSBO variable looked something like this:

  r1 = untyped_surface_read(ssbo_float)
  r2 = r1 + 1
  untyped_surface_write(ssbo_float, r2)
  r3 = untyped_surface_read(ssbo_float)
  r4 = r3 + 1
  untyped_surface_write(ssbo_float, r4)

And after CSE, we had:

  r1 = untyped_surface_read(ssbo_float)
  r2 = r1 + 1
  untyped_surface_write(ssbo_float, r2)
  r4 = r1 + 1
  untyped_surface_write(ssbo_float, r4)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-22 00:36:37 -07:00
Chia-I Wu
13a5805b64 ilo: make sure there is HiZ before resolving
We do not want to perform a depth resolve on an MCS enabled surface.
2015-10-22 14:06:21 +08:00
Chia-I Wu
0b6f6ee50f ilo: fix max thread count for HS on Gen8
It is in DW2 on Gen8.
2015-10-22 14:06:21 +08:00
Ben Widawsky
8eefdacb38 i965: Advertise ARB_shader_stencil_export (gen9+)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 21:14:44 -07:00
Ben Widawsky
1db44252d0 i965: Implement ARB_shader_stencil_export (gen9+)
v2: remove useless source_stencil_to_render_target (Ken)
Squash in the actual packing function, which also got to
  v2:
Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt)
Reorder the regioning to be in VWH order (Matt)
Don't retype src in the backend, just assert instead (Matt)
Rename the debug prints to something better (Matt)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 21:14:44 -07:00
Ben Widawsky
5fa7114652 i965/fs: Enumerate logical fb writes arguments
Gen9 adds the ability to write out a stencil value, so we need to expand the
virtual payload by one. Abstracting this now makes that change easier to read.

I was admittedly confused early on about some of the hardcoding. If people
believe the resulting code is inferior, I am not super attached to the patch.

v2:
Remove explicit numbering from the enumeration (Matt).
Use a real naming scheme, and reference it in the opcode definition (Curro)
Add a missed hardcoded logical position in get_lowered_simd_width (Ben)
Add an assertion to make sure the component numbering is correct (Ben)

Cc: Matt Turner <mattst88@gmail.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 21:14:44 -07:00
Brian Paul
18a631eb90 svga: fix clip plane regression after recent tgsi_scan change
Before the change "tgsi/scan: use properties for clip/cull distance
writemasks", the tgsi_shader_info::num_written_clipdistance field
was a multiple of four, now it's an accurate count.  In the svga
driver, we need a minor change to the loop test.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-21 17:12:19 -06:00
Kenneth Graunke
48c76eae8e i965: Implement gl_InvocationID.
It's stored in bits 31:27 of g1 (along with the URB handles).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:58 -07:00
Kenneth Graunke
c5ae34f38f i965: Implement nir_intrinsic_load_primitive.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:56 -07:00
Kenneth Graunke
b3ebf03b84 i965: Add a fs_visitor constructor that takes a brw_gs_compile.
Unlike the vs/wm structs, brw_gs_compile is actually useful: it contains
the input VUE map and information about the control data headers.
Passing this in allows us to share that code in brw_gs.c, and calculate
them before deciding on vec4 vs. scalar mode, as it's independent of
that choice.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:54 -07:00
Kenneth Graunke
55dfd39b5f i965: Add a brw->scalar_gs flag controlled by INTEL_SCALAR_GS=1.
This patch introduces a brw->scalar_gs flag, similar to brw->scalar_vs,
which controls whether or not to use SIMD8 geometry shaders.

For now, we control it via a new environment variable, INTEL_SCALAR_GS.
This provides a convenient way to try it out.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:53 -07:00
Kenneth Graunke
ac0a33666b i965: Make emit_urb_writes() reserve space for GS header information.
Geometry shaders have additional header data at the beginning of their
output URB entries.  Shaders that use EndPrimitive() or multiple streams
have a control data header; shaders with a dynamic vertex count have an
additional vec4 slot to hold the 32-bit vertex count (and 96 bits of
padding).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:52 -07:00
Kenneth Graunke
cb755996d9 i965: Make emit_urb_writes() only set EOT for the VS.
The GS will emit a bunch of vertices, and we don't want to do an EOT
prematurely.  We'll emit GS_OPCODE_THREAD_END when we want to terminate
the thread.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:50 -07:00
Kenneth Graunke
6ae419b94d i965: Make fs_visitor::emit_urb_writes reusable for scalar GS.
GS doesn't have ClampVertexColor, and we don't want to go through VS
structures.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:49 -07:00
Kenneth Graunke
72d84ae7ce i965: Introduce a brw_vue_prog_data::include_vue_handles flag.
Tessellation shaders and SIMD8 geometry shaders may need to resort to
the pull model for inputs at times.  When set, the state upload code
will tell the hardware to provide URB handles for input data.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:48 -07:00
Kenneth Graunke
ac98888afd i965: Introduce a new SHADER_OPCODE_URB_READ_SIMD8 opcode.
In scalar mode, geometry shader inputs can easily take up hundreds of
registers.  This makes pushing VUE entries impractical; we'll need to
resort to the pull model in some cases.

To support this, we introduce a new opcode corresponding to the "URB
Read SIMD8" message.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:46 -07:00
Kenneth Graunke
bea7522782 i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes.
In the vec4 backend, we have a vec4_instruction::urb_write_flags field.
There are many kinds of flags for SIMD4x2 messages.

However, there are really only two (per-slot offset, use channel masks)
for SIMD8 messages.  Rather than adding a boolean flag for per-slot
offsets (polluting all instructions), I decided to just make three new
opcodes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-21 14:27:41 -07:00
Jason Ekstrand
0e57694745 i965/gs: Do prog_data setup and other calculations in brw_compile_gs
This commit moves the large pile of setup calculations we have to do for
geometry shaders out of brw_gs_emit and into brw_compile_gs.  This has a
couple of nice implications.  First, it's less work that the caller of
brw_compile_gs has to do.  Second, it's consistent with the vertex and
fragment stages.  Finally, it allows us to put brw_gs_compile back behind
the API boundary where it belongs.

v2 (Jason Ekstrand):
 - Pull the changes to use nir info into a separate patch
 - Put brw_gs_compile into brw_shader.h rather than brw_vec4_gs_visitor.h
   so that we can use it for scalar GS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
f3bc73073a i965/gs: Use NIR info for setting up prog_data
Previously, we were pulling bits from GL data structures in order to set up
the prog_data.  However, in this brave new world of NIR, we want to be
pulling it out of the NIR shader whenever possible.  This way, we can move
all this setup code into brw_compile_gs without depending on the old GL
stuff.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
fac9b21e03 i965/gs: Pull prog_data out of brw_gs_compile
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
6ac2bbec16 i965/gs: Use NIR instead of the brw_geometry_program for GS metadata
With this, we can remove the geometry program from brw_gs_compile.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
72148de217 i965/gs: Move the mem_ctx argument to brw_compile_gs
This makes it better match the other brw_compile_* functions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
8e8b527b27 i965/gs: Set static_vertex_count unconditionally on GEN8+
We always have NIR, so there's no reason for the check.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
2686477d37 nir: Constify nir_gs_count_vertices
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Jason Ekstrand
4eb84a03be nir/info: Add more information about geometry shaders
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 14:20:32 -07:00
Ben Widawsky
3c5d24363a i965: (trivial) rename computes stencil to gen9
All the documentation I can find says that this bit (and functionality) only
exists on SKL+. Since the bit isn't yet used, there is no real impact here.

The original code was added by Ken here (a surprisingly long time ago):
commit f3c6d6f1e1
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Thu Nov 29 21:00:27 2012 -0800

    i965: Update 3DSTATE_PS, 3DSTATE_WM, and add 3DSTATE_PS_EXTRA.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-21 11:00:03 -07:00
Ben Widawsky
c643518452 i965: Correct the comment about fb write payload
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-21 11:00:00 -07:00
Nanley Chery
f1147a238a mesa/glformats: Undo code changes from _mesa_base_tex_format() move
The refactoring commit, c6bf1cd, accidentally reverted cd49b97
and 99b1f47. These changes caused more code to be added to the
function and removed the existing support for ASTC. This patch
reverts those modifications.

v2. Actually include ASTC support again.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92221
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-21 10:36:31 -07:00
Matt Turner
2ce659b5e4 i965: Mark compacted 3-src instructions as Gen8+.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
05cc56cca3 i965: Add const to brw_compact_inst_bits.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
b29f92daec i965: Add mask_control_ex field and handle it in compaction.
Documentation is sparse, but it appears to have existed on G45 and ILK
as a second bit extension of the mask_control field. Setting the pair of
bits to 0b11 enables "NoCMask".

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
3ec9d96d43 i965: Add devinfo->gen assertions for acc_wr_control.
... and for flag_subreg_nr since it's right near by.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
d14907b946 i965: Prepare for next commit by adding more whitespace.
We're going to add a field with a longer name that wouldn't align with
the rest.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:38 -07:00
Matt Turner
35f3f06c8a i965: Compact acc_wr_control only on Gen6+.
It only exists on Gen6+, and the next patches will add compaction
support for the (unused) field in the same location on earlier
platforms.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:37 -07:00
Matt Turner
ee868c46e8 i965: Add devinfo parameter to brw_compact_inst_* funcs.
The next commit will add assertions dependent on devinfo->gen.

Use compact()/uncompact() macros where possible, like the 3-src code
does.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:37 -07:00
Matt Turner
4a132349c3 i965/vec4: Don't emit MOVs for unused URB slots.
Otherwise we'd emit a MOV from the null register (which isn't allowed).

Helps 24 programs in shader-db (the geometry shaders in GSCloth):

instructions in affected programs:     302 -> 262 (-13.25%)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-21 10:17:37 -07:00
Nigel Stewart
04703762e5 osmesa: Expose GL entry points for Windows build via DEF file.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92437
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-21 14:06:58 +01:00
Jonathan Gray
99c4079c37 configure.ac: ensure RM is set
GNU make predefines RM to rm -f but this is not required by POSIX
so ensure that RM is set.  This fixes "make clean" on OpenBSD.

v2: use AC_CHECK_PROG

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-21 14:09:38 +01:00
Neil Roberts
ee77796a5c i965/fs: Disable opt_sampler_eot for more message types
In bfdae9149e I disabled the opt_sampler_eot optimisation for TG4
message types because I found by experimentation that it doesn't work.
I wrote in the comment that I couldn't find any documentation for this
problem. However I've now found the documentation and it has
additional restrictions on further message types so this patch updates
the comment and adds the others.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-21 11:08:37 +02:00
Neil Roberts
801f151917 i965: Remove block arg from foreach_inst_in_block_*_starting_from
Since 49374fab5d these macros no longer actually use the block
argument. I think this is worth doing to make the macros easier to use
because they already have really long names and a confusing set of
arguments.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-21 11:07:04 +02:00
Timothy Arceri
38ceeeadaa glsl: check for arrays of arrays when assigning explicit locations
This fixes assigning explicit locations in the CTS test:

ES31-CTS.explicit_uniform_location.uniform-loc-arrays-of-arrays

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-21 15:49:32 +11:00
Timothy Arceri
9a04057ef1 glsl: add is_array_of_arrays() helper
As suggested by Ian Romanick

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-21 15:49:17 +11:00
Kenneth Graunke
156b7d3113 glsl: Fix bad indentation in bit_logic_result_type().
The first level of indentation was using 4 spaces.  Mesa uses 3.

Trivial.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-20 21:25:11 -07:00
Timothy Arceri
fd01840c0b glsl: add AoA support to subroutines
process_parameters() will now be called earlier because we need
actual_parameters processed earlier so we can use it with
match_subroutine_by_name() to get the subroutine variable, we need
to do this inside the recursive function generate_array_index() because
we can't create the ir_dereference_array() until we have gotten to the
outermost array.

For the remainder of the array dimensions the type doesn't matter so we
can just use the existing _mesa_ast_array_index_to_hir() function to
process the ast.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-10-21 14:56:57 +11:00
Tapani Pälli
a59c1adcc6 glsl: fix record type detection in explicit location assign
Check current_var directly instead of using the passed in record_type.

This fixes following failing CTS test:
	ES31-CTS.explicit_uniform_location.uniform-loc-types-structs

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-21 06:12:15 +03:00
Tapani Pälli
1f48ea1193 glsl: do not try to reserve explicit locations for buffer variables
Explicit locations are only used with uniform variables.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-21 06:11:38 +03:00
Tapani Pälli
96bbb3707f glsl: skip buffer variables when filling UniformRemapTable
UniformRemapTable is used only for remapping user specified uniform
locations to driver internally used ones, shader storage buffer
variables should not utilize uniform locations.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-21 06:10:52 +03:00
Brian Paul
f1682fdafa svga: add switch case for PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT
A third instance of this was needed but missed in the previous commit.
Return 32 as for the two other cases.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-20 19:14:51 -06:00
Brian Paul
b48e16fa2f draw: fix splitting of line loops (v2)
When the draw module splits long line loops, the sections are emitted
as line strips.  But the primitive type wasn't set correctly so each
section was being drawn as a loop, introducing extra line segments.

To fix this, we pass a new DRAW_LINE_LOOP_AS_STRIP flag to the run()
function.  The linear/elt_run() functions have to check for this flag
and set their primitive type accordingly.

No piglit regressions.  Fixes piglit's lineloop with -count 4097 or
higher.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-20 19:14:51 -06:00
Anuj Phogat
876d07d837 i965/gen9: Remove temporary variable 'bpp' in tr_mode_..._texture_alignment()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-20 13:26:25 -07:00
Anuj Phogat
06ec19bca4 i965/gen9: Remove temporary variable 'align_yf' in tr_mode_..._texture_alignment()
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-20 13:26:25 -07:00
Anuj Phogat
8f8c450bc7 i965/gen9: Remove parameter 'brw' from tr_mode_..._texture_alignment()
V2: Rebased on master.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-20 13:26:25 -07:00
Anuj Phogat
a5a00bd747 i965/gen9: Reuse YF alignment tables in tr_mode_..._texture_alignment()
Patch just does some refactoring to make the code look better. No
functional changes in here.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-20 13:26:25 -07:00
Brian Paul
f221580937 vbo: convert display list GL_LINE_LOOP prims to GL_LINE_STRIP
When a long GL_LINE_LOOP prim was split across primitives we drew
stray lines.  See previous commit for details.

This patch converts GL_LINE_LOOP prims into GL_LINE_STRIP prims so
that drivers don't have to worry about the _mesa_prim::begin/end flags.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
d79595bf02 vbo: fix GL_LINE_LOOP stray line bug
When long GL_LINE_LOOP primitives don't fit in one vertex buffer they
have to be split across buffers.  The code to do this was basically correct
but drivers had to pay special attention to the _mesa_prim::begin,end flags
in order to draw the sections of the line loop properly.  Apparently, the
only drivers to do this were those using the old 'tnl' module for software
vertex processing.

Now we convert the split pieces of GL_LINE_LOOP prims into GL_LINE_STRIP
primitives so that drivers don't have to worry about the special begin/end
flags.  The only time a driver will get a GL_LINE_LOOP prim is when the
whole thing fits in one vertex buffer.

Mostly fixes bug 81174, but not completely.  There's another bug somewhere
in the src/gallium/auxiliary/draw/ code.  If the piglit lineloop test is
run with -count 4096, rendering is correct, but with -count 4097 there are
stray lines.  4096 is a magic number in the draw code (search for "4096").

Also note that this does not fix long line loops in display lists.  The
next patch fixes that.

v2: fix incorrect -1 in vbo_compute_max_verts(), per Charmaine.  Remove
incorrect assertion which was added in vbo_copy_vertices().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49779
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28130

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
03d2f08539 vbo: add new vbo_compute_max_verts() helper function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
002c5c1da3 vbo: simplify some code in vbo_exec_End()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
d916175c4d vbo: simplify some code in vbo_copy_vertices()
As before, use a new 'last_prim' pointer to simplify things.  Plus, add
some const qualifiers.

v2: use 'sz' in another place, per Sinclair.  And update subject line.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
d24c3a680e vbo: simplify some code in vbo_exec_wrap_buffers()
Use a new 'last_prim' pointer to simplify things.

v2: remove unneeded assert(exec->vtx.prim_count > 0)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
1637cec8f8 vbo: replace the comment on vbo_copy_vertices()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
e05ffcf1d9 vbo: make vbo_exec_vtx_wrap() static
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
971b56c643 vbo: remove unneeded ctx parameter for merge_prims()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:41 -06:00
Brian Paul
6cc596c66b tnl: add some comments in render_line_loop code
And remove '(void) flags' line which is not needed.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:40 -06:00
Brian Paul
f7272032be mesa: simple whitespace fix in texstore.c 2015-10-20 12:52:40 -06:00
Brian Paul
f6d4e20d10 vbo: reduce number of vertex buffer mappings for vertex attributes
Whenever we got a glColor, glNormal, glTexCoord, etc. call outside a
glBegin/End pair, we'd immediately map a vertex buffer to begin
accumulating vertex data.  In some cases, such as with display lists,
this led to excessive vertex buffer mapping.  For example, if we have
a display list such as:

glNewList(42, GL_COMPILE);
glBegin(prim);
glVertex2f();
...
glVertex2f();
glEnd();
glEndList();

Then did:

glColor3f();
glCallList(42);

We'd map a vertex buffer as soon as we saw glColor3f but we'd never
actually write anything to it.  Note that the vertex position data
was put into a vertex buffer during display list compilation.

With this change, we delay mapping the vertex buffer until we actually
have a vertex to write to it (triggered by a glVertex() call).  In the
above case, we no longer map a vertex buffer when setting the color and
calling the list.

For drivers such as VMware's, reducing buffer mappings gives improved
performance.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-20 12:52:40 -06:00
Brian Paul
d11fefa961 st/mesa: optimize 4-component ubyte glDrawPixels
If we didn't find a gallium surface format that exactly matched the
glDrawPixels format/type combination, we used some other 32-bit packed
RGBA format and swizzled the whole image in the mesa texstore/format code.

That slow path can be avoided in some common cases by using the
pipe_samper_view's swizzle terms to do the swizzling at texture sampling
time instead.

For now, only GL_RGBA/ubyte and GL_BGRA/ubyte combinations are supported.
In the future other formats and types like GL_UNSIGNED_INT_8_8_8_8 could
be added.

v2: fix incorrect swizzle setup (need to invert the tex format's swizzle)

Reviewed by: Jose Fonseca <jfonseca@vmware.com>
2015-10-20 12:52:40 -06:00
Brian Paul
cf405922eb mesa: make memcpy_texture() non-static
So that we can use it directly from the mesa/gallium state tracker.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-20 12:52:40 -06:00
Brian Paul
31ae52acce st/mesa: check for out-of-memory in st_DrawPixels()
Before, if make_texture() or st_create_texture_sampler_view() failed
we silently no-op'd the glDrawPixels.  Now, set GL_OUT_OF_MEMORY.
This also allows us to un-nest a bunch of code.

v2: also check if allocation of sv[1] fails, per Jose.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-20 12:52:40 -06:00
Brian Paul
c5de38abc9 st/mesa: use MAX3() instead of MAX2(MAX2) in draw_textured_quad()
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-20 12:52:40 -06:00
Brian Paul
e24d04e436 mesa: fix incorrect opcode in save_BlendFunci()
Fixes assertion failure with new piglit
arb_draw_buffers_blend-state_set_get test.

Cc: mesa-stable@lists.freedesktop.org

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-20 12:52:40 -06:00
Brian Paul
b1f8ef5ae3 mesa: add more cases to print_list() in dlist.c
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-20 12:52:40 -06:00
Emil Velikov
6994d8ec01 i965: silence incompatible pointer type warning
src/mesa/drivers/dri/i965/brw_program.c:94:39:
warning: passing argument 1 of ‘_mesa_init_gl_program’ from incompatible
pointer type [-Wincompatible-pointer-types]
          return _mesa_init_gl_program(&prog->program, target, id);

                                       ^

Runtime was unaffected as brw_geometry_program is subclassed from
gl_geometry_program, thus the address passed was the same.

Fixes: bcb56c2c69 (program: convert _mesa_init_gl_program() to take
struct gl_program *)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-20 18:37:22 +01:00
Marek Olšák
814f31457e gallium: add PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT
This avoids a serious r600g bug leading to a GPU hang.
The chances this bug will get fixed are pretty low now.

I deeply regret listening to others and not pushing this patch, leaving
other users with a GPU-crashing driver. Yes, it should be fixed
in the compiler and it's ugly, but users couldn't care less about that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720

Cc: 11.0 10.6 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 18:27:11 +02:00
Eric Anholt
921feb8782 vc4: Switch our vertex attr lowering to being NIR-based.
This exposes more information to NIR's optimization, and should be
particularly useful when we do range-based optimization.

total uniforms in shared programs: 32066 -> 32065 (-0.00%)
uniforms in affected programs:     21 -> 20 (-4.76%)
total instructions in shared programs: 93104 -> 92630 (-0.51%)
instructions in affected programs:     31901 -> 31427 (-1.49%)
2015-10-20 12:47:27 +01:00
Eric Anholt
85b946478c vc4: Add limited support for ibfe/ubfe.
This is just enough to cover our unpack modes, which will be used by some
new NIR-based lowering in the next commit.
2015-10-20 12:47:27 +01:00
Marek Olšák
8910ebd8e8 tgsi/scan: use properties for clip/cull distance writemasks
No changes needed for drivers already relying on tgsi_shader_info.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 12:58:25 +02:00
Marek Olšák
7c75f23cb9 st/mesa: pass the clip distance array size to drivers
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 12:58:25 +02:00
Marek Olšák
e70c66197e gallium: add new properties for clip and cull distance usage
The TGSI usage mask can't be used, because these are declared as an output
array of 2 elements.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 12:58:25 +02:00
Marek Olšák
67f489ded3 mesa: replace UsesClipDistance with ClipDistanceArraySize
This is more practical and needed by gallium.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-20 12:58:25 +02:00
Marek Olšák
8339585b12 radeonsi: enable BC_OPTIMIZE if centroid isn't used
This solution was recommended by a Catalyst developer.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:56:46 +02:00
Marek Olšák
38391835b5 radeonsi: fix the export_prim_id field size in the shader key
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:56:40 +02:00
Marek Olšák
9b54ce3362 radeonsi: support thread-safe shaders shared by multiple contexts
The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.

The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.

This is also a prerequisite for multithreaded shader compilation.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:51:51 +02:00
Marek Olšák
e57dd7a08b st/mesa: create shaders which have only one variant immediatelly (v2)
v2: fix the condition when lacking sample shading

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-20 12:51:51 +02:00
Marek Olšák
b99645f819 st/mesa: negate the can_force_persample_interp flag
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-20 12:51:51 +02:00
Marek Olšák
f4e938e9ae st/mesa: decouple shaders from contexts if they are shareable
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-20 12:51:51 +02:00
Marek Olšák
d74e7b6fb9 gallium: add PIPE_CAP_SHAREABLE_SHADERS
I'll let drivers figure out how to do it.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-20 12:51:51 +02:00
Marek Olšák
12321966ae radeonsi: add support for ARB_texture_view
All tests pass. We don't need to do much - just set CUBE if the view
target is CUBE or CUBE_ARRAY, otherwise set the resource target.

The reason this can be so simple is that texture instructions
have a greater effect on the target than the sampler view.

Thanks Glenn for the piglit test.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-20 12:25:19 +02:00
Boyan Ding
6bd9e03512 vc4: Use nir_foreach_variable
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-20 09:54:53 +01:00
Timothy Arceri
2832ca95ec glsl: fix stream qualifier for blocks with an instance name
This also removes the validation from the parser as it is not required
and once arb_enhanced_layouts comes along we wont be able to do validation
on the stream qualifier in the parser anyway as it adds constant expression
support to the stream qualifier.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
2015-10-20 11:58:28 +11:00
Timothy Arceri
aa9f06b3ea glsl: fix regression when building interface field name for SSBOs
Fixes regression cased by bb5aeb8549

We don't care about the swizzle when building the name so just skip over it.

Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-20 11:54:09 +11:00
Leo Liu
867284a8f0 st/omx/dec/h264: fix field picture type 0 poc disorder
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-19 20:43:03 -04:00
Anuj Phogat
2eed9e6b75 i965/gen9: Handle the GL_TEXTURE_{1D, 1D_ARRAY} targets inside switch
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 13:43:44 -07:00
Matt Turner
de862f03ac i965/fs: Localize variables' scopes. 2015-10-19 10:19:32 -07:00
Matt Turner
35a2d259f2 i965/fs: Consider type mismatches in saturate propagation.
NIR considers bcsel to produce and consume unsigned types, leading to
SEL instructions operating on unsigned types when the data is really
floating-point. Previous to this patch, saturate propagation would
happily transform

   (+f0) sel      g20:UD, g30:UD, g40:UD
         mov.sat  g50:F,  g20:F

into

   (+f0) sel.sat  g20:UD, g30:UD, g40:UD
         mov      g50:F,  g20:F

But since the meaning of .sat is dependent on the type of the
destination register, this is not valid.

Instead, allow saturate propagation to change the types of dest/source
on instructions that are simply copying data in order to propagate the
saturate modifier.

Fixes bad code gen in 158 programs.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-19 10:19:32 -07:00
Matt Turner
9e17c36b8b i965: Extract can_change_source_types() functions.
Make them members of fs_inst/vec4_instruction for use elsewhere.

Also fix the fs version to check that dst.type == src[1].type and for
!saturate.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-19 10:19:32 -07:00
Jason Ekstrand
41c474df53 i965/vs: Move URB entry_size and read_length calculations to compile_vs
Reviewed-By: Eduardo Lima Mitev <elima@igalia.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
6980372010 i965: Move the entire compiler API into a single file
At this point, the compiler API has been substantially simplified.  In the
spirit of Kristian's making a compiler library, this commit makes a single
header file that contains, more-or-less, the entire compiler API.

There's still a bit of cleanup to do particularly in the area of geometry
shaders.  However, this gets us much closer to having a separate compiler.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
4467344c82 i965: Rename brw_foo_emit to brw_compile_foo
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
67db9072b9 i965/fs: Move some of the prog_data setup into brw_wm_emit
This commit moves the common/modern stuff.  Some legacy stuff such as
setting use_alt_mode was left because it needs to know whether or not we're
an ARB program.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
4e711872d0 i965/cs: Rework cs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
657863bb5c i965/gs: Rework gs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.  Unfortunately, we still
have to pass in the gl_shader_program for gen6 because it's needed for
transform feedback.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
5d8bf6de61 i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

v2 (Jason Ekstrand):
   - Patch use_legacy_snorm_formula through as a function argument rather
     than trying to go through the shader key.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
22ad44910e i965/fs: Rework wm_fs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
0ca401327e i965: Use a const nir_shader in backend_shader
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
8f1d968704 i965/vec4: Remove gl_program and gl_shader_program from the generator
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
5e86f5b3d2 i965/fs: Remove the gl_program from the generator
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
688d2e4585 nir/info: Add a few bits of info for fragment shaders
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
4889c73dd1 nir/info: Add compute shader local size to nir_shader_info
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
fe399f3a69 nir/info: Move the GS info into a stage-specific info union
This way we can have other stage-specific info without consuming too much
extra space.  While we're at it, we make sure that the geometry info is
only set if we're actually a goemetry shader.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
16619477bc mesa: Move gl_frag_depth_layout from mtypes.h to shader_enums.h
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:47:03 -07:00
Jason Ekstrand
5d4bc5ec13 nir: Add a label to nir_shader_info
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:45:14 -07:00
Jason Ekstrand
e00314bc57 i965/asm: Explicitly use a nir_instr for IR annotations
Now that everything goes through NIR, we don't need this to be a void
pointer anymore.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-19 08:45:14 -07:00
Jose Fonseca
b23a4859f4 scons: Build nir/glsl_types.cpp once.
Undoes early hacks, and ensures nir/glsl_types.cpp is built once, and
only once.

The root problem is that SCons doesn't know about NIR nor any source
file in the NIR_FILES source list.

Tested with libgl-gdi and libgl-xlib scons targets.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-19 15:59:59 +01:00
Brian Paul
530eb39c71 svga: fix incorrect round-down arithmetic
Spotted by Roland.  Luckily, this code should never really be hit
since the const buffer size and offset should already be multiples
of 16.  I could probably add more assertions to that effect, but
let's just fix the arithmetic for now.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-19 08:54:42 -06:00
Samuel Iglesias Gonsalvez
6f3954618b glsl: fix segfault when indirect indexing a buffer variable which is an array
Fixes a regression added by bb5aeb8549.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-19 11:16:50 +02:00
Indrajit Das
b0a44f1017 st/va: Added support for NV12 to IYUV conversion in vlVaGetImage
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-19 09:47:33 +02:00
Indrajit Das
381c17d695 st/va: Used correct parameter to derive the value of the "h" variable in vlVaCreateImage
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-19 09:47:24 +02:00
Iago Toral Quiroga
36c93e9659 glsl_to_tgsi: Use {Num}UniformBlocks instead of {Num}BufferInterfaceBlocks
The latter holds both UBOs and SSBOs, but here we only want UBOs.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Iago Toral Quiroga
5a9ff87d0f st/mesa: Use {Num}UniformBlocks instead of {Num}BufferInterfaceBlocks
The latter holds both UBOs and SSBOs, but here we only want UBOs.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Iago Toral Quiroga
55403665b6 i965: Do not use NumBufferInterfaceBlocks
This is the only place in the driver where we use this. Since we now work
with separate index spaces, always use NumUniformBlocks and
NumShaderStorageBlocks instead of NumBufferInterfaceBlocks to be more
consistent with the rest of the code.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Iago Toral Quiroga
14c3db7bc5 main: GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH is about UBOS, not SSBOs
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Iago Toral Quiroga
fba582efc7 main: Use NumUniformBlocks to count UBOs
Now that we have separate index spaces for UBOs and SSBOs we do not need
to iterate through BufferInterfaceBlocks any more, we can just take the
UBO count directly from NumUniformBlocks.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-19 08:20:40 +02:00
Chia-I Wu
86ccb2a16f ilo: set VME for 3DSTATE_PS
When the bit is not set, we can see sampling artifacts on triangle edges when
the mip filter is not GEN6_MIPFILTER_NONE.
2015-10-18 21:35:16 +08:00
Chia-I Wu
d04126a773 ilo: ignore prefer_linear_threshold when zero
This was the intended behavior but it did not work as intended until now.
2015-10-18 21:04:52 +08:00
Chia-I Wu
a445e0f7ef ilo: remove some unused kernel params 2015-10-18 21:04:52 +08:00
Chia-I Wu
6e132f4730 ilo: remove unused ilo_shader_get_type() 2015-10-18 21:04:52 +08:00
Chia-I Wu
29a0f7479d ilo: remove u_debug.h inclusion from ilo_core.h
Move it to ilo_debug.h.
2015-10-18 21:04:52 +08:00
Chia-I Wu
3fe568e2a4 ilo: remove u_memory.h inclusion from ilo_core.h
We do not make allocations generally in the core.
2015-10-18 21:04:52 +08:00
Samuel Pitoiset
fc5ae0c13f nvc0: do not bind input params at compute state init on Fermi
It looks like binding a constant buffer on compute overwrites the 3D
state. To avoid that, we already re-bind all the 3D constant buffers
after launching a compute grid but this is not enough.

Binding the constant buffer of input parameters for the compute state at
initialization corrupts the 3D constant buffers, and it's just useless
to bind it because this is not needed until we really launch a grid.

This fixes some piglit regressions related to interpolation tests
introduced in "nvc0: enable compute support by default on Fermi".

Fixes: 00d6186 (nvc0: enable compute support by default on Fermi)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-18 14:25:05 +02:00
Kenneth Graunke
ca2b807ca3 i965/vs: Drop hack that created NIR for fixed function vertex programs.
Marek made core Mesa call ProgramStringNotify(), which solves this
properly.  The hack is no longer needed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-17 17:26:11 -07:00
Kenneth Graunke
dbac0a6352 i965/nir: Switch on shader stage in nir_lower_outputs().
VS, GS, and FS continue doing the same thing they did before.  We can
simplify the FS code a bit because it is always scalar.

Compute shaders now assert that there are no outputs instead of doing
a loop over 0 outputs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-17 17:26:11 -07:00
Marek Olšák
7c10af6425 radeonsi: don't use the AMDGPU intrinsic for CMP
No difference according to shader-db.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
f2cdb68c8b radeonsi: use LRP from gallivm
Totals:
SGPRS: 344552 -> 344368 (-0.05 %)
VGPRS: 197132 -> 197552 (0.21 %)
Code Size: 7375376 -> 7366304 (-0.12 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1679360 -> 1615872 (-3.78 %) bytes per wave

Totals from affected shaders:
SGPRS: 47736 -> 47552 (-0.39 %)
VGPRS: 27952 -> 28372 (1.50 %)
Code Size: 1392724 -> 1383652 (-0.65 %) bytes
LDS: 39 -> 39 (0.00 %) blocks
Scratch: 513024 -> 449536 (-12.38 %) bytes per wave

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
eb11efc989 radeonsi: don't emit AMDGPU intrinsics for integer abs, min, max
No difference according to shader-db. (with the new S_ABS_I32 pattern)

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
d72a26ec5d radeonsi: don't emit AMDGPU intrinsics for EX2, ROUND, TRUNC
No difference according to shader-db.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:04 +02:00
Marek Olšák
6660ca7121 radeonsi: initialize output, temp, and address registers to "undef"
This removes "v_mov v0, 0" which typically occurs before exports.

Totals:
SGPRS: 345216 -> 344552 (-0.19 %)
VGPRS: 197684 -> 197132 (-0.28 %)
Code Size: 7390408 -> 7375376 (-0.20 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1842176 -> 1679360 (-8.84 %) bytes per wave

Totals from affected shaders:
SGPRS: 101336 -> 100672 (-0.66 %)
VGPRS: 53920 -> 53368 (-1.02 %)
Code Size: 2170176 -> 2155144 (-0.69 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 1015808 -> 852992 (-16.03 %) bytes per wave

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
529c5e7740 gallivm: implement the correct version of LRP
The previous version has precision issues. This can be a problem
with tessellation. Sadly, I can't find the article where I read it
anymore. I'm not sure if the unsafe-fp-math flag would be enough to revert
this.

v2: added the comment
2015-10-17 21:40:03 +02:00
Marek Olšák
a2197cac7f gallivm: set correct opcode info from unary/binary/ternary emits
and clear the emit_data structure.

The new radeonsi min/max opcode implementation requires this.

(it looks good according to Roland S.)
2015-10-17 21:40:03 +02:00
Marek Olšák
5bc871a4ca radeonsi: implement vertex color clamping
This is only supported in the compatibility profile (without GS and tess).

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
208d1ed38d radeonsi: implement fragment color clamping
using the shader key for now.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
acc6a07874 radeonsi: clean up other scratch buffer functions
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
9098d7e9bd radeonsi: clean up copy-pasted scratch buffer updates
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
938a1bee34 radeonsi: unify shader create functions
The shader specifies the processor type, so use that instead.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
b0167809f1 radeonsi: unify shader delete functions
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
aa060e276c radeonsi: fix a GS copy shader leak
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
c4f086f399 radeonsi: remove an unused ctx parameter in si_shader_destroy
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
4f4f477d6d radeonsi: print export_prim_id from the shader key
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
b11edf8872 radeonsi: disable NaNs for LS and HS
They're disabled for all other shaders except compute, but I forgot
to do this for tess stages.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
73e3fba335 radeonsi: clean up si_llvm_init_export_args
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Marek Olšák
82335978bb tgsi: move pipe_shader_from_tgsi_processor function to util
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-17 21:40:03 +02:00
Brian Paul
8c5647db5e mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()
Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-17 19:36:46 +02:00
Marek Olšák
3c6156a4a7 st/mesa: fix clip state dependencies
This allows removing FLUSH_VERTICES in MatrixMode.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-17 19:36:44 +02:00
Marek Olšák
006fcc0da6 gallium/hud: fix possible NULL pointer dereference
Trivial.
2015-10-17 19:06:27 +02:00
Brian Paul
3272f632ee scons: fix MSVC, MinGW build
Duplicate the glsl_types_hack.cpp work-around from the libgl-xlib target.
2015-10-17 10:06:49 -06:00
Rob Clark
7e6aafd6ab build: fix make-check after a6a6a71
commit a6a6a71092
   Author:     Rob Clark <robclark@freedesktop.org>
   AuthorDate: Sat Oct 10 14:13:50 2015 -0400

       glsl: (mostly) remove libglsl_util

Was a bit too ambitious on removal of libglsl_util.. it is still needed
by some of the tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-17 09:51:29 -04:00
Rob Clark
b7963b6926 build: fix out-of-tree build after b9b40ef
commit b9b40ef9b7
   Author:     Rob Clark <robclark@freedesktop.org>
   AuthorDate: Sat Oct 10 13:55:07 2015 -0400

       nir: remove dependency on glsl

broke things for i965 out of tree build.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-17 09:51:29 -04:00
Samuel Pitoiset
c188235d1b nvc0: add support for performance monitoring metrics on Fermi
As explained in the CUDA toolkit documentation, "a metric is a
characteristic of an application that is calculated from one or more
event values."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-17 10:50:00 +02:00
Rob Clark
a6a6a71092 glsl: (mostly) remove libglsl_util
Now that NIR does not depend on glsl, we can (mostly[*]) get rid of the
libglsl_util hack.

[*] glsl_compiler is the one remaining user of libglsl_util

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:38 -04:00
Rob Clark
b9b40ef9b7 nir: remove dependency on glsl
Move glsl_types into NIR, now that the dependency on glsl_symbol_table
has been split out.

Possibly makes sense to rename things at this point, but if we do that
I'd like to keep it split out into a separate patch to make git history
easier to follow (IMHO).

v2: fix android build
v3: I f***ing hate scons.. but at least it builds

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:38 -04:00
Rob Clark
183db3a645 glsl: move half<->float convertion to util
Needed in NIR too, so move out of mesa/main/imports.c

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:37 -04:00
Rob Clark
60690cb3b3 glsl: move builtin vector types to glsl_types.cpp
First step at untangling NIR's dependency on glsl_types without bringing
in the dependency on glsl_symbol_table.  The builtin types are now in
glsl_types (which will end up in NIR), but adding them to the symbol-
table stays in builtin_types.cpp (which will not be part of NIR).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:37 -04:00
Rob Clark
33de998230 glsl: couple shader_enums cleanups
Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX /
FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil.

v2: add STATIC_ASSERT()'s

Reported-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-16 19:33:37 -04:00
Timothy Arceri
698cdbf492 glsl: initialise record array count to 1
This was only being done in one of the two process methods.

Fixes an issue with samplers using the array size of a previous record.

Tested-by: Marek Olšák <marek.olsak@amd.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
2015-10-17 08:50:40 +11:00
Timothy Arceri
3c87377d0b nir: add atomic lowering support for AoA
Cc: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-17 08:43:21 +11:00
Timothy Arceri
2e1798f183 nir: wrapper for glsl_type arrays_of_arrays_size()
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-17 08:43:15 +11:00
Ilia Mirkin
fd5e0581dd configure: show which gallium drivers/sts are built
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-16 17:18:43 -04:00
Brian Paul
2023906667 tgsi: initialize ctx.file in tgsi_dump_instruction()
Fixes segfault because of uninitialized file pointer.
Trivial.
2015-10-16 14:32:09 -06:00
Samuel Pitoiset
a3b1757551 nvc0: add a note about MP counters on GF100/GF110
MP counters on GF100/GF110 (compute capability 2.0) are buggy
because there is a context-switch problem that we need to fix.
Results might be wrong sometimes, be careful!

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
0461260d77 nvc0: add MP counters variants for GF100/GF110
GF100 and GF110 chipsets are compute capability 2.0, while the other
Fermi chipsets are compute capability 2.1. That's why, some MP counters
are different between these chipsets and we need to handle variants.

Signed-off-by: Samuel Pitoiet <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
ec5001d25b nvc0: move SW/HW queries info to their respective files
This will help for handling HW SM queries variants on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
00d61869a5 nvc0: enable compute support by default on Fermi
Compute support was not enabled by default because weird effects
on 3D state happened, but I can't reproduce them anymore.

This also enables MP performance counters by default on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
8cd4b8478a nvc0: allow only one active query for the MP counters group
Because we can't expose the number of hardware counters needed for each
different query, we don't want to allow more than one active query
simultaneously to avoid failure when the maximum number of counters
is reached. Note that these groups of GPU counters are currently only
used by AMD_performance_monitor.

Like for Kepler, this limits the maximum number of active queries
to 1 on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
cef22f3490 nvc0: read MP counters of all GPCs on Fermi
When a card has more than one GPC, the grid used by the compute
kernel which reads MP performance counters seems to be too small.
The consequence is that the kernel is not launched on all TPCs.

Increasing the grid size using the number of GPCs now launches
enough blocks and we can read MP performance counters of all TPCs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
1825898e04 nvc0: store the number of GPCs to nvc0_screen
NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total
number of TPCs and the number of ROP units. Note that when the DRM
version is too old the default number of GPCs is fixed to 4.

This will be used to launch the compute kernel which is used to read MP
performance counters over all GPCs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
c4896c99cb nvc0: fix unaligned mem access when reading MP counters on Fermi
Memory access have to be aligned to 128-bits. Note that this
doesn't happen when the card only has TPC.

This patch fixes the following dmesg fail:

gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 000f
[UNALIGNED_MEM_ACCESS]

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
7abd707251 nvc0: fix monitoring multiple MP counters queries on Fermi
For strange reasons, the signal id depends on the slot selected on Fermi
but not on Kepler. Fortunately, the signal ids are just offseted by the
slot id!

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
4fcb661711 nvc0: fix queries which use multiple MP counters on Fermi
Queries which use more than one MP counters was misconfigured and
computing the final result was also wrong because sources need to
be configured on different hardware counters instead.

According to the blob, computing the result is now as follows:

FOR  i..n
val += ctr[i] * pow(2, i)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
6353f620cd nvc0: allow to use 8 MP counters on Fermi
On Fermi, we have one domain of 8 MP counters while we have
two domains of 4 MP counters on Kepler.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
cac897197b nvc0: fix sequence field init for MP counters on Fermi
Sequence fields are located at MP[i] + 0x20 in the buffer object.
This is used to check if result is available for MP[i].

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
409658c367 nvc0: correctly enable the MP counters' multiplexer on Fermi
Writing 0x408000 to 0x419e00 (like on Kepler) has no effect on Fermi
because we only have one domain of 8 counters. Instead, we have to
write 0x80000000.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
c3570c3fb9 nvc0: rip off the kepler MP-enabling logic from the Fermi codepath
Writing 0x1fcb to 0x419eac is definitely not related to MP counters and
has no effect on Fermi (although this enables MP counters on Kepler).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
dab7e0ed09 nvc0: split out begin_query() hook used by MP counters
The way we configure MP performance counters is going to pretty
different between Fermi and Kepler. Having two separate functions
is much better.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Samuel Pitoiset
d4ecc2bce4 nvc0: remove useless call to query_get_cfg() in nvc0_hw_sm_query_end()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-16 21:57:44 +02:00
Brian Paul
efe37519b0 svga: only count hardware buffer mappings for HUD
Don't count client memory buffer mappings since they're basically free.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:44:00 -06:00
Neha Bhende
9bc7e3105a svga: add new GALLIUM_HUD queries
Add new GALLIUM_HUD queries for:
    num-shaders
    num-resources
    num-state-objects
    num-validations
    map-buffer-time
    num-surface-views
    num-resources-mapped
    num-flushes

Most of this patch was originally written by Neha.  Additional clean-ups
and num-flushes counter added by Brian Paul.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:43:28 -06:00
Brian Paul
f413f1a17c svga: use new svga_new_shader_variant() function
To simplify upcoming new HUD shader count implementation.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:43:28 -06:00
Brian Paul
8d0d5dca5b svga: pass context to svga_tgsi_vgpu9_translate()
Will be used for upcoming change.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:43:28 -06:00
Brian Paul
615b37a0e2 svga: remove svga_tgsi_vgpu9_translate() call in GS path
We can never have geometry shaders with vgpu9.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-16 11:43:28 -06:00
Brian Paul
cb473c46fe glsl: silence warning about unhandled ast_unsized_array_dim case in switch
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-10-16 11:34:05 -06:00
Brian Paul
afff809fea st/mesa: fix incorrect pointer type arguments in st_new_program()
Silences 5 warnings of the type:
state_tracker/st_cb_program.c: In function 'st_new_program':
state_tracker/st_cb_program.c:108:7: warning: passing argument 1 of
'_mesa_init_gl_program' from incompatible pointer type [enabled by default]
       return _mesa_init_gl_program(&prog->Base, target, id);
       ^
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-16 09:15:05 -06:00
Brian Paul
4627e8058e Revert "mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()"
This reverts commit 0de5e0f3fb.

Michel Dänzer spotted two piglit regressions from the change.  I suspect
that removing the FLUSH_VERTICES() actually exposed a bug elsewhere but
I don't have time to hunt down the root issue at this time.
2015-10-16 09:10:22 -06:00
Samuel Iglesias Gonsalvez
ccbb52ac11 glsl: fix check SSBOs support for builtin functions
has_shader_storage_buffer_objects() returns true also if the OpenGL
context is 4.30 or ES 3.1.

Previously, we were saying that all atomic*() GLSL builtin functions
for SSBOs were not available when OpenGL ES 3.1 context was in use.

Fixes 48 dEQP-GLES31 tests:

dEQP-GLES31.functional.ssbo.atomic.*

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-16 12:47:58 +02:00
Tapani Pälli
dc8c221e28 mesa: Set api prefix to version string when overriding version
Otherwise there are problems when user overrides version and application
such as Piglit wants to detect used api with glGetString(GL_VERSION).

This makes it currently impossible to run glslparsertest tests for
OpenGL ES when using version override.

Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1.

Before:
	"3.1 Mesa 11.1.0-devel (git-24a1a15)"

After:
	"OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)"

v2: only include api prefix for OpenGL ES (Boyan Ding)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-10-16 12:58:52 +03:00
Iago Toral Quiroga
c8f5274b52 nir: Get the number of SSBOs and UBOs right
Before d31f98a272 and 56e2bdbca3 we had a sigle index space for UBOs
and SSBOs, so NumBufferInterfaceBlocks would contain the combined number of
blocks, not just one kind. This means that for shader programs using both
UBOs and SSBOs, we were setting num_ssbos and num_ubos to a larger number than
we should. Since the above commits  we have separate index spaces for each
so we can just get the right numbers.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-16 10:12:44 +02:00
Iago Toral Quiroga
f534f331ca i965/vec4: Use the right number of UBOs
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-16 10:12:44 +02:00
Iago Toral Quiroga
6f9ca30266 i965/fs: use the right number of UBOs
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-16 10:12:44 +02:00
Rob Clark
ef7a563829 freedreno: add debug option to dirty state after draw
Similar to "dclear", "ddraw" will mark all state dirty after each draw.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-15 18:04:18 -04:00
Rob Clark
6206da736c freedreno/a3xx: cache-flush is needed after MEM_WRITE
Otherwise the mem2gmem blit would see potentially bogus texture
coordinates.  Fixes an issue that shows up with glamor.

CC: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-10-15 18:04:17 -04:00
Rob Clark
fefffdc2b2 gallium/util: fix debug_get_flags_option on 32-bit harder
(yes, we want PRI?64, but we want the x version rather than the u
version)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-15 18:04:17 -04:00
Chih-Wei Huang
7599f8b167 nv30: include the header of ffs prototype
It fixes a building error of the android 6.0 64-bit target.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-15 15:00:14 -04:00
Chih-Wei Huang
d31005e3e5 nv50/ir: use C++11 standard std::unordered_map if possible
Note Android version before Lollipop is not supported.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-15 14:59:59 -04:00
Jason Ekstrand
5f106153f5 nir/prog: Don't double-insert the fog-coord variable
nir_variable_create already inserts it in the right list for us so
inserting it again causes a linked list corruption.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-15 10:48:21 -07:00
Jason Ekstrand
b705005584 nir/glsl: Use shader_prog->Name for naming the NIR shader
This has the better name to use. Aparently, sh->Name is usually 0.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
2015-10-15 07:31:09 -07:00
Jason Ekstrand
eb893c220c nir: Add helpers for creating variables and adding them to lists
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-15 07:31:09 -07:00
Jason Ekstrand
635daef76e nir/prog: Use nir_foreach_variable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-15 07:31:09 -07:00
Brian Paul
5d954fd5cb mesa: wrap a ridiculously long line in es1_conversion.c
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:08 -06:00
Brian Paul
d8c23d156d mesa: add num_buffers() helper in blend.c
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:08 -06:00
Brian Paul
dfbd62e772 mesa: optimize _UsesDualSrc blend flag setting
For glBlendFunc and glBlendFuncSeparate(), the _UsesDualSrc flag
will be the same for all buffers, so no need to compute it N times.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:08 -06:00
Brian Paul
d21e17f48f mesa: fix incorrect error string in _mesa_BlendEquationiARB()
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
1d75165501 mesa: move validate_blend_factors() call after no-change check
A redundant call to glBlendFuncSeparateiARB() is more likely than getting
invalid values, so do the no-op check first.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
34de3c4c16 mesa: optimize no-change check in _mesa_BlendEquationSeparate()
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
2dfedf105d mesa: optimize no-change check in _mesa_BlendEquation()
Same story as preceeding change to _mesa_BlendFuncSeparate().

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
6fd29e6c31 mesa: optimize no-change check in _mesa_BlendFuncSeparate()
Streamline the checking for no state change in _mesa_BlendFuncSeparate()
(and _mesa_BlendFunc()).  If _BlendFuncPerBuffer is false, we only need
to check the 0th buffer state.  Move argument validation after the no-op
check.

I'm looking at an app that issues about 1000 redundant glBlendFunc()
calls per frame!

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
083b3f5cb4 mesa: short-cut new_state == _NEW_LINE in _mesa_update_state_locked()
We can skip to the end of _mesa_update_state_locked() if only the
_NEW_LINE flag is set since none of the derived state depends on it
(just like _NEW_CURRENT_ATTRIB).  Note that we still call the
ctx->Driver.UpdateState() function, of course.

v2: use bitmask-based test, per Eric.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Brian Paul
0de5e0f3fb mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()
Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-10-15 07:21:07 -06:00
Chih-Wei Huang
67d8518a0e mesa: android: Fix the incorrect path of sse_minmax.c
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Fixes: 669cfc267a (android: mesa: fix the path of the SSE4_1
optimisations)
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-15 13:41:02 +01:00
Mauro Rossi
45f0392ceb i965: android: add the i965_compile_FILES sources to the driver
i965_compile_FILES are needed otherwise we'll error out as below:

target SharedLib: i915_dri (out/target/product/x86/obj/SHARED_LIBRARIES/i915_dri_intermediates/LINKED/i915_dri.so)
external/mesa/src/mesa/drivers/dri/i965/brw_ir_fs.h:181: error: undefined reference to 'fs_inst::~fs_inst()'
...
...
external/mesa/src/mesa/drivers/dri/i965/intel_screen.c:1484: error: undefined reference to 'brw_compiler_create'
collect2: error: ld returned 1 exit status
build/core/shared_library.mk:81: recipe for target 'out/target/product/x86/obj/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so' failed
make: *** [out/target/product/x86/obj/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so] Error 1

[Emil Velikov: tweak commit message]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-10-15 13:35:19 +01:00
Emil Velikov
bcb56c2c69 program: convert _mesa_init_gl_program() to take struct gl_program *
Rather than accepting a void pointer, only to down and up cast around
it, convert the function to take the base (struct gl_program) pointer.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-15 13:30:52 +01:00
Emil Velikov
2034bdd46c nir: include nir_instr_set.h in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-15 13:30:22 +01:00
Timothy Arceri
8da9e154b7 glsl: Allow arrays of arrays in GLSL ES 3.10 and GLSL 4.30
V3: use a check_*_allowed style function for requirements checking
rather than has_* which doesn't encapsulate the error message

V2: add missing 's' to the extension name in error messages
 and add decimal place in version string

Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
f22b7933e2 glsl: allow for AoA in calculating offset to ubo start region
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
bb5aeb8549 glsl: build ubo name and indexing offset for AoA
V2: split out unrelated change as suggested by Samuel

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
8cf1333b18 glsl: link uniform block arrays of arrays
This adds support for setting up the UniformBlock structures for AoA
and also adds support for resizing AoA blocks with a packed layout.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
d9f1f2bbc6 glsl: Add AoA support when checking for non-const index
When checking for non-const indexing of interfaces
take into account arrays of arrays

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
082b1ca2fe glsl: Add support for lowering interface block arrays of arrays
V2: make array processing functions static

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
132b9e9dd9 glsl: add AoA support for an inteface with unsized array members
Add support for setting the max access of an unsized member
of an interface array of arrays.

For example ifc[j][k].foo[i] where foo is unsized.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
d1d05c0f85 glsl: add AoA support for linking interface blocks with unsized members
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 21:42:24 +11:00
Timothy Arceri
dd89880dc0 glsl: avoid hitting assert for arrays of arrays
Also add TODO comment about adding proper support

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 21:21:33 +11:00
Timothy Arceri
2d7a98de18 glsl: add AoA support for atomic counters
This marks all counters in an AoA as active.

For AoA all but the innermost array are treated as separate
counters/uniforms. The Nvidia binary also goes further and
finds inactive counters in the AoA, in future we should do
this too, however this gets things working for the time being.

This change also removes the use of UniformHash for atomic counters,
this avoids having to generate name strings used as hash keys.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 21:21:27 +11:00
Timothy Arceri
261a434996 glsl: add std140 layout support for AoA
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:44:33 +11:00
Timothy Arceri
176e6930e6 i965: add arrays of arrays support for varyings
V2: get the correct vector elements value for outputs

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 20:44:26 +11:00
Timothy Arceri
be822b89ac glsl: calculate AoA uniform offset correctly for structs
This allows the correct offset to be calculated for use in indirect
indexing of samplers.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 20:44:20 +11:00
Timothy Arceri
410609c968 glsl: remove dead code in a single pass
Currently only one ir assignment is removed for each var in a single
dead code optimisation pass. This means if a var has more than one
assignment, then it requires all the glsl optimisations to be run again
for each additional assignment to be removed.
Another pass is also required to remove the variable itself.

With this change all assignments and the variable are removed in a single
pass.

Some of the arrays of arrays conformance tests that were looping
through 8 dimensions ended up with a var with hundreds of assignments.

This change helps ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1
go from around 3 min 20 sec -> 2 min

ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 went from
around 9 min 20 sec to 7 min 30 sec

I had difficulty getting the public shader-db to give a consistent result
with or without this change but the results seemed unchanged at between
15-20 seconds.

Thomas Helland measured change with shader-db on his machine from
approx 117 secs to 112 secs.

V3: Simplify freeing of list as suggested by Ian, and spelling fixes.

V2: Add assert to be sure references are counted before assignments.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-By: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:36:14 +11:00
Timothy Arceri
d337da81f2 glsl: dont allow gl_PerVertex to be redeclared as an array of arrays
V3: move patch after fixes to ast for AoA and add const to helper
as suggested by Ian

V2: move single dimensional array detection into a helper

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:36:01 +11:00
Timothy Arceri
dea0af8f82 glsl: check that only the outermost array is unsized
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 20:35:54 +11:00
Timothy Arceri
3129359ed7 glsl: allow AoA to be sized by initializer or constructor
V2: Split out unsized array validation to its own patch as
suggested by Samuel.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-15 20:35:45 +11:00
Timothy Arceri
296a7ea471 glsl: add support for initialising sampler AoA
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:35:40 +11:00
Timothy Arceri
db280e951a glsl: Add support for linking uniform arrays of arrays
V3: Fix setting of data.location for struct AoA UBO members

V2: Handle arrays of arrays in the same way structures are handled

The ARB_arrays_of_arrays spec doesn't give very many details on how
AoA uniforms are intended to be implemented. However in the
ARB_program_interface_query spec there are details that show AoA are
intended to be handled in a similar way to structs.

Issues 7 from the ARB_program_interface_query spec:

 We define rules consistent with our enumeration rules for
 other complex types.  For existing one-dimensional arrays, we enumerate
 a single entry if the array is an array of basic types, or separate
 entries for each array element if the array is an array of structures.
 We follow similar rules here.  For a uniform array such as:

   uniform vec4 a[5][4][3];

 we enumerate twenty different entries ("a[0][0][0]" through
 "a[4][3][0]"), each of which is treated as an array with three elements.
 This is morally equivalent to what you'd get if you worked around the
 limitation in current GLSL via:

    struct ArrayBottom { vec4 c[3]; };
    struct ArrayMid    { ArrayBottom b[3]; };
    uniform ArrayMid   a[5];

 which would enumerate "a[0].b[0].c[0]" through "a[4].b[3].c[0]".

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-15 20:35:35 +11:00
Kenneth Graunke
ff31c243e3 i965: Don't hardcode FS in "validation failed!" message.
Instead, print "Scalar VS" or "Scalar FS".  Otherwise it's really
confusing which stage is broken.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 14:07:47 -07:00
Jordan Justen
a274eff9ff glsl: Support uint index in lower_vector_insert
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates
an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-14 13:16:35 -07:00
Jordan Justen
ab04adcf63 glsl: Support uint index in do_vec_index_to_cond_assign
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates
an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-14 13:16:34 -07:00
Jordan Justen
0d1eef536b i965/fs: Ignore compute shaders in brw_nir_lower_inputs
The commit shown below caused compute shaders to hit the unreachable
in the default of the switch block. Since compute shaders don't have
any inputs, we can make brw_nir_lower_inputs a no-op for CS.

commit 2953c3d761
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Fri Aug 14 15:15:11 2015 -0700

    i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-14 13:16:30 -07:00
Jordan Justen
63728dac57 i965/fs: Simplify FS in brw_nir_lower_inputs to only support scalar mode
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-14 13:16:29 -07:00
Brian Paul
9abbf65d0a mesa: remove unused functions in program.c
replace_registers() and adjust_param_indexes() were unused.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-14 12:47:15 -06:00
Brian Paul
9d4ce80736 mesa: minor indentation fix in _mesa_BindTextureUnit() 2015-10-14 12:47:15 -06:00
Brian Paul
77eef81370 mesa: remove unused texUnit local in _mesa_BindTextureUnit()
The texture unit is error-checked before this and the texUnit var
is unused, so remove it.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-10-14 12:47:15 -06:00
Krzysztof Sobiecki
14f7ce4248 st/fbo: use pipe_surface_release instead of pipe_surface_reference
pipe_surface_reference have problems with deleted contexts,
so use of pipe_surface_release might be more appropriate.

Fixes Wasteland 2 Director's Cut crash on start.

Cc: mesa-stable@lists.freedesktop.org

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-14 12:47:07 -06:00
Marta Lofstedt
93267887a0 glsl: Enable split of lower UBOs and SSBO also for compute shaders
The split of Uniform blocks and shader storage block only loops
up to MESA_SHADER_FRAGMENT and igonres compute shaders.
This cause segfault when running the OpenGL ES 3.1 CTS tests
with GL_ARB_compute_shader enabled.

V2: Changed to use MESA_SHADER_STAGES instead of
MESA_SHADER_COMPUTE

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
2015-10-14 16:05:42 +02:00
Jose Fonseca
5423c1e855 glsl: Include util/strndup.h.
Fixes Windows builds.

Trivial.
2015-10-14 11:50:06 +01:00
Tapani Pälli
ac257f1070 glsl: calculate TOP_LEVEL_ARRAY_SIZE and STRIDE when adding resources
Patch moves existing calculation code from shader_query.cpp to happen
during program resource list creation.

No Piglit or CTS regressions were observed during testing.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-14 12:39:04 +03:00
Tapani Pälli
b76159b096 glsl: add top level array size and stride to gl_uniform_storage
Patch adds 2 new fields to gl_uniform_storage so that we don't need to
calculate these values during runtime shader queries. This is required by
upcoming changes to free GLSL IR after linking.

Patch moves 3 booleans inside structure so that structure size stays the
same after this change.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-14 09:32:58 +03:00
Iago Toral Quiroga
d3f4588804 i965: Adapt SSBOs to work with their own separate index space
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:11:13 +02:00
Iago Toral Quiroga
56e2bdbca3 glsl/lower_ubo_reference: lower UBOs and SSBOs to separate index spaces
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:11:13 +02:00
Iago Toral Quiroga
d31f98a272 mesa: Add {Num}UniformBlocks and {Num}ShaderStorageBlocks to gl_shader{_program}
These arrays provide backends with separate index spaces for UBOS and SSBOs.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:11:13 +02:00
Iago Toral Quiroga
27dccf097d mesa: Rename {Num}UniformBlocks to {Num}BufferInterfaceBlocks
Currently, these arrays in gl_shader and gl_shader_program hold both
UBOs and SSBOs, so this looks like a better name. We were already
using NumBufferInterfaceBlocks in gl_shader_program, so this makes
things more consistent as well.

In a later patch we will add {Num}UniformBlocks and
{Num}ShaderStorageBlocks which will contain only references to
UBOs and SSBOs respectively that will provide backends with
a separate index space for both types of objects.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:11:13 +02:00
Iago Toral Quiroga
9de651b261 glsl: Fix variable_referenced() for vector_{extract,insert} expressions
We get these when we operate on vector variables with array accessors
(i.e. things like a[0] where 'a' is a vec4). When we call variable_referenced()
on these expressions we want to return a reference to 'a' instead of NULL.

This fixes a problem where we pass a[0] as the first argument to an atomic
SSBO function that expects a buffer variable. In order to check this, we use
variable_referenced(), but that is currently returning NULL in this case, since
the underlying rvalue is a vector_extract expression.

Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-10-14 08:08:12 +02:00
Iago Toral Quiroga
baee16bf02 nir: split SSBO min/max atomic instrinsics into signed/unsigned versions
NIR is typeless so this is the only way to keep track of the
type to select the proper atomic to use.

v2:
  - Use imin,imax,umin,umax for the intrinsic names (Connor Abbott)
  - Change message for unreachable paths (Michael Schellenberger)

Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-14 08:03:58 +02:00
Iago Toral Quiroga
be800ef6d8 i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-14 08:01:58 +02:00
Iago Toral Quiroga
9d2bbca98d i965/fs: Fix indentation in fs_live_variables::compute_start_end
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-14 08:01:46 +02:00
Brian Paul
4a168ad797 mesa: clean up comments for gl_current_attrib struct
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
a7b6e6192a vbo: make void vbo_exec_BeginVertices() static
Not called from any other file.  Rename and move before use.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
84719ad9df vbo: document vbo_exec_context fields
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
d65b029dc2 vbo: minor clean-ups for vbo_exec_fixup_vertex()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
7f67bfaa74 vbo: add assertion in ATTR_UNION macro
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
3491ec5930 vbo: add comments, braces in ATTR_UNION() in vbo_exec_api.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
e729f36c09 vbo: fix whitespace in vbo_exec_draw.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
8fbb72c297 vbo: move 'tmp' var initialization
Improve readability a bit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
a1cbf85de0 vbo: improve fprintf() formatting
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
a639bbf098 vbo: simplify vertex array initializations in vbo_context.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
20f31ae37c vbo: get rid of needless NR_MAT_ATTRIBS constant
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:23 -06:00
Brian Paul
dd293d8aae vbo: fix incorrect switch statement in init_mat_currval()
The variable 'i' is a value in [0, MAT_ATTRIB_MAX-1] so subtracting
VERT_ATTRIB_GENERIC0 gave a bogus value and we executed the default
switch clause for all loop iterations.

This doesn't fix any known issues but was clearly incorrect.

Cc: mesa-stable@lists.freedesktop.org

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-13 08:28:22 -06:00
Brian Paul
c73c481c4a mesa: pass caller name to create_textures()
Simpler than the dsa flag approach.
2015-10-13 08:28:22 -06:00
Samuel Iglesias Gonsalvez
6a506689db glsl: fix matrix stride calculation for std430's row_major matrices with two columns
This is the result of applying several rules:

From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform Block Layout":

"2. If the member is a two- or four-component vector with components
consuming N basic machine units, the base alignment is 2N or 4N,
respectively."
[...]
"4. If the member is an array of scalars or vectors, the base alignment
and array stride are set to match the base alignment of a single array
element, according to rules (1), (2), and (3), and rounded up to the
base alignment of a vec4."
[...]
"7. If the member is a row-major matrix with C columns and R rows, the
matrix is stored identically to an array of R row vectors with C
components each, according to rule (4)."
[...]
"When using the std430 storage layout, shader storage blocks will be
laid out in buffer storage identically to uniform and shader storage
blocks using the std140 layout, except that the base alignment and
stride of arrays of scalars and vectors in rule 4 and of structures in
rule 9 are not rounded up a multiple of the base alignment of a vec4."

In summary: vec2 has a base alignment of 2*N, a row-major mat2xY is
stored like an array of Y row vectors with 2 components each. Because
of std430 storage layout, the base alignment of the array of vectors
is not rounded up to vec4, so it is still 2*N.

Fixes 15 dEQP tests:

dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x4
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x3
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x4

v2:
- Add spec quote in both commit log and code (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2015-10-13 15:58:54 +02:00
Christian König
685335639a r600/vce: enable VCE for trinity/richland
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-13 14:32:52 +02:00
Christian König
83de93309e r600/uvd: disable UVD tiling by default
It has only minimal advantages for post processing and doesn't work with VCE.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-13 14:32:48 +02:00
Glenn Kennard
24a1a157a6 r600g: Enable GL_ARB_gpu_shader5 extension
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-13 08:55:42 +10:00
Glenn Kennard
1befb7ed98 r600g/sb: SB support for UBO indexing
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-13 08:55:33 +10:00
Glenn Kennard
80c5062abf r600g/sb: Support gs5 sampler indexing (v2)
[airlied: v2 cayman fixups]

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-13 08:53:35 +10:00
Kenneth Graunke
bd198b9f0a i965/vs: Simplify fs_visitor's ATTR file.
Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of
compilation, assign_vs_urb_setup() translated those into GRF units,
and converted ATTR to HW_REGs.

This patch moves the transslation earlier, making ATTR work in terms of
GRF units from the beginning.  assign_vs_urb_setup() simply has to add
the number of payload registers and push constants to obtain the final
hardware GRF number.  (We can't do this earlier as those values aren't
known.)

ATTR still supports reg_offset; however, it's simply added to reg.
It's not clear whether this is valuable or not.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-12 14:33:26 -07:00
Ilia Mirkin
bf97f8d467 nouveau: avoid double-emitting fence
The act of ensuring that there is space can cause a flush to happen,
which will emit the current screen fence. If that is the fence we're
trying to wait on, then it will have been emitted as a result of doing
the PUSH_SPACE. Don't attempt to emit it a second time.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes: 8053c9208f (nouveau: avoid emitting new fences unnecessarily)
Cc: mesa-stable@lists.freedesktop.org
2015-10-12 17:21:29 -04:00
Ian Romanick
eeb444bc99 glsl: Never allow the sequence operator anywhere in an array size
Fixes:

    spec/glsl-1.20/compiler/structure-and-array-operations/array-size-sequence-in-parenthesis.vert
    spec/glsl-es-1.00/compiler/array-sized-by-sequence-in-parenthesis.vert
    spec/glsl-es-3.00/compiler/array-sized-by-sequence-in-parenthesis.vert

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-12 10:15:14 -07:00
Ian Romanick
92635a84a7 glsl: In later GLSL versions, sequence operator is cannot be a constant expression
Fixes:
    ES3-CTS.shaders.negative.constant_sequence

    spec/glsl-es-3.00/compiler/global-initializer/from-sequence.vert
    spec/glsl-es-3.00/compiler/global-initializer/from-sequence.frag

v2: Fix a couple copy-and-paste mistake in the spec quotations.
Suggested by Matt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:14 -07:00
Ian Romanick
05e4601c6b glsl: Add method to determine whether an expression contains the sequence operator
This will be used in the next patch to enforce some language sematics.

v2: Fix inverted logic in
ast_function_expression::has_sequence_subexpression.  The method
originally had a different name and a different meaning.  I fixed the
logic in ast_to_hir.cpp, but I only changed the names in
ast_function.cpp.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
bb329f2ff6 glsl: Restrict initializers for global variables to constant expression in ES
v2: Combine this check with the existing const and uniform checks.  This
change depends on the previous patch (glsl: Only set
ir_variable::constant_value for const-decorated variables).

Fixes:

    ES2-CTS.shaders.negative.initialize
    ES3-CTS.shaders.negative.initialize

    spec/glsl-es-1.00/compiler/global-initializer/from-attribute.vert
    spec/glsl-es-1.00/compiler/global-initializer/from-uniform.vert
    spec/glsl-es-1.00/compiler/global-initializer/from-uniform.frag
    spec/glsl-es-1.00/compiler/global-initializer/from-global.vert
    spec/glsl-es-1.00/compiler/global-initializer/from-global.frag
    spec/glsl-es-1.00/compiler/global-initializer/from-varying.frag
    spec/glsl-es-3.00/compiler/global-initializer/from-uniform.vert
    spec/glsl-es-3.00/compiler/global-initializer/from-uniform.frag
    spec/glsl-es-3.00/compiler/global-initializer/from-in.vert
    spec/glsl-es-3.00/compiler/global-initializer/from-in.frag
    spec/glsl-es-3.00/compiler/global-initializer/from-global.vert
    spec/glsl-es-3.00/compiler/global-initializer/from-global.frag

Note: spec/glsl-es-3.00/compiler/global-initializer/from-sequence.*
still fail because the result of a sequence operator is still considered
to be a constant expression.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92304
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v1]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
3524d6df33 glsl: Only set ir_variable::constant_value for const-decorated variables
Right now we're also setting for uniforms, and that doesn't seem to hurt
things.  The next patch will make general global variables in GLSL ES,
and those definitely should not have constant_value set!

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
5bc68f0f2b glsl: Use constant_initializer instead of constant_value to determine whether to keep an unused uniform
This even matches the comment "uniform initializers are precious, and
could get used by another stage."

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
313372cae8 glsl/linker: Use constant_initializer instead of constant_value to initialize uniforms
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
8acce5d53a ff_fragment_shader: Use binding to set the sampler unit
This is the way layout(binding=xxx) works from GLSL.  The old method
just happened to work (and significantly predated support for
layout(binding=xxx)), but future changes will break this.

v2: Remove some stale comments.  Suggested by Matt and Chris Forbes.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-12 10:15:13 -07:00
Ian Romanick
43b07eb60f glsl: Allow built-in functions as constant expressions in OpenGL ES 1.00
In d4a24745 (August 2012), Paul made functions calls not be constant
expressions in GLSL ES 1.00.  Since this feature was added in desktop
GLSL 1.20, we believed that it was added in GLSL ES 3.00.  That turns
out to be completely wrong.  Built-in functions have always been allowed
as constant expressions in GLSL ES, and the patch adds the (many) spec
quotations to prove it.

While we never previously encountered this, a later patch enforces a GLSL
ES 1.00 rule that global variable initializers must be constant
expressions.  Without this fix, several dEQP tests fail.

Fixes:

    tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.frag
    tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.vert
    tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.frag
    tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.vert

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.0 10.1 10.2 10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>

Yes, I know we don't maintain stable branches that far back, but that
*is* how far back this bug goes!
2015-10-12 10:15:13 -07:00
Nicolai Hähnle
45ed627d89 u_vbuf: fix vb slot assignment for translated buffers
Vertex attributes of different categories (constant/per-instance/
per-vertex) go into different buffers for translation, and this is now
properly reflected in the vertex buffers passed to the driver.

Fixes e.g. piglit's point-vertex-id divisor test.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-12 16:46:30 +02:00
Iago Toral Quiroga
7a1143f29e glsl: include variable name in error messages about initializers
Also fix style / wrong indentation along the way and make the messages
more uniform.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-12 08:31:08 +02:00
Iago Toral Quiroga
f09c229cc6 glsl: shader outputs cannot have initializers
GLSL Spec 4.20.8, 4.3 Storage Qualifiers:

"Initializers in global declarations may only be used in declarations of
 global variables with no storage qualifier, with a const qualifier or
 with a uniform qualifier."

We do this for input variables, but not for output variables. AMD and NVIDIA
proprietary drivers don't allow this either.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-12 08:31:08 +02:00
Iago Toral Quiroga
8281a7c533 i965: Fix unsafe pointer when dumping VS/FS IR
For the VS and FS stages that use ARB_vertex_program or
ARB_fragment_program we don't have a shader program, however,
when debuging is enabled, we call brw_dump_ir like this:

brw_dump_ir("vertex", prog, &vs->base, &vp->program.Base);

where vs will be NULL (since prog is NULL).

As pointed out by Chris, this &vs->base is not really a dereference,
it simply computes a new address that just happens to be 0x0 because
the offset of base in brw_shader is 0. Then brw_dump_ir will see a
NULL pointer and not do anything. This is why this does not crash at
the moment. However, this does not look very safe (it would crash
for any location of base that is not the first in brw_shader), so
patch it to prevent a potential (even if unlikely) problem in the
future.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2015-10-12 08:30:57 +02:00
Dave Airlie
bcfaab3885 mesa/uniforms: fix get_uniform for doubles (v2)
The initial glGetUniformdv support didn't cover all the
casting cases that are apparantly legal, and cts seems to
test for them.

I've updated the piglit test to cover these cases now.

v2: fix indentation - it's all broken in this file (Ilia)
fix src/dst index tracking in light of fp64 support (Ilia)

cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-12 13:10:59 +10:00
Chia-I Wu
c8083b1adc ilo: improve Gen8 defines based on its PRMs 2015-10-12 10:15:28 +08:00
Matt Turner
4642d53a03 i965/vec4: Implement b2f and b2i using negation.
Curro added this in commit 3ee2daf23d (before the vec4/NIR backend was
added) but it was missed in the new NIR backend. Add it there as well.

instructions in affected programs:     1857 -> 1810 (-2.53%)
helped:                                15

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-11 16:19:52 -07:00
Ilia Mirkin
9fe458335f nv50,nvc0: don't base decisions on available pushbuf space
We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: mesa-stable@lists.freedesktop.org
2015-10-11 17:57:04 -04:00
Ilia Mirkin
8053c9208f nouveau: avoid emitting new fences unnecessarily
Right now we emit on every kick, but this is only necessary if something
will ever be able to observe that the fence completed. If there are no
refs, leave the fence alone and emit it another day.

This also happens to work around an issue for the kick handler -- a kick
can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be
due to lack of space in the pushbuf. We want the emit to happen in the
current batch, so we want there to always be enough space. However an
explicit kick could take the reserved space for the implicitly-triggered
kick's fence emission if it happened right after. With the new mechanism,
hopefully there's no way to cause two fences to be emitted into the same
reserved space.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: mesa-stable@lists.freedesktop.org
2015-10-11 17:57:04 -04:00
Samuel Pitoiset
06abd1a25e nvc0: make use of NVC0_COMPUTE_CLASS for GF110
In theory, GF110+ should also support NVC8_COMPUTE_CLASS but, in practice,
a ILLEGAL_CLASS dmesg fail appears when using it.

This fixes compute support and MP performance counters on GF110.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-10 22:11:03 +02:00
Kenneth Graunke
a23bdd1fae i965/gs: Make MAX_GS_INPUT_VERTICES a #define in brw_context.h.
For scalar VS, I'll need this in brw_fs.cpp as well.  It seems silly to
redeclare it in three places.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-10 11:40:19 -07:00
Kenneth Graunke
2953c3d761 i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.
Previously, we used nir_lower_io with the scalar type_size function,
which mapped VERT_ATTRIB_* locations to...some numbers.  Then, in
fs_visitor::nir_setup_inputs(), we created temporaries indexed by
those numbers, and emitted MOVs from the actual ATTR registers to
those temporaries.  Virtually all of these were copy propagated away,
but it's still ugly.

This patch reworks our input lowering to produce NIR lower_input
intrinsics that properly index into the ATTR file, so we can access
it directly.

No changes in shader-db.

v2: Fix unreachable() message (Ken), update commit message (Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-10 11:40:19 -07:00
Kenneth Graunke
6842ad7912 i965/vs: Fix a subtlety in the nr_attributes == 0 workaround.
nr_attributes is used to compute first_non_payload_grf, which is the
first register we're allowed to use for ordinary register allocation.

The hardware requires us to read at least one pair of values, but we're
completely free to overwrite that garbage register with whatever we like.

Instead of altering nr_attributes, we should alter urb_read_length, which
only affects the amount we ask the VF to read.  This should save us a
register in trivial cases (which admittedly isn't very useful).

While we're at it, improve the explanation in the comments.

v2: Actually do what I said (caught by Ilia).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-10 11:40:19 -07:00
Kenneth Graunke
031d350132 i965/vs: Unify URB entry size/read length calculations between backends.
Both the vec4 and scalar VS backends had virtually identical URB entry
size and read length calculations.  We can move those up a level to
backend-agnostic code and reuse it for both.

Unfortunately, the backends need to know nr_attributes to compute
first_non_payload_grf, so I had to store that in prog_data.  We could
use urb_read_length, but that's nr_attributes rounded up to a multiple
of two, so doing so would waste a register in some cases.

There's more code to be removed in the vec4 backend, but that will
come in a follow-on patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-10 11:40:19 -07:00
Kenneth Graunke
a4e988f481 i965/cfg: Fix cfg_t::dump() when a block has no immediate dominator.
Switch statements introduce a bogus loop with an unconditional break at
the end of the loop, just before the while...so the while is unreachable
and has no immediate dominator.

v2: With less exuberance

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-10 11:40:19 -07:00
Emil Velikov
2496cfd771 docs: add news item and link release notes for 11.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-10-10 17:10:26 +01:00
Emil Velikov
55a8f072ea docs: add sha256 checksums for 11.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit b4bfea0094)
2015-10-10 17:10:26 +01:00
Emil Velikov
8337a31bcc docs: add release notes for 11.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 914966befc)
2015-10-10 17:10:26 +01:00
924 changed files with 50509 additions and 18190 deletions

View File

@@ -32,6 +32,7 @@ AM_DISTCHECK_CONFIGURE_FLAGS = \
--enable-vdpau \
--enable-xa \
--enable-xvmc \
--disable-llvm-shared-libs \
--with-egl-platforms=x11,wayland,drm \
--with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \
--with-gallium-drivers=i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast

View File

@@ -1 +1 @@
11.1.0-devel
11.1.1

5
bin/.cherry-ignore Normal file
View File

@@ -0,0 +1,5 @@
# As per Marek http://lists.freedesktop.org/archives/mesa-stable/2015-December/003600.html
37208c4fd7b1ec679d10992b42a2811cab8245a5 Revert "radeonsi: disable DCC on Stoney"
# causes regression in xwayland, kde/plasma, mpv, steam ... fdo#92759
839793680f99b8387bee9489733d5071c10f3ace i965: Use MESA_FORMAT_B8G8R8X8_SRGB for RGB visuals

View File

@@ -81,7 +81,7 @@ PRESENTPROTO_REQUIRED=1.0
LIBUDEV_REQUIRED=151
GLPROTO_REQUIRED=1.4.14
LIBOMXIL_BELLAGIO_REQUIRED=0.0
LIBVA_REQUIRED=0.35.0
LIBVA_REQUIRED=0.38.0
VDPAU_REQUIRED=1.1
WAYLAND_REQUIRED=1.2.0
XCB_REQUIRED=1.9.3
@@ -98,7 +98,7 @@ AC_PROG_CXX
AM_PROG_CC_C_O
AM_PROG_AS
AX_CHECK_GNU_MAKE
AC_CHECK_PROGS([PYTHON2], [python2 python])
AC_CHECK_PROGS([PYTHON2], [python2.7 python2 python])
AC_PROG_SED
AC_PROG_MKDIR_P
@@ -107,6 +107,8 @@ AC_SYS_LARGEFILE
LT_PREREQ([2.2])
LT_INIT([disable-static])
AC_CHECK_PROG(RM, rm, [rm -f])
AX_PROG_BISON([],
AS_IF([test ! -f "$srcdir/src/glsl/glcpp/glcpp-parse.c"],
[AC_MSG_ERROR([bison not found - unable to compile glcpp-parse.y])]))
@@ -374,10 +376,11 @@ save_CFLAGS="$CFLAGS"
CFLAGS="$SSE41_CFLAGS $CFLAGS"
AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
#include <smmintrin.h>
int param;
int main () {
__m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c;
__m128i a = _mm_set1_epi32 (param), b = _mm_set1_epi32 (param + 1), c;
c = _mm_max_epu32(a, b);
return 0;
return _mm_cvtsi128_si32(c);
}]])], SSE41_SUPPORTED=1)
CFLAGS="$save_CFLAGS"
if test "x$SSE41_SUPPORTED" = x1; then
@@ -765,6 +768,11 @@ linux*)
dri3_default=no
;;
esac
if test "x$enable_dri" = xno; then
dri3_default=no
fi
AC_ARG_ENABLE([dri3],
[AS_HELP_STRING([--enable-dri3],
[enable DRI3 @<:@default=auto@:>@])],
@@ -864,7 +872,7 @@ GALLIUM_DRIVERS_DEFAULT="r300,r600,svga,swrast"
AC_ARG_WITH([gallium-drivers],
[AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@],
[comma delimited Gallium drivers list, e.g.
"i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4"
"i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4,virgl"
@<:@default=r300,r600,svga,swrast@:>@])],
[with_gallium_drivers="$withval"],
[with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"])
@@ -954,8 +962,13 @@ gnu*|cygwin*)
dri_platform='drm' ;;
esac
if test "x$enable_dri" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes; then
have_drisw_kms='yes'
fi
AM_CONDITIONAL(HAVE_DRICOMMON, test "x$enable_dri" = xyes )
AM_CONDITIONAL(HAVE_DRISW, test "x$enable_dri" = xyes )
AM_CONDITIONAL(HAVE_DRISW_KMS, test "x$have_drisw_kms" = xyes )
AM_CONDITIONAL(HAVE_DRI2, test "x$enable_dri" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes )
AM_CONDITIONAL(HAVE_DRI3, test "x$enable_dri3" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes )
AM_CONDITIONAL(HAVE_APPLEDRI, test "x$enable_dri" = xyes -a "x$dri_platform" = xapple )
@@ -990,10 +1003,6 @@ if test -n "$with_gallium_drivers" -a "x$enable_glx$enable_xlib_glx" = xyesyes;
NEED_WINSYS_XLIB="yes"
fi
if test "x$enable_dri" = xyes; then
enable_gallium_loader="$enable_shared_pipe_drivers"
fi
if test "x$enable_gallium_osmesa" = xyes; then
if ! echo "$with_gallium_drivers" | grep -q 'swrast'; then
AC_MSG_ERROR([gallium_osmesa requires the gallium swrast driver])
@@ -1224,7 +1233,8 @@ xyesno)
if test x"$enable_dri3" = xyes; then
PKG_CHECK_EXISTS([xcb >= $XCB_REQUIRED], [], AC_MSG_ERROR([DRI3 requires xcb >= $XCB_REQUIRED]))
dri_modules="$dri_modules xcb-dri3 xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED"
dri3_modules="xcb-dri3 xcb-present xcb-sync xshmfence >= $XSHMFENCE_REQUIRED"
PKG_CHECK_MODULES([XCB_DRI3], [$dri3_modules])
fi
fi
if test x"$dri_platform" = xapple ; then
@@ -1565,6 +1575,12 @@ if test "x$enable_egl" = xyes; then
if test "x$enable_shared_glapi" = xno; then
AC_MSG_ERROR([egl_dri2 requires --enable-shared-glapi])
fi
if test "x$enable_dri3" = xyes; then
HAVE_EGL_DRIVER_DRI3=1
if test "x$enable_shared_glapi" = xno; then
AC_MSG_ERROR([egl_dri3 requires --enable-shared-glapi])
fi
fi
else
# Avoid building an "empty" libEGL. Drop/update this
# when other backends (haiku?) come along.
@@ -1576,6 +1592,8 @@ fi
AM_CONDITIONAL(HAVE_EGL, test "x$enable_egl" = xyes)
AC_SUBST([EGL_LIB_DEPS])
gallium_st="mesa"
dnl
dnl XA configuration
dnl
@@ -1588,7 +1606,7 @@ if test "x$enable_xa" = xyes; then
enabling XA.
Example: ./configure --enable-xa --with-gallium-drivers=svga...])
fi
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st xa"
fi
AM_CONDITIONAL(HAVE_ST_XA, test "x$enable_xa" = xyes)
@@ -1633,25 +1651,25 @@ AM_CONDITIONAL(NEED_GALLIUM_VL_WINSYS, test "x$need_gallium_vl_winsys" = xyes)
if test "x$enable_xvmc" = xyes; then
PKG_CHECK_MODULES([XVMC], [xvmc >= $XVMC_REQUIRED])
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st xvmc"
fi
AM_CONDITIONAL(HAVE_ST_XVMC, test "x$enable_xvmc" = xyes)
if test "x$enable_vdpau" = xyes; then
PKG_CHECK_MODULES([VDPAU], [vdpau >= $VDPAU_REQUIRED])
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st vdpau"
fi
AM_CONDITIONAL(HAVE_ST_VDPAU, test "x$enable_vdpau" = xyes)
if test "x$enable_omx" = xyes; then
PKG_CHECK_MODULES([OMX], [libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED])
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st omx"
fi
AM_CONDITIONAL(HAVE_ST_OMX, test "x$enable_omx" = xyes)
if test "x$enable_va" = xyes; then
PKG_CHECK_MODULES([VA], [libva >= $LIBVA_REQUIRED])
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st va"
fi
AM_CONDITIONAL(HAVE_ST_VA, test "x$enable_va" = xyes)
@@ -1673,7 +1691,7 @@ if test "x$enable_nine" = xyes; then
AC_MSG_WARN([using nine together with wine requires DRI3 enabled system])
fi
enable_gallium_loader=$enable_shared_pipe_drivers
gallium_st="$gallium_st nine"
fi
AM_CONDITIONAL(HAVE_ST_NINE, test "x$enable_nine" = xyes)
@@ -1688,7 +1706,15 @@ AC_ARG_WITH([clang-libdir],
[CLANG_LIBDIR=''])
PKG_CHECK_EXISTS([libclc], [have_libclc=yes], [have_libclc=no])
AC_CHECK_LIB([elf], [elf_memory], [have_libelf=yes;ELF_LIB=-lelf])
PKG_CHECK_MODULES([LIBELF], [libelf], [have_libelf=yes], [have_libelf=no])
if test "x$have_libelf" = xno; then
LIBELF_LIBS=''
LIBELF_CFLAGS=''
AC_CHECK_LIB([elf], [elf_memory], [have_libelf=yes;LIBELF_LIBS=-lelf], [have_libelf=no])
AC_SUBST([LIBELF_LIBS])
AC_SUBST([LIBELF_CFLAGS])
fi
if test "x$enable_opencl" = xyes; then
if test -z "$with_gallium_drivers"; then
@@ -1711,8 +1737,7 @@ if test "x$enable_opencl" = xyes; then
AC_SUBST([LIBCLC_LIBEXECDIR])
fi
# XXX: Use $enable_shared_pipe_drivers once converted to use static/shared pipe-drivers
enable_gallium_loader=yes
gallium_st="$gallium_st clover"
if test "x$enable_opencl_icd" = xyes; then
OPENCL_LIBNAME="MesaOpenCL"
@@ -1992,10 +2017,6 @@ AC_SUBST([XVMC_LIB_INSTALL_DIR])
dnl
dnl Gallium Tests
dnl
if test "x$enable_gallium_tests" = xyes; then
# XXX: Use $enable_shared_pipe_drivers once converted to use static/shared pipe-drivers
enable_gallium_loader=yes
fi
AM_CONDITIONAL(HAVE_GALLIUM_TESTS, test "x$enable_gallium_tests" = xyes)
dnl Directory for VDPAU libs
@@ -2050,14 +2071,8 @@ gallium_require_llvm() {
}
gallium_require_drm_loader() {
if test "x$enable_gallium_loader" = xyes; then
if test "x$need_pci_id$have_pci_id" = xyesno; then
AC_MSG_ERROR([Gallium drm loader requires libudev >= $LIBUDEV_REQUIRED or sysfs])
fi
enable_gallium_drm_loader=yes
fi
if test "x$enable_va" = xyes && test "x$7" != x; then
GALLIUM_TARGET_DIRS="$GALLIUM_TARGET_DIRS $7"
if test "x$need_pci_id$have_pci_id" = xyesno; then
AC_MSG_ERROR([Gallium drm loader requires libudev >= $LIBUDEV_REQUIRED or sysfs])
fi
}
@@ -2172,7 +2187,15 @@ if test -n "$with_gallium_drivers"; then
gallium_require_drm_loader
PKG_CHECK_MODULES([SIMPENROSE], [simpenrose],
[USE_VC4_SIMULATOR=yes], [USE_VC4_SIMULATOR=no])
[USE_VC4_SIMULATOR=yes;
DEFINES="$DEFINES -DUSE_VC4_SIMULATOR"],
[USE_VC4_SIMULATOR=no])
;;
xvirgl)
HAVE_GALLIUM_VIRGL=yes
gallium_require_drm "virgl"
gallium_require_drm_loader
require_egl_drm "virgl"
;;
*)
AC_MSG_ERROR([Unknown Gallium driver: $driver])
@@ -2245,26 +2268,19 @@ AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test "x$HAVE_GALLIUM_FREEDRENO" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test "x$HAVE_GALLIUM_SOFTPIPE" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test "x$HAVE_GALLIUM_LLVMPIPE" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_VC4, test "x$HAVE_GALLIUM_VC4" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_VIRGL, test "x$HAVE_GALLIUM_VIRGL" = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_STATIC_TARGETS, test "x$enable_shared_pipe_drivers" = xno)
# NOTE: anything using xcb or other client side libs ends up in separate
# _CLIENT variables. The pipe loader is built in two variants,
# one that is standalone and does not link any x client libs (for
# use by XA tracker in particular, but could be used in any case
# where communication with xserver is not desired).
if test "x$enable_gallium_loader" = xyes; then
if test "x$enable_dri" = xyes; then
GALLIUM_PIPE_LOADER_DEFINES="$GALLIUM_PIPE_LOADER_DEFINES -DHAVE_PIPE_LOADER_DRI"
fi
if test "x$enable_gallium_drm_loader" = xyes; then
GALLIUM_PIPE_LOADER_DEFINES="$GALLIUM_PIPE_LOADER_DEFINES -DHAVE_PIPE_LOADER_DRM"
fi
AC_SUBST([GALLIUM_PIPE_LOADER_DEFINES])
if test "x$enable_dri" = xyes; then
GALLIUM_PIPE_LOADER_DEFINES="$GALLIUM_PIPE_LOADER_DEFINES -DHAVE_PIPE_LOADER_DRI"
fi
if test "x$have_drisw_kms" = xyes; then
GALLIUM_PIPE_LOADER_DEFINES="$GALLIUM_PIPE_LOADER_DEFINES -DHAVE_PIPE_LOADER_KMS"
fi
AC_SUBST([GALLIUM_PIPE_LOADER_DEFINES])
AM_CONDITIONAL(HAVE_I915_DRI, test x$HAVE_I915_DRI = xyes)
AM_CONDITIONAL(HAVE_I965_DRI, test x$HAVE_I965_DRI = xyes)
AM_CONDITIONAL(HAVE_NOUVEAU_DRI, test x$HAVE_NOUVEAU_DRI = xyes)
@@ -2278,8 +2294,6 @@ AM_CONDITIONAL(NEED_RADEON_DRM_WINSYS, test "x$HAVE_GALLIUM_R300" = xyes -o \
AM_CONDITIONAL(NEED_WINSYS_XLIB, test "x$NEED_WINSYS_XLIB" = xyes)
AM_CONDITIONAL(NEED_RADEON_LLVM, test x$NEED_RADEON_LLVM = xyes)
AM_CONDITIONAL(USE_R600_LLVM_COMPILER, test x$USE_R600_LLVM_COMPILER = xyes)
AM_CONDITIONAL(HAVE_LOADER_GALLIUM, test x$enable_gallium_loader = xyes)
AM_CONDITIONAL(HAVE_DRM_LOADER_GALLIUM, test x$enable_gallium_drm_loader = xyes)
AM_CONDITIONAL(HAVE_GALLIUM_COMPUTE, test x$enable_opencl = xyes)
AM_CONDITIONAL(HAVE_MESA_LLVM, test x$MESA_LLVM = x1)
AM_CONDITIONAL(USE_VC4_SIMULATOR, test x$USE_VC4_SIMULATOR = xyes)
@@ -2287,8 +2301,6 @@ if test "x$USE_VC4_SIMULATOR" = xyes -a "x$HAVE_GALLIUM_ILO" = xyes; then
AC_MSG_ERROR([VC4 simulator on x86 replaces i965 driver build, so ilo must be disabled.])
fi
AC_SUBST([ELF_LIB])
AM_CONDITIONAL(HAVE_LIBDRM, test "x$have_libdrm" = xyes)
AM_CONDITIONAL(HAVE_X11_DRIVER, test "x$enable_xlib_glx" = xyes)
AM_CONDITIONAL(HAVE_OSMESA, test "x$enable_osmesa" = xyes)
@@ -2365,6 +2377,7 @@ AC_CONFIG_FILES([Makefile
src/gallium/drivers/svga/Makefile
src/gallium/drivers/trace/Makefile
src/gallium/drivers/vc4/Makefile
src/gallium/drivers/virgl/Makefile
src/gallium/state_trackers/clover/Makefile
src/gallium/state_trackers/dri/Makefile
src/gallium/state_trackers/glx/xlib/Makefile
@@ -2405,6 +2418,8 @@ AC_CONFIG_FILES([Makefile
src/gallium/winsys/sw/wrapper/Makefile
src/gallium/winsys/sw/xlib/Makefile
src/gallium/winsys/vc4/drm/Makefile
src/gallium/winsys/virgl/drm/Makefile
src/gallium/winsys/virgl/vtest/Makefile
src/gbm/Makefile
src/gbm/main/gbm.pc
src/glsl/Makefile
@@ -2498,6 +2513,9 @@ if test "$enable_egl" = yes; then
if test "x$HAVE_EGL_DRIVER_DRI2" != "x"; then
egl_drivers="$egl_drivers builtin:egl_dri2"
fi
if test "x$HAVE_EGL_DRIVER_DRI3" != "x"; then
egl_drivers="$egl_drivers builtin:egl_dri3"
fi
echo " EGL drivers: $egl_drivers"
fi
@@ -2513,7 +2531,8 @@ fi
echo ""
if test -n "$with_gallium_drivers"; then
echo " Gallium: yes"
echo " Gallium drivers: $gallium_drivers"
echo " Gallium st: $gallium_st"
else
echo " Gallium: no"
fi

View File

@@ -96,18 +96,18 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi
GL_ARB_draw_buffers_blend DONE (i965, nv50, r600, llvmpipe, softpipe)
GL_ARB_draw_indirect DONE (i965, r600, llvmpipe, softpipe)
GL_ARB_gpu_shader5 DONE (i965)
GL_ARB_gpu_shader5 DONE (i965, r600)
- 'precise' qualifier DONE
- Dynamically uniform sampler array indices DONE (r600, softpipe)
- Dynamically uniform UBO array indices DONE (r600)
- Dynamically uniform sampler array indices DONE (softpipe)
- Dynamically uniform UBO array indices DONE ()
- Implicit signed -> unsigned conversions DONE
- Fused multiply-add DONE ()
- Packing/bitfield/conversion functions DONE (r600, softpipe)
- Enhanced textureGather DONE (r600, softpipe)
- Geometry shader instancing DONE (r600, llvmpipe, softpipe)
- Packing/bitfield/conversion functions DONE (softpipe)
- Enhanced textureGather DONE (softpipe)
- Geometry shader instancing DONE (llvmpipe, softpipe)
- Geometry shader multiple streams DONE ()
- Enhanced per-sample shading DONE (r600)
- Interpolation functions DONE (r600)
- Enhanced per-sample shading DONE ()
- Interpolation functions DONE ()
- New overload resolution rules DONE
GL_ARB_gpu_shader_fp64 DONE (r600, llvmpipe, softpipe)
GL_ARB_sample_shading DONE (i965, nv50, r600)
@@ -149,14 +149,14 @@ GL 4.2, GLSL 4.20:
GL 4.3, GLSL 4.30:
GL_ARB_arrays_of_arrays started (Timothy)
GL_ARB_arrays_of_arrays DONE (i965)
GL_ARB_ES3_compatibility DONE (all drivers that support GLSL 3.30)
GL_ARB_clear_buffer_object DONE (all drivers)
GL_ARB_compute_shader in progress (jljusten)
GL_ARB_copy_image DONE (i965) (gallium - in progress, VMware)
GL_ARB_copy_image DONE (i965, nv50, nvc0, radeonsi)
GL_KHR_debug DONE (all drivers)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_fragment_layer_viewport DONE (nv50, nvc0, r600, radeonsi, llvmpipe)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe)
GL_ARB_framebuffer_no_attachments DONE (i965)
GL_ARB_internalformat_query2 not started
GL_ARB_invalidate_subdata DONE (all drivers)
@@ -169,7 +169,7 @@ GL 4.3, GLSL 4.30:
GL_ARB_texture_buffer_range DONE (nv50, nvc0, i965, r600, radeonsi, llvmpipe)
GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_texture_view DONE (i965, nv50, nvc0, llvmpipe, softpipe)
GL_ARB_texture_view DONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_vertex_attrib_binding DONE (all drivers)
@@ -177,9 +177,9 @@ GL 4.4, GLSL 4.40:
GL_MAX_VERTEX_ATTRIB_STRIDE DONE (all drivers)
GL_ARB_buffer_storage DONE (i965, nv50, nvc0, r600, radeonsi)
GL_ARB_clear_texture DONE (i965) (gallium - in progress, VMware)
GL_ARB_clear_texture DONE (i965, nv50, nvc0)
GL_ARB_enhanced_layouts in progress (Timothy)
- compile-time constant expressions in progress
- compile-time constant expressions DONE
- explicit byte offsets for blocks in progress
- forced alignment within blocks in progress
- specified vec4-slot component numbers in progress
@@ -209,7 +209,7 @@ GL 4.5, GLSL 4.50:
These are the extensions cherry-picked to make GLES 3.1
GLES3.1, GLSL ES 3.1
GL_ARB_arrays_of_arrays started (Timothy)
GL_ARB_arrays_of_arrays DONE (i965)
GL_ARB_compute_shader in progress (jljusten)
GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
@@ -243,7 +243,7 @@ GLES3.2, GLSL ES 3.2
GL_KHR_texture_compression_astc_ldr DONE (i965/gen9+)
GL_OES_copy_image not started (based on GL_ARB_copy_image, which is done for some drivers)
GL_OES_draw_buffers_indexed not started
GL_OES_draw_elements_base_vertex not started (based on GL_ARB_draw_elements_base_vertex, which is done for all drivers)
GL_OES_draw_elements_base_vertex DONE (all drivers)
GL_OES_geometry_shader not started (based on GL_ARB_geometry_shader4, which is done for all drivers)
GL_OES_gpu_shader5 not started (based on parts of GL_ARB_gpu_shader5, which is done for some drivers)
GL_OES_primitive_bounding box not started

View File

@@ -2,8 +2,8 @@ The software may implement third party technologies (e.g. third party
libraries) that are not licensed to you by AMD and for which you may need
to obtain licenses from other parties. Unless explicitly stated otherwise,
these third party technologies are not licensed hereunder. Such third
party technologies include, but are not limited, to H.264, MPEG-2, MPEG-4,
AVC, and VC-1.
party technologies include, but are not limited, to H.264, H.265, HEVC, MPEG-2,
MPEG-4, AVC, and VC-1.
For MPEG-2 Encoding Products ANY USE OF THIS PRODUCT IN ANY MANNER OTHER
THAN PERSONAL USE THAT COMPLIES WITH THE MPEG-2 STANDARD FOR ENCODING VIDEO

View File

@@ -179,6 +179,14 @@ Mesa EGL supports different sets of environment variables. See the
<li>GALLIUM_HUD - draws various information on the screen, like framerate,
cpu load, driver statistics, performance counters, etc.
Set GALLIUM_HUD=help and run e.g. glxgears for more info.
<li>GALLIUM_HUD_PERIOD - sets the hud update rate in seconds (float). Use zero
to update every frame. The default period is 1/2 second.
<li>GALLIUM_HUD_VISIBLE - control default visibility, defaults to true.
<li>GALLIUM_HUD_TOGGLE_SIGNAL - toggle visibility via user specified signal.
Especially useful to toggle hud at specific points of application and
disable for unencumbered viewing the rest of the time. For example, set
GALLIUM_HUD_VISIBLE to false and GALLIUM_HUD_SIGNAL_TOGGLE to 10 (SIGUSR1).
Use kill -10 <pid> to toggle the hud as desired.
<li>GALLIUM_LOG_FILE - specifies a file for logging all errors, warnings, etc.
rather than stderr.
<li>GALLIUM_PRINT_OPTIONS - if non-zero, print all the Gallium environment
@@ -230,6 +238,12 @@ for details.
</ul>
<h3>VA-API state tracker environment variables</h3>
<ul>
<li>VAAPI_MPEG4_ENABLED - enable MPEG4 for VA-API, disabled by default.
</ul>
<p>
Other Gallium drivers have their own environment variables. These may change
frequently so the source code should be consulted for details.

View File

@@ -16,13 +16,37 @@
<h1>News</h1>
<h2>November 21, 2015</h2>
<p>
<a href="relnotes/11.0.6.html">Mesa 11.0.6</a> is released.
This is a bug-fix release.
</p>
<h2>November 11, 2015</h2>
<p>
<a href="relnotes/11.0.5.html">Mesa 11.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>October 24, 2015</h2>
<p>
<a href="relnotes/11.0.4.html">Mesa 11.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>October 10, 2015</h2>
<p>
<a href="relnotes/11.0.3.html">Mesa 11.0.3</a> is released.
This is a bug-fix release.
</p>
<h2>October 3, 2015</h2>
<p>
<a href="relnotes/10.6.9.html">Mesa 10.6.9</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 10.6.9 will be the final release in the 10.6
series. Users of 10.5 are encouraged to migrate to the 11.0 series in order
series. Users of 10.6 are encouraged to migrate to the 11.0 series in order
to obtain future fixes.
</p>

View File

@@ -21,6 +21,10 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/11.0.6.html">11.0.6 release notes</a>
<li><a href="relnotes/11.0.5.html">11.0.5 release notes</a>
<li><a href="relnotes/11.0.4.html">11.0.4 release notes</a>
<li><a href="relnotes/11.0.3.html">11.0.3 release notes</a>
<li><a href="relnotes/10.6.9.html">10.6.9 release notes</a>
<li><a href="relnotes/11.0.2.html">11.0.2 release notes</a>
<li><a href="relnotes/11.0.1.html">11.0.1 release notes</a>

185
docs/relnotes/11.0.3.html Normal file
View File

@@ -0,0 +1,185 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.3 Release Notes / October 10, 2015</h1>
<p>
Mesa 11.0.3 is a bug fix release which fixes bugs found since the 11.0.2 release.
</p>
<p>
Mesa 11.0.3 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
c2210e3daecc10ed9fdcea500327652ed6effc2f47c4b9cee63fb08f560d7117 mesa-11.0.3.tar.gz
ab2992eece21adc23c398720ef8c6933cb69ea42e1b2611dc09d031e17e033d6 mesa-11.0.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91044">Bug 91044</a> - piglit spec/egl_khr_create_context/valid debug flag gles* fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91718">Bug 91718</a> - piglit.spec.arb_shader_image_load_store.invalid causes intermittent GPU HANG</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92072">Bug 92072</a> - Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92265">Bug 92265</a> - Black windows in weston after update mesa to 11.0.2-1</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: try PIPE_BIND_RENDER_TARGET when choosing float texture formats</li>
</ul>
<p>Daniel Scharrer (1):</p>
<ul>
<li>mesa: Add abs input modifier to base for POW in ffvertex_prog</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.2</li>
<li>Revert "nouveau: make sure there's always room to emit a fence"</li>
<li>Update version to 11.0.3</li>
</ul>
<p>Francisco Jerez (1):</p>
<ul>
<li>i965/fs: Fix hang on IVB and VLV with image format mismatch.</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>meta: Handle array textures in scaled MSAA blits</li>
</ul>
<p>Ilia Mirkin (6):</p>
<ul>
<li>nouveau: be more careful about freeing temporary transfer buffers</li>
<li>nouveau: delay deleting buffer with unflushed fence</li>
<li>nouveau: wait to unref the transfer's bo until it's no longer used</li>
<li>nv30: pretend to have packed texture/surface formats</li>
<li>nv30: always go through translate module on big-endian</li>
<li>nouveau: make sure there's always room to emit a fence</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>mesa: Correctly handle GL_BGRA_EXT in ES3 format_and_type checks</li>
</ul>
<p>Kyle Brenneman (3):</p>
<ul>
<li>glx: Fix build errors with --enable-mangling (v2)</li>
<li>mapi: Make _glapi_get_stub work with "gl" or "mgl" prefix.</li>
<li>glx: Don't hard-code the name "libGL.so.1" in driOpenDriver (v3)</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>radeon/vce: fix vui time_scale zero error</li>
</ul>
<p>Marek Olšák (21):</p>
<ul>
<li>st/mesa: fix front buffer regression after dropping st_validate_state in Blit</li>
<li>radeonsi: handle index buffer alloc failures</li>
<li>radeonsi: handle constant buffer alloc failures</li>
<li>gallium/radeon: handle buffer_map staging buffer failures better</li>
<li>gallium/radeon: handle buffer alloc failures in r600_draw_rectangle</li>
<li>gallium/radeon: add a fail path for depth MSAA texture readback</li>
<li>radeonsi: report alloc failure from si_shader_binary_read</li>
<li>radeonsi: add malloc fail paths to si_create_shader_state</li>
<li>radeonsi: skip drawing if the tess factor ring allocation fails</li>
<li>radeonsi: skip drawing if GS ring allocations fail</li>
<li>radeonsi: handle shader precompile failures</li>
<li>radeonsi: handle fixed-func TCS shader create failure</li>
<li>radeonsi: skip drawing if VS, TCS, TES, GS fail to compile or upload</li>
<li>radeonsi: skip drawing if PS fails to compile or upload</li>
<li>radeonsi: skip drawing if updating the scratch buffer fails</li>
<li>radeonsi: don't forget to update scratch relocations for LS, HS, ES shaders</li>
<li>radeonsi: handle dummy constant buffer allocation failure</li>
<li>gallium/u_blitter: handle allocation failures</li>
<li>radeonsi: add scratch buffer to the buffer list when it's re-allocated</li>
<li>st/dri: don't use _ctx in client_wait_sync</li>
<li>egl/dri2: don't require a context for ClientWaitSync (v2)</li>
</ul>
<p>Matthew Waters (1):</p>
<ul>
<li>egl: rework handling EGL_CONTEXT_FLAGS</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>st/dri: Use packed RGB formats</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>mesa: fix mipmap generation for immutable, compressed textures</li>
</ul>
<p>Tom Stellard (3):</p>
<ul>
<li>gallium/radeon: Use call_once() when initailizing LLVM targets</li>
<li>gallivm: Allow drivers and state trackers to initialize gallivm LLVM targets v2</li>
<li>radeon/llvm: Initialize gallivm targets when initializing the AMDGPU target v2</li>
</ul>
<p>Varad Gautam (1):</p>
<ul>
<li>egl: restore surface type before linking config to its display</li>
</ul>
<p>Ville Syrjälä (3):</p>
<ul>
<li>i830: Fix collision between I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0)</li>
<li>i915: Fix texcoord vs. varying collision in fragment programs</li>
<li>i915: Remember to call intel_prepare_render() before blitting</li>
</ul>
</div>
</body>
</html>

168
docs/relnotes/11.0.4.html Normal file
View File

@@ -0,0 +1,168 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.4 Release Notes / October 24, 2015</h1>
<p>
Mesa 11.0.4 is a bug fix release which fixes bugs found since the 11.0.3 release.
</p>
<p>
Mesa 11.0.4 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
ed412ca6a46d1bd055120e5c12806c15419ae8c4dd6d3f6ea20a83091d5c78bf mesa-11.0.4.tar.gz
40201bf7fc6fa12a6d9edfe870b41eb4dd6669154e3c42c48a96f70805f5483d mesa-11.0.4.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86281">Bug 86281</a> - brw_meta_fast_clear (brw=brw&#64;entry=0x7fffd4097a08, fb=fb&#64;entry=0x7fffd40fa900, buffers=buffers&#64;entry=2, partial_clear=partial_clear&#64;entry=false)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86720">Bug 86720</a> - [radeon] Europa Universalis 4 freezing during game start (10.3.3+, still broken on 11.0.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91788">Bug 91788</a> - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92304">Bug 92304</a> - [cts] cts.shaders.negative conformance tests fail</li>
</ul>
<h2>Changes</h2>
<p>Alejandro Piñeiro (2):</p>
<ul>
<li>i965/vec4: check writemask when bailing out at register coalesce</li>
<li>i965/vec4: fill src_reg type using the constructor type parameter</li>
</ul>
<p>Brian Paul (2):</p>
<ul>
<li>vbo: fix incorrect switch statement in init_mat_currval()</li>
<li>mesa: fix incorrect opcode in save_BlendFunci()</li>
</ul>
<p>Chih-Wei Huang (3):</p>
<ul>
<li>mesa: android: Fix the incorrect path of sse_minmax.c</li>
<li>nv50/ir: use C++11 standard std::unordered_map if possible</li>
<li>nv30: include the header of ffs prototype</li>
</ul>
<p>Chris Wilson (1):</p>
<ul>
<li>i965: Remove early release of DRI2 miptree</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>mesa/uniforms: fix get_uniform for doubles (v2)</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.3</li>
</ul>
<p>Francisco Jerez (5):</p>
<ul>
<li>i965: Don't tell the hardware about our UAV access.</li>
<li>mesa: Expose function to calculate whether a shader image unit is valid.</li>
<li>mesa: Skip redundant texture completeness checking during image validation.</li>
<li>i965: Use _mesa_is_image_unit_valid() instead of gl_image_unit::_Valid.</li>
<li>mesa: Get rid of texture-dependent image unit derived state.</li>
</ul>
<p>Ian Romanick (8):</p>
<ul>
<li>glsl: Allow built-in functions as constant expressions in OpenGL ES 1.00</li>
<li>ff_fragment_shader: Use binding to set the sampler unit</li>
<li>glsl/linker: Use constant_initializer instead of constant_value to initialize uniforms</li>
<li>glsl: Use constant_initializer instead of constant_value to determine whether to keep an unused uniform</li>
<li>glsl: Only set ir_variable::constant_value for const-decorated variables</li>
<li>glsl: Restrict initializers for global variables to constant expression in ES</li>
<li>glsl: Add method to determine whether an expression contains the sequence operator</li>
<li>glsl: In later GLSL versions, sequence operator is cannot be a constant expression</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nouveau: make sure there's always room to emit a fence</li>
</ul>
<p>Indrajit Das (1):</p>
<ul>
<li>st/va: Used correct parameter to derive the value of the "h" variable in vlVaCreateImage</li>
</ul>
<p>Jonathan Gray (1):</p>
<ul>
<li>configure.ac: ensure RM is set</li>
</ul>
<p>Krzysztof Sobiecki (1):</p>
<ul>
<li>st/fbo: use pipe_surface_release instead of pipe_surface_reference</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>st/omx/dec/h264: fix field picture type 0 poc disorder</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>st/mesa: fix clip state dependencies</li>
<li>radeonsi: fix a GS copy shader leak</li>
<li>gallium: add PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>u_vbuf: fix vb slot assignment for translated buffers</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno/a3xx: cache-flush is needed after MEM_WRITE</li>
</ul>
<p>Tapani Pälli (3):</p>
<ul>
<li>mesa: add GL_UNSIGNED_INT_24_8 to _mesa_pack_depth_span</li>
<li>mesa: Set api prefix to version string when overriding version</li>
<li>mesa: fix ARRAY_SIZE query for GetProgramResourceiv</li>
</ul>
</div>
</body>
</html>

174
docs/relnotes/11.0.5.html Normal file
View File

@@ -0,0 +1,174 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.5 Release Notes / November 11, 2015</h1>
<p>
Mesa 11.0.5 is a bug fix release which fixes bugs found since the 11.0.4 release.
</p>
<p>
Mesa 11.0.5 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
8495ef5c06f7f726452462b7d408a5b40048373ff908f2283a3b4d1f49b45ee6 mesa-11.0.5.tar.gz
9c255a2a6695fcc6ef4a279e1df0aeaf417dc142f39ee59dfb533d80494bb67a mesa-11.0.5.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92437">Bug 92437</a> - osmesa: Expose GL entry points for Windows build, via .def file</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92476">Bug 92476</a> - [cts] ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92623">Bug 92623</a> - Differences in prog_data ignored when caching fragment programs (causes hangs)</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeon/uvd: don't expose HEVC on old UVD hw (v3)</li>
</ul>
<p>Ben Widawsky (1):</p>
<ul>
<li>i965/skl: Add GT4 PCI IDs</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.4</li>
<li>cherry-ignore: ignore a possible wrong nomination</li>
<li>Revert "mesa/glformats: Undo code changes from _mesa_base_tex_format() move"</li>
<li>Update version to 11.0.5</li>
</ul>
<p>Emmanuel Gil Peyrot (1):</p>
<ul>
<li>gbm.h: Add a missing stddef.h include for size_t.</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: When the create ioctl fails, free our cache and try again.</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>i965: Fix is-renderable check in intel_image_target_renderbuffer_storage</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nvc0: respect edgeflag attribute width</li>
<li>nouveau: set MaxDrawBuffers to the same value as MaxColorAttachments</li>
<li>nouveau: relax fence emit space assert</li>
</ul>
<p>Ivan Kalvachev (1):</p>
<ul>
<li>r600g: Fix special negative immediate constants when using ABS modifier.</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>nir/lower_vec_to_movs: Pass the shader around directly</li>
<li>nir: Report progress from lower_vec_to_movs().</li>
</ul>
<p>Jose Fonseca (2):</p>
<ul>
<li>gallivm: Translate all util_cpu_caps bits to LLVM attributes.</li>
<li>gallivm: Explicitly disable unsupported CPU features.</li>
</ul>
<p>Julien Isorce (4):</p>
<ul>
<li>st/va: pass picture desc to begin and decode</li>
<li>nvc0: fix crash when nv50_miptree_from_handle fails</li>
<li>st/va: do not destroy old buffer when new one failed</li>
<li>st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer</li>
</ul>
<p>Kenneth Graunke (6):</p>
<ul>
<li>i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.</li>
<li>nir: Report progress from nir_split_var_copies().</li>
<li>nir: Properly invalidate metadata in nir_split_var_copies().</li>
<li>nir: Properly invalidate metadata in nir_opt_copy_prop().</li>
<li>nir: Properly invalidate metadata in nir_lower_vec_to_movs().</li>
<li>nir: Properly invalidate metadata in nir_opt_remove_phis().</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: add register definitions for Stoney</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>mesa/glformats: Undo code changes from _mesa_base_tex_format() move</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>st/mesa: fix mipmap generation for immutable textures with incomplete pyramids</li>
</ul>
<p>Nigel Stewart (1):</p>
<ul>
<li>osmesa: Expose GL entry points for Windows build via DEF file.</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>gallivm: disable f16c when not using AVX</li>
</ul>
<p>Samuel Li (2):</p>
<ul>
<li>radeonsi: add support for Stoney asics (v3)</li>
<li>radeonsi: add Stoney pci ids</li>
</ul>
</div>
</body>
</html>

145
docs/relnotes/11.0.6.html Normal file
View File

@@ -0,0 +1,145 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.0.6 Release Notes / November 21, 2015</h1>
<p>
Mesa 11.0.6 is a bug fix release which fixes bugs found since the 11.0.5 release.
</p>
<p>
Mesa 11.0.6 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
4bdf054af66ebabf3eca0616f9f5e44c2f234695661b570261c391bc2f4f7482 mesa-11.0.6.tar.gz
8340e64cdc91999840404c211496f3de38e7b4cb38db34e2f72f1642c5134760 mesa-11.0.6.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91780">Bug 91780</a> - Rendering issues with geometry shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92588">Bug 92588</a> - [HSW,BDW,BSW,SKL-Y][GLES 3.1 CTS] ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 - assert</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92738">Bug 92738</a> - Randon R7 240 doesn't work on 16KiB page size platform</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92860">Bug 92860</a> - [radeonsi][bisected] st/mesa: implement ARB_copy_image - Corruption in ARK Survival Evolved</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92900">Bug 92900</a> - [regression bisected] About 700 piglit regressions is what could go wrong</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: enable optimal raster config setting for fiji (v2)</li>
</ul>
<p>Ben Widawsky (1):</p>
<ul>
<li>i965/skl/gt4: Fix URB programming restriction.</li>
</ul>
<p>Boyuan Zhang (2):</p>
<ul>
<li>st/vaapi: fix vaapi VC-1 simple/main corruption v2</li>
<li>radeon/uvd: fix VC-1 simple/main profile decode v2</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>r600: initialised PGM_RESOURCES_2 for ES/GS</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.5</li>
<li>cherry-ignore: add the swrast front buffer support</li>
<li>automake: use static llvm for make distcheck</li>
<li>Update version to 11.0.6</li>
</ul>
<p>Eric Anholt (3):</p>
<ul>
<li>vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.</li>
<li>vc4: Return NULL when we can't make our shadow for a sampler view.</li>
<li>vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>meta/generate_mipmap: Don't leak the sampler object</li>
<li>meta/generate_mipmap: Only modify the draw framebuffer binding in fallback_required</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>mesa/copyimage: allow width/height to not be multiples of block</li>
<li>nouveau: don't expose HEVC decoding support</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>nir/vars_to_ssa: Rework copy set handling in lower_copies_to_load_store</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>glsl: Allow implicit int -&gt; uint conversions for the % operator.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: initialize SX_PS_DOWNCONVERT to 0 on Stoney</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>winsys/radeon: Use CPU page size instead of hardcoding 4096 bytes v3</li>
</ul>
<p>Oded Gabbay (1):</p>
<ul>
<li>llvmpipe: use simple coeffs calc for 128bit vectors</li>
</ul>
<p>Roland Scheidegger (2):</p>
<ul>
<li>radeon: fix bgrx8/xrgb8 blits</li>
<li>r200: fix bgrx8/xrgb8 blits</li>
</ul>
</div>
</body>
</html>

View File

@@ -14,7 +14,7 @@
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.1.0 Release Notes / TBD</h1>
<h1>Mesa 11.1.0 Release Notes / 15 December 2015</h1>
<p>
Mesa 11.1.0 is a new development release.
@@ -33,7 +33,8 @@ because compatibility contexts are not supported.
<h2>SHA256 checksums</h2>
<pre>
TBD.
e3bc44be4df5e4dc728dfda7b55b1aaeadfce36eca6a367b76cc07598070cb2d mesa-11.1.0.tar.gz
9befe03b04223eb1ede177fa8cac001e2850292c8c12a3ec9929106afad9cf1f mesa-11.1.0.tar.xz
</pre>
@@ -44,23 +45,236 @@ Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>OpenGL 3.1 support on freedreno (a3xx, a4xx)</li>
<li>OpenGL 3.3 support for VMware guest VM driver (supported by Workstation 12
and Fusion 8).
<li>GL_AMD_performance_monitor on nv50</li>
<li>GL_ARB_arrays_of_arrays on i965</li>
<li>GL_ARB_blend_func_extended on freedreno (a3xx)</li>
<li>GL_ARB_clear_texture on nv50, nvc0</li>
<li>GL_ARB_clip_control on freedreno/a4xx</li>
<li>GL_ARB_copy_image on nv50, nvc0, radeonsi</li>
<li>GL_ARB_depth_clamp on freedreno/a4xx</li>
<li>GL_ARB_fragment_layer_viewport on i965 (gen6+)</li>
<li>GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips</li>
<li>GL_ARB_gpu_shader5 on r600 for Evergreen and later chips</li>
<li>GL_ARB_seamless_cubemap_per_texture on freedreno/a4xx</li>
<li>GL_ARB_shader_clock on i965 (gen7+)</li>
<li>GL_ARB_shader_stencil_export on i965 (gen9+)</li>
<li>GL_ARB_shader_storage_buffer_object on i965</li>
<li>GL_ARB_shader_texture_image_samples on i965, nv50, nvc0, r600, radeonsi</li>
<li>GL_ARB_texture_barrier / GL_NV_texture_barrier on i965</li>
<li>GL_ARB_texture_buffer_range on freedreno/a3xx</li>
<li>GL_ARB_texture_compression_bptc on freedreno/a4xx</li>
<li>GL_ARB_texture_query_lod on softpipe</li>
<li>GL_ARB_texture_view on radeonsi and r600 (for evergeen and newer)</li>
<li>GL_ARB_vertex_type_2_10_10_10_rev on freedreno (a3xx, a4xx)</li>
<li>GL_EXT_blend_func_extended on all drivers that support the ARB version</li>
<li>GL_EXT_buffer_storage implemented for when ES 3.1 support is gained</li>
<li>GL_EXT_draw_elements_base_vertex on all drivers</li>
<li>GL_EXT_texture_compression_rgtc / latc on freedreno (a3xx & a4xx)</li>
<li>GL_KHR_debug (GLES)</li>
<li>GL_NV_conditional_render on freedreno</li>
<li>GL_OES_draw_elements_base_vertex on all drivers</li>
<li>EGL_KHR_create_context on softpipe, llvmpipe</li>
<li>EGL_KHR_gl_colorspace on softpipe, llvmpipe</li>
<li>new virgl gallium driver for qemu virtio-gpu</li>
<li>16x multisampling on i965 (gen9+)</li>
<li>GL_EXT_shader_samples_identical on i965.</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=28130">Bug 28130</a> - vbo: premature flushing breaks GL_LINE_LOOP</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=49779">Bug 49779</a> - Extra line segments in GL_LINE_LOOP</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=79783">Bug 79783</a> - Distorted output in obs-studio where other vendors &quot;work&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=80821">Bug 80821</a> - When LIBGL_ALWAYS_SOFTWARE is set, KHR_create_context is not supported</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=81174">Bug 81174</a> - Gallium: GL_LINE_LOOP broken with more than 512 points</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=83508">Bug 83508</a> - [UBO] Assertion for array of blocks</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86281">Bug 86281</a> - brw_meta_fast_clear (brw=brw&#64;entry=0x7fffd4097a08, fb=fb&#64;entry=0x7fffd40fa900, buffers=buffers&#64;entry=2, partial_clear=partial_clear&#64;entry=false)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86469">Bug 86469</a> - Unreal Engine demo doesn't run</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=86720">Bug 86720</a> - [radeon] Europa Universalis 4 freezing during game start (10.3.3+, still broken on 11.0.2)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89014">Bug 89014</a> - PIPE_QUERY_GPU_FINISHED is not acting as expected on SI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90175">Bug 90175</a> - [hsw bisected][PATCH] atomic counters doesn't work for a binding point different to zero</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90631">Bug 90631</a> - Compilation failure for fragment shader with many branches on Sandy Bridge</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90734">Bug 90734</a> - glBufferSubData is corrupting data when buffer is &gt; 32k</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91044">Bug 91044</a> - piglit spec/egl_khr_create_context/valid debug flag gles* fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91114">Bug 91114</a> - ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91718">Bug 91718</a> - piglit.spec.arb_shader_image_load_store.invalid causes intermittent GPU HANG</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91780">Bug 91780</a> - Rendering issues with geometry shader</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91785">Bug 91785</a> - make check DispatchSanity_test.GLES31 regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91788">Bug 91788</a> - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91890">Bug 91890</a> - [nve7] witcher2: blurry image &amp; DATA_ERRORs (class 0xa097 mthd 0x2380/0x238c)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91898">Bug 91898</a> - src/util/mesa-sha1.c:250:25: fatal error: openssl/sha.h: No such file or directory</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91927">Bug 91927</a> - [SKL] [regression] piglit compressed textures tests fail with kernel upgrade</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91930">Bug 91930</a> - Program with GtkGLArea widget does not redraw</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91970">Bug 91970</a> - [BSW regression] dEQP-GLES3.functional.shaders.precision.int.highp_mul_vertex</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91985">Bug 91985</a> - [regression, bisected] FTBFS with commit f9caabe8f1: R600_UCP_CONST_BUFFER is undefined</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92009">Bug 92009</a> - ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92033">Bug 92033</a> - [SNB,regression,dEQP,bisected] functional.shaders.random tests regressed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92052">Bug 92052</a> - nir/nir_builder.h:79: error: expected primary-expression before . token</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92054">Bug 92054</a> - make check gbm-symbols-check regression</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92066">Bug 92066</a> - [ILK,G45,regression] New assertion on BRW_MAX_MRF breaks ilk and g45</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92072">Bug 92072</a> - Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92095">Bug 92095</a> - [Regression, bisected] arb_shader_atomic_counters.compiler.builtins.frag</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92122">Bug 92122</a> - [bisected, cts] Regression with Assault Android Cactus</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92124">Bug 92124</a> - shader_query.cpp:841:34: error: strndup was not declared in this scope</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92183">Bug 92183</a> - linker.cpp:3187:46: error: strtok_r was not declared in this scope</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92193">Bug 92193</a> - [SKL] ES2-CTS.gtf.GL2ExtensionTests.compressed_astc_texture.compressed_astc_texture fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92221">Bug 92221</a> - Unintended code changes in _mesa_base_tex_format commit</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92265">Bug 92265</a> - Black windows in weston after update mesa to 11.0.2-1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92304">Bug 92304</a> - [cts] cts.shaders.negative conformance tests fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92437">Bug 92437</a> - osmesa: Expose GL entry points for Windows build, via .def file</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92476">Bug 92476</a> - [cts] ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92588">Bug 92588</a> - [HSW,BDW,BSW,SKL-Y][GLES 3.1 CTS] ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 - assert</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92621">Bug 92621</a> - [G965 ILK G45] Regression: 24 piglit regressions in glsl-1.10</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92623">Bug 92623</a> - Differences in prog_data ignored when caching fragment programs (causes hangs)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92639">Bug 92639</a> - [Regression bisected] Ogles1conform mustpass.c fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92641">Bug 92641</a> - [SKL BSW] [Regression] Ogles1conform userclip.c fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92645">Bug 92645</a> - kodi vdpau interop fails since mesa,meta: move gl_texture_object::TargetIndex initializations</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92705">Bug 92705</a> - [clover] fail to build with llvm-svn/clang-svn 3.8</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92709">Bug 92709</a> - &quot;LLVM triggered Diagnostic Handler: unsupported call to function ldexpf in main&quot; when starting race in stuntrally</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92738">Bug 92738</a> - Randon R7 240 doesn't work on 16KiB page size platform</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92744">Bug 92744</a> - [g965 Regression bisected] Performance regression and piglit assertions due to liveness analysis</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92770">Bug 92770</a> - [SNB, regression, dEQP] deqp-gles3.functional.shaders.discard.dynamic_loop_texture</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92824">Bug 92824</a> - [regression, bisected] `make check` dispatch-sanity broken by GL_EXT_buffer_storage</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92849">Bug 92849</a> - [IVB HSW BDW] piglit image load/store load-from-cleared-image.shader_test fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92859">Bug 92859</a> - [regression, bisected] validate_intrinsic_instr: Assertion triggered</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92860">Bug 92860</a> - [radeonsi][bisected] st/mesa: implement ARB_copy_image - Corruption in ARK Survival Evolved</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92900">Bug 92900</a> - [regression bisected] About 700 piglit regressions is what could go wrong</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92985">Bug 92985</a> - Mac OS X build error &quot;ar: no archive members specified&quot;</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93015">Bug 93015</a> - Tonga Elemental segfault + VM faults since radeon: implement r600_query_hw_get_result via function pointers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93048">Bug 93048</a> - [CTS regression] mesa af2723 breaks GL Conformance for debug extension</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93063">Bug 93063</a> - drm_helper.h:227:1: error: static declaration of pipe_virgl_create_screen follows non-static declaration</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93091">Bug 93091</a> - [opencl] segfault when running any opencl programs (like clinfo)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93126">Bug 93126</a> - wrongly claim supporting GL_EXT_texture_rg</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93180">Bug 93180</a> - [regression] arb_separate_shader_objects.active sampler conflict fails</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93235">Bug 93235</a> - [regression] dispatch sanity broken by GetPointerv</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
</ul>
<h2>Changes</h2>
TBD.
<li>MPEG4 decoding has been disabled by default in the VAAPI driver</li>
</div>
</body>

196
docs/relnotes/11.1.1.html Normal file
View File

@@ -0,0 +1,196 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 11.1.1 Release Notes / January 13, 2016</h1>
<p>
Mesa 11.1.1 is a bug fix release which fixes bugs found since the 11.1.0 release.
</p>
<p>
Mesa 11.1.1 implements the OpenGL 4.1 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.1. OpenGL
4.1 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91806">Bug 91806</a> - configure does not test whether assembler supports sse4.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92229">Bug 92229</a> - [APITRACE] SOMA have serious graphical errors</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92233">Bug 92233</a> - Unigine Heaven 4.0 silhuette run</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93004">Bug 93004</a> - Guild Wars 2 crash on nouveau DX11 cards</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93215">Bug 93215</a> - [Regression bisected] Ogles1conform Automatic mipmap generation test is fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93257">Bug 93257</a> - [SKL, bisected] ASTC dEQP tests segfault</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: check state-&gt;mesa in early return check in st_validate_state()</li>
</ul>
<p>Dave Airlie (6):</p>
<ul>
<li>mesa/varray: set double arrays to non-normalised.</li>
<li>mesa/shader: return correct attribute location for double matrix arrays</li>
<li>glsl: pass stage into mark function</li>
<li>glsl/fp64: add helper for dual slot double detection.</li>
<li>glsl: fix count_attribute_slots to allow for different 64-bit handling</li>
<li>glsl: only update doubles inputs for vertex inputs.</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.1</li>
<li>cherry-ignore: drop the "re-enable" DCC on Stoney</li>
<li>cherry-ignore: don't pick a specific i965 formats patch</li>
<li>Update version to 11.1.1</li>
</ul>
<p>Eric Anholt (2):</p>
<ul>
<li>vc4: Warn instead of abort()ing on exec ioctl failures.</li>
<li>vc4: Keep sample mask writes from being reordered after TLB writes</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>r600: fix constant buffer size programming</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>meta/generate_mipmap: Work-around GLES 1.x problem with GL_DRAW_FRAMEBUFFER</li>
</ul>
<p>Ilia Mirkin (9):</p>
<ul>
<li>nv50/ir: can't have predication and immediates</li>
<li>gk104/ir: simplify and fool-proof texbar algorithm</li>
<li>glsl: assign varying locations to tess shaders when doing SSO</li>
<li>glx/dri3: a drawable might not be bound at wait time</li>
<li>nvc0: don't forget to reset VTX_TMP bufctx slot after blit completion</li>
<li>nv50/ir: float(s32 &amp; 0xff) = float(u8), not s8</li>
<li>nv50,nvc0: make sure there's pushbuf space and that we ref the bo early</li>
<li>nv50,nvc0: fix crash when increasing bsp bo size for h264</li>
<li>nvc0: scale up inter_bo size so that it's 16M for a 4K video</li>
</ul>
<p>Jonathan Gray (2):</p>
<ul>
<li>configure.ac: use pkg-config for libelf</li>
<li>configure: check for python2.7 for PYTHON2</li>
</ul>
<p>Kenneth Graunke (5):</p>
<ul>
<li>ralloc: Fix ralloc_adopt() to the old context's last child's parent.</li>
<li>drirc: Disable ARB_blend_func_extended for Heaven 4.0/Valley 1.0.</li>
<li>glsl: Fix varying struct locations when varying packing is disabled.</li>
<li>nvc0: Set winding order regardless of domain.</li>
<li>nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.</li>
</ul>
<p>Marek Olšák (7):</p>
<ul>
<li>tgsi/scan: add flag colors_written</li>
<li>r600g: write all MRTs only if there is exactly one output (fixes a hang)</li>
<li>radeonsi: don't call of u_prims_for_vertices for patches and rectangles</li>
<li>radeonsi: apply the streamout workaround to Fiji as well</li>
<li>gallium/radeon: fix Hyper-Z hangs by programming PA_SC_MODE_CNTL_1 correctly</li>
<li>program: add _mesa_reserve_parameter_storage</li>
<li>st/mesa: fix GLSL uniform updates for glBitmap &amp; glDrawPixels (v2)</li>
</ul>
<p>Mark Janes (1):</p>
<ul>
<li>Add missing platform information for KBL</li>
</ul>
<p>Miklós Máté (1):</p>
<ul>
<li>mesa: Don't leak ATIfs instructions in DeleteFragmentShader</li>
</ul>
<p>Neil Roberts (3):</p>
<ul>
<li>i965: Add MESA_FORMAT_B8G8R8X8_SRGB to brw_format_for_mesa_format</li>
<li>i965: Add B8G8R8X8_SRGB to the alpha format override</li>
<li>i965: Fix crash when calling glViewport with no surface bound</li>
</ul>
<p>Nicolai Hähnle (2):</p>
<ul>
<li>gallium/radeon: only dispose locally created target machine in radeon_llvm_compile</li>
<li>gallium/radeon: fix regression in a number of driver queries</li>
</ul>
<p>Oded Gabbay (1):</p>
<ul>
<li>configura.ac: fix test for SSE4.1 assembler support</li>
</ul>
<p>Patrick Rudolph (2):</p>
<ul>
<li>nv50,nvc0: fix use-after-free when vertex buffers are unbound</li>
<li>gallium/util: return correct number of bound vertex buffers</li>
</ul>
<p>Rob Herring (1):</p>
<ul>
<li>freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabled</li>
</ul>
<p>Samuel Pitoiset (3):</p>
<ul>
<li>nvc0: free memory allocated by the prog which reads MP perf counters</li>
<li>nv50,nvc0: free memory allocated by performance metrics</li>
<li>nv50: free memory allocated by the prog which reads MP perf counters</li>
</ul>
<p>Sarah Sharp (1):</p>
<ul>
<li>mesa: Add KBL PCI IDs and platform information.</li>
</ul>
</div>
</body>
</html>

View File

@@ -0,0 +1,176 @@
Name
EXT_shader_samples_identical
Name Strings
GL_EXT_shader_samples_identical
Contact
Ian Romanick, Intel (ian.d.romanick 'at' intel.com)
Contributors
Chris Forbes, Mesa
Magnus Wendt, Intel
Neil S. Roberts, Intel
Graham Sellers, AMD
Status
XXX - Not complete yet.
Version
Last Modified Date: November 19, 2015
Revision: 6
Number
TBD
Dependencies
OpenGL 3.2, or OpenGL ES 3.1, or ARB_texture_multisample is required.
This extension is written against the OpenGL 4.5 (Core Profile)
Specification
Overview
Multisampled antialiasing has become a common method for improving the
quality of rendered images. Multisampling differs from supersampling in
that the color of a primitive that covers all or part of a pixel is
resolved once, regardless of the number of samples covered. If a large
polygon is rendered, the colors of all samples in each interior pixel will
be the same. This suggests a simple compression scheme that can reduce
the necessary memory bandwidth requirements. In one such scheme, each
sample is stored in a separate slice of the multisample surface. An
additional multisample control surface (MCS) contains a mapping from pixel
samples to slices.
If all the values stored in the MCS for a particular pixel are the same,
then all the samples have the same value. Applications can take advantage
of this information to reduce the bandwidth of reading multisample
textures. A custom multisample resolve filter could optimize resolving
pixels where every sample is identical by reading the color once.
color = texelFetch(sampler, coordinate, 0);
if (!textureSamplesIdenticalEXT(sampler, coordinate)) {
for (int i = 1; i < MAX_SAMPLES; i++) {
vec4 c = texelFetch(sampler, coordinate, i);
//... accumulate c into color
}
}
New Procedures and Functions
None.
New Tokens
None.
Additions to the OpenGL 4.5 (Core Profile) Specification
None.
Modifications to The OpenGL Shading Language Specification, Version 4.50.5
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_EXT_shader_samples_identical
A new preprocessor #define is added to the OpenGL Shading Language:
#define GL_EXT_shader_samples_identical
Add to the table in section 8.7 "Texture Lookup Functions"
Syntax:
bool textureSamplesIdenticalEXT(gsampler2DMS sampler, ivec2 coord)
bool textureSamplesIdenticalEXT(gsampler2DMSArray sampler,
ivec3 coord)
Description:
Returns true if it can be determined that all samples within the texel
of the multisample texture bound to <sampler> at <coord> contain the
same values or false if this cannot be determined."
Additions to the AGL/EGL/GLX/WGL Specifications
None
Errors
None
New State
None
New Implementation Dependent State
None
Issues
1) What should the new functions be called?
RESOLVED: textureSamplesIdenticalEXT. Initially
textureAllSamplesIdenticalEXT was considered, but
textureSamplesIdenticalEXT is more similar to the existing textureSamples
function.
2) It seems like applications could implement additional optimization if
they were provided with raw MCS data. Should this extension also
provide that data?
There are a number of challenges in providing raw MCS data. The biggest
problem being that the amount of MCS data depends on the number of
samples, and that is not known at compile time. Additionally, without new
texelFetch functions, applications would have difficulty utilizing the
information.
Another option is to have a function that returns an array of tuples of
sample number and count. This also has difficulties with the maximum
array size not being known at compile time.
RESOLVED: Do not expose raw MCS data in this extension.
3) Should this extension also extend SPIR-V?
RESOLVED: Yes, but this has not yet been written.
4) Is it possible for textureSamplesIdenticalEXT to report false negatives?
RESOLVED: Yes. It is possible that the underlying hardware may not detect
that separate writes of the same color to different samples of a pixel are
the same. The shader function is at the whim of the underlying hardware
implementation. It is also possible that a compressed multisample surface
is not used. In that case the function will likely always return false.
Revision History
Rev Date Author Changes
--- ---------- -------- ---------------------------------------------
1 2014/08/20 cforbes Initial version
2 2015/10/23 idr Change from MESA to EXT. Rebase on OpenGL 4.5,
and add dependency on OpenGL ES 3.1. Initial
draft of overview section and issues 1 through
3.
3 2015/10/27 idr Typo fixes.
4 2015/11/10 idr Rename extension from EXT_shader_multisample_compression
to EXT_shader_samples_identical.
Add issue #4.
5 2015/11/18 idr Fix some typos spotted by gsellers. Change the
name of the name of the function to
textureSamplesIdenticalEXT.
6 2015/11/19 idr Fix more typos spotted by Nicolai Hähnle.

View File

@@ -30,6 +30,10 @@
<dt><a href="http://www.valgrind.org">Valgrind</a></dt>
<dd>is a very useful tool for tracking down
memory-related problems in your code.</dd>
<dt><a href="http:scan.coverity.com/projects/mesa">Coverity</a><dt>
<dd>provides static code analysis of Mesa. If you create an account
you can see the results and try to fix outstanding issues.</dd>
</dl>
</div>

View File

@@ -148,10 +148,33 @@ To get the latest code from git:
<h2>Building the Code</h2>
<ul>
<li>Build libdrm: If you're on a 32-bit system, you should skip the --libdir configure option. Note also the comment about toolchain libdrm above.
<li>
Determine where the GL-related libraries reside on your system and set
the LIBDIR environment variable accordingly.
<br><br>
For 32-bit Ubuntu systems:
<pre>
export LIBDIR=/usr/lib/i386-linux-gnu
</pre>
For 64-bit Ubuntu systems:
<pre>
export LIBDIR=/usr/lib/x86_64-linux-gnu
</pre>
For 32-bit Fedora systems:
<pre>
export LIBDIR=/usr/lib
</pre>
For 64-bit Fedora systems:
<pre>
export LIBDIR=/usr/lib64
</pre>
</li>
<li>Build libdrm:
<pre>
cd $TOP/drm
./autogen.sh --prefix=/usr --libdir=/usr/lib64
./autogen.sh --prefix=/usr --libdir=${LIBDIR}
make
sudo make install
</pre>
@@ -162,12 +185,9 @@ The libxatracker library is used exclusively by the X server to do render,
copy and video acceleration:
<br>
The following configure options doesn't build the EGL system.
<br>
As before, if you're on a 32-bit system, you should skip the --libdir
configure option.
<pre>
cd $TOP/mesa
./autogen.sh --prefix=/usr --libdir=/usr/lib64 --with-gallium-drivers=svga --with-dri-drivers= --enable-xa --disable-dri3
./autogen.sh --prefix=/usr --libdir=${LIBDIR} --with-gallium-drivers=svga --with-dri-drivers=swrast --enable-xa --disable-dri3 --enable-glx-tls
make
sudo make install
</pre>
@@ -177,25 +197,39 @@ if they're not installed in your system. You should be told what's missing.
<br>
<br>
<li>xf86-video-vmware: Now, once libxatracker is installed, we proceed with building and replacing the current Xorg driver. First check if your system is 32- or 64-bit. If you're building for a 32-bit system, you will not be needing the --libdir=/usr/lib64 option to autogen.
<li>xf86-video-vmware: Now, once libxatracker is installed, we proceed with
building and replacing the current Xorg driver.
First check if your system is 32- or 64-bit.
<pre>
cd $TOP/xf86-video-vmware
./autogen.sh --prefix=/usr --libdir=/usr/lib64
./autogen.sh --prefix=/usr --libdir=${LIBDIR}
make
sudo make install
</pre>
<li>vmwgfx kernel module. First make sure that any old version of this kernel module is removed from the system by issuing
<pre>
<pre>
sudo rm /lib/modules/`uname -r`/kernel/drivers/gpu/drm/vmwgfx.ko*
</pre>
Then
<pre>
</pre>
Build and install:
<pre>
cd $TOP/vmwgfx
make
sudo make install
sudo cp 00-vmwgfx.rules /etc/udev/rules.d
sudo depmod -ae
</pre>
sudo depmod -a
</pre>
If you're using a Ubuntu OS:
<pre>
sudo update-initramfs -u
</pre>
If you're using a Fedora OS:
<pre>
sudo dracut --force
</pre>
Add 'vmwgfx' to the /etc/modules file:
<pre>
echo vmwgfx | sudo tee -a /etc/modules
</pre>
Note: some distros put DRM kernel drivers in different directories.
For example, sometimes vmwgfx.ko might be found in

View File

@@ -495,7 +495,7 @@ struct __DRIdamageExtensionRec {
* SWRast Loader extension.
*/
#define __DRI_SWRAST_LOADER "DRI_SWRastLoader"
#define __DRI_SWRAST_LOADER_VERSION 2
#define __DRI_SWRAST_LOADER_VERSION 3
struct __DRIswrastLoaderExtensionRec {
__DRIextension base;
@@ -528,6 +528,15 @@ struct __DRIswrastLoaderExtensionRec {
void (*putImage2)(__DRIdrawable *drawable, int op,
int x, int y, int width, int height, int stride,
char *data, void *loaderPrivate);
/**
* Put image to drawable
*
* \since 3
*/
void (*getImage2)(__DRIdrawable *readable,
int x, int y, int width, int height, int stride,
char *data, void *loaderPrivate);
};
/**

File diff suppressed because it is too large Load Diff

View File

@@ -109,21 +109,51 @@ CHIPSET(0x162A, bdw_gt3, "Intel(R) Iris Pro P6300 (Broadwell GT3e)")
CHIPSET(0x162B, bdw_gt3, "Intel(R) Iris 6100 (Broadwell GT3)")
CHIPSET(0x162D, bdw_gt3, "Intel(R) Broadwell GT3")
CHIPSET(0x162E, bdw_gt3, "Intel(R) Broadwell GT3")
CHIPSET(0x1902, skl_gt1, "Intel(R) Skylake DT GT1")
CHIPSET(0x1906, skl_gt1, "Intel(R) Skylake ULT GT1")
CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake SRV GT1")
CHIPSET(0x190B, skl_gt1, "Intel(R) Skylake Halo GT1")
CHIPSET(0x190E, skl_gt1, "Intel(R) Skylake ULX GT1")
CHIPSET(0x1912, skl_gt2, "Intel(R) Skylake DT GT2")
CHIPSET(0x1916, skl_gt2, "Intel(R) Skylake ULT GT2")
CHIPSET(0x191A, skl_gt2, "Intel(R) Skylake SRV GT2")
CHIPSET(0x191B, skl_gt2, "Intel(R) Skylake Halo GT2")
CHIPSET(0x191D, skl_gt2, "Intel(R) Skylake WKS GT2")
CHIPSET(0x191E, skl_gt2, "Intel(R) Skylake ULX GT2")
CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake ULT GT2F")
CHIPSET(0x1926, skl_gt3, "Intel(R) Skylake ULT GT3")
CHIPSET(0x192A, skl_gt3, "Intel(R) Skylake SRV GT3")
CHIPSET(0x192B, skl_gt3, "Intel(R) Skylake Halo GT3")
CHIPSET(0x1902, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x1906, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake GT1")
CHIPSET(0x190E, skl_gt1, "Intel(R) Skylake GT1")
CHIPSET(0x1912, skl_gt2, "Intel(R) HD Graphics 530 (Skylake GT2)")
CHIPSET(0x1913, skl_gt2, "Intel(R) Skylake GT2f")
CHIPSET(0x1915, skl_gt2, "Intel(R) Skylake GT2f")
CHIPSET(0x1916, skl_gt2, "Intel(R) HD Graphics 520 (Skylake GT2)")
CHIPSET(0x1917, skl_gt2, "Intel(R) Skylake GT2f")
CHIPSET(0x191A, skl_gt2, "Intel(R) Skylake GT2")
CHIPSET(0x191B, skl_gt2, "Intel(R) HD Graphics 530 (Skylake GT2)")
CHIPSET(0x191D, skl_gt2, "Intel(R) HD Graphics P530 (Skylake GT2)")
CHIPSET(0x191E, skl_gt2, "Intel(R) HD Graphics 515 (Skylake GT2)")
CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake GT2")
CHIPSET(0x1923, skl_gt3, "Intel(R) Iris Graphics 540 (Skylake GT3e)")
CHIPSET(0x1926, skl_gt3, "Intel(R) HD Graphics 535 (Skylake GT3)")
CHIPSET(0x1927, skl_gt3, "Intel(R) Iris Graphics 550 (Skylake GT3e)")
CHIPSET(0x192A, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x192B, skl_gt3, "Intel(R) Iris Graphics (Skylake GT3fe)")
CHIPSET(0x1932, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x193A, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x193B, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x193D, skl_gt4, "Intel(R) Skylake GT4")
CHIPSET(0x5902, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x5906, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x590A, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x590B, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x590E, kbl_gt1, "Intel(R) Kabylake GT1")
CHIPSET(0x5913, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
CHIPSET(0x5915, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
CHIPSET(0x5917, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
CHIPSET(0x5912, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x5916, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591A, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591B, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591D, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x591E, kbl_gt2, "Intel(R) Kabylake GT2")
CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F")
CHIPSET(0x5926, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x592A, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x592B, kbl_gt3, "Intel(R) Kabylake GT3")
CHIPSET(0x5932, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x593A, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x593D, kbl_gt4, "Intel(R) Kabylake GT4")
CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")

View File

@@ -181,3 +181,5 @@ CHIPSET(0x9876, CARRIZO_, CARRIZO)
CHIPSET(0x9877, CARRIZO_, CARRIZO)
CHIPSET(0x7300, FIJI_, FIJI)
CHIPSET(0x98E4, STONEY_, STONEY)

View File

@@ -47,12 +47,21 @@ libEGL_la_LDFLAGS = \
$(LD_NO_UNDEFINED)
dri2_backend_FILES =
dri3_backend_FILES =
if HAVE_EGL_PLATFORM_X11
AM_CFLAGS += -DHAVE_X11_PLATFORM
AM_CFLAGS += $(XCB_DRI2_CFLAGS)
libEGL_la_LIBADD += $(XCB_DRI2_LIBS)
dri2_backend_FILES += drivers/dri2/platform_x11.c
if HAVE_DRI3
dri3_backend_FILES += \
drivers/dri2/platform_x11_dri3.c \
drivers/dri2/platform_x11_dri3.h
libEGL_la_LIBADD += $(top_builddir)/src/loader/libloader_dri3_helper.la
endif
endif
if HAVE_EGL_PLATFORM_WAYLAND
@@ -88,7 +97,8 @@ AM_CFLAGS += \
libEGL_la_SOURCES += \
$(dri2_backend_core_FILES) \
$(dri2_backend_FILES)
$(dri2_backend_FILES) \
$(dri3_backend_FILES)
libEGL_la_LIBADD += $(top_builddir)/src/loader/libloader.la
libEGL_la_LIBADD += $(DLOPEN_LIBS) $(LIBDRM_LIBS)
@@ -111,7 +121,10 @@ egl_HEADERS = \
$(top_srcdir)/include/EGL/eglmesaext.h \
$(top_srcdir)/include/EGL/eglplatform.h
TESTS = egl-symbols-check
EXTRA_DIST = \
egl-symbols-check \
SConscript \
drivers/haiku \
docs \

View File

@@ -352,6 +352,12 @@ struct dri2_extension_match {
int offset;
};
static struct dri2_extension_match dri3_driver_extensions[] = {
{ __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) },
{ __DRI_IMAGE_DRIVER, 1, offsetof(struct dri2_egl_display, image_driver) },
{ NULL, 0, 0 }
};
static struct dri2_extension_match dri2_driver_extensions[] = {
{ __DRI_CORE, 1, offsetof(struct dri2_egl_display, core) },
{ __DRI_DRI2, 2, offsetof(struct dri2_egl_display, dri2) },
@@ -385,13 +391,13 @@ dri2_bind_extensions(struct dri2_egl_display *dri2_dpy,
void *field;
for (i = 0; extensions[i]; i++) {
_eglLog(_EGL_DEBUG, "DRI2: found extension `%s'", extensions[i]->name);
_eglLog(_EGL_DEBUG, "found extension `%s'", extensions[i]->name);
for (j = 0; matches[j].name; j++) {
if (strcmp(extensions[i]->name, matches[j].name) == 0 &&
extensions[i]->version >= matches[j].version) {
field = ((char *) dri2_dpy + matches[j].offset);
*(const __DRIextension **) field = extensions[i];
_eglLog(_EGL_INFO, "DRI2: found extension %s version %d",
_eglLog(_EGL_INFO, "found extension %s version %d",
extensions[i]->name, extensions[i]->version);
}
}
@@ -400,7 +406,7 @@ dri2_bind_extensions(struct dri2_egl_display *dri2_dpy,
for (j = 0; matches[j].name; j++) {
field = ((char *) dri2_dpy + matches[j].offset);
if (*(const __DRIextension **) field == NULL) {
_eglLog(_EGL_WARNING, "DRI2: did not find extension %s version %d",
_eglLog(_EGL_WARNING, "did not find extension %s version %d",
matches[j].name, matches[j].version);
ret = EGL_FALSE;
}
@@ -493,6 +499,25 @@ dri2_open_driver(_EGLDisplay *disp)
return extensions;
}
EGLBoolean
dri2_load_driver_dri3(_EGLDisplay *disp)
{
struct dri2_egl_display *dri2_dpy = disp->DriverData;
const __DRIextension **extensions;
extensions = dri2_open_driver(disp);
if (!extensions)
return EGL_FALSE;
if (!dri2_bind_extensions(dri2_dpy, dri3_driver_extensions, extensions)) {
dlclose(dri2_dpy->driver);
return EGL_FALSE;
}
dri2_dpy->driver_extensions = extensions;
return EGL_TRUE;
}
EGLBoolean
dri2_load_driver(_EGLDisplay *disp)
{
@@ -550,7 +575,9 @@ dri2_setup_screen(_EGLDisplay *disp)
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
unsigned int api_mask;
if (dri2_dpy->dri2) {
if (dri2_dpy->image_driver) {
api_mask = dri2_dpy->image_driver->getAPIMask(dri2_dpy->dri_screen);
} else if (dri2_dpy->dri2) {
api_mask = dri2_dpy->dri2->getAPIMask(dri2_dpy->dri_screen);
} else {
assert(dri2_dpy->swrast);
@@ -570,7 +597,7 @@ dri2_setup_screen(_EGLDisplay *disp)
if (api_mask & (1 << __DRI_API_GLES3))
disp->ClientAPIs |= EGL_OPENGL_ES3_BIT_KHR;
assert(dri2_dpy->dri2 || dri2_dpy->swrast);
assert(dri2_dpy->image_driver || dri2_dpy->dri2 || dri2_dpy->swrast);
disp->Extensions.KHR_surfaceless_context = EGL_TRUE;
disp->Extensions.MESA_configless_context = EGL_TRUE;
@@ -578,7 +605,8 @@ dri2_setup_screen(_EGLDisplay *disp)
__DRI2_RENDERER_HAS_FRAMEBUFFER_SRGB))
disp->Extensions.KHR_gl_colorspace = EGL_TRUE;
if ((dri2_dpy->dri2 && dri2_dpy->dri2->base.version >= 3) ||
if (dri2_dpy->image_driver ||
(dri2_dpy->dri2 && dri2_dpy->dri2->base.version >= 3) ||
(dri2_dpy->swrast && dri2_dpy->swrast->base.version >= 3)) {
disp->Extensions.KHR_create_context = EGL_TRUE;
@@ -641,7 +669,14 @@ dri2_create_screen(_EGLDisplay *disp)
dri2_dpy = disp->DriverData;
if (dri2_dpy->dri2) {
if (dri2_dpy->image_driver) {
dri2_dpy->dri_screen =
dri2_dpy->image_driver->createNewScreen2(0, dri2_dpy->fd,
dri2_dpy->extensions,
dri2_dpy->driver_extensions,
&dri2_dpy->driver_configs,
disp);
} else if (dri2_dpy->dri2) {
if (dri2_dpy->dri2->base.version >= 4) {
dri2_dpy->dri_screen =
dri2_dpy->dri2->createNewScreen2(0, dri2_dpy->fd,
@@ -677,7 +712,7 @@ dri2_create_screen(_EGLDisplay *disp)
extensions = dri2_dpy->core->getExtensions(dri2_dpy->dri_screen);
if (dri2_dpy->dri2) {
if (dri2_dpy->image_driver || dri2_dpy->dri2) {
if (!dri2_bind_extensions(dri2_dpy, dri2_core_extensions, extensions))
goto cleanup_dri_screen;
} else {
@@ -1024,7 +1059,26 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLConfig *conf,
else
dri_config = NULL;
if (dri2_dpy->dri2) {
if (dri2_dpy->image_driver) {
unsigned error;
unsigned num_attribs = 8;
uint32_t ctx_attribs[8];
if (!dri2_fill_context_attribs(dri2_ctx, dri2_dpy, ctx_attribs,
&num_attribs))
goto cleanup;
dri2_ctx->dri_context =
dri2_dpy->image_driver->createContextAttribs(dri2_dpy->dri_screen,
api,
dri_config,
shared,
num_attribs / 2,
ctx_attribs,
& error,
dri2_ctx);
dri2_create_context_attribs_error(error);
} else if (dri2_dpy->dri2) {
if (dri2_dpy->dri2->base.version >= 3) {
unsigned error;
unsigned num_attribs = 8;
@@ -1119,11 +1173,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
{
struct dri2_egl_driver *dri2_drv = dri2_egl_driver(drv);
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_dsurf = dri2_egl_surface(dsurf);
struct dri2_egl_surface *dri2_rsurf = dri2_egl_surface(rsurf);
struct dri2_egl_context *dri2_ctx = dri2_egl_context(ctx);
_EGLContext *old_ctx;
_EGLSurface *old_dsurf, *old_rsurf;
_EGLSurface *tmp_dsurf, *tmp_rsurf;
__DRIdrawable *ddraw, *rdraw;
__DRIcontext *cctx;
@@ -1135,8 +1188,8 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
if (old_ctx && dri2_drv->glFlush)
dri2_drv->glFlush();
ddraw = (dri2_dsurf) ? dri2_dsurf->dri_drawable : NULL;
rdraw = (dri2_rsurf) ? dri2_rsurf->dri_drawable : NULL;
ddraw = (dsurf) ? dri2_dpy->vtbl->get_dri_drawable(dsurf) : NULL;
rdraw = (rsurf) ? dri2_dpy->vtbl->get_dri_drawable(rsurf) : NULL;
cctx = (dri2_ctx) ? dri2_ctx->dri_context : NULL;
if (old_ctx) {
@@ -1156,10 +1209,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
return EGL_TRUE;
} else {
/* undo the previous _eglBindContext */
_eglBindContext(old_ctx, old_dsurf, old_rsurf, &ctx, &dsurf, &rsurf);
_eglBindContext(old_ctx, old_dsurf, old_rsurf, &ctx, &tmp_dsurf, &tmp_rsurf);
assert(&dri2_ctx->base == ctx &&
&dri2_dsurf->base == dsurf &&
&dri2_rsurf->base == rsurf);
tmp_dsurf == dsurf &&
tmp_rsurf == rsurf);
_eglPutSurface(dsurf);
_eglPutSurface(rsurf);
@@ -1173,6 +1226,14 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
}
}
__DRIdrawable *
dri2_surface_get_dri_drawable(_EGLSurface *surf)
{
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(surf);
return dri2_surf->dri_drawable;
}
/*
* Called from eglGetProcAddress() via drv->API.GetProcAddress().
*/
@@ -1235,7 +1296,7 @@ void
dri2_flush_drawable_for_swapbuffers(_EGLDisplay *disp, _EGLSurface *draw)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(draw);
__DRIdrawable *dri_drawable = dri2_dpy->vtbl->get_dri_drawable(draw);
if (dri2_dpy->flush) {
if (dri2_dpy->flush->base.version >= 4) {
@@ -1253,12 +1314,12 @@ dri2_flush_drawable_for_swapbuffers(_EGLDisplay *disp, _EGLSurface *draw)
* after calling eglSwapBuffers."
*/
dri2_dpy->flush->flush_with_flags(dri2_ctx->dri_context,
dri2_surf->dri_drawable,
dri_drawable,
__DRI2_FLUSH_DRAWABLE |
__DRI2_FLUSH_INVALIDATE_ANCILLARY,
__DRI2_THROTTLE_SWAPBUFFER);
} else {
dri2_dpy->flush->flush(dri2_surf->dri_drawable);
dri2_dpy->flush->flush(dri_drawable);
}
}
}
@@ -1315,7 +1376,8 @@ static EGLBoolean
dri2_wait_client(_EGLDriver *drv, _EGLDisplay *disp, _EGLContext *ctx)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(ctx->DrawSurface);
_EGLSurface *surf = ctx->DrawSurface;
__DRIdrawable *dri_drawable = dri2_dpy->vtbl->get_dri_drawable(surf);
(void) drv;
@@ -1323,7 +1385,7 @@ dri2_wait_client(_EGLDriver *drv, _EGLDisplay *disp, _EGLContext *ctx)
* we need to copy fake to real here.*/
if (dri2_dpy->flush != NULL)
dri2_dpy->flush->flush(dri2_surf->dri_drawable);
dri2_dpy->flush->flush(dri_drawable);
return EGL_TRUE;
}
@@ -1346,10 +1408,10 @@ dri2_bind_tex_image(_EGLDriver *drv,
_EGLDisplay *disp, _EGLSurface *surf, EGLint buffer)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(surf);
struct dri2_egl_context *dri2_ctx;
_EGLContext *ctx;
GLint format, target;
__DRIdrawable *dri_drawable = dri2_dpy->vtbl->get_dri_drawable(surf);
ctx = _eglGetCurrentContext();
dri2_ctx = dri2_egl_context(ctx);
@@ -1357,7 +1419,7 @@ dri2_bind_tex_image(_EGLDriver *drv,
if (!_eglBindTexImage(drv, disp, surf, buffer))
return EGL_FALSE;
switch (dri2_surf->base.TextureFormat) {
switch (surf->TextureFormat) {
case EGL_TEXTURE_RGB:
format = __DRI_TEXTURE_FORMAT_RGB;
break;
@@ -1369,7 +1431,7 @@ dri2_bind_tex_image(_EGLDriver *drv,
format = __DRI_TEXTURE_FORMAT_RGBA;
}
switch (dri2_surf->base.TextureTarget) {
switch (surf->TextureTarget) {
case EGL_TEXTURE_2D:
target = GL_TEXTURE_2D;
break;
@@ -1380,7 +1442,7 @@ dri2_bind_tex_image(_EGLDriver *drv,
(*dri2_dpy->tex_buffer->setTexBuffer2)(dri2_ctx->dri_context,
target, format,
dri2_surf->dri_drawable);
dri_drawable);
return EGL_TRUE;
}
@@ -1390,10 +1452,10 @@ dri2_release_tex_image(_EGLDriver *drv,
_EGLDisplay *disp, _EGLSurface *surf, EGLint buffer)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_surface *dri2_surf = dri2_egl_surface(surf);
struct dri2_egl_context *dri2_ctx;
_EGLContext *ctx;
GLint target;
__DRIdrawable *dri_drawable = dri2_dpy->vtbl->get_dri_drawable(surf);
ctx = _eglGetCurrentContext();
dri2_ctx = dri2_egl_context(ctx);
@@ -1401,7 +1463,7 @@ dri2_release_tex_image(_EGLDriver *drv,
if (!_eglReleaseTexImage(drv, disp, surf, buffer))
return EGL_FALSE;
switch (dri2_surf->base.TextureTarget) {
switch (surf->TextureTarget) {
case EGL_TEXTURE_2D:
target = GL_TEXTURE_2D;
break;
@@ -1413,7 +1475,7 @@ dri2_release_tex_image(_EGLDriver *drv,
dri2_dpy->tex_buffer->releaseTexBuffer != NULL) {
(*dri2_dpy->tex_buffer->releaseTexBuffer)(dri2_ctx->dri_context,
target,
dri2_surf->dri_drawable);
dri_drawable);
}
return EGL_TRUE;

View File

@@ -35,6 +35,10 @@
#include <xcb/dri2.h>
#include <xcb/xfixes.h>
#include <X11/Xlib-xcb.h>
#ifdef HAVE_DRI3
#include "loader_dri3_helper.h"
#endif
#endif
#ifdef HAVE_WAYLAND_PLATFORM
@@ -145,6 +149,8 @@ struct dri2_egl_display_vtbl {
EGLBoolean (*get_sync_values)(_EGLDisplay *display, _EGLSurface *surface,
EGLuint64KHR *ust, EGLuint64KHR *msc,
EGLuint64KHR *sbc);
__DRIdrawable *(*get_dri_drawable)(_EGLSurface *surf);
};
struct dri2_egl_display
@@ -158,6 +164,7 @@ struct dri2_egl_display
const __DRIconfig **driver_configs;
void *driver;
const __DRIcoreExtension *core;
const __DRIimageDriverExtension *image_driver;
const __DRIdri2Extension *dri2;
const __DRIswrastExtension *swrast;
const __DRI2flushExtension *flush;
@@ -190,6 +197,9 @@ struct dri2_egl_display
#ifdef HAVE_X11_PLATFORM
xcb_connection_t *conn;
int screen;
#ifdef HAVE_DRI3
struct loader_dri3_extensions loader_dri3_ext;
#endif
#endif
#ifdef HAVE_WAYLAND_PLATFORM
@@ -203,8 +213,9 @@ struct dri2_egl_display
int formats;
uint32_t capabilities;
int is_render_node;
int is_different_gpu;
#endif
int is_different_gpu;
};
struct dri2_egl_context
@@ -324,9 +335,15 @@ dri2_setup_screen(_EGLDisplay *disp);
EGLBoolean
dri2_load_driver_swrast(_EGLDisplay *disp);
EGLBoolean
dri2_load_driver_dri3(_EGLDisplay *disp);
EGLBoolean
dri2_create_screen(_EGLDisplay *disp);
__DRIdrawable *
dri2_surface_get_dri_drawable(_EGLSurface *surf);
__DRIimage *
dri2_lookup_egl_image(__DRIscreen *screen, void *image, void *data);

View File

@@ -650,6 +650,7 @@ static struct dri2_egl_display_vtbl droid_display_vtbl = {
.query_buffer_age = dri2_fallback_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
EGLBoolean

View File

@@ -594,6 +594,7 @@ static struct dri2_egl_display_vtbl dri2_drm_display_vtbl = {
.query_buffer_age = dri2_drm_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
EGLBoolean

View File

@@ -703,18 +703,10 @@ dri2_wl_swap_buffers_with_damage(_EGLDriver *drv,
dri2_surf->dx = 0;
dri2_surf->dy = 0;
if (n_rects == 0) {
wl_surface_damage(dri2_surf->wl_win->surface,
0, 0, INT32_MAX, INT32_MAX);
} else {
for (i = 0; i < n_rects; i++) {
const int *rect = &rects[i * 4];
wl_surface_damage(dri2_surf->wl_win->surface,
rect[0],
dri2_surf->base.Height - rect[1] - rect[3],
rect[2], rect[3]);
}
}
/* We deliberately ignore the damage region and post maximum damage, due to
* https://bugs.freedesktop.org/78190 */
wl_surface_damage(dri2_surf->wl_win->surface,
0, 0, INT32_MAX, INT32_MAX);
if (dri2_dpy->is_different_gpu) {
_EGLContext *ctx = _eglGetCurrentContext();
@@ -1033,6 +1025,7 @@ static struct dri2_egl_display_vtbl dri2_wl_display_vtbl = {
.query_buffer_age = dri2_wl_query_buffer_age,
.create_wayland_buffer_from_image = dri2_wl_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
static EGLBoolean
@@ -1760,6 +1753,7 @@ static struct dri2_egl_display_vtbl dri2_wl_swrast_display_vtbl = {
.query_buffer_age = dri2_fallback_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
static EGLBoolean

View File

@@ -45,6 +45,10 @@
#include "egl_dri2_fallbacks.h"
#include "loader.h"
#ifdef HAVE_DRI3
#include "platform_x11_dri3.h"
#endif
static EGLBoolean
dri2_x11_swap_interval(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf,
EGLint interval);
@@ -703,7 +707,7 @@ dri2_x11_local_authenticate(_EGLDisplay *disp)
static EGLBoolean
dri2_x11_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy,
_EGLDisplay *disp)
_EGLDisplay *disp, bool supports_preserved)
{
xcb_screen_iterator_t s;
xcb_depth_iterator_t d;
@@ -724,8 +728,10 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy,
surface_type =
EGL_WINDOW_BIT |
EGL_PIXMAP_BIT |
EGL_PBUFFER_BIT |
EGL_SWAP_BEHAVIOR_PRESERVED_BIT;
EGL_PBUFFER_BIT;
if (supports_preserved)
surface_type |= EGL_SWAP_BEHAVIOR_PRESERVED_BIT;
while (d.rem > 0) {
EGLBoolean class_added[6] = { 0, };
@@ -1112,6 +1118,7 @@ static struct dri2_egl_display_vtbl dri2_x11_swrast_display_vtbl = {
.query_buffer_age = dri2_fallback_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_fallback_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
static struct dri2_egl_display_vtbl dri2_x11_display_vtbl = {
@@ -1130,6 +1137,7 @@ static struct dri2_egl_display_vtbl dri2_x11_display_vtbl = {
.query_buffer_age = dri2_fallback_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri2_x11_get_sync_values,
.get_dri_drawable = dri2_surface_get_dri_drawable,
};
static EGLBoolean
@@ -1179,7 +1187,7 @@ dri2_initialize_x11_swrast(_EGLDriver *drv, _EGLDisplay *disp)
if (!dri2_create_screen(disp))
goto cleanup_driver;
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp))
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp, true))
goto cleanup_configs;
/* Fill vtbl last to prevent accidentally calling virtual function during
@@ -1250,6 +1258,100 @@ dri2_x11_setup_swap_interval(struct dri2_egl_display *dri2_dpy)
}
}
#ifdef HAVE_DRI3
static EGLBoolean
dri2_initialize_x11_dri3(_EGLDriver *drv, _EGLDisplay *disp)
{
struct dri2_egl_display *dri2_dpy;
dri2_dpy = calloc(1, sizeof *dri2_dpy);
if (!dri2_dpy)
return _eglError(EGL_BAD_ALLOC, "eglInitialize");
disp->DriverData = (void *) dri2_dpy;
if (disp->PlatformDisplay == NULL) {
dri2_dpy->conn = xcb_connect(0, &dri2_dpy->screen);
dri2_dpy->own_device = true;
} else {
Display *dpy = disp->PlatformDisplay;
dri2_dpy->conn = XGetXCBConnection(dpy);
dri2_dpy->screen = DefaultScreen(dpy);
}
if (xcb_connection_has_error(dri2_dpy->conn)) {
_eglLog(_EGL_WARNING, "DRI3: xcb_connect failed");
goto cleanup_dpy;
}
if (dri2_dpy->conn) {
if (!dri3_x11_connect(dri2_dpy))
goto cleanup_conn;
}
if (!dri2_load_driver_dri3(disp))
goto cleanup_conn;
dri2_dpy->extensions[0] = &dri3_image_loader_extension.base;
dri2_dpy->extensions[1] = &use_invalidate.base;
dri2_dpy->extensions[2] = &image_lookup_extension.base;
dri2_dpy->extensions[3] = NULL;
dri2_dpy->swap_available = true;
dri2_dpy->invalidate_available = true;
if (!dri2_create_screen(disp))
goto cleanup_fd;
dri2_x11_setup_swap_interval(dri2_dpy);
if (!dri2_dpy->is_different_gpu)
disp->Extensions.KHR_image_pixmap = EGL_TRUE;
disp->Extensions.NOK_texture_from_pixmap = EGL_TRUE;
disp->Extensions.CHROMIUM_sync_control = EGL_TRUE;
disp->Extensions.EXT_buffer_age = EGL_TRUE;
#ifdef HAVE_WAYLAND_PLATFORM
disp->Extensions.WL_bind_wayland_display = EGL_TRUE;
#endif
if (dri2_dpy->conn) {
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp, false))
goto cleanup_configs;
}
dri2_dpy->loader_dri3_ext.core = dri2_dpy->core;
dri2_dpy->loader_dri3_ext.image_driver = dri2_dpy->image_driver;
dri2_dpy->loader_dri3_ext.flush = dri2_dpy->flush;
dri2_dpy->loader_dri3_ext.tex_buffer = dri2_dpy->tex_buffer;
dri2_dpy->loader_dri3_ext.image = dri2_dpy->image;
dri2_dpy->loader_dri3_ext.config = dri2_dpy->config;
/* Fill vtbl last to prevent accidentally calling virtual function during
* initialization.
*/
dri2_dpy->vtbl = &dri3_x11_display_vtbl;
_eglLog(_EGL_INFO, "Using DRI3");
return EGL_TRUE;
cleanup_configs:
_eglCleanupDisplay(disp);
dri2_dpy->core->destroyScreen(dri2_dpy->dri_screen);
dlclose(dri2_dpy->driver);
cleanup_fd:
close(dri2_dpy->fd);
cleanup_conn:
if (disp->PlatformDisplay == NULL)
xcb_disconnect(dri2_dpy->conn);
cleanup_dpy:
free(dri2_dpy);
return EGL_FALSE;
}
#endif
static EGLBoolean
dri2_initialize_x11_dri2(_EGLDriver *drv, _EGLDisplay *disp)
{
@@ -1321,7 +1423,7 @@ dri2_initialize_x11_dri2(_EGLDriver *drv, _EGLDisplay *disp)
disp->Extensions.WL_bind_wayland_display = EGL_TRUE;
#endif
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp))
if (!dri2_x11_add_configs_for_visuals(dri2_dpy, disp, true))
goto cleanup_configs;
/* Fill vtbl last to prevent accidentally calling virtual function during
@@ -1329,6 +1431,8 @@ dri2_initialize_x11_dri2(_EGLDriver *drv, _EGLDisplay *disp)
*/
dri2_dpy->vtbl = &dri2_x11_display_vtbl;
_eglLog(_EGL_INFO, "Using DRI2");
return EGL_TRUE;
cleanup_configs:
@@ -1355,9 +1459,16 @@ dri2_initialize_x11(_EGLDriver *drv, _EGLDisplay *disp)
int x11_dri2_accel = (getenv("LIBGL_ALWAYS_SOFTWARE") == NULL);
if (x11_dri2_accel) {
if (!dri2_initialize_x11_dri2(drv, disp)) {
initialized = dri2_initialize_x11_swrast(drv, disp);
#ifdef HAVE_DRI3
if (getenv("LIBGL_DRI3_DISABLE") != NULL ||
!dri2_initialize_x11_dri3(drv, disp)) {
#endif
if (!dri2_initialize_x11_dri2(drv, disp)) {
initialized = dri2_initialize_x11_swrast(drv, disp);
}
#ifdef HAVE_DRI3
}
#endif
} else {
initialized = dri2_initialize_x11_swrast(drv, disp);
}

View File

@@ -0,0 +1,547 @@
/*
* Copyright © 2015 Boyan Ding
*
* Permission to use, copy, modify, distribute, and sell this software and its
* documentation for any purpose is hereby granted without fee, provided that
* the above copyright notice appear in all copies and that both that copyright
* notice and this permission notice appear in supporting documentation, and
* that the name of the copyright holders not be used in advertising or
* publicity pertaining to distribution of the software without specific,
* written prior permission. The copyright holders make no representations
* about the suitability of this software for any purpose. It is provided "as
* is" without express or implied warranty.
*
* THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
* INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO
* EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR
* CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
* DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
* TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
* OF THIS SOFTWARE.
*/
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <xcb/xcb.h>
#include <xcb/dri3.h>
#include <xcb/present.h>
#include <xf86drm.h>
#include "egl_dri2.h"
#include "egl_dri2_fallbacks.h"
#include "platform_x11_dri3.h"
#include "loader.h"
#include "loader_dri3_helper.h"
static struct dri3_egl_surface *
loader_drawable_to_egl_surface(struct loader_dri3_drawable *draw) {
size_t offset = offsetof(struct dri3_egl_surface, loader_drawable);
return (struct dri3_egl_surface *)(((void*) draw) - offset);
}
static int
egl_dri3_get_swap_interval(struct loader_dri3_drawable *draw)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
return dri3_surf->base.SwapInterval;
}
static int
egl_dri3_clamp_swap_interval(struct loader_dri3_drawable *draw, int interval)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
if (interval > dri3_surf->base.Config->MaxSwapInterval)
interval = dri3_surf->base.Config->MaxSwapInterval;
else if (interval < dri3_surf->base.Config->MinSwapInterval)
interval = dri3_surf->base.Config->MinSwapInterval;
return interval;
}
static void
egl_dri3_set_swap_interval(struct loader_dri3_drawable *draw, int interval)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
dri3_surf->base.SwapInterval = interval;
}
static void
egl_dri3_set_drawable_size(struct loader_dri3_drawable *draw,
int width, int height)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
dri3_surf->base.Width = width;
dri3_surf->base.Height = height;
}
static bool
egl_dri3_in_current_context(struct loader_dri3_drawable *draw)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
_EGLContext *ctx = _eglGetCurrentContext();
return ctx->Resource.Display == dri3_surf->base.Resource.Display;
}
static __DRIcontext *
egl_dri3_get_dri_context(struct loader_dri3_drawable *draw)
{
_EGLContext *ctx = _eglGetCurrentContext();
struct dri2_egl_context *dri2_ctx = dri2_egl_context(ctx);
return dri2_ctx->dri_context;
}
static void
egl_dri3_flush_drawable(struct loader_dri3_drawable *draw, unsigned flags)
{
struct dri3_egl_surface *dri3_surf = loader_drawable_to_egl_surface(draw);
_EGLDisplay *disp = dri3_surf->base.Resource.Display;
dri2_flush_drawable_for_swapbuffers(disp, &dri3_surf->base);
}
static struct loader_dri3_vtable egl_dri3_vtable = {
.get_swap_interval = egl_dri3_get_swap_interval,
.clamp_swap_interval = egl_dri3_clamp_swap_interval,
.set_swap_interval = egl_dri3_set_swap_interval,
.set_drawable_size = egl_dri3_set_drawable_size,
.in_current_context = egl_dri3_in_current_context,
.get_dri_context = egl_dri3_get_dri_context,
.flush_drawable = egl_dri3_flush_drawable,
.show_fps = NULL,
};
static EGLBoolean
dri3_destroy_surface(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
(void) drv;
if (!_eglPutSurface(surf))
return EGL_TRUE;
loader_dri3_drawable_fini(&dri3_surf->loader_drawable);
free(surf);
return EGL_TRUE;
}
static EGLBoolean
dri3_set_swap_interval(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf,
EGLint interval)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
loader_dri3_set_swap_interval(&dri3_surf->loader_drawable, interval);
return EGL_TRUE;
}
static xcb_screen_t *
get_xcb_screen(xcb_screen_iterator_t iter, int screen)
{
for (; iter.rem; --screen, xcb_screen_next(&iter))
if (screen == 0)
return iter.data;
return NULL;
}
static _EGLSurface *
dri3_create_surface(_EGLDriver *drv, _EGLDisplay *disp, EGLint type,
_EGLConfig *conf, void *native_surface,
const EGLint *attrib_list)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_config *dri2_conf = dri2_egl_config(conf);
struct dri3_egl_surface *dri3_surf;
const __DRIconfig *dri_config;
xcb_drawable_t drawable;
xcb_screen_iterator_t s;
xcb_screen_t *screen;
STATIC_ASSERT(sizeof(uintptr_t) == sizeof(native_surface));
drawable = (uintptr_t) native_surface;
(void) drv;
dri3_surf = calloc(1, sizeof *dri3_surf);
if (!dri3_surf) {
_eglError(EGL_BAD_ALLOC, "dri3_create_surface");
return NULL;
}
if (!_eglInitSurface(&dri3_surf->base, disp, type, conf, attrib_list))
goto cleanup_surf;
if (type == EGL_PBUFFER_BIT) {
s = xcb_setup_roots_iterator(xcb_get_setup(dri2_dpy->conn));
screen = get_xcb_screen(s, dri2_dpy->screen);
if (!screen) {
_eglError(EGL_BAD_NATIVE_WINDOW, "dri3_create_surface");
goto cleanup_surf;
}
drawable = xcb_generate_id(dri2_dpy->conn);
xcb_create_pixmap(dri2_dpy->conn, conf->BufferSize,
drawable, screen->root,
dri3_surf->base.Width, dri3_surf->base.Height);
}
dri_config = dri2_get_dri_config(dri2_conf, type,
dri3_surf->base.GLColorspace);
if (loader_dri3_drawable_init(dri2_dpy->conn, drawable,
dri2_dpy->dri_screen,
dri2_dpy->is_different_gpu, dri_config,
&dri2_dpy->loader_dri3_ext,
&egl_dri3_vtable,
&dri3_surf->loader_drawable)) {
_eglError(EGL_BAD_ALLOC, "dri3_surface_create");
goto cleanup_pixmap;
}
return &dri3_surf->base;
cleanup_pixmap:
if (type == EGL_PBUFFER_BIT)
xcb_free_pixmap(dri2_dpy->conn, drawable);
cleanup_surf:
free(dri3_surf);
return NULL;
}
/**
* Called via eglCreateWindowSurface(), drv->API.CreateWindowSurface().
*/
static _EGLSurface *
dri3_create_window_surface(_EGLDriver *drv, _EGLDisplay *disp,
_EGLConfig *conf, void *native_window,
const EGLint *attrib_list)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
_EGLSurface *surf;
surf = dri3_create_surface(drv, disp, EGL_WINDOW_BIT, conf,
native_window, attrib_list);
if (surf != NULL)
dri3_set_swap_interval(drv, disp, surf, dri2_dpy->default_swap_interval);
return surf;
}
static _EGLSurface *
dri3_create_pixmap_surface(_EGLDriver *drv, _EGLDisplay *disp,
_EGLConfig *conf, void *native_pixmap,
const EGLint *attrib_list)
{
return dri3_create_surface(drv, disp, EGL_PIXMAP_BIT, conf,
native_pixmap, attrib_list);
}
static _EGLSurface *
dri3_create_pbuffer_surface(_EGLDriver *drv, _EGLDisplay *disp,
_EGLConfig *conf, const EGLint *attrib_list)
{
return dri3_create_surface(drv, disp, EGL_PBUFFER_BIT, conf,
XCB_WINDOW_NONE, attrib_list);
}
static EGLBoolean
dri3_get_sync_values(_EGLDisplay *display, _EGLSurface *surface,
EGLuint64KHR *ust, EGLuint64KHR *msc,
EGLuint64KHR *sbc)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surface);
return loader_dri3_wait_for_msc(&dri3_surf->loader_drawable, 0, 0, 0,
(int64_t *) ust, (int64_t *) msc,
(int64_t *) sbc) ? EGL_TRUE : EGL_FALSE;
}
static _EGLImage *
dri3_create_image_khr_pixmap(_EGLDisplay *disp, _EGLContext *ctx,
EGLClientBuffer buffer, const EGLint *attr_list)
{
struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
struct dri2_egl_image *dri2_img;
xcb_drawable_t drawable;
xcb_dri3_buffer_from_pixmap_cookie_t bp_cookie;
xcb_dri3_buffer_from_pixmap_reply_t *bp_reply;
unsigned int format;
drawable = (xcb_drawable_t) (uintptr_t) buffer;
bp_cookie = xcb_dri3_buffer_from_pixmap(dri2_dpy->conn, drawable);
bp_reply = xcb_dri3_buffer_from_pixmap_reply(dri2_dpy->conn,
bp_cookie, NULL);
if (!bp_reply) {
_eglError(EGL_BAD_ALLOC, "xcb_dri3_buffer_from_pixmap");
return NULL;
}
switch (bp_reply->depth) {
case 16:
format = __DRI_IMAGE_FORMAT_RGB565;
break;
case 24:
format = __DRI_IMAGE_FORMAT_XRGB8888;
break;
case 32:
format = __DRI_IMAGE_FORMAT_ARGB8888;
break;
default:
_eglError(EGL_BAD_PARAMETER,
"dri3_create_image_khr: unsupported pixmap depth");
free(bp_reply);
return EGL_NO_IMAGE_KHR;
}
dri2_img = malloc(sizeof *dri2_img);
if (!dri2_img) {
_eglError(EGL_BAD_ALLOC, "dri3_create_image_khr");
return EGL_NO_IMAGE_KHR;
}
if (!_eglInitImage(&dri2_img->base, disp)) {
free(dri2_img);
return EGL_NO_IMAGE_KHR;
}
dri2_img->dri_image = loader_dri3_create_image(dri2_dpy->conn,
bp_reply,
format,
dri2_dpy->dri_screen,
dri2_dpy->image,
dri2_img);
free(bp_reply);
return &dri2_img->base;
}
static _EGLImage *
dri3_create_image_khr(_EGLDriver *drv, _EGLDisplay *disp,
_EGLContext *ctx, EGLenum target,
EGLClientBuffer buffer, const EGLint *attr_list)
{
(void) drv;
switch (target) {
case EGL_NATIVE_PIXMAP_KHR:
return dri3_create_image_khr_pixmap(disp, ctx, buffer, attr_list);
default:
return dri2_create_image_khr(drv, disp, ctx, target, buffer, attr_list);
}
}
/**
* Called by the driver when it needs to update the real front buffer with the
* contents of its fake front buffer.
*/
static void
dri3_flush_front_buffer(__DRIdrawable *driDrawable, void *loaderPrivate)
{
/* There does not seem to be any kind of consensus on whether we should
* support front-buffer rendering or not:
* http://lists.freedesktop.org/archives/mesa-dev/2013-June/040129.html
*/
_eglLog(_EGL_WARNING, "FIXME: egl/x11 doesn't support front buffer rendering.");
(void) driDrawable;
(void) loaderPrivate;
}
const __DRIimageLoaderExtension dri3_image_loader_extension = {
.base = { __DRI_IMAGE_LOADER, 1 },
.getBuffers = loader_dri3_get_buffers,
.flushFrontBuffer = dri3_flush_front_buffer,
};
static EGLBoolean
dri3_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *draw)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(draw);
/* No-op for a pixmap or pbuffer surface */
if (draw->Type == EGL_PIXMAP_BIT || draw->Type == EGL_PBUFFER_BIT)
return 0;
return loader_dri3_swap_buffers_msc(&dri3_surf->loader_drawable,
0, 0, 0, 0,
draw->SwapBehavior == EGL_BUFFER_PRESERVED) != -1;
}
static EGLBoolean
dri3_copy_buffers(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf,
void *native_pixmap_target)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
xcb_pixmap_t target;
STATIC_ASSERT(sizeof(uintptr_t) == sizeof(native_pixmap_target));
target = (uintptr_t) native_pixmap_target;
loader_dri3_copy_drawable(&dri3_surf->loader_drawable, target,
dri3_surf->loader_drawable.drawable);
return EGL_TRUE;
}
static int
dri3_query_buffer_age(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
return loader_dri3_query_buffer_age(&dri3_surf->loader_drawable);
}
static __DRIdrawable *
dri3_get_dri_drawable(_EGLSurface *surf)
{
struct dri3_egl_surface *dri3_surf = dri3_egl_surface(surf);
return dri3_surf->loader_drawable.dri_drawable;
}
struct dri2_egl_display_vtbl dri3_x11_display_vtbl = {
.authenticate = NULL,
.create_window_surface = dri3_create_window_surface,
.create_pixmap_surface = dri3_create_pixmap_surface,
.create_pbuffer_surface = dri3_create_pbuffer_surface,
.destroy_surface = dri3_destroy_surface,
.create_image = dri3_create_image_khr,
.swap_interval = dri3_set_swap_interval,
.swap_buffers = dri3_swap_buffers,
.swap_buffers_with_damage = dri2_fallback_swap_buffers_with_damage,
.swap_buffers_region = dri2_fallback_swap_buffers_region,
.post_sub_buffer = dri2_fallback_post_sub_buffer,
.copy_buffers = dri3_copy_buffers,
.query_buffer_age = dri3_query_buffer_age,
.create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image,
.get_sync_values = dri3_get_sync_values,
.get_dri_drawable = dri3_get_dri_drawable,
};
static char *
dri3_get_device_name(int fd)
{
char *ret = NULL;
ret = drmGetRenderDeviceNameFromFd(fd);
if (ret)
return ret;
/* For dri3, render node support is required for WL_bind_wayland_display.
* In order not to regress on older systems without kernel or libdrm
* support, fall back to dri2. User can override it with environment
* variable if they don't need to use that extension.
*/
if (getenv("EGL_FORCE_DRI3") == NULL) {
_eglLog(_EGL_WARNING, "Render node support not available, falling back to dri2");
_eglLog(_EGL_WARNING, "If you want to force dri3, set EGL_FORCE_DRI3 environment variable");
} else
ret = loader_get_device_name_for_fd(fd);
return ret;
}
EGLBoolean
dri3_x11_connect(struct dri2_egl_display *dri2_dpy)
{
xcb_dri3_query_version_reply_t *dri3_query;
xcb_dri3_query_version_cookie_t dri3_query_cookie;
xcb_present_query_version_reply_t *present_query;
xcb_present_query_version_cookie_t present_query_cookie;
xcb_generic_error_t *error;
xcb_screen_iterator_t s;
xcb_screen_t *screen;
const xcb_query_extension_reply_t *extension;
xcb_prefetch_extension_data (dri2_dpy->conn, &xcb_dri3_id);
xcb_prefetch_extension_data (dri2_dpy->conn, &xcb_present_id);
extension = xcb_get_extension_data(dri2_dpy->conn, &xcb_dri3_id);
if (!(extension && extension->present))
return EGL_FALSE;
extension = xcb_get_extension_data(dri2_dpy->conn, &xcb_present_id);
if (!(extension && extension->present))
return EGL_FALSE;
dri3_query_cookie = xcb_dri3_query_version(dri2_dpy->conn,
XCB_DRI3_MAJOR_VERSION,
XCB_DRI3_MINOR_VERSION);
present_query_cookie = xcb_present_query_version(dri2_dpy->conn,
XCB_PRESENT_MAJOR_VERSION,
XCB_PRESENT_MINOR_VERSION);
dri3_query =
xcb_dri3_query_version_reply(dri2_dpy->conn, dri3_query_cookie, &error);
if (dri3_query == NULL || error != NULL) {
_eglLog(_EGL_WARNING, "DRI3: failed to query the version");
free(dri3_query);
free(error);
return EGL_FALSE;
}
free(dri3_query);
present_query =
xcb_present_query_version_reply(dri2_dpy->conn,
present_query_cookie, &error);
if (present_query == NULL || error != NULL) {
_eglLog(_EGL_WARNING, "DRI3: failed to query Present version");
free(present_query);
free(error);
return EGL_FALSE;
}
free(present_query);
s = xcb_setup_roots_iterator(xcb_get_setup(dri2_dpy->conn));
screen = get_xcb_screen(s, dri2_dpy->screen);
if (!screen) {
_eglError(EGL_BAD_NATIVE_WINDOW, "dri3_x11_connect");
return EGL_FALSE;
}
dri2_dpy->fd = loader_dri3_open(dri2_dpy->conn, screen->root, 0);
if (dri2_dpy->fd < 0) {
int conn_error = xcb_connection_has_error(dri2_dpy->conn);
_eglLog(_EGL_WARNING, "DRI3: Screen seems not DRI3 capable");
if (conn_error)
_eglLog(_EGL_WARNING, "DRI3: Failed to initialize");
return EGL_FALSE;
}
dri2_dpy->fd = loader_get_user_preferred_fd(dri2_dpy->fd, &dri2_dpy->is_different_gpu);
dri2_dpy->driver_name = loader_get_driver_for_fd(dri2_dpy->fd, 0);
if (!dri2_dpy->driver_name) {
_eglLog(_EGL_WARNING, "DRI3: No driver found");
close(dri2_dpy->fd);
return EGL_FALSE;
}
dri2_dpy->device_name = dri3_get_device_name(dri2_dpy->fd);
if (!dri2_dpy->device_name) {
close(dri2_dpy->fd);
return EGL_FALSE;
}
return EGL_TRUE;
}

View File

@@ -0,0 +1,41 @@
/*
* Copyright © 2015 Boyan Ding
*
* Permission to use, copy, modify, distribute, and sell this software and its
* documentation for any purpose is hereby granted without fee, provided that
* the above copyright notice appear in all copies and that both that copyright
* notice and this permission notice appear in supporting documentation, and
* that the name of the copyright holders not be used in advertising or
* publicity pertaining to distribution of the software without specific,
* written prior permission. The copyright holders make no representations
* about the suitability of this software for any purpose. It is provided "as
* is" without express or implied warranty.
*
* THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
* INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO
* EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR
* CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
* DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
* TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
* OF THIS SOFTWARE.
*/
#ifndef EGL_X11_DRI3_INCLUDED
#define EGL_X11_DRI3_INCLUDED
#include "egl_dri2.h"
_EGL_DRIVER_TYPECAST(dri3_egl_surface, _EGLSurface, obj)
struct dri3_egl_surface {
_EGLSurface base;
struct loader_dri3_drawable loader_drawable;
};
extern const __DRIimageLoaderExtension dri3_image_loader_extension;
extern struct dri2_egl_display_vtbl dri3_x11_display_vtbl;
EGLBoolean
dri3_x11_connect(struct dri2_egl_display *dri2_dpy);
#endif

55
src/egl/egl-symbols-check Executable file
View File

@@ -0,0 +1,55 @@
#!/bin/bash
FUNCS=$(nm -D --defined-only ${1-.libs/libEGL.so} | grep -o "T .*" | cut -c 3- | while read func; do
( grep -q "^$func$" || echo $func ) <<EOF
eglBindAPI
eglBindTexImage
eglChooseConfig
eglClientWaitSync
eglCopyBuffers
eglCreateContext
eglCreateImage
eglCreatePbufferFromClientBuffer
eglCreatePbufferSurface
eglCreatePixmapSurface
eglCreatePlatformPixmapSurface
eglCreatePlatformWindowSurface
eglCreateSync
eglCreateWindowSurface
eglDestroyContext
eglDestroyImage
eglDestroySurface
eglDestroySync
eglGetConfigAttrib
eglGetConfigs
eglGetCurrentContext
eglGetCurrentDisplay
eglGetCurrentSurface
eglGetDisplay
eglGetError
eglGetPlatformDisplay
eglGetProcAddress
eglGetSyncAttrib
eglInitialize
eglMakeCurrent
eglQueryAPI
eglQueryContext
eglQueryString
eglQuerySurface
eglReleaseTexImage
eglReleaseThread
eglSurfaceAttrib
eglSwapBuffers
eglSwapInterval
eglTerminate
eglWaitClient
eglWaitGL
eglWaitNative
eglWaitSync
_fini
_init
EOF
done)
test ! -n "$FUNCS" || echo $FUNCS
test ! -n "$FUNCS"

View File

@@ -27,6 +27,7 @@ GALLIUM_TOP := $(call my-dir)
GALLIUM_COMMON_MK := $(GALLIUM_TOP)/Android.common.mk
SUBDIRS := auxiliary
SUBDIRS += auxiliary/pipe-loader
#
# Gallium drivers and their respective winsys

View File

@@ -67,3 +67,9 @@ if HAVE_DRISW
GALLIUM_PIPE_LOADER_WINSYS_LIBS += \
$(top_builddir)/src/gallium/winsys/sw/dri/libswdri.la
endif
if HAVE_DRISW_KMS
GALLIUM_PIPE_LOADER_WINSYS_LIBS += \
$(top_builddir)/src/gallium/winsys/sw/kms-dri/libswkmsdri.la \
$(LIBDRM_LIBS)
endif

View File

@@ -5,6 +5,7 @@ SUBDIRS =
##
SUBDIRS += auxiliary
SUBDIRS += auxiliary/pipe-loader
##
## Gallium pipe drivers and their respective winsys'
@@ -82,6 +83,11 @@ if HAVE_GALLIUM_VC4
SUBDIRS += drivers/vc4 winsys/vc4/drm
endif
## virgl
if HAVE_GALLIUM_VIRGL
SUBDIRS += drivers/virgl winsys/virgl/drm winsys/virgl/vtest
endif
## the sw winsys'
SUBDIRS += winsys/sw/null
@@ -93,7 +99,7 @@ if HAVE_DRISW
SUBDIRS += winsys/sw/dri
endif
if HAVE_DRI2
if HAVE_DRISW_KMS
SUBDIRS += winsys/sw/kms-dri
endif
@@ -115,7 +121,8 @@ EXTRA_DIST = \
## Gallium state trackers and their users (targets)
##
if HAVE_LOADER_GALLIUM
## XXX: Rename the conditional once we have a config switch for static/dynamic pipe-drivers
if HAVE_CLOVER
SUBDIRS += targets/pipe-loader
endif

View File

@@ -5,6 +5,7 @@ Import('env')
#
SConscript('auxiliary/SConscript')
SConscript('auxiliary/pipe-loader/SConscript')
#
# Drivers

View File

@@ -1,7 +1,3 @@
if HAVE_LOADER_GALLIUM
SUBDIRS := pipe-loader
endif
include Makefile.sources
include $(top_srcdir)/src/gallium/Automake.inc
@@ -66,15 +62,7 @@ COMMON_VL_CFLAGS = \
$(AM_CFLAGS) \
$(VL_CFLAGS) \
$(DRI2PROTO_CFLAGS) \
$(LIBDRM_CFLAGS) \
$(GALLIUM_PIPE_LOADER_DEFINES) \
-DPIPE_SEARCH_DIR=\"$(libdir)/gallium-pipe\"
if HAVE_GALLIUM_STATIC_TARGETS
COMMON_VL_CFLAGS += \
-DGALLIUM_STATIC_TARGETS=1
endif # HAVE_GALLIUM_STATIC_TARGETS
$(LIBDRM_CFLAGS)
noinst_LTLIBRARIES += libgalliumvl.la

View File

@@ -349,7 +349,8 @@ VL_SOURCES := \
# XXX: Nuke this as our dri targets no longer depend on VL.
VL_WINSYS_SOURCES := \
vl/vl_winsys_dri.c
vl/vl_winsys_dri.c \
vl/vl_winsys_drm.c
VL_STUB_SOURCES := \
vl/vl_stubs.c
@@ -378,7 +379,9 @@ GALLIVM_SOURCES := \
gallivm/lp_bld_flow.h \
gallivm/lp_bld_format_aos_array.c \
gallivm/lp_bld_format_aos.c \
gallivm/lp_bld_format_cached.c \
gallivm/lp_bld_format_float.c \
gallivm/lp_bld_format.c \
gallivm/lp_bld_format.h \
gallivm/lp_bld_format_soa.c \
gallivm/lp_bld_format_srgb.c \

View File

@@ -625,6 +625,7 @@ generate_vs(struct draw_llvm_variant *variant,
inputs,
outputs,
context_ptr,
NULL,
draw_sampler,
&llvm->draw->vs.vertex_shader->info,
NULL);
@@ -749,7 +750,8 @@ generate_fetch(struct gallivm_state *gallivm,
lp_float32_vec4_type(),
FALSE,
map_ptr,
zero, zero, zero);
zero, zero, zero,
NULL);
LLVMBuildStore(builder, val, temp_ptr);
}
lp_build_endif(&if_ctx);
@@ -2193,6 +2195,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm,
NULL,
outputs,
context_ptr,
NULL,
sampler,
&llvm->draw->gs.geometry_shader->info,
(const struct lp_build_tgsi_gs_iface *)&gs_iface);

View File

@@ -355,8 +355,9 @@ struct draw_vertex_info {
};
/* these flags are set if the primitive is a segment of a larger one */
#define DRAW_SPLIT_BEFORE 0x1
#define DRAW_SPLIT_AFTER 0x2
#define DRAW_SPLIT_BEFORE 0x1
#define DRAW_SPLIT_AFTER 0x2
#define DRAW_LINE_LOOP_AS_STRIP 0x4
struct draw_prim_info {
boolean linear;

View File

@@ -359,6 +359,16 @@ fetch_pipeline_generic(struct draw_pt_middle_end *middle,
}
static inline unsigned
prim_type(unsigned prim, unsigned flags)
{
if (flags & DRAW_LINE_LOOP_AS_STRIP)
return PIPE_PRIM_LINE_STRIP;
else
return prim;
}
static void
fetch_pipeline_run(struct draw_pt_middle_end *middle,
const unsigned *fetch_elts,
@@ -380,7 +390,7 @@ fetch_pipeline_run(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = draw_count;
prim_info.elts = draw_elts;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &draw_count;
@@ -408,7 +418,7 @@ fetch_pipeline_linear_run(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = count;
prim_info.elts = NULL;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &count;
@@ -439,7 +449,7 @@ fetch_pipeline_linear_run_elts(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = draw_count;
prim_info.elts = draw_elts;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &draw_count;

View File

@@ -473,6 +473,16 @@ llvm_pipeline_generic(struct draw_pt_middle_end *middle,
}
static inline unsigned
prim_type(unsigned prim, unsigned flags)
{
if (flags & DRAW_LINE_LOOP_AS_STRIP)
return PIPE_PRIM_LINE_STRIP;
else
return prim;
}
static void
llvm_middle_end_run(struct draw_pt_middle_end *middle,
const unsigned *fetch_elts,
@@ -494,7 +504,7 @@ llvm_middle_end_run(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = draw_count;
prim_info.elts = draw_elts;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &draw_count;
@@ -522,7 +532,7 @@ llvm_middle_end_linear_run(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = count;
prim_info.elts = NULL;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &count;
@@ -552,7 +562,7 @@ llvm_middle_end_linear_run_elts(struct draw_pt_middle_end *middle,
prim_info.start = 0;
prim_info.count = draw_count;
prim_info.elts = draw_elts;
prim_info.prim = fpme->input_prim;
prim_info.prim = prim_type(fpme->input_prim, prim_flags);
prim_info.flags = prim_flags;
prim_info.primitive_count = 1;
prim_info.primitive_lengths = &draw_count;

View File

@@ -249,6 +249,9 @@ vsplit_segment_loop_linear(struct vsplit_frontend *vsplit, unsigned flags,
assert(icount + !!close_loop <= vsplit->segment_size);
/* need to draw the sections of the line loop as line strips */
flags |= DRAW_LINE_LOOP_AS_STRIP;
if (close_loop) {
for (nr = 0; nr < icount; nr++)
vsplit->fetch_elts[nr] = istart + nr;

View File

@@ -0,0 +1,56 @@
/**************************************************************************
*
* Copyright 2010 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
* THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
**************************************************************************/
#include "lp_bld_format.h"
LLVMTypeRef
lp_build_format_cache_type(struct gallivm_state *gallivm)
{
LLVMTypeRef elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_COUNT];
LLVMTypeRef s;
elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_DATA] =
LLVMArrayType(LLVMInt32TypeInContext(gallivm->context),
LP_BUILD_FORMAT_CACHE_SIZE * 16);
elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_TAGS] =
LLVMArrayType(LLVMInt64TypeInContext(gallivm->context),
LP_BUILD_FORMAT_CACHE_SIZE);
#if LP_BUILD_FORMAT_CACHE_DEBUG
elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_TOTAL] =
LLVMInt64TypeInContext(gallivm->context);
elem_types[LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS] =
LLVMInt64TypeInContext(gallivm->context);
#endif
s = LLVMStructTypeInContext(gallivm->context, elem_types,
LP_BUILD_FORMAT_CACHE_MEMBER_COUNT, 0);
return s;
}

View File

@@ -44,6 +44,45 @@ struct lp_type;
struct lp_build_context;
#define LP_BUILD_FORMAT_CACHE_DEBUG 0
/*
* Block cache
*
* Optional block cache to be used when unpacking big pixel blocks.
* Must be a power of 2
*/
#define LP_BUILD_FORMAT_CACHE_SIZE 128
/*
* Note: cache_data needs 16 byte alignment.
*/
struct lp_build_format_cache
{
PIPE_ALIGN_VAR(16) uint32_t cache_data[LP_BUILD_FORMAT_CACHE_SIZE][4][4];
uint64_t cache_tags[LP_BUILD_FORMAT_CACHE_SIZE];
#if LP_BUILD_FORMAT_CACHE_DEBUG
uint64_t cache_access_total;
uint64_t cache_access_miss;
#endif
};
enum {
LP_BUILD_FORMAT_CACHE_MEMBER_DATA = 0,
LP_BUILD_FORMAT_CACHE_MEMBER_TAGS,
#if LP_BUILD_FORMAT_CACHE_DEBUG
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_TOTAL,
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS,
#endif
LP_BUILD_FORMAT_CACHE_MEMBER_COUNT
};
LLVMTypeRef
lp_build_format_cache_type(struct gallivm_state *gallivm);
/*
* AoS
*/
@@ -66,7 +105,8 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm,
LLVMValueRef base_ptr,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j);
LLVMValueRef j,
LLVMValueRef cache);
LLVMValueRef
lp_build_fetch_rgba_aos_array(struct gallivm_state *gallivm,
@@ -107,13 +147,13 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
LLVMValueRef offsets,
LLVMValueRef i,
LLVMValueRef j,
LLVMValueRef cache,
LLVMValueRef rgba_out[4]);
/*
* YUV
*/
LLVMValueRef
lp_build_fetch_subsampled_rgba_aos(struct gallivm_state *gallivm,
const struct util_format_description *format_desc,
@@ -123,6 +163,18 @@ lp_build_fetch_subsampled_rgba_aos(struct gallivm_state *gallivm,
LLVMValueRef i,
LLVMValueRef j);
LLVMValueRef
lp_build_fetch_cached_texels(struct gallivm_state *gallivm,
const struct util_format_description *format_desc,
unsigned n,
LLVMValueRef base_ptr,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j,
LLVMValueRef cache);
/*
* special float formats
*/

View File

@@ -370,7 +370,8 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm,
LLVMValueRef base_ptr,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j)
LLVMValueRef j,
LLVMValueRef cache)
{
LLVMBuilderRef builder = gallivm->builder;
unsigned num_pixels = type.length / 4;
@@ -502,6 +503,34 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm,
return tmp;
}
/*
* s3tc rgb formats
*/
if (format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC && cache) {
struct lp_type tmp_type;
LLVMValueRef tmp;
memset(&tmp_type, 0, sizeof tmp_type);
tmp_type.width = 8;
tmp_type.length = num_pixels * 4;
tmp_type.norm = TRUE;
tmp = lp_build_fetch_cached_texels(gallivm,
format_desc,
num_pixels,
base_ptr,
offset,
i, j,
cache);
lp_build_conv(gallivm,
tmp_type, type,
&tmp, 1, &tmp, 1);
return tmp;
}
/*
* Fallback to util_format_description::fetch_rgba_8unorm().
*/

View File

@@ -0,0 +1,374 @@
/**************************************************************************
*
* Copyright 2015 VMware, Inc.
* All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sub license, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
* IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
* ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
**************************************************************************/
#include "lp_bld_format.h"
#include "lp_bld_type.h"
#include "lp_bld_struct.h"
#include "lp_bld_const.h"
#include "lp_bld_flow.h"
#include "lp_bld_swizzle.h"
#include "util/u_math.h"
/**
* @file
* Complex block-compression based formats are handled here by using a cache,
* so re-decoding of every pixel is not required.
* Especially for bilinear filtering, texel reuse is very high hence even
* a small cache helps.
* The elements in the cache are the decoded blocks - currently things
* are restricted to formats which are 4x4 block based, and the decoded
* texels must fit into 4x8 bits.
* The cache is direct mapped so hitrates aren't all that great and cache
* thrashing could happen.
*
* @author Roland Scheidegger <sroland@vmware.com>
*/
#if LP_BUILD_FORMAT_CACHE_DEBUG
static void
update_cache_access(struct gallivm_state *gallivm,
LLVMValueRef ptr,
unsigned count,
unsigned index)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef member_ptr, cache_access;
assert(index == LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_TOTAL ||
index == LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS);
member_ptr = lp_build_struct_get_ptr(gallivm, ptr, index, "");
cache_access = LLVMBuildLoad(builder, member_ptr, "cache_access");
cache_access = LLVMBuildAdd(builder, cache_access,
LLVMConstInt(LLVMInt64TypeInContext(gallivm->context),
count, 0), "");
LLVMBuildStore(builder, cache_access, member_ptr);
}
#endif
static void
store_cached_block(struct gallivm_state *gallivm,
LLVMValueRef *col,
LLVMValueRef tag_value,
LLVMValueRef hash_index,
LLVMValueRef cache)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef ptr, indices[3];
LLVMTypeRef type_ptr4x32;
unsigned count;
type_ptr4x32 = LLVMPointerType(LLVMVectorType(LLVMInt32TypeInContext(gallivm->context), 4), 0);
indices[0] = lp_build_const_int32(gallivm, 0);
indices[1] = lp_build_const_int32(gallivm, LP_BUILD_FORMAT_CACHE_MEMBER_TAGS);
indices[2] = hash_index;
ptr = LLVMBuildGEP(builder, cache, indices, Elements(indices), "");
LLVMBuildStore(builder, tag_value, ptr);
indices[1] = lp_build_const_int32(gallivm, LP_BUILD_FORMAT_CACHE_MEMBER_DATA);
hash_index = LLVMBuildMul(builder, hash_index,
lp_build_const_int32(gallivm, 16), "");
for (count = 0; count < 4; count++) {
indices[2] = hash_index;
ptr = LLVMBuildGEP(builder, cache, indices, Elements(indices), "");
ptr = LLVMBuildBitCast(builder, ptr, type_ptr4x32, "");
LLVMBuildStore(builder, col[count], ptr);
hash_index = LLVMBuildAdd(builder, hash_index,
lp_build_const_int32(gallivm, 4), "");
}
}
static LLVMValueRef
lookup_cached_pixel(struct gallivm_state *gallivm,
LLVMValueRef ptr,
LLVMValueRef index)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef member_ptr, indices[3];
indices[0] = lp_build_const_int32(gallivm, 0);
indices[1] = lp_build_const_int32(gallivm, LP_BUILD_FORMAT_CACHE_MEMBER_DATA);
indices[2] = index;
member_ptr = LLVMBuildGEP(builder, ptr, indices, Elements(indices), "");
return LLVMBuildLoad(builder, member_ptr, "cache_data");
}
static LLVMValueRef
lookup_tag_data(struct gallivm_state *gallivm,
LLVMValueRef ptr,
LLVMValueRef index)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef member_ptr, indices[3];
indices[0] = lp_build_const_int32(gallivm, 0);
indices[1] = lp_build_const_int32(gallivm, LP_BUILD_FORMAT_CACHE_MEMBER_TAGS);
indices[2] = index;
member_ptr = LLVMBuildGEP(builder, ptr, indices, Elements(indices), "");
return LLVMBuildLoad(builder, member_ptr, "tag_data");
}
static void
update_cached_block(struct gallivm_state *gallivm,
const struct util_format_description *format_desc,
LLVMValueRef ptr_addr,
LLVMValueRef hash_index,
LLVMValueRef cache)
{
LLVMBuilderRef builder = gallivm->builder;
LLVMTypeRef i8t = LLVMInt8TypeInContext(gallivm->context);
LLVMTypeRef pi8t = LLVMPointerType(i8t, 0);
LLVMTypeRef i32t = LLVMInt32TypeInContext(gallivm->context);
LLVMTypeRef i32x4 = LLVMVectorType(LLVMInt32TypeInContext(gallivm->context), 4);
LLVMValueRef function;
LLVMValueRef tag_value, tmp_ptr;
LLVMValueRef col[4];
unsigned i, j;
/*
* Use format_desc->fetch_rgba_8unorm() for each pixel in the block.
* This doesn't actually make any sense whatsoever, someone would need
* to write a function doing this for all pixels in a block (either as
* an external c function or with generated code). Don't ask.
*/
{
/*
* Function to call looks like:
* fetch(uint8_t *dst, const uint8_t *src, unsigned i, unsigned j)
*/
LLVMTypeRef ret_type;
LLVMTypeRef arg_types[4];
LLVMTypeRef function_type;
assert(format_desc->fetch_rgba_8unorm);
ret_type = LLVMVoidTypeInContext(gallivm->context);
arg_types[0] = pi8t;
arg_types[1] = pi8t;
arg_types[2] = i32t;
arg_types[3] = i32t;
function_type = LLVMFunctionType(ret_type, arg_types,
Elements(arg_types), 0);
/* make const pointer for the C fetch_rgba_8unorm function */
function = lp_build_const_int_pointer(gallivm,
func_to_pointer((func_pointer) format_desc->fetch_rgba_8unorm));
/* cast the callee pointer to the function's type */
function = LLVMBuildBitCast(builder, function,
LLVMPointerType(function_type, 0),
"cast callee");
}
tmp_ptr = lp_build_array_alloca(gallivm, i32x4,
lp_build_const_int32(gallivm, 16),
"tmp_decode_store");
tmp_ptr = LLVMBuildBitCast(builder, tmp_ptr, pi8t, "");
/*
* Invoke format_desc->fetch_rgba_8unorm() for each pixel.
* This is going to be really really slow.
* Note: the block store format is actually
* x0y0x0y1x0y2x0y3 x1y0x1y1x1y2x1y3 ...
*/
for (i = 0; i < 4; ++i) {
for (j = 0; j < 4; ++j) {
LLVMValueRef args[4];
LLVMValueRef dst_offset = lp_build_const_int32(gallivm, (i * 4 + j) * 4);
/*
* Note we actually supply a pointer to the start of the block,
* not the start of the texture.
*/
args[0] = LLVMBuildGEP(gallivm->builder, tmp_ptr, &dst_offset, 1, "");
args[1] = ptr_addr;
args[2] = LLVMConstInt(i32t, i, 0);
args[3] = LLVMConstInt(i32t, j, 0);
LLVMBuildCall(builder, function, args, Elements(args), "");
}
}
/* Finally store the block - pointless mem copy + update tag. */
tmp_ptr = LLVMBuildBitCast(builder, tmp_ptr, LLVMPointerType(i32x4, 0), "");
for (i = 0; i < 4; ++i) {
LLVMValueRef tmp_offset = lp_build_const_int32(gallivm, i);
LLVMValueRef ptr = LLVMBuildGEP(gallivm->builder, tmp_ptr, &tmp_offset, 1, "");
col[i] = LLVMBuildLoad(builder, ptr, "");
}
tag_value = LLVMBuildPtrToInt(gallivm->builder, ptr_addr,
LLVMInt64TypeInContext(gallivm->context), "");
store_cached_block(gallivm, col, tag_value, hash_index, cache);
}
/*
* Do a cached lookup.
*
* Returns (vectors of) 4x8 rgba aos value
*/
LLVMValueRef
lp_build_fetch_cached_texels(struct gallivm_state *gallivm,
const struct util_format_description *format_desc,
unsigned n,
LLVMValueRef base_ptr,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j,
LLVMValueRef cache)
{
LLVMBuilderRef builder = gallivm->builder;
unsigned count, low_bit, log2size;
LLVMValueRef color, offset_stored, addr, ptr_addrtrunc, tmp;
LLVMValueRef ij_index, hash_index, hash_mask, block_index;
LLVMTypeRef i8t = LLVMInt8TypeInContext(gallivm->context);
LLVMTypeRef i32t = LLVMInt32TypeInContext(gallivm->context);
LLVMTypeRef i64t = LLVMInt64TypeInContext(gallivm->context);
struct lp_type type;
struct lp_build_context bld32;
memset(&type, 0, sizeof type);
type.width = 32;
type.length = n;
assert(format_desc->block.width == 4);
assert(format_desc->block.height == 4);
lp_build_context_init(&bld32, gallivm, type);
/*
* compute hash - we use direct mapped cache, the hash function could
* be better but it needs to be simple
* per-element:
* compare offset with offset stored at tag (hash)
* if not equal decode/store block, update tag
* extract color from cache
* assemble result vector
*/
/* TODO: not ideal with 32bit pointers... */
low_bit = util_logbase2(format_desc->block.bits / 8);
log2size = util_logbase2(LP_BUILD_FORMAT_CACHE_SIZE);
addr = LLVMBuildPtrToInt(builder, base_ptr, i64t, "");
ptr_addrtrunc = LLVMBuildPtrToInt(builder, base_ptr, i32t, "");
ptr_addrtrunc = lp_build_broadcast_scalar(&bld32, ptr_addrtrunc);
/* For the hash function, first mask off the unused lowest bits. Then just
do some xor with address bits - only use lower 32bits */
ptr_addrtrunc = LLVMBuildAdd(builder, offset, ptr_addrtrunc, "");
ptr_addrtrunc = LLVMBuildLShr(builder, ptr_addrtrunc,
lp_build_const_int_vec(gallivm, type, low_bit), "");
/* This only really makes sense for size 64,128,256 */
hash_index = ptr_addrtrunc;
ptr_addrtrunc = LLVMBuildLShr(builder, ptr_addrtrunc,
lp_build_const_int_vec(gallivm, type, 2*log2size), "");
hash_index = LLVMBuildXor(builder, ptr_addrtrunc, hash_index, "");
tmp = LLVMBuildLShr(builder, hash_index,
lp_build_const_int_vec(gallivm, type, log2size), "");
hash_index = LLVMBuildXor(builder, hash_index, tmp, "");
hash_mask = lp_build_const_int_vec(gallivm, type, LP_BUILD_FORMAT_CACHE_SIZE - 1);
hash_index = LLVMBuildAnd(builder, hash_index, hash_mask, "");
ij_index = LLVMBuildShl(builder, i, lp_build_const_int_vec(gallivm, type, 2), "");
ij_index = LLVMBuildAdd(builder, ij_index, j, "");
block_index = LLVMBuildShl(builder, hash_index,
lp_build_const_int_vec(gallivm, type, 4), "");
block_index = LLVMBuildAdd(builder, ij_index, block_index, "");
if (n > 1) {
color = LLVMGetUndef(LLVMVectorType(i32t, n));
for (count = 0; count < n; count++) {
LLVMValueRef index, cond, colorx;
LLVMValueRef block_indexx, hash_indexx, addrx, offsetx, ptr_addrx;
struct lp_build_if_state if_ctx;
index = lp_build_const_int32(gallivm, count);
offsetx = LLVMBuildExtractElement(builder, offset, index, "");
addrx = LLVMBuildZExt(builder, offsetx, i64t, "");
addrx = LLVMBuildAdd(builder, addrx, addr, "");
block_indexx = LLVMBuildExtractElement(builder, block_index, index, "");
hash_indexx = LLVMBuildLShr(builder, block_indexx,
lp_build_const_int32(gallivm, 4), "");
offset_stored = lookup_tag_data(gallivm, cache, hash_indexx);
cond = LLVMBuildICmp(builder, LLVMIntNE, offset_stored, addrx, "");
lp_build_if(&if_ctx, gallivm, cond);
{
ptr_addrx = LLVMBuildIntToPtr(builder, addrx,
LLVMPointerType(i8t, 0), "");
update_cached_block(gallivm, format_desc, ptr_addrx, hash_indexx, cache);
#if LP_BUILD_FORMAT_CACHE_DEBUG
update_cache_access(gallivm, cache, 1,
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS);
#endif
}
lp_build_endif(&if_ctx);
colorx = lookup_cached_pixel(gallivm, cache, block_indexx);
color = LLVMBuildInsertElement(builder, color, colorx,
lp_build_const_int32(gallivm, count), "");
}
}
else {
LLVMValueRef cond;
struct lp_build_if_state if_ctx;
tmp = LLVMBuildZExt(builder, offset, i64t, "");
addr = LLVMBuildAdd(builder, tmp, addr, "");
offset_stored = lookup_tag_data(gallivm, cache, hash_index);
cond = LLVMBuildICmp(builder, LLVMIntNE, offset_stored, addr, "");
lp_build_if(&if_ctx, gallivm, cond);
{
tmp = LLVMBuildIntToPtr(builder, addr, LLVMPointerType(i8t, 0), "");
update_cached_block(gallivm, format_desc, tmp, hash_index, cache);
#if LP_BUILD_FORMAT_CACHE_DEBUG
update_cache_access(gallivm, cache, 1,
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_MISS);
#endif
}
lp_build_endif(&if_ctx);
color = lookup_cached_pixel(gallivm, cache, block_index);
}
#if LP_BUILD_FORMAT_CACHE_DEBUG
update_cache_access(gallivm, cache, n,
LP_BUILD_FORMAT_CACHE_MEMBER_ACCESS_TOTAL);
#endif
return LLVMBuildBitCast(builder, color, LLVMVectorType(i8t, n * 4), "");
}

View File

@@ -346,6 +346,7 @@ lp_build_rgba8_to_fi32_soa(struct gallivm_state *gallivm,
* \param i, j the sub-block pixel coordinates. For non-compressed formats
* these will always be (0,0). For compressed formats, i will
* be in [0, block_width-1] and j will be in [0, block_height-1].
* \param cache optional value pointing to a lp_build_format_cache structure
*/
void
lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
@@ -355,6 +356,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
LLVMValueRef offset,
LLVMValueRef i,
LLVMValueRef j,
LLVMValueRef cache,
LLVMValueRef rgba_out[4])
{
LLVMBuilderRef builder = gallivm->builder;
@@ -473,7 +475,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
tmp_type.norm = TRUE;
tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type,
TRUE, base_ptr, offset, i, j);
TRUE, base_ptr, offset, i, j, cache);
lp_build_rgba8_to_fi32_soa(gallivm,
type,
@@ -483,6 +485,39 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
return;
}
if (format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC &&
/* non-srgb case is already handled above */
format_desc->colorspace == UTIL_FORMAT_COLORSPACE_SRGB &&
type.floating && type.width == 32 &&
(type.length == 1 || (type.length % 4 == 0)) &&
cache) {
const struct util_format_description *format_decompressed;
const struct util_format_description *flinear_desc;
LLVMValueRef packed;
flinear_desc = util_format_description(util_format_linear(format_desc->format));
packed = lp_build_fetch_cached_texels(gallivm,
flinear_desc,
type.length,
base_ptr,
offset,
i, j,
cache);
packed = LLVMBuildBitCast(builder, packed,
lp_build_int_vec_type(gallivm, type), "");
/*
* The values are now packed so they match ordinary srgb RGBA8 format,
* hence need to use matching format for unpack.
*/
format_decompressed = util_format_description(PIPE_FORMAT_R8G8B8A8_SRGB);
lp_build_unpack_rgba_soa(gallivm,
format_decompressed,
type,
packed, rgba_out);
return;
}
/*
* Fallback to calling lp_build_fetch_rgba_aos for each pixel.
*
@@ -524,7 +559,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm,
/* Get a single float[4]={R,G,B,A} pixel */
tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type,
TRUE, base_ptr, offset_elem,
i_elem, j_elem);
i_elem, j_elem, cache);
/*
* Insert the AoS tmp value channels into the SoA result vectors at

View File

@@ -427,6 +427,7 @@ lp_build_init(void)
*/
util_cpu_caps.has_avx = 0;
util_cpu_caps.has_avx2 = 0;
util_cpu_caps.has_f16c = 0;
}
#ifdef PIPE_ARCH_PPC_64
@@ -458,7 +459,9 @@ lp_build_init(void)
util_cpu_caps.has_sse3 = 0;
util_cpu_caps.has_ssse3 = 0;
util_cpu_caps.has_sse4_1 = 0;
util_cpu_caps.has_sse4_2 = 0;
util_cpu_caps.has_avx = 0;
util_cpu_caps.has_avx2 = 0;
util_cpu_caps.has_f16c = 0;
#endif

View File

@@ -137,6 +137,8 @@ gallivm_get_shader_param(enum pipe_shader_cap param)
case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED:
case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED:
return 0;
case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
return 32;
}
/* if we get here, we missed a shader cap above (and should have seen
* a compiler warning.)

View File

@@ -497,20 +497,57 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
#endif
}
llvm::SmallVector<std::string, 1> MAttrs;
if (util_cpu_caps.has_avx) {
/*
* AVX feature is not automatically detected from CPUID by the X86 target
* yet, because the old (yet default) JIT engine is not capable of
* emitting the opcodes. On newer llvm versions it is and at least some
* versions (tested with 3.3) will emit avx opcodes without this anyway.
*/
MAttrs.push_back("+avx");
if (util_cpu_caps.has_f16c) {
MAttrs.push_back("+f16c");
}
builder.setMAttrs(MAttrs);
llvm::SmallVector<std::string, 16> MAttrs;
#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
/*
* We need to unset attributes because sometimes LLVM mistakenly assumes
* certain features are present given the processor name.
*
* https://bugs.freedesktop.org/show_bug.cgi?id=92214
* http://llvm.org/PR25021
* http://llvm.org/PR19429
* http://llvm.org/PR16721
*/
MAttrs.push_back(util_cpu_caps.has_sse ? "+sse" : "-sse" );
MAttrs.push_back(util_cpu_caps.has_sse2 ? "+sse2" : "-sse2" );
MAttrs.push_back(util_cpu_caps.has_sse3 ? "+sse3" : "-sse3" );
MAttrs.push_back(util_cpu_caps.has_ssse3 ? "+ssse3" : "-ssse3" );
#if HAVE_LLVM >= 0x0304
MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse4.1" : "-sse4.1");
#else
MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse41" : "-sse41" );
#endif
#if HAVE_LLVM >= 0x0304
MAttrs.push_back(util_cpu_caps.has_sse4_2 ? "+sse4.2" : "-sse4.2");
#else
MAttrs.push_back(util_cpu_caps.has_sse4_2 ? "+sse42" : "-sse42" );
#endif
/*
* AVX feature is not automatically detected from CPUID by the X86 target
* yet, because the old (yet default) JIT engine is not capable of
* emitting the opcodes. On newer llvm versions it is and at least some
* versions (tested with 3.3) will emit avx opcodes without this anyway.
*/
MAttrs.push_back(util_cpu_caps.has_avx ? "+avx" : "-avx");
MAttrs.push_back(util_cpu_caps.has_f16c ? "+f16c" : "-f16c");
MAttrs.push_back(util_cpu_caps.has_avx2 ? "+avx2" : "-avx2");
#endif
#if defined(PIPE_ARCH_PPC)
MAttrs.push_back(util_cpu_caps.has_altivec ? "+altivec" : "-altivec");
#if HAVE_LLVM >= 0x0304
/*
* Make sure VSX instructions are disabled
* See LLVM bug https://llvm.org/bugs/show_bug.cgi?id=25503#c7
*/
if (util_cpu_caps.has_altivec) {
MAttrs.push_back("-vsx");
}
#endif
#endif
builder.setMAttrs(MAttrs);
#if HAVE_LLVM >= 0x0305
StringRef MCPU = llvm::sys::getHostCPUName();

View File

@@ -99,6 +99,7 @@ struct lp_sampler_params
unsigned sampler_index;
unsigned sample_key;
LLVMValueRef context_ptr;
LLVMValueRef thread_data_ptr;
const LLVMValueRef *coords;
const LLVMValueRef *offsets;
LLVMValueRef lod;
@@ -267,6 +268,17 @@ struct lp_sampler_dynamic_state
struct gallivm_state *gallivm,
LLVMValueRef context_ptr,
unsigned sampler_unit);
/**
* Obtain texture cache (returns ptr to lp_build_format_cache).
*
* It's optional: no caching will be done if it's NULL.
*/
LLVMValueRef
(*cache_ptr)(const struct lp_sampler_dynamic_state *state,
struct gallivm_state *gallivm,
LLVMValueRef thread_data_ptr,
unsigned unit);
};
@@ -356,6 +368,7 @@ struct lp_build_sample_context
LLVMValueRef img_stride_array;
LLVMValueRef base_ptr;
LLVMValueRef mip_offsets;
LLVMValueRef cache;
/** Integer vector with texture width, height, depth */
LLVMValueRef int_size;

View File

@@ -593,7 +593,8 @@ lp_build_sample_fetch_image_nearest(struct lp_build_sample_context *bld,
TRUE,
data_ptr, offset,
x_subcoord,
y_subcoord);
y_subcoord,
bld->cache);
}
*colors = rgba8;
@@ -933,7 +934,8 @@ lp_build_sample_fetch_image_linear(struct lp_build_sample_context *bld,
TRUE,
data_ptr, offset[k][j][i],
x_subcoord[i],
y_subcoord[j]);
y_subcoord[j],
bld->cache);
}
neighbors[k][j][i] = rgba8;

View File

@@ -161,6 +161,7 @@ lp_build_sample_texel_soa(struct lp_build_sample_context *bld,
bld->texel_type,
data_ptr, offset,
i, j,
bld->cache,
texel_out);
/*
@@ -405,16 +406,17 @@ lp_build_sample_wrap_linear(struct lp_build_sample_context *bld,
break;
case PIPE_TEX_WRAP_MIRROR_REPEAT:
if (offset) {
offset = lp_build_int_to_float(coord_bld, offset);
offset = lp_build_div(coord_bld, offset, length_f);
coord = lp_build_add(coord_bld, coord, offset);
}
/* compute mirror function */
coord = lp_build_coord_mirror(bld, coord);
/* scale coord to length */
coord = lp_build_mul(coord_bld, coord, length_f);
coord = lp_build_sub(coord_bld, coord, half);
if (offset) {
offset = lp_build_int_to_float(coord_bld, offset);
coord = lp_build_add(coord_bld, coord, offset);
}
/* convert to int, compute lerp weight */
lp_build_ifloor_fract(coord_bld, coord, &coord0, &weight);
@@ -567,12 +569,13 @@ lp_build_sample_wrap_nearest(struct lp_build_sample_context *bld,
coord = lp_build_mul(coord_bld, coord, length_f);
}
if (offset) {
offset = lp_build_int_to_float(coord_bld, offset);
coord = lp_build_add(coord_bld, coord, offset);
}
/* floor */
/* use itrunc instead since we clamp to 0 anyway */
icoord = lp_build_itrunc(coord_bld, coord);
if (offset) {
icoord = lp_build_add(int_coord_bld, icoord, offset);
}
/* clamp to [0, length - 1]. */
icoord = lp_build_clamp(int_coord_bld, icoord, int_coord_bld->zero,
@@ -2387,6 +2390,7 @@ lp_build_fetch_texel(struct lp_build_sample_context *bld,
bld->texel_type,
bld->base_ptr, offset,
i, j,
bld->cache,
colors_out);
if (out_of_bound_ret_zero) {
@@ -2440,6 +2444,7 @@ lp_build_sample_soa_code(struct gallivm_state *gallivm,
unsigned texture_index,
unsigned sampler_index,
LLVMValueRef context_ptr,
LLVMValueRef thread_data_ptr,
const LLVMValueRef *coords,
const LLVMValueRef *offsets,
const struct lp_derivatives *derivs, /* optional */
@@ -2586,6 +2591,10 @@ lp_build_sample_soa_code(struct gallivm_state *gallivm,
derived_sampler_state.wrap_s = PIPE_TEX_WRAP_CLAMP_TO_EDGE;
derived_sampler_state.wrap_t = PIPE_TEX_WRAP_CLAMP_TO_EDGE;
}
/*
* We could force CLAMP to CLAMP_TO_EDGE here if min/mag filter is nearest,
* so AoS path could be used. Not sure it's worth the trouble...
*/
min_img_filter = derived_sampler_state.min_img_filter;
mag_img_filter = derived_sampler_state.mag_img_filter;
@@ -2701,6 +2710,11 @@ lp_build_sample_soa_code(struct gallivm_state *gallivm,
context_ptr, texture_index);
/* Note that mip_offsets is an array[level] of offsets to texture images */
if (dynamic_state->cache_ptr && thread_data_ptr) {
bld.cache = dynamic_state->cache_ptr(dynamic_state, gallivm,
thread_data_ptr, texture_index);
}
/* width, height, depth as single int vector */
if (dims <= 1) {
bld.int_size = tex_width;
@@ -2877,6 +2891,7 @@ lp_build_sample_soa_code(struct gallivm_state *gallivm,
bld4.base_ptr = bld.base_ptr;
bld4.mip_offsets = bld.mip_offsets;
bld4.int_size = bld.int_size;
bld4.cache = bld.cache;
bld4.vector_width = lp_type_width(type4);
@@ -3075,12 +3090,14 @@ lp_build_sample_gen_func(struct gallivm_state *gallivm,
LLVMValueRef offsets[3] = { NULL };
LLVMValueRef lod = NULL;
LLVMValueRef context_ptr;
LLVMValueRef thread_data_ptr = NULL;
LLVMValueRef texel_out[4];
struct lp_derivatives derivs;
struct lp_derivatives *deriv_ptr = NULL;
unsigned num_param = 0;
unsigned i, num_coords, num_derivs, num_offsets, layer;
enum lp_sampler_lod_control lod_control;
boolean need_cache = FALSE;
lod_control = (sample_key & LP_SAMPLER_LOD_CONTROL_MASK) >>
LP_SAMPLER_LOD_CONTROL_SHIFT;
@@ -3088,8 +3105,19 @@ lp_build_sample_gen_func(struct gallivm_state *gallivm,
get_target_info(static_texture_state->target,
&num_coords, &num_derivs, &num_offsets, &layer);
if (dynamic_state->cache_ptr) {
const struct util_format_description *format_desc;
format_desc = util_format_description(static_texture_state->format);
if (format_desc && format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC) {
need_cache = TRUE;
}
}
/* "unpack" arguments */
context_ptr = LLVMGetParam(function, num_param++);
if (need_cache) {
thread_data_ptr = LLVMGetParam(function, num_param++);
}
for (i = 0; i < num_coords; i++) {
coords[i] = LLVMGetParam(function, num_param++);
}
@@ -3140,6 +3168,7 @@ lp_build_sample_gen_func(struct gallivm_state *gallivm,
texture_index,
sampler_index,
context_ptr,
thread_data_ptr,
coords,
offsets,
deriv_ptr,
@@ -3183,6 +3212,7 @@ lp_build_sample_soa_func(struct gallivm_state *gallivm,
const LLVMValueRef *offsets = params->offsets;
const struct lp_derivatives *derivs = params->derivs;
enum lp_sampler_lod_control lod_control;
boolean need_cache = FALSE;
lod_control = (sample_key & LP_SAMPLER_LOD_CONTROL_MASK) >>
LP_SAMPLER_LOD_CONTROL_SHIFT;
@@ -3190,6 +3220,17 @@ lp_build_sample_soa_func(struct gallivm_state *gallivm,
get_target_info(static_texture_state->target,
&num_coords, &num_derivs, &num_offsets, &layer);
if (dynamic_state->cache_ptr) {
const struct util_format_description *format_desc;
format_desc = util_format_description(static_texture_state->format);
if (format_desc && format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC) {
/*
* This is not 100% correct, if we have cache but the
* util_format_s3tc_prefer is true the cache won't get used
* regardless (could hook up the block decode there...) */
need_cache = TRUE;
}
}
/*
* texture function matches are found by name.
* Thus the name has to include both the texture and sampler unit
@@ -3215,6 +3256,9 @@ lp_build_sample_soa_func(struct gallivm_state *gallivm,
*/
arg_types[num_param++] = LLVMTypeOf(params->context_ptr);
if (need_cache) {
arg_types[num_param++] = LLVMTypeOf(params->thread_data_ptr);
}
for (i = 0; i < num_coords; i++) {
arg_types[num_param++] = LLVMTypeOf(coords[0]);
assert(LLVMTypeOf(coords[0]) == LLVMTypeOf(coords[i]));
@@ -3274,6 +3318,9 @@ lp_build_sample_soa_func(struct gallivm_state *gallivm,
num_args = 0;
args[num_args++] = params->context_ptr;
if (need_cache) {
args[num_args++] = params->thread_data_ptr;
}
for (i = 0; i < num_coords; i++) {
args[num_args++] = coords[i];
}
@@ -3378,6 +3425,7 @@ lp_build_sample_soa(const struct lp_static_texture_state *static_texture_state,
params->texture_index,
params->sampler_index,
params->context_ptr,
params->thread_data_ptr,
params->coords,
params->offsets,
params->derivs,

View File

@@ -129,7 +129,8 @@ lp_build_emit_llvm_unary(
unsigned tgsi_opcode,
LLVMValueRef arg0)
{
struct lp_build_emit_data emit_data;
struct lp_build_emit_data emit_data = {{0}};
emit_data.info = tgsi_get_opcode_info(tgsi_opcode);
emit_data.arg_count = 1;
emit_data.args[0] = arg0;
return lp_build_emit_llvm(bld_base, tgsi_opcode, &emit_data);
@@ -142,7 +143,8 @@ lp_build_emit_llvm_binary(
LLVMValueRef arg0,
LLVMValueRef arg1)
{
struct lp_build_emit_data emit_data;
struct lp_build_emit_data emit_data = {{0}};
emit_data.info = tgsi_get_opcode_info(tgsi_opcode);
emit_data.arg_count = 2;
emit_data.args[0] = arg0;
emit_data.args[1] = arg1;
@@ -157,7 +159,8 @@ lp_build_emit_llvm_ternary(
LLVMValueRef arg1,
LLVMValueRef arg2)
{
struct lp_build_emit_data emit_data;
struct lp_build_emit_data emit_data = {{0}};
emit_data.info = tgsi_get_opcode_info(tgsi_opcode);
emit_data.arg_count = 3;
emit_data.args[0] = arg0;
emit_data.args[1] = arg1;

View File

@@ -230,6 +230,7 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
const LLVMValueRef (*inputs)[4],
LLVMValueRef (*outputs)[4],
LLVMValueRef context_ptr,
LLVMValueRef thread_data_ptr,
struct lp_build_sampler_soa *sampler,
const struct tgsi_shader_info *info,
const struct lp_build_tgsi_gs_iface *gs_iface);
@@ -447,6 +448,7 @@ struct lp_build_tgsi_soa_context
const LLVMValueRef (*inputs)[TGSI_NUM_CHANNELS];
LLVMValueRef (*outputs)[TGSI_NUM_CHANNELS];
LLVMValueRef context_ptr;
LLVMValueRef thread_data_ptr;
const struct lp_build_sampler_soa *sampler;

View File

@@ -538,12 +538,19 @@ lrp_emit(
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
{
LLVMValueRef tmp;
tmp = lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_SUB,
emit_data->args[1],
emit_data->args[2]);
emit_data->output[emit_data->chan] = lp_build_emit_llvm_ternary(bld_base,
TGSI_OPCODE_MAD, emit_data->args[0], tmp, emit_data->args[2]);
struct lp_build_context *bld = &bld_base->base;
LLVMValueRef inv, a, b;
/* This uses the correct version: (1 - t)*a + t*b
*
* An alternative version is "a + t*(b-a)". The problem is this version
* doesn't return "b" for t = 1, because "a + (b-a)" isn't equal to "b"
* because of the floating-point rounding.
*/
inv = lp_build_sub(bld, bld_base->base.one, emit_data->args[0]);
a = lp_build_mul(bld, emit_data->args[1], emit_data->args[0]);
b = lp_build_mul(bld, emit_data->args[2], inv);
emit_data->output[emit_data->chan] = lp_build_add(bld, a, b);
}
/* TGSI_OPCODE_MAD */

View File

@@ -2321,6 +2321,7 @@ emit_tex( struct lp_build_tgsi_soa_context *bld,
params.texture_index = unit;
params.sampler_index = unit;
params.context_ptr = bld->context_ptr;
params.thread_data_ptr = bld->thread_data_ptr;
params.coords = coords;
params.offsets = offsets;
params.lod = lod;
@@ -2488,6 +2489,7 @@ emit_sample(struct lp_build_tgsi_soa_context *bld,
params.texture_index = texture_unit;
params.sampler_index = sampler_unit;
params.context_ptr = bld->context_ptr;
params.thread_data_ptr = bld->thread_data_ptr;
params.coords = coords;
params.offsets = offsets;
params.lod = lod;
@@ -2606,8 +2608,14 @@ emit_fetch_texels( struct lp_build_tgsi_soa_context *bld,
params.type = bld->bld_base.base.type;
params.sample_key = sample_key;
params.texture_index = unit;
params.sampler_index = unit;
/*
* sampler not actually used, set to 0 so it won't exceed PIPE_MAX_SAMPLERS
* and trigger some assertions with d3d10 where the sampler view number
* can exceed this.
*/
params.sampler_index = 0;
params.context_ptr = bld->context_ptr;
params.thread_data_ptr = bld->thread_data_ptr;
params.coords = coords;
params.offsets = offsets;
params.derivs = NULL;
@@ -3858,6 +3866,7 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
const LLVMValueRef (*inputs)[TGSI_NUM_CHANNELS],
LLVMValueRef (*outputs)[TGSI_NUM_CHANNELS],
LLVMValueRef context_ptr,
LLVMValueRef thread_data_ptr,
struct lp_build_sampler_soa *sampler,
const struct tgsi_shader_info *info,
const struct lp_build_tgsi_gs_iface *gs_iface)
@@ -3893,6 +3902,7 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
bld.bld_base.info = info;
bld.indirect_files = info->indirect_files;
bld.context_ptr = context_ptr;
bld.thread_data_ptr = thread_data_ptr;
/*
* If the number of temporaries is rather large then we just

View File

@@ -33,6 +33,7 @@
* Set GALLIUM_HUD=help for more info.
*/
#include <signal.h>
#include <stdio.h>
#include "hud/hud_context.h"
@@ -51,12 +52,15 @@
#include "tgsi/tgsi_text.h"
#include "tgsi/tgsi_dump.h"
/* Control the visibility of all HUD contexts */
static boolean huds_visible = TRUE;
struct hud_context {
struct pipe_context *pipe;
struct cso_context *cso;
struct u_upload_mgr *uploader;
struct hud_batch_query_context *batch_query;
struct list_head pane_list;
/* states */
@@ -95,6 +99,13 @@ struct hud_context {
} text, bg, whitelines;
};
#ifdef PIPE_OS_UNIX
static void
signal_visible_handler(int sig, siginfo_t *siginfo, void *context)
{
huds_visible = !huds_visible;
}
#endif
static void
hud_draw_colored_prims(struct hud_context *hud, unsigned prim,
@@ -441,6 +452,9 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex)
struct hud_pane *pane;
struct hud_graph *gr;
if (!huds_visible)
return;
hud->fb_width = tex->width0;
hud->fb_height = tex->height0;
hud->constants.two_div_fb_width = 2.0f / hud->fb_width;
@@ -510,6 +524,8 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex)
hud_alloc_vertices(hud, &hud->text, 4 * 512, 4 * sizeof(float));
/* prepare all graphs */
hud_batch_query_update(hud->batch_query);
LIST_FOR_EACH_ENTRY(pane, &hud->pane_list, head) {
LIST_FOR_EACH_ENTRY(gr, &pane->graph_list, head) {
gr->query_new_value(gr);
@@ -903,17 +919,21 @@ hud_parse_env_var(struct hud_context *hud, const char *env)
}
else if (strcmp(name, "samples-passed") == 0 &&
has_occlusion_query(hud->pipe->screen)) {
hud_pipe_query_install(pane, hud->pipe, "samples-passed",
hud_pipe_query_install(&hud->batch_query, pane, hud->pipe,
"samples-passed",
PIPE_QUERY_OCCLUSION_COUNTER, 0, 0,
PIPE_DRIVER_QUERY_TYPE_UINT64,
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE);
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE,
0);
}
else if (strcmp(name, "primitives-generated") == 0 &&
has_streamout(hud->pipe->screen)) {
hud_pipe_query_install(pane, hud->pipe, "primitives-generated",
hud_pipe_query_install(&hud->batch_query, pane, hud->pipe,
"primitives-generated",
PIPE_QUERY_PRIMITIVES_GENERATED, 0, 0,
PIPE_DRIVER_QUERY_TYPE_UINT64,
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE);
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE,
0);
}
else {
boolean processed = FALSE;
@@ -938,17 +958,19 @@ hud_parse_env_var(struct hud_context *hud, const char *env)
if (strcmp(name, pipeline_statistics_names[i]) == 0)
break;
if (i < Elements(pipeline_statistics_names)) {
hud_pipe_query_install(pane, hud->pipe, name,
hud_pipe_query_install(&hud->batch_query, pane, hud->pipe, name,
PIPE_QUERY_PIPELINE_STATISTICS, i,
0, PIPE_DRIVER_QUERY_TYPE_UINT64,
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE);
PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE,
0);
processed = TRUE;
}
}
/* driver queries */
if (!processed) {
if (!hud_driver_query_install(pane, hud->pipe, name)){
if (!hud_driver_query_install(&hud->batch_query, pane, hud->pipe,
name)) {
fprintf(stderr, "gallium_hud: unknown driver query '%s'\n", name);
}
}
@@ -987,6 +1009,9 @@ hud_parse_env_var(struct hud_context *hud, const char *env)
case ',':
env++;
if (!pane)
break;
y += height + hud->font.glyph_height * (pane->num_graphs + 2);
height = 100;
@@ -1122,6 +1147,12 @@ hud_create(struct pipe_context *pipe, struct cso_context *cso)
struct pipe_sampler_view view_templ;
unsigned i;
const char *env = debug_get_option("GALLIUM_HUD", NULL);
unsigned signo = debug_get_num_option("GALLIUM_HUD_TOGGLE_SIGNAL", 0);
#ifdef PIPE_OS_UNIX
static boolean sig_handled = FALSE;
struct sigaction action = {};
#endif
huds_visible = debug_get_bool_option("GALLIUM_HUD_VISIBLE", TRUE);
if (!env || !*env)
return NULL;
@@ -1264,6 +1295,22 @@ hud_create(struct pipe_context *pipe, struct cso_context *cso)
LIST_INITHEAD(&hud->pane_list);
/* setup sig handler once for all hud contexts */
#ifdef PIPE_OS_UNIX
if (!sig_handled && signo != 0) {
action.sa_sigaction = &signal_visible_handler;
action.sa_flags = SA_SIGINFO;
if (signo >= NSIG)
fprintf(stderr, "gallium_hud: invalid signal %u\n", signo);
else if (sigaction(signo, &action, NULL) < 0)
fprintf(stderr, "gallium_hud: unable to set handler for signal %u\n", signo);
fflush(stderr);
sig_handled = TRUE;
}
#endif
hud_parse_env_var(hud, env);
return hud;
}
@@ -1284,6 +1331,7 @@ hud_destroy(struct hud_context *hud)
FREE(pane);
}
hud_batch_query_cleanup(&hud->batch_query);
pipe->delete_fs_state(pipe, hud->fs_color);
pipe->delete_fs_state(pipe, hud->fs_text);
pipe->delete_vs_state(pipe, hud->vs);

View File

@@ -33,6 +33,58 @@
#include "util/u_memory.h"
#include <stdio.h>
#include <inttypes.h>
#ifdef PIPE_OS_WINDOWS
#include <windows.h>
#endif
#ifdef PIPE_OS_WINDOWS
static inline uint64_t
filetime_to_scalar(FILETIME ft)
{
ULARGE_INTEGER uli;
uli.LowPart = ft.dwLowDateTime;
uli.HighPart = ft.dwHighDateTime;
return uli.QuadPart;
}
static boolean
get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, uint64_t *total_time)
{
SYSTEM_INFO sysInfo;
FILETIME ftNow, ftCreation, ftExit, ftKernel, ftUser;
GetSystemInfo(&sysInfo);
assert(sysInfo.dwNumberOfProcessors >= 1);
if (cpu_index != ALL_CPUS && cpu_index >= sysInfo.dwNumberOfProcessors) {
/* Tell hud_get_num_cpus there are only this many CPUs. */
return FALSE;
}
/* Get accumulated user and sys time for all threads */
if (!GetProcessTimes(GetCurrentProcess(), &ftCreation, &ftExit,
&ftKernel, &ftUser))
return FALSE;
GetSystemTimeAsFileTime(&ftNow);
*busy_time = filetime_to_scalar(ftUser) + filetime_to_scalar(ftKernel);
*total_time = filetime_to_scalar(ftNow) - filetime_to_scalar(ftCreation);
/* busy_time already has the time accross all cpus.
* XXX: if we want 100% to mean one CPU, 200% two cpus, eliminate the
* following line.
*/
*total_time *= sysInfo.dwNumberOfProcessors;
/* XXX: we ignore cpu_index, i.e, we assume that the individual CPU usage
* and the system usage are one and the same.
*/
return TRUE;
}
#else
static boolean
get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, uint64_t *total_time)
@@ -81,6 +133,8 @@ get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, uint64_t *total_time)
fclose(f);
return FALSE;
}
#endif
struct cpu_info {
unsigned cpu_index;

View File

@@ -34,13 +34,164 @@
#include "hud/hud_private.h"
#include "pipe/p_screen.h"
#include "os/os_time.h"
#include "util/u_math.h"
#include "util/u_memory.h"
#include <stdio.h>
// Must be a power of two
#define NUM_QUERIES 8
struct hud_batch_query_context {
struct pipe_context *pipe;
unsigned num_query_types;
unsigned allocated_query_types;
unsigned *query_types;
boolean failed;
struct pipe_query *query[NUM_QUERIES];
union pipe_query_result *result[NUM_QUERIES];
unsigned head, pending, results;
};
void
hud_batch_query_update(struct hud_batch_query_context *bq)
{
struct pipe_context *pipe;
if (!bq || bq->failed)
return;
pipe = bq->pipe;
if (bq->query[bq->head])
pipe->end_query(pipe, bq->query[bq->head]);
bq->results = 0;
while (bq->pending) {
unsigned idx = (bq->head - bq->pending + 1) % NUM_QUERIES;
struct pipe_query *query = bq->query[idx];
if (!bq->result[idx])
bq->result[idx] = MALLOC(sizeof(bq->result[idx]->batch[0]) *
bq->num_query_types);
if (!bq->result[idx]) {
fprintf(stderr, "gallium_hud: out of memory.\n");
bq->failed = TRUE;
return;
}
if (!pipe->get_query_result(pipe, query, FALSE, bq->result[idx]))
break;
++bq->results;
--bq->pending;
}
bq->head = (bq->head + 1) % NUM_QUERIES;
if (bq->pending == NUM_QUERIES) {
fprintf(stderr,
"gallium_hud: all queries busy after %i frames, dropping data.\n",
NUM_QUERIES);
assert(bq->query[bq->head]);
pipe->destroy_query(bq->pipe, bq->query[bq->head]);
bq->query[bq->head] = NULL;
}
++bq->pending;
if (!bq->query[bq->head]) {
bq->query[bq->head] = pipe->create_batch_query(pipe,
bq->num_query_types,
bq->query_types);
if (!bq->query[bq->head]) {
fprintf(stderr,
"gallium_hud: create_batch_query failed. You may have "
"selected too many or incompatible queries.\n");
bq->failed = TRUE;
return;
}
}
if (!pipe->begin_query(pipe, bq->query[bq->head])) {
fprintf(stderr,
"gallium_hud: could not begin batch query. You may have "
"selected too many or incompatible queries.\n");
bq->failed = TRUE;
}
}
static boolean
batch_query_add(struct hud_batch_query_context **pbq,
struct pipe_context *pipe, unsigned query_type,
unsigned *result_index)
{
struct hud_batch_query_context *bq = *pbq;
unsigned i;
if (!bq) {
bq = CALLOC_STRUCT(hud_batch_query_context);
if (!bq)
return false;
bq->pipe = pipe;
*pbq = bq;
}
for (i = 0; i < bq->num_query_types; ++i) {
if (bq->query_types[i] == query_type) {
*result_index = i;
return true;
}
}
if (bq->num_query_types == bq->allocated_query_types) {
unsigned new_alloc = MAX2(16, bq->allocated_query_types * 2);
unsigned *new_query_types
= REALLOC(bq->query_types,
bq->allocated_query_types * sizeof(unsigned),
new_alloc * sizeof(unsigned));
if (!new_query_types)
return false;
bq->query_types = new_query_types;
bq->allocated_query_types = new_alloc;
}
bq->query_types[bq->num_query_types] = query_type;
*result_index = bq->num_query_types++;
return true;
}
void
hud_batch_query_cleanup(struct hud_batch_query_context **pbq)
{
struct hud_batch_query_context *bq = *pbq;
unsigned idx;
if (!bq)
return;
*pbq = NULL;
if (bq->query[bq->head] && !bq->failed)
bq->pipe->end_query(bq->pipe, bq->query[bq->head]);
for (idx = 0; idx < NUM_QUERIES; ++idx) {
if (bq->query[idx])
bq->pipe->destroy_query(bq->pipe, bq->query[idx]);
FREE(bq->result[idx]);
}
FREE(bq->query_types);
FREE(bq);
}
struct query_info {
struct pipe_context *pipe;
struct hud_batch_query_context *batch;
unsigned query_type;
unsigned result_index; /* unit depends on query_type */
enum pipe_driver_query_result_type result_type;
@@ -48,7 +199,6 @@ struct query_info {
/* Ring of queries. If a query is busy, we use another slot. */
struct pipe_query *query[NUM_QUERIES];
unsigned head, tail;
unsigned num_queries;
uint64_t last_time;
uint64_t results_cumulative;
@@ -56,11 +206,26 @@ struct query_info {
};
static void
query_new_value(struct hud_graph *gr)
query_new_value_batch(struct query_info *info)
{
struct hud_batch_query_context *bq = info->batch;
unsigned result_index = info->result_index;
unsigned idx = (bq->head - bq->pending) % NUM_QUERIES;
unsigned results = bq->results;
while (results) {
info->results_cumulative += bq->result[idx]->batch[result_index].u64;
++info->num_results;
--results;
idx = (idx - 1) % NUM_QUERIES;
}
}
static void
query_new_value_normal(struct query_info *info)
{
struct query_info *info = gr->query_data;
struct pipe_context *pipe = info->pipe;
uint64_t now = os_time_get();
if (info->last_time) {
if (info->query[info->head])
@@ -107,30 +272,9 @@ query_new_value(struct hud_graph *gr)
break;
}
}
if (info->num_results && info->last_time + gr->pane->period <= now) {
uint64_t value;
switch (info->result_type) {
default:
case PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE:
value = info->results_cumulative / info->num_results;
break;
case PIPE_DRIVER_QUERY_RESULT_TYPE_CUMULATIVE:
value = info->results_cumulative;
break;
}
hud_graph_add_value(gr, value);
info->last_time = now;
info->results_cumulative = 0;
info->num_results = 0;
}
}
else {
/* initialize */
info->last_time = now;
info->query[info->head] = pipe->create_query(pipe, info->query_type, 0);
}
@@ -138,12 +282,50 @@ query_new_value(struct hud_graph *gr)
pipe->begin_query(pipe, info->query[info->head]);
}
static void
query_new_value(struct hud_graph *gr)
{
struct query_info *info = gr->query_data;
uint64_t now = os_time_get();
if (info->batch) {
query_new_value_batch(info);
} else {
query_new_value_normal(info);
}
if (!info->last_time) {
info->last_time = now;
return;
}
if (info->num_results && info->last_time + gr->pane->period <= now) {
uint64_t value;
switch (info->result_type) {
default:
case PIPE_DRIVER_QUERY_RESULT_TYPE_AVERAGE:
value = info->results_cumulative / info->num_results;
break;
case PIPE_DRIVER_QUERY_RESULT_TYPE_CUMULATIVE:
value = info->results_cumulative;
break;
}
hud_graph_add_value(gr, value);
info->last_time = now;
info->results_cumulative = 0;
info->num_results = 0;
}
}
static void
free_query_info(void *ptr)
{
struct query_info *info = ptr;
if (info->last_time) {
if (!info->batch && info->last_time) {
struct pipe_context *pipe = info->pipe;
int i;
@@ -159,11 +341,13 @@ free_query_info(void *ptr)
}
void
hud_pipe_query_install(struct hud_pane *pane, struct pipe_context *pipe,
hud_pipe_query_install(struct hud_batch_query_context **pbq,
struct hud_pane *pane, struct pipe_context *pipe,
const char *name, unsigned query_type,
unsigned result_index,
uint64_t max_value, enum pipe_driver_query_type type,
enum pipe_driver_query_result_type result_type)
enum pipe_driver_query_result_type result_type,
unsigned flags)
{
struct hud_graph *gr;
struct query_info *info;
@@ -175,28 +359,40 @@ hud_pipe_query_install(struct hud_pane *pane, struct pipe_context *pipe,
strncpy(gr->name, name, sizeof(gr->name));
gr->name[sizeof(gr->name) - 1] = '\0';
gr->query_data = CALLOC_STRUCT(query_info);
if (!gr->query_data) {
FREE(gr);
return;
}
if (!gr->query_data)
goto fail_gr;
gr->query_new_value = query_new_value;
gr->free_query_data = free_query_info;
info = gr->query_data;
info->pipe = pipe;
info->query_type = query_type;
info->result_index = result_index;
info->result_type = result_type;
if (flags & PIPE_DRIVER_QUERY_FLAG_BATCH) {
if (!batch_query_add(pbq, pipe, query_type, &info->result_index))
goto fail_info;
info->batch = *pbq;
} else {
info->query_type = query_type;
info->result_index = result_index;
}
hud_pane_add_graph(pane, gr);
if (pane->max_value < max_value)
hud_pane_set_max_value(pane, max_value);
pane->type = type;
return;
fail_info:
FREE(info);
fail_gr:
FREE(gr);
}
boolean
hud_driver_query_install(struct hud_pane *pane, struct pipe_context *pipe,
hud_driver_query_install(struct hud_batch_query_context **pbq,
struct hud_pane *pane, struct pipe_context *pipe,
const char *name)
{
struct pipe_screen *screen = pipe->screen;
@@ -220,8 +416,9 @@ hud_driver_query_install(struct hud_pane *pane, struct pipe_context *pipe,
if (!found)
return FALSE;
hud_pipe_query_install(pane, pipe, query.name, query.query_type, 0,
query.max_value.u64, query.type, query.result_type);
hud_pipe_query_install(pbq, pane, pipe, query.name, query.query_type, 0,
query.max_value.u64, query.type, query.result_type,
query.flags);
return TRUE;
}

View File

@@ -80,19 +80,26 @@ void hud_pane_set_max_value(struct hud_pane *pane, uint64_t value);
void hud_graph_add_value(struct hud_graph *gr, uint64_t value);
/* graphs/queries */
struct hud_batch_query_context;
#define ALL_CPUS ~0 /* optionally set as cpu_index */
int hud_get_num_cpus(void);
void hud_fps_graph_install(struct hud_pane *pane);
void hud_cpu_graph_install(struct hud_pane *pane, unsigned cpu_index);
void hud_pipe_query_install(struct hud_pane *pane, struct pipe_context *pipe,
void hud_pipe_query_install(struct hud_batch_query_context **pbq,
struct hud_pane *pane, struct pipe_context *pipe,
const char *name, unsigned query_type,
unsigned result_index,
uint64_t max_value,
enum pipe_driver_query_type type,
enum pipe_driver_query_result_type result_type);
boolean hud_driver_query_install(struct hud_pane *pane,
enum pipe_driver_query_result_type result_type,
unsigned flags);
boolean hud_driver_query_install(struct hud_batch_query_context **pbq,
struct hud_pane *pane,
struct pipe_context *pipe, const char *name);
void hud_batch_query_update(struct hud_batch_query_context *bq);
void hud_batch_query_cleanup(struct hud_batch_query_context **pbq);
#endif

View File

@@ -68,17 +68,18 @@ static void translate_memcpy_uint( const void *in,
* \param out_nr returns number of new vertices
* \param out_translate returns the translation function to use by the caller
*/
int u_index_translator( unsigned hw_mask,
unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv,
unsigned out_pv,
unsigned prim_restart,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate )
enum indices_mode
u_index_translator(unsigned hw_mask,
unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv,
unsigned out_pv,
unsigned prim_restart,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate)
{
unsigned in_idx;
unsigned out_idx;
@@ -204,17 +205,17 @@ int u_index_translator( unsigned hw_mask,
* \param out_nr returns new number of vertices to draw
* \param out_generate returns pointer to the generator function
*/
int u_index_generator( unsigned hw_mask,
unsigned prim,
unsigned start,
unsigned nr,
unsigned in_pv,
unsigned out_pv,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate )
enum indices_mode
u_index_generator(unsigned hw_mask,
unsigned prim,
unsigned start,
unsigned nr,
unsigned in_pv,
unsigned out_pv,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate)
{
unsigned out_idx;

View File

@@ -67,66 +67,68 @@ typedef void (*u_generate_func)( unsigned start,
/* Return codes describe the translate/generate operation. Caller may
* be able to reuse translated indices under some circumstances.
*/
#define U_TRANSLATE_ERROR -1
#define U_TRANSLATE_NORMAL 1
#define U_TRANSLATE_MEMCPY 2
#define U_GENERATE_LINEAR 3
#define U_GENERATE_REUSABLE 4
#define U_GENERATE_ONE_OFF 5
enum indices_mode {
U_TRANSLATE_ERROR = -1,
U_TRANSLATE_NORMAL = 1,
U_TRANSLATE_MEMCPY = 2,
U_GENERATE_LINEAR = 3,
U_GENERATE_REUSABLE= 4,
U_GENERATE_ONE_OFF = 5,
};
void u_index_init( void );
int u_index_translator( unsigned hw_mask,
unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv, /* API */
unsigned out_pv, /* hardware */
unsigned prim_restart,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate );
enum indices_mode
u_index_translator(unsigned hw_mask,
unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned in_pv, /* API */
unsigned out_pv, /* hardware */
unsigned prim_restart,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate);
/* Note that even when generating it is necessary to know what the
* API's PV is, as the indices generated will depend on whether it is
* the same as hardware or not, and in the case of triangle strips,
* whether it is first or last.
*/
int u_index_generator( unsigned hw_mask,
unsigned prim,
unsigned start,
unsigned nr,
unsigned in_pv, /* API */
unsigned out_pv, /* hardware */
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate );
enum indices_mode
u_index_generator(unsigned hw_mask,
unsigned prim,
unsigned start,
unsigned nr,
unsigned in_pv, /* API */
unsigned out_pv, /* hardware */
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate);
void u_unfilled_init( void );
int u_unfilled_translator( unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate );
int u_unfilled_generator( unsigned prim,
unsigned start,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate );
enum indices_mode
u_unfilled_translator(unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate);
enum indices_mode
u_unfilled_generator(unsigned prim,
unsigned start,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate);
#endif

View File

@@ -111,14 +111,15 @@ static unsigned nr_lines( unsigned prim,
int u_unfilled_translator( unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate )
enum indices_mode
u_unfilled_translator(unsigned prim,
unsigned in_index_size,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_translate_func *out_translate)
{
unsigned in_idx;
unsigned out_idx;
@@ -170,14 +171,15 @@ int u_unfilled_translator( unsigned prim,
* different front/back fill modes, that can be handled with the
* 'draw' module.
*/
int u_unfilled_generator( unsigned prim,
unsigned start,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate )
enum indices_mode
u_unfilled_generator(unsigned prim,
unsigned start,
unsigned nr,
unsigned unfilled_mode,
unsigned *out_prim,
unsigned *out_index_size,
unsigned *out_nr,
u_generate_func *out_generate)
{
unsigned out_idx;

View File

@@ -1239,6 +1239,11 @@ ttn_tex(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
op = nir_texop_tex;
num_srcs = 1;
break;
case TGSI_OPCODE_TEX2:
op = nir_texop_tex;
num_srcs = 1;
samp = 2;
break;
case TGSI_OPCODE_TXP:
op = nir_texop_tex;
num_srcs = 2;
@@ -1394,10 +1399,12 @@ ttn_tex(struct ttn_compile *c, nir_alu_dest dest, nir_ssa_def **src)
}
if (instr->is_shadow) {
if (instr->coord_components < 3)
instr->src[src_number].src = nir_src_for_ssa(ttn_channel(b, src[0], Z));
else
if (instr->coord_components == 4)
instr->src[src_number].src = nir_src_for_ssa(ttn_channel(b, src[1], X));
else if (instr->coord_components == 3)
instr->src[src_number].src = nir_src_for_ssa(ttn_channel(b, src[0], W));
else
instr->src[src_number].src = nir_src_for_ssa(ttn_channel(b, src[0], Z));
instr->src[src_number].src_type = nir_tex_src_comparitor;
src_number++;
@@ -1803,6 +1810,7 @@ ttn_emit_instruction(struct ttn_compile *c)
case TGSI_OPCODE_TXL:
case TGSI_OPCODE_TXB:
case TGSI_OPCODE_TXD:
case TGSI_OPCODE_TEX2:
case TGSI_OPCODE_TXL2:
case TGSI_OPCODE_TXB2:
case TGSI_OPCODE_TXQ_LZ:

View File

@@ -54,37 +54,48 @@ boolean
os_get_process_name(char *procname, size_t size)
{
const char *name;
/* First, check if the GALLIUM_PROCESS_NAME env var is set to
* override the normal process name query.
*/
name = os_get_option("GALLIUM_PROCESS_NAME");
if (!name) {
/* do normal query */
#if defined(PIPE_SUBSYSTEM_WINDOWS_USER)
char szProcessPath[MAX_PATH];
char *lpProcessName;
char *lpProcessExt;
char szProcessPath[MAX_PATH];
char *lpProcessName;
char *lpProcessExt;
GetModuleFileNameA(NULL, szProcessPath, Elements(szProcessPath));
GetModuleFileNameA(NULL, szProcessPath, Elements(szProcessPath));
lpProcessName = strrchr(szProcessPath, '\\');
lpProcessName = lpProcessName ? lpProcessName + 1 : szProcessPath;
lpProcessName = strrchr(szProcessPath, '\\');
lpProcessName = lpProcessName ? lpProcessName + 1 : szProcessPath;
lpProcessExt = strrchr(lpProcessName, '.');
if (lpProcessExt) {
*lpProcessExt = '\0';
}
lpProcessExt = strrchr(lpProcessName, '.');
if (lpProcessExt) {
*lpProcessExt = '\0';
}
name = lpProcessName;
name = lpProcessName;
#elif defined(__GLIBC__) || defined(__CYGWIN__)
name = program_invocation_short_name;
name = program_invocation_short_name;
#elif defined(PIPE_OS_BSD) || defined(PIPE_OS_APPLE)
/* *BSD and OS X */
name = getprogname();
/* *BSD and OS X */
name = getprogname();
#elif defined(PIPE_OS_HAIKU)
image_info info;
get_image_info(B_CURRENT_TEAM, &info);
name = info.name;
image_info info;
get_image_info(B_CURRENT_TEAM, &info);
name = info.name;
#else
#warning unexpected platform in os_process.c
return FALSE;
return FALSE;
#endif
}
assert(size > 0);
assert(procname);

View File

@@ -0,0 +1,49 @@
# Mesa 3-D graphics library
#
# Copyright (C) 2015 Emil Velikov <emil.l.velikov@gmail.com>
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
# NOTE: Currently we build only a 'static' pipe-loader
LOCAL_PATH := $(call my-dir)
# get COMMON_SOURCES and DRM_SOURCES
include $(LOCAL_PATH)/Makefile.sources
include $(CLEAR_VARS)
LOCAL_CFLAGS := \
-DHAVE_PIPE_LOADER_DRI \
-DDROP_PIPE_LOADER_MISC \
-DGALLIUM_STATIC_TARGETS
LOCAL_SRC_FILES := $(COMMON_SOURCES)
LOCAL_MODULE := libmesa_pipe_loader
ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
LOCAL_CFLAGS += -DHAVE_LIBDRM
LOCAL_SRC_FILES += $(DRM_SOURCES)
LOCAL_SHARED_LIBRARIES := libdrm
LOCAL_STATIC_LIBRARIES := libmesa_loader
endif
include $(GALLIUM_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)

View File

@@ -9,20 +9,40 @@ AM_CFLAGS = \
$(GALLIUM_CFLAGS) \
$(VISIBILITY_CFLAGS)
noinst_LTLIBRARIES = libpipe_loader.la
noinst_LTLIBRARIES = \
libpipe_loader_static.la \
libpipe_loader_dynamic.la
libpipe_loader_la_SOURCES = \
libpipe_loader_static_la_CFLAGS = \
$(AM_CFLAGS) \
-DGALLIUM_STATIC_TARGETS=1
libpipe_loader_dynamic_la_CFLAGS = \
$(AM_CFLAGS) \
-DPIPE_SEARCH_DIR=\"$(libdir)/gallium-pipe\"
libpipe_loader_static_la_SOURCES = \
$(COMMON_SOURCES)
if HAVE_DRM_LOADER_GALLIUM
libpipe_loader_dynamic_la_SOURCES = \
$(COMMON_SOURCES)
if HAVE_LIBDRM
AM_CFLAGS += \
$(LIBDRM_CFLAGS)
libpipe_loader_la_SOURCES += \
libpipe_loader_static_la_SOURCES += \
$(DRM_SOURCES)
libpipe_loader_la_LIBADD = \
$(top_builddir)/src/loader/libloader.la
libpipe_loader_dynamic_la_SOURCES += \
$(DRM_SOURCES)
endif
libpipe_loader_static_la_LIBADD = \
$(top_builddir)/src/loader/libloader.la
libpipe_loader_dynamic_la_LIBADD = \
$(top_builddir)/src/loader/libloader.la
EXTRA_DIST = SConscript

View File

@@ -0,0 +1,34 @@
Import('*')
env = env.Clone()
env.MSVC2008Compat()
env.Append(CPPPATH = [
'#/src/loader',
'#/src/gallium/winsys',
])
env.Append(CPPDEFINES = [
('HAVE_PIPE_LOADER_DRI', '1'),
('DROP_PIPE_LOADER_MISC', '1'),
('GALLIUM_STATIC_TARGETS', '1'),
])
source = env.ParseSourceList('Makefile.sources', 'COMMON_SOURCES')
#if HAVE_LIBDRM
source += env.ParseSourceList('Makefile.sources', 'DRM_SOURCES')
env.PkgUseModules('DRM')
env.Append(LIBS = [libloader])
#endif
pipe_loader = env.ConvenienceLibrary(
target = 'pipe_loader',
source = source,
)
env.Alias('pipe_loader', pipe_loader)
Export('pipe_loader')

View File

@@ -35,7 +35,7 @@
#define MODULE_PREFIX "pipe_"
static int (*backends[])(struct pipe_loader_device **, int) = {
#ifdef HAVE_PIPE_LOADER_DRM
#ifdef HAVE_LIBDRM
&pipe_loader_drm_probe,
#endif
&pipe_loader_sw_probe
@@ -69,10 +69,9 @@ pipe_loader_configuration(struct pipe_loader_device *dev,
}
struct pipe_screen *
pipe_loader_create_screen(struct pipe_loader_device *dev,
const char *library_paths)
pipe_loader_create_screen(struct pipe_loader_device *dev)
{
return dev->ops->create_screen(dev, library_paths);
return dev->ops->create_screen(dev);
}
struct util_dl_library *

View File

@@ -82,13 +82,9 @@ pipe_loader_probe(struct pipe_loader_device **devs, int ndev);
* Create a pipe_screen for the specified device.
*
* \param dev Device the screen will be created for.
* \param library_paths Colon-separated list of filesystem paths that
* will be used to look for the pipe driver
* module that handles this device.
*/
struct pipe_screen *
pipe_loader_create_screen(struct pipe_loader_device *dev,
const char *library_paths);
pipe_loader_create_screen(struct pipe_loader_device *dev);
/**
* Query the configuration parameters for the specified device.
@@ -112,8 +108,6 @@ pipe_loader_configuration(struct pipe_loader_device *dev,
void
pipe_loader_release(struct pipe_loader_device **devs, int ndev);
#ifdef HAVE_PIPE_LOADER_DRI
/**
* Initialize sw dri device give the drisw_loader_funcs.
*
@@ -125,7 +119,15 @@ bool
pipe_loader_sw_probe_dri(struct pipe_loader_device **devs,
struct drisw_loader_funcs *drisw_lf);
#endif
/**
* Initialize a kms backed sw device given an fd.
*
* This function is platform-specific.
*
* \sa pipe_loader_probe
*/
bool
pipe_loader_sw_probe_kms(struct pipe_loader_device **devs, int fd);
/**
* Initialize a null sw device.
@@ -158,8 +160,6 @@ boolean
pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev,
struct pipe_screen *screen);
#ifdef HAVE_PIPE_LOADER_DRM
/**
* Get a list of known DRM devices.
*
@@ -180,8 +180,6 @@ pipe_loader_drm_probe(struct pipe_loader_device **devs, int ndev);
bool
pipe_loader_drm_probe_fd(struct pipe_loader_device **dev, int fd);
#endif
#ifdef __cplusplus
}
#endif

View File

@@ -36,6 +36,7 @@
#include <unistd.h>
#include "loader.h"
#include "target-helpers/drm_helper_public.h"
#include "state_tracker/drm_driver.h"
#include "pipe_loader_priv.h"
@@ -50,13 +51,123 @@
struct pipe_loader_drm_device {
struct pipe_loader_device base;
const struct drm_driver_descriptor *dd;
#ifndef GALLIUM_STATIC_TARGETS
struct util_dl_library *lib;
#endif
int fd;
};
#define pipe_loader_drm_device(dev) ((struct pipe_loader_drm_device *)dev)
static struct pipe_loader_ops pipe_loader_drm_ops;
static const struct pipe_loader_ops pipe_loader_drm_ops;
#ifdef GALLIUM_STATIC_TARGETS
static const struct drm_conf_ret throttle_ret = {
DRM_CONF_INT,
{2},
};
static const struct drm_conf_ret share_fd_ret = {
DRM_CONF_BOOL,
{true},
};
static inline const struct drm_conf_ret *
configuration_query(enum drm_conf conf)
{
switch (conf) {
case DRM_CONF_THROTTLE:
return &throttle_ret;
case DRM_CONF_SHARE_FD:
return &share_fd_ret;
default:
break;
}
return NULL;
}
static const struct drm_driver_descriptor driver_descriptors[] = {
{
.name = "i915",
.driver_name = "i915",
.create_screen = pipe_i915_create_screen,
.configuration = configuration_query,
},
#ifdef USE_VC4_SIMULATOR
/* VC4 simulator and ILO (i965) are mutually exclusive (error at
* configure). As the latter is unconditionally added, keep this one above
* it.
*/
{
.name = "i965",
.driver_name = "vc4",
.create_screen = pipe_vc4_create_screen,
.configuration = configuration_query,
},
#endif
{
.name = "i965",
.driver_name = "i915",
.create_screen = pipe_ilo_create_screen,
.configuration = configuration_query,
},
{
.name = "nouveau",
.driver_name = "nouveau",
.create_screen = pipe_nouveau_create_screen,
.configuration = configuration_query,
},
{
.name = "r300",
.driver_name = "radeon",
.create_screen = pipe_r300_create_screen,
.configuration = configuration_query,
},
{
.name = "r600",
.driver_name = "radeon",
.create_screen = pipe_r600_create_screen,
.configuration = configuration_query,
},
{
.name = "radeonsi",
.driver_name = "radeon",
.create_screen = pipe_radeonsi_create_screen,
.configuration = configuration_query,
},
{
.name = "vmwgfx",
.driver_name = "vmwgfx",
.create_screen = pipe_vmwgfx_create_screen,
.configuration = configuration_query,
},
{
.name = "kgsl",
.driver_name = "freedreno",
.create_screen = pipe_freedreno_create_screen,
.configuration = configuration_query,
},
{
.name = "msm",
.driver_name = "freedreno",
.create_screen = pipe_freedreno_create_screen,
.configuration = configuration_query,
},
{
.name = "virtio_gpu",
.driver_name = "virtio-gpu",
.create_screen = pipe_virgl_create_screen,
.configuration = configuration_query,
},
{
.name = "vc4",
.driver_name = "vc4",
.create_screen = pipe_vc4_create_screen,
.configuration = configuration_query,
},
};
#endif
bool
pipe_loader_drm_probe_fd(struct pipe_loader_device **dev, int fd)
@@ -81,10 +192,36 @@ pipe_loader_drm_probe_fd(struct pipe_loader_device **dev, int fd)
if (!ddev->base.driver_name)
goto fail;
#ifdef GALLIUM_STATIC_TARGETS
for (int i = 0; i < ARRAY_SIZE(driver_descriptors); i++) {
if (strcmp(driver_descriptors[i].name, ddev->base.driver_name) == 0) {
ddev->dd = &driver_descriptors[i];
break;
}
}
if (!ddev->dd)
goto fail;
#else
ddev->lib = pipe_loader_find_module(&ddev->base, PIPE_SEARCH_DIR);
if (!ddev->lib)
goto fail;
ddev->dd = (const struct drm_driver_descriptor *)
util_dl_get_proc_address(ddev->lib, "driver_descriptor");
/* sanity check on the name */
if (!ddev->dd || strcmp(ddev->dd->name, ddev->base.driver_name) != 0)
goto fail;
#endif
*dev = &ddev->base;
return true;
fail:
#ifndef GALLIUM_STATIC_TARGETS
if (ddev->lib)
util_dl_close(ddev->lib);
#endif
FREE(ddev);
return false;
}
@@ -105,8 +242,9 @@ pipe_loader_drm_probe(struct pipe_loader_device **devs, int ndev)
for (i = DRM_RENDER_NODE_MIN_MINOR, j = 0;
i <= DRM_RENDER_NODE_MAX_MINOR; i++) {
fd = open_drm_render_node_minor(i);
struct pipe_loader_device *dev;
fd = open_drm_render_node_minor(i);
if (fd < 0)
continue;
@@ -132,8 +270,10 @@ pipe_loader_drm_release(struct pipe_loader_device **dev)
{
struct pipe_loader_drm_device *ddev = pipe_loader_drm_device(*dev);
#ifndef GALLIUM_STATIC_TARGETS
if (ddev->lib)
util_dl_close(ddev->lib);
#endif
close(ddev->fd);
FREE(ddev->base.driver_name);
@@ -146,47 +286,22 @@ pipe_loader_drm_configuration(struct pipe_loader_device *dev,
enum drm_conf conf)
{
struct pipe_loader_drm_device *ddev = pipe_loader_drm_device(dev);
const struct drm_driver_descriptor *dd;
if (!ddev->lib)
if (!ddev->dd->configuration)
return NULL;
dd = (const struct drm_driver_descriptor *)
util_dl_get_proc_address(ddev->lib, "driver_descriptor");
/* sanity check on the name */
if (!dd || strcmp(dd->name, ddev->base.driver_name) != 0)
return NULL;
if (!dd->configuration)
return NULL;
return dd->configuration(conf);
return ddev->dd->configuration(conf);
}
static struct pipe_screen *
pipe_loader_drm_create_screen(struct pipe_loader_device *dev,
const char *library_paths)
pipe_loader_drm_create_screen(struct pipe_loader_device *dev)
{
struct pipe_loader_drm_device *ddev = pipe_loader_drm_device(dev);
const struct drm_driver_descriptor *dd;
if (!ddev->lib)
ddev->lib = pipe_loader_find_module(dev, library_paths);
if (!ddev->lib)
return NULL;
dd = (const struct drm_driver_descriptor *)
util_dl_get_proc_address(ddev->lib, "driver_descriptor");
/* sanity check on the name */
if (!dd || strcmp(dd->name, ddev->base.driver_name) != 0)
return NULL;
return dd->create_screen(ddev->fd);
return ddev->dd->create_screen(ddev->fd);
}
static struct pipe_loader_ops pipe_loader_drm_ops = {
static const struct pipe_loader_ops pipe_loader_drm_ops = {
.create_screen = pipe_loader_drm_create_screen,
.configuration = pipe_loader_drm_configuration,
.release = pipe_loader_drm_release

View File

@@ -31,8 +31,7 @@
#include "pipe_loader.h"
struct pipe_loader_ops {
struct pipe_screen *(*create_screen)(struct pipe_loader_device *dev,
const char *library_paths);
struct pipe_screen *(*create_screen)(struct pipe_loader_device *dev);
const struct drm_conf_ret *(*configuration)(struct pipe_loader_device *dev,
enum drm_conf conf);

View File

@@ -30,45 +30,161 @@
#include "util/u_memory.h"
#include "util/u_dl.h"
#include "sw/dri/dri_sw_winsys.h"
#include "sw/kms-dri/kms_dri_sw_winsys.h"
#include "sw/null/null_sw_winsys.h"
#include "sw/wrapper/wrapper_sw_winsys.h"
#include "target-helpers/inline_sw_helper.h"
#include "target-helpers/sw_helper_public.h"
#include "state_tracker/drisw_api.h"
#include "state_tracker/sw_driver.h"
#include "state_tracker/sw_winsys.h"
struct pipe_loader_sw_device {
struct pipe_loader_device base;
const struct sw_driver_descriptor *dd;
#ifndef GALLIUM_STATIC_TARGETS
struct util_dl_library *lib;
#endif
struct sw_winsys *ws;
};
#define pipe_loader_sw_device(dev) ((struct pipe_loader_sw_device *)dev)
static struct pipe_loader_ops pipe_loader_sw_ops;
static const struct pipe_loader_ops pipe_loader_sw_ops;
static struct sw_winsys *(*backends[])() = {
null_sw_create
#ifdef GALLIUM_STATIC_TARGETS
static const struct sw_driver_descriptor driver_descriptors = {
.create_screen = sw_screen_create,
.winsys = {
#ifdef HAVE_PIPE_LOADER_DRI
{
.name = "dri",
.create_winsys = dri_create_sw_winsys,
},
#endif
#ifdef HAVE_PIPE_LOADER_KMS
{
.name = "kms_dri",
.create_winsys = kms_dri_create_winsys,
},
#endif
/**
* XXX: Do not include these two for non autotools builds.
* They don't have neither opencl nor nine, where these are used.
*/
#ifndef DROP_PIPE_LOADER_MISC
{
.name = "null",
.create_winsys = null_sw_create,
},
{
.name = "wrapped",
.create_winsys = wrapper_sw_winsys_wrap_pipe_screen,
},
#endif
{ 0 },
}
};
#endif
static bool
pipe_loader_sw_probe_init_common(struct pipe_loader_sw_device *sdev)
{
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
#ifdef GALLIUM_STATIC_TARGETS
sdev->dd = &driver_descriptors;
if (!sdev->dd)
return false;
#else
sdev->lib = pipe_loader_find_module(&sdev->base, PIPE_SEARCH_DIR);
if (!sdev->lib)
return false;
sdev->dd = (const struct sw_driver_descriptor *)
util_dl_get_proc_address(sdev->lib, "swrast_driver_descriptor");
if (!sdev->dd){
util_dl_close(sdev->lib);
sdev->lib = NULL;
return false;
}
#endif
return true;
}
static void
pipe_loader_sw_probe_teardown_common(struct pipe_loader_sw_device *sdev)
{
#ifndef GALLIUM_STATIC_TARGETS
if (sdev->lib)
util_dl_close(sdev->lib);
#endif
}
#ifdef HAVE_PIPE_LOADER_DRI
bool
pipe_loader_sw_probe_dri(struct pipe_loader_device **devs, struct drisw_loader_funcs *drisw_lf)
{
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
int i;
if (!sdev)
return false;
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
sdev->ws = dri_create_sw_winsys(drisw_lf);
if (!sdev->ws) {
FREE(sdev);
return false;
}
*devs = &sdev->base;
if (!pipe_loader_sw_probe_init_common(sdev))
goto fail;
for (i = 0; sdev->dd->winsys[i].name; i++) {
if (strcmp(sdev->dd->winsys[i].name, "dri") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(drisw_lf);
break;
}
}
if (!sdev->ws)
goto fail;
*devs = &sdev->base;
return true;
fail:
pipe_loader_sw_probe_teardown_common(sdev);
FREE(sdev);
return false;
}
#endif
#ifdef HAVE_PIPE_LOADER_KMS
bool
pipe_loader_sw_probe_kms(struct pipe_loader_device **devs, int fd)
{
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
int i;
if (!sdev)
return false;
if (!pipe_loader_sw_probe_init_common(sdev))
goto fail;
for (i = 0; sdev->dd->winsys[i].name; i++) {
if (strcmp(sdev->dd->winsys[i].name, "kms_dri") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(fd);
break;
}
}
if (!sdev->ws)
goto fail;
*devs = &sdev->base;
return true;
fail:
pipe_loader_sw_probe_teardown_common(sdev);
FREE(sdev);
return false;
}
#endif
@@ -76,38 +192,40 @@ bool
pipe_loader_sw_probe_null(struct pipe_loader_device **devs)
{
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
int i;
if (!sdev)
return false;
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
sdev->ws = null_sw_create();
if (!sdev->ws) {
FREE(sdev);
return false;
}
*devs = &sdev->base;
if (!pipe_loader_sw_probe_init_common(sdev))
goto fail;
for (i = 0; sdev->dd->winsys[i].name; i++) {
if (strcmp(sdev->dd->winsys[i].name, "null") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys();
break;
}
}
if (!sdev->ws)
goto fail;
*devs = &sdev->base;
return true;
fail:
pipe_loader_sw_probe_teardown_common(sdev);
FREE(sdev);
return false;
}
int
pipe_loader_sw_probe(struct pipe_loader_device **devs, int ndev)
{
int i;
int i = 1;
for (i = 0; i < Elements(backends); i++) {
if (i < ndev) {
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
/* TODO: handle CALLOC_STRUCT failure */
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
sdev->ws = backends[i]();
devs[i] = &sdev->base;
if (i <= ndev) {
if (!pipe_loader_sw_probe_null(devs)) {
i--;
}
}
@@ -119,21 +237,30 @@ pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev,
struct pipe_screen *screen)
{
struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device);
int i;
if (!sdev)
return false;
sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
sdev->base.driver_name = "swrast";
sdev->base.ops = &pipe_loader_sw_ops;
sdev->ws = wrapper_sw_winsys_wrap_pipe_screen(screen);
if (!pipe_loader_sw_probe_init_common(sdev))
goto fail;
if (!sdev->ws) {
FREE(sdev);
return false;
for (i = 0; sdev->dd->winsys[i].name; i++) {
if (strcmp(sdev->dd->winsys[i].name, "wrapped") == 0) {
sdev->ws = sdev->dd->winsys[i].create_winsys(screen);
break;
}
}
if (!sdev->ws)
goto fail;
*dev = &sdev->base;
return true;
fail:
pipe_loader_sw_probe_teardown_common(sdev);
FREE(sdev);
return false;
}
static void
@@ -141,8 +268,10 @@ pipe_loader_sw_release(struct pipe_loader_device **dev)
{
struct pipe_loader_sw_device *sdev = pipe_loader_sw_device(*dev);
#ifndef GALLIUM_STATIC_TARGETS
if (sdev->lib)
util_dl_close(sdev->lib);
#endif
FREE(sdev);
*dev = NULL;
@@ -156,28 +285,19 @@ pipe_loader_sw_configuration(struct pipe_loader_device *dev,
}
static struct pipe_screen *
pipe_loader_sw_create_screen(struct pipe_loader_device *dev,
const char *library_paths)
pipe_loader_sw_create_screen(struct pipe_loader_device *dev)
{
struct pipe_loader_sw_device *sdev = pipe_loader_sw_device(dev);
struct pipe_screen *(*init)(struct sw_winsys *);
struct pipe_screen *screen;
if (!sdev->lib)
sdev->lib = pipe_loader_find_module(dev, library_paths);
if (!sdev->lib)
return NULL;
screen = sdev->dd->create_screen(sdev->ws);
if (!screen)
sdev->ws->destroy(sdev->ws);
init = (void *)util_dl_get_proc_address(sdev->lib, "swrast_create_screen");
if (!init){
util_dl_close(sdev->lib);
sdev->lib = NULL;
return NULL;
}
return init(sdev->ws);
return screen;
}
static struct pipe_loader_ops pipe_loader_sw_ops = {
static const struct pipe_loader_ops pipe_loader_sw_ops = {
.create_screen = pipe_loader_sw_create_screen,
.configuration = pipe_loader_sw_configuration,
.release = pipe_loader_sw_release

View File

@@ -0,0 +1,275 @@
#ifndef DRM_HELPER_H
#define DRM_HELPER_H
#include <stdio.h>
#include "target-helpers/inline_debug_helper.h"
#include "target-helpers/drm_helper_public.h"
#ifdef GALLIUM_I915
#include "i915/drm/i915_drm_public.h"
#include "i915/i915_public.h"
struct pipe_screen *
pipe_i915_create_screen(int fd)
{
struct i915_winsys *iws;
struct pipe_screen *screen;
iws = i915_drm_winsys_create(fd);
if (!iws)
return NULL;
screen = i915_screen_create(iws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_i915_create_screen(int fd)
{
fprintf(stderr, "i915g: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_ILO
#include "intel/drm/intel_drm_public.h"
#include "ilo/ilo_public.h"
struct pipe_screen *
pipe_ilo_create_screen(int fd)
{
struct intel_winsys *iws;
struct pipe_screen *screen;
iws = intel_winsys_create_for_fd(fd);
if (!iws)
return NULL;
screen = ilo_screen_create(iws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_ilo_create_screen(int fd)
{
fprintf(stderr, "ilo: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_NOUVEAU
#include "nouveau/drm/nouveau_drm_public.h"
struct pipe_screen *
pipe_nouveau_create_screen(int fd)
{
struct pipe_screen *screen;
screen = nouveau_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_nouveau_create_screen(int fd)
{
fprintf(stderr, "nouveau: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_R300
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "r300/r300_public.h"
struct pipe_screen *
pipe_r300_create_screen(int fd)
{
struct radeon_winsys *rw;
rw = radeon_drm_winsys_create(fd, r300_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#else
struct pipe_screen *
pipe_r300_create_screen(int fd)
{
fprintf(stderr, "r300: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_R600
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "r600/r600_public.h"
struct pipe_screen *
pipe_r600_create_screen(int fd)
{
struct radeon_winsys *rw;
rw = radeon_drm_winsys_create(fd, r600_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#else
struct pipe_screen *
pipe_r600_create_screen(int fd)
{
fprintf(stderr, "r600: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_RADEONSI
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "amdgpu/drm/amdgpu_public.h"
#include "radeonsi/si_public.h"
struct pipe_screen *
pipe_radeonsi_create_screen(int fd)
{
struct radeon_winsys *rw;
/* First, try amdgpu. */
rw = amdgpu_winsys_create(fd, radeonsi_screen_create);
if (!rw)
rw = radeon_drm_winsys_create(fd, radeonsi_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#else
struct pipe_screen *
pipe_radeonsi_create_screen(int fd)
{
fprintf(stderr, "radeonsi: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_VMWGFX
#include "svga/drm/svga_drm_public.h"
#include "svga/svga_public.h"
struct pipe_screen *
pipe_vmwgfx_create_screen(int fd)
{
struct svga_winsys_screen *sws;
struct pipe_screen *screen;
sws = svga_drm_winsys_screen_create(fd);
if (!sws)
return NULL;
screen = svga_screen_create(sws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_vmwgfx_create_screen(int fd)
{
fprintf(stderr, "svga: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_FREEDRENO
#include "freedreno/drm/freedreno_drm_public.h"
struct pipe_screen *
pipe_freedreno_create_screen(int fd)
{
struct pipe_screen *screen;
screen = fd_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_freedreno_create_screen(int fd)
{
fprintf(stderr, "freedreno: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_VIRGL
#include "virgl/drm/virgl_drm_public.h"
#include "virgl/virgl_public.h"
struct pipe_screen *
pipe_virgl_create_screen(int fd)
{
struct virgl_winsys *vws;
struct pipe_screen *screen;
vws = virgl_drm_winsys_create(fd);
if (!vws)
return NULL;
screen = virgl_create_screen(vws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_virgl_create_screen(int fd)
{
fprintf(stderr, "virgl: driver missing\n");
return NULL;
}
#endif
#ifdef GALLIUM_VC4
#include "vc4/drm/vc4_drm_public.h"
struct pipe_screen *
pipe_vc4_create_screen(int fd)
{
struct pipe_screen *screen;
screen = vc4_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#else
struct pipe_screen *
pipe_vc4_create_screen(int fd)
{
fprintf(stderr, "vc4: driver missing\n");
return NULL;
}
#endif
#endif /* DRM_HELPER_H */

View File

@@ -0,0 +1,37 @@
#ifndef _DRM_HELPER_PUBLIC_H
#define _DRM_HELPER_PUBLIC_H
struct pipe_screen;
struct pipe_screen *
pipe_i915_create_screen(int fd);
struct pipe_screen *
pipe_ilo_create_screen(int fd);
struct pipe_screen *
pipe_nouveau_create_screen(int fd);
struct pipe_screen *
pipe_r300_create_screen(int fd);
struct pipe_screen *
pipe_r600_create_screen(int fd);
struct pipe_screen *
pipe_radeonsi_create_screen(int fd);
struct pipe_screen *
pipe_vmwgfx_create_screen(int fd);
struct pipe_screen *
pipe_freedreno_create_screen(int fd);
struct pipe_screen *
pipe_virgl_create_screen(int fd);
struct pipe_screen *
pipe_vc4_create_screen(int fd);
#endif /* _DRM_HELPER_PUBLIC_H */

View File

@@ -1,489 +0,0 @@
#ifndef INLINE_DRM_HELPER_H
#define INLINE_DRM_HELPER_H
#include "state_tracker/drm_driver.h"
#include "target-helpers/inline_debug_helper.h"
#include "loader.h"
#if defined(DRI_TARGET)
#include "dri_screen.h"
#endif
#if GALLIUM_SOFTPIPE
#include "target-helpers/inline_sw_helper.h"
#include "sw/kms-dri/kms_dri_sw_winsys.h"
#endif
#if GALLIUM_I915
#include "i915/drm/i915_drm_public.h"
#include "i915/i915_public.h"
#endif
#if GALLIUM_ILO
#include "intel/drm/intel_drm_public.h"
#include "ilo/ilo_public.h"
#endif
#if GALLIUM_NOUVEAU
#include "nouveau/drm/nouveau_drm_public.h"
#endif
#if GALLIUM_R300
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "r300/r300_public.h"
#endif
#if GALLIUM_R600
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "r600/r600_public.h"
#endif
#if GALLIUM_RADEONSI
#include "radeon/radeon_winsys.h"
#include "radeon/drm/radeon_drm_public.h"
#include "amdgpu/drm/amdgpu_public.h"
#include "radeonsi/si_public.h"
#endif
#if GALLIUM_VMWGFX
#include "svga/drm/svga_drm_public.h"
#include "svga/svga_public.h"
#endif
#if GALLIUM_FREEDRENO
#include "freedreno/drm/freedreno_drm_public.h"
#endif
#if GALLIUM_VC4
#include "vc4/drm/vc4_drm_public.h"
#endif
static char* driver_name = NULL;
/* XXX: We need to teardown the winsys if *screen_create() fails. */
#if defined(GALLIUM_SOFTPIPE)
#if defined(DRI_TARGET)
#if defined(HAVE_LIBDRM)
const __DRIextension **__driDriverGetExtensions_kms_swrast(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_kms_swrast(void)
{
globalDriverAPI = &dri_kms_driver_api;
return galliumdrm_driver_extensions;
}
struct pipe_screen *
kms_swrast_create_screen(int fd)
{
struct sw_winsys *sws;
struct pipe_screen *screen;
sws = kms_dri_create_winsys(fd);
if (!sws)
return NULL;
screen = sw_screen_create(sws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#endif
#endif
#if defined(GALLIUM_I915)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_i915(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_i915(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_i915_create_screen(int fd)
{
struct i915_winsys *iws;
struct pipe_screen *screen;
iws = i915_drm_winsys_create(fd);
if (!iws)
return NULL;
screen = i915_screen_create(iws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_ILO)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_i965(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_ilo_create_screen(int fd)
{
struct intel_winsys *iws;
struct pipe_screen *screen;
iws = intel_winsys_create_for_fd(fd);
if (!iws)
return NULL;
screen = ilo_screen_create(iws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_NOUVEAU)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_nouveau(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_nouveau(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_nouveau_create_screen(int fd)
{
struct pipe_screen *screen;
screen = nouveau_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_R300)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_r300(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_r300(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_r300_create_screen(int fd)
{
struct radeon_winsys *rw;
rw = radeon_drm_winsys_create(fd, r300_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#endif
#if defined(GALLIUM_R600)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_r600(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_r600(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_r600_create_screen(int fd)
{
struct radeon_winsys *rw;
rw = radeon_drm_winsys_create(fd, r600_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#endif
#if defined(GALLIUM_RADEONSI)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_radeonsi(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_radeonsi(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_radeonsi_create_screen(int fd)
{
struct radeon_winsys *rw;
/* First, try amdgpu. */
rw = amdgpu_winsys_create(fd, radeonsi_screen_create);
if (!rw)
rw = radeon_drm_winsys_create(fd, radeonsi_screen_create);
return rw ? debug_screen_wrap(rw->screen) : NULL;
}
#endif
#if defined(GALLIUM_VMWGFX)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_vmwgfx(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_vmwgfx(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_vmwgfx_create_screen(int fd)
{
struct svga_winsys_screen *sws;
struct pipe_screen *screen;
sws = svga_drm_winsys_screen_create(fd);
if (!sws)
return NULL;
screen = svga_screen_create(sws);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_FREEDRENO)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_msm(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_msm(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
const __DRIextension **__driDriverGetExtensions_kgsl(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_kgsl(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
static struct pipe_screen *
pipe_freedreno_create_screen(int fd)
{
struct pipe_screen *screen;
screen = fd_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
#if defined(GALLIUM_VC4)
#if defined(DRI_TARGET)
const __DRIextension **__driDriverGetExtensions_vc4(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_vc4(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#if defined(USE_VC4_SIMULATOR)
const __DRIextension **__driDriverGetExtensions_i965(void);
/**
* When building using the simulator (on x86), we advertise ourselves as the
* i965 driver so that you can just make a directory with a link from
* i965_dri.so to the built vc4_dri.so, and point LIBGL_DRIVERS_PATH to that
* on your i965-using host to run the driver under simulation.
*
* This is, of course, incompatible with building with the ilo driver, but you
* shouldn't be building that anyway.
*/
PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void)
{
globalDriverAPI = &galliumdrm_driver_api;
return galliumdrm_driver_extensions;
}
#endif
#endif
static struct pipe_screen *
pipe_vc4_create_screen(int fd)
{
struct pipe_screen *screen;
screen = vc4_drm_screen_create(fd);
return screen ? debug_screen_wrap(screen) : NULL;
}
#endif
inline struct pipe_screen *
dd_create_screen(int fd)
{
driver_name = loader_get_driver_for_fd(fd, _LOADER_GALLIUM);
if (!driver_name)
return NULL;
#if defined(GALLIUM_I915)
if (strcmp(driver_name, "i915") == 0)
return pipe_i915_create_screen(fd);
else
#endif
#if defined(GALLIUM_ILO)
if (strcmp(driver_name, "i965") == 0)
return pipe_ilo_create_screen(fd);
else
#endif
#if defined(GALLIUM_NOUVEAU)
if (strcmp(driver_name, "nouveau") == 0)
return pipe_nouveau_create_screen(fd);
else
#endif
#if defined(GALLIUM_R300)
if (strcmp(driver_name, "r300") == 0)
return pipe_r300_create_screen(fd);
else
#endif
#if defined(GALLIUM_R600)
if (strcmp(driver_name, "r600") == 0)
return pipe_r600_create_screen(fd);
else
#endif
#if defined(GALLIUM_RADEONSI)
if (strcmp(driver_name, "radeonsi") == 0)
return pipe_radeonsi_create_screen(fd);
else
#endif
#if defined(GALLIUM_VMWGFX)
if (strcmp(driver_name, "vmwgfx") == 0)
return pipe_vmwgfx_create_screen(fd);
else
#endif
#if defined(GALLIUM_FREEDRENO)
if ((strcmp(driver_name, "kgsl") == 0) || (strcmp(driver_name, "msm") == 0))
return pipe_freedreno_create_screen(fd);
else
#endif
#if defined(GALLIUM_VC4)
if (strcmp(driver_name, "vc4") == 0)
return pipe_vc4_create_screen(fd);
else
#if defined(USE_VC4_SIMULATOR)
if (strcmp(driver_name, "i965") == 0)
return pipe_vc4_create_screen(fd);
else
#endif
#endif
return NULL;
}
inline const char *
dd_driver_name(void)
{
return driver_name;
}
static const struct drm_conf_ret throttle_ret = {
DRM_CONF_INT,
{2},
};
static const struct drm_conf_ret share_fd_ret = {
DRM_CONF_BOOL,
{true},
};
static inline const struct drm_conf_ret *
configuration_query(enum drm_conf conf)
{
switch (conf) {
case DRM_CONF_THROTTLE:
return &throttle_ret;
case DRM_CONF_SHARE_FD:
return &share_fd_ret;
default:
break;
}
return NULL;
}
inline const struct drm_conf_ret *
dd_configuration(enum drm_conf conf)
{
if (!driver_name)
return NULL;
#if defined(GALLIUM_I915)
if (strcmp(driver_name, "i915") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_ILO)
if (strcmp(driver_name, "i965") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_NOUVEAU)
if (strcmp(driver_name, "nouveau") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_R300)
if (strcmp(driver_name, "r300") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_R600)
if (strcmp(driver_name, "r600") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_RADEONSI)
if (strcmp(driver_name, "radeonsi") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_VMWGFX)
if (strcmp(driver_name, "vmwgfx") == 0)
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_FREEDRENO)
if ((strcmp(driver_name, "kgsl") == 0) || (strcmp(driver_name, "msm") == 0))
return configuration_query(conf);
else
#endif
#if defined(GALLIUM_VC4)
if (strcmp(driver_name, "vc4") == 0)
return configuration_query(conf);
else
#if defined(USE_VC4_SIMULATOR)
if (strcmp(driver_name, "i965") == 0)
return configuration_query(conf);
else
#endif
#endif
return NULL;
}
#endif /* INLINE_DRM_HELPER_H */

View File

@@ -19,6 +19,10 @@
#include "llvmpipe/lp_public.h"
#endif
#ifdef GALLIUM_VIRGL
#include "virgl/virgl_public.h"
#include "virgl/vtest/virgl_vtest_public.h"
#endif
static inline struct pipe_screen *
sw_screen_create_named(struct sw_winsys *winsys, const char *driver)
@@ -30,6 +34,14 @@ sw_screen_create_named(struct sw_winsys *winsys, const char *driver)
screen = llvmpipe_create_screen(winsys);
#endif
#if defined(GALLIUM_VIRGL)
if (screen == NULL && strcmp(driver, "virpipe") == 0) {
struct virgl_winsys *vws;
vws = virgl_vtest_winsys_wrap(winsys);
screen = virgl_create_screen(vws);
}
#endif
#if defined(GALLIUM_SOFTPIPE)
if (screen == NULL)
screen = softpipe_create_screen(winsys);
@@ -57,69 +69,4 @@ sw_screen_create(struct sw_winsys *winsys)
return sw_screen_create_named(winsys, driver);
}
#if defined(GALLIUM_SOFTPIPE)
#if defined(DRI_TARGET)
#include "target-helpers/inline_debug_helper.h"
#include "sw/dri/dri_sw_winsys.h"
#include "dri_screen.h"
const __DRIextension **__driDriverGetExtensions_swrast(void);
PUBLIC const __DRIextension **__driDriverGetExtensions_swrast(void)
{
globalDriverAPI = &galliumsw_driver_api;
return galliumsw_driver_extensions;
}
inline struct pipe_screen *
drisw_create_screen(struct drisw_loader_funcs *lf)
{
struct sw_winsys *winsys = NULL;
struct pipe_screen *screen = NULL;
winsys = dri_create_sw_winsys(lf);
if (winsys == NULL)
return NULL;
screen = sw_screen_create(winsys);
if (screen == NULL) {
winsys->destroy(winsys);
return NULL;
}
screen = debug_screen_wrap(screen);
return screen;
}
#endif // DRI_TARGET
#if defined(NINE_TARGET)
#include "sw/wrapper/wrapper_sw_winsys.h"
#include "target-helpers/inline_debug_helper.h"
extern struct pipe_screen *ninesw_create_screen(struct pipe_screen *screen);
inline struct pipe_screen *
ninesw_create_screen(struct pipe_screen *pscreen)
{
struct sw_winsys *winsys = NULL;
struct pipe_screen *screen = NULL;
winsys = wrapper_sw_winsys_wrap_pipe_screen(pscreen);
if (winsys == NULL)
return NULL;
screen = sw_screen_create(winsys);
if (screen == NULL) {
winsys->destroy(winsys);
return NULL;
}
screen = debug_screen_wrap(screen);
return screen;
}
#endif // NINE_TARGET
#endif // GALLIUM_SOFTPIPE
#endif

View File

@@ -0,0 +1,73 @@
#ifndef SW_HELPER_H
#define SW_HELPER_H
#include "pipe/p_compiler.h"
#include "util/u_debug.h"
#include "target-helpers/sw_helper_public.h"
#include "state_tracker/sw_winsys.h"
/* Helper function to choose and instantiate one of the software rasterizers:
* llvmpipe, softpipe.
*/
#ifdef GALLIUM_SOFTPIPE
#include "softpipe/sp_public.h"
#endif
#ifdef GALLIUM_LLVMPIPE
#include "llvmpipe/lp_public.h"
#endif
#ifdef GALLIUM_VIRGL
#include "virgl/virgl_public.h"
#include "virgl/vtest/virgl_vtest_public.h"
#endif
static inline struct pipe_screen *
sw_screen_create_named(struct sw_winsys *winsys, const char *driver)
{
struct pipe_screen *screen = NULL;
#if defined(GALLIUM_LLVMPIPE)
if (screen == NULL && strcmp(driver, "llvmpipe") == 0)
screen = llvmpipe_create_screen(winsys);
#endif
#if defined(GALLIUM_VIRGL)
if (screen == NULL && strcmp(driver, "virpipe") == 0) {
struct virgl_winsys *vws;
vws = virgl_vtest_winsys_wrap(winsys);
screen = virgl_create_screen(vws);
}
#endif
#if defined(GALLIUM_SOFTPIPE)
if (screen == NULL)
screen = softpipe_create_screen(winsys);
#endif
return screen;
}
struct pipe_screen *
sw_screen_create(struct sw_winsys *winsys)
{
const char *default_driver;
const char *driver;
#if defined(GALLIUM_LLVMPIPE)
default_driver = "llvmpipe";
#elif defined(GALLIUM_SOFTPIPE)
default_driver = "softpipe";
#else
default_driver = "";
#endif
driver = debug_get_option("GALLIUM_DRIVER", default_driver);
return sw_screen_create_named(winsys, driver);
}
#endif

View File

@@ -0,0 +1,10 @@
#ifndef _SW_HELPER_PUBLIC_H
#define _SW_HELPER_PUBLIC_H
struct pipe_screen;
struct sw_winsys;
struct pipe_screen *
sw_screen_create(struct sw_winsys *winsys);
#endif /* _SW_HELPER_PUBLIC_H */

View File

@@ -29,6 +29,7 @@
#include "util/u_string.h"
#include "util/u_math.h"
#include "util/u_memory.h"
#include "util/u_math.h"
#include "tgsi_dump.h"
#include "tgsi_info.h"
#include "tgsi_iterate.h"
@@ -43,6 +44,8 @@ struct dump_ctx
{
struct tgsi_iterate_context iter;
boolean dump_float_as_hex;
uint instno;
uint immno;
int indent;
@@ -88,6 +91,7 @@ dump_enum(
#define SID(I) ctx->dump_printf( ctx, "%d", I )
#define FLT(F) ctx->dump_printf( ctx, "%10.4f", F )
#define DBL(D) ctx->dump_printf( ctx, "%10.8f", D )
#define HFLT(F) ctx->dump_printf( ctx, "0x%08x", fui((F)) )
#define ENM(E,ENUMS) dump_enum( ctx, E, ENUMS, sizeof( ENUMS ) / sizeof( *ENUMS ) )
const char *
@@ -251,7 +255,10 @@ dump_imm_data(struct tgsi_iterate_context *iter,
break;
}
case TGSI_IMM_FLOAT32:
FLT( data[i].Float );
if (ctx->dump_float_as_hex)
HFLT( data[i].Float );
else
FLT( data[i].Float );
break;
case TGSI_IMM_UINT32:
UID(data[i].Uint);
@@ -648,6 +655,7 @@ tgsi_dump_instruction(
ctx.indent = 0;
ctx.dump_printf = dump_ctx_printf;
ctx.indentation = 0;
ctx.file = NULL;
iter_instruction( &ctx.iter, (struct tgsi_full_instruction *)inst );
}
@@ -681,6 +689,11 @@ tgsi_dump_to_file(const struct tgsi_token *tokens, uint flags, FILE *file)
ctx.indentation = 0;
ctx.file = file;
if (flags & TGSI_DUMP_FLOAT_AS_HEX)
ctx.dump_float_as_hex = TRUE;
else
ctx.dump_float_as_hex = FALSE;
tgsi_iterate_shader( tokens, &ctx.iter );
}
@@ -696,6 +709,7 @@ struct str_dump_ctx
char *str;
char *ptr;
int left;
bool nospace;
};
static void
@@ -718,10 +732,11 @@ str_dump_ctx_printf(struct dump_ctx *ctx, const char *format, ...)
sctx->ptr += written;
sctx->left -= written;
}
}
} else
sctx->nospace = true;
}
void
bool
tgsi_dump_str(
const struct tgsi_token *tokens,
uint flags,
@@ -748,8 +763,16 @@ tgsi_dump_str(
ctx.str[0] = 0;
ctx.ptr = str;
ctx.left = (int)size;
ctx.nospace = false;
if (flags & TGSI_DUMP_FLOAT_AS_HEX)
ctx.base.dump_float_as_hex = TRUE;
else
ctx.base.dump_float_as_hex = FALSE;
tgsi_iterate_shader( tokens, &ctx.base.iter );
return !ctx.nospace;
}
void
@@ -772,6 +795,7 @@ tgsi_dump_instruction_str(
ctx.str[0] = 0;
ctx.ptr = str;
ctx.left = (int)size;
ctx.nospace = false;
iter_instruction( &ctx.base.iter, (struct tgsi_full_instruction *)inst );
}

View File

@@ -38,7 +38,9 @@
extern "C" {
#endif
void
#define TGSI_DUMP_FLOAT_AS_HEX (1 << 0)
bool
tgsi_dump_str(
const struct tgsi_token *tokens,
uint flags,

View File

@@ -474,6 +474,8 @@ tgsi_exec_get_shader_param(enum pipe_shader_cap param)
case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED:
case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED:
return 0;
case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
return 32;
}
/* if we get here, we missed a shader cap above (and should have seen
* a compiler warning.)

View File

@@ -365,23 +365,14 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
info->output_semantic_index[reg] = (ubyte) semIndex;
info->num_outputs++;
if (semName == TGSI_SEMANTIC_COLOR)
info->colors_written |= 1 << semIndex;
if (procType == TGSI_PROCESSOR_VERTEX ||
procType == TGSI_PROCESSOR_GEOMETRY ||
procType == TGSI_PROCESSOR_TESS_CTRL ||
procType == TGSI_PROCESSOR_TESS_EVAL) {
if (semName == TGSI_SEMANTIC_CLIPDIST) {
info->num_written_clipdistance +=
util_bitcount(fulldecl->Declaration.UsageMask);
info->clipdist_writemask |=
fulldecl->Declaration.UsageMask << (semIndex*4);
}
else if (semName == TGSI_SEMANTIC_CULLDIST) {
info->num_written_culldistance +=
util_bitcount(fulldecl->Declaration.UsageMask);
info->culldist_writemask |=
fulldecl->Declaration.UsageMask << (semIndex*4);
}
else if (semName == TGSI_SEMANTIC_VIEWPORT_INDEX) {
if (semName == TGSI_SEMANTIC_VIEWPORT_INDEX) {
info->writes_viewport_index = TRUE;
}
else if (semName == TGSI_SEMANTIC_LAYER) {
@@ -432,9 +423,21 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
const struct tgsi_full_property *fullprop
= &parse.FullToken.FullProperty;
unsigned name = fullprop->Property.PropertyName;
unsigned value = fullprop->u[0].Data;
assert(name < Elements(info->properties));
info->properties[name] = fullprop->u[0].Data;
info->properties[name] = value;
switch (name) {
case TGSI_PROPERTY_NUM_CLIPDIST_ENABLED:
info->num_written_clipdistance = value;
info->clipdist_writemask |= (1 << value) - 1;
break;
case TGSI_PROPERTY_NUM_CULLDIST_ENABLED:
info->num_written_culldistance = value;
info->culldist_writemask |= (1 << value) - 1;
break;
}
}
break;

View File

@@ -77,6 +77,7 @@ struct tgsi_shader_info
uint opcode_count[TGSI_OPCODE_LAST]; /**< opcode histogram */
ubyte colors_written;
boolean reads_position; /**< does fragment shader read position? */
boolean reads_z; /**< does fragment shader read depth? */
boolean writes_z; /**< does fragment shader write Z value? */

View File

@@ -95,6 +95,7 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] =
"TESSOUTER",
"TESSINNER",
"VERTICESIN",
"HELPER_INVOCATION",
};
const char *tgsi_texture_names[TGSI_TEXTURE_COUNT] =
@@ -137,6 +138,8 @@ const char *tgsi_property_names[TGSI_PROPERTY_COUNT] =
"TES_SPACING",
"TES_VERTEX_ORDER_CW",
"TES_POINT_MODE",
"NUM_CLIPDIST_ENABLED",
"NUM_CULLDIST_ENABLED",
};
const char *tgsi_return_type_names[TGSI_RETURN_TYPE_COUNT] =

View File

@@ -195,8 +195,15 @@ static boolean parse_float( const char **pcur, float *val )
boolean integral_part = FALSE;
boolean fractional_part = FALSE;
*val = (float) atof( cur );
if (*cur == '0' && *(cur + 1) == 'x') {
union fi fi;
fi.ui = strtoul(cur, NULL, 16);
*val = fi.f;
cur += 10;
goto out;
}
*val = (float) atof( cur );
if (*cur == '-' || *cur == '+')
cur++;
if (is_digit( cur )) {
@@ -228,6 +235,8 @@ static boolean parse_float( const char **pcur, float *val )
else
return FALSE;
}
out:
*pcur = cur;
return TRUE;
}

View File

@@ -35,6 +35,7 @@
#include "tgsi/tgsi_dump.h"
#include "tgsi/tgsi_sanity.h"
#include "util/u_debug.h"
#include "util/u_inlines.h"
#include "util/u_memory.h"
#include "util/u_math.h"
#include "util/u_bitmask.h"
@@ -1830,29 +1831,6 @@ void ureg_free_tokens( const struct tgsi_token *tokens )
}
static inline unsigned
pipe_shader_from_tgsi_processor(unsigned processor)
{
switch (processor) {
case TGSI_PROCESSOR_VERTEX:
return PIPE_SHADER_VERTEX;
case TGSI_PROCESSOR_TESS_CTRL:
return PIPE_SHADER_TESS_CTRL;
case TGSI_PROCESSOR_TESS_EVAL:
return PIPE_SHADER_TESS_EVAL;
case TGSI_PROCESSOR_GEOMETRY:
return PIPE_SHADER_GEOMETRY;
case TGSI_PROCESSOR_FRAGMENT:
return PIPE_SHADER_FRAGMENT;
case TGSI_PROCESSOR_COMPUTE:
return PIPE_SHADER_COMPUTE;
default:
assert(0);
return PIPE_SHADER_VERTEX;
}
}
struct ureg_program *
ureg_create(unsigned processor)
{
@@ -1872,7 +1850,7 @@ ureg_create_with_screen(unsigned processor, struct pipe_screen *screen)
ureg->supports_any_inout_decl_range =
screen &&
screen->get_shader_param(screen,
pipe_shader_from_tgsi_processor(processor),
util_pipe_shader_from_tgsi_processor(processor),
PIPE_SHADER_CAP_TGSI_ANY_INOUT_DECL_RANGE) != 0;
for (i = 0; i < Elements(ureg->properties); i++)

View File

@@ -70,7 +70,7 @@ struct blitter_context_priv
/* Constant state objects. */
/* Vertex shaders. */
void *vs; /**< Vertex shader which passes {pos, generic} to the output.*/
void *vs_pos_only; /**< Vertex shader which passes pos to the output.*/
void *vs_pos_only[4]; /**< Vertex shader which passes pos to the output.*/
void *vs_layered; /**< Vertex shader which sets LAYER = INSTANCEID. */
/* Fragment shaders. */
@@ -325,27 +325,29 @@ struct blitter_context *util_blitter_create(struct pipe_context *pipe)
return &ctx->base;
}
static void bind_vs_pos_only(struct blitter_context_priv *ctx)
static void bind_vs_pos_only(struct blitter_context_priv *ctx,
unsigned num_so_channels)
{
struct pipe_context *pipe = ctx->base.pipe;
int index = num_so_channels ? num_so_channels - 1 : 0;
if (!ctx->vs_pos_only) {
if (!ctx->vs_pos_only[index]) {
struct pipe_stream_output_info so;
const uint semantic_names[] = { TGSI_SEMANTIC_POSITION };
const uint semantic_indices[] = { 0 };
memset(&so, 0, sizeof(so));
so.num_outputs = 1;
so.output[0].num_components = 1;
so.stride[0] = 1;
so.output[0].num_components = num_so_channels;
so.stride[0] = num_so_channels;
ctx->vs_pos_only =
ctx->vs_pos_only[index] =
util_make_vertex_passthrough_shader_with_so(pipe, 1, semantic_names,
semantic_indices, FALSE,
&so);
}
pipe->bind_vs_state(pipe, ctx->vs_pos_only);
pipe->bind_vs_state(pipe, ctx->vs_pos_only[index]);
}
static void bind_vs_passthrough(struct blitter_context_priv *ctx)
@@ -441,8 +443,9 @@ void util_blitter_destroy(struct blitter_context *blitter)
pipe->delete_rasterizer_state(pipe, ctx->rs_discard_state);
if (ctx->vs)
pipe->delete_vs_state(pipe, ctx->vs);
if (ctx->vs_pos_only)
pipe->delete_vs_state(pipe, ctx->vs_pos_only);
for (i = 0; i < 4; i++)
if (ctx->vs_pos_only[i])
pipe->delete_vs_state(pipe, ctx->vs_pos_only[i]);
if (ctx->vs_layered)
pipe->delete_vs_state(pipe, ctx->vs_layered);
pipe->delete_vertex_elements_state(pipe, ctx->velem_state);
@@ -2036,7 +2039,7 @@ void util_blitter_copy_buffer(struct blitter_context *blitter,
pipe->set_vertex_buffers(pipe, ctx->base.vb_slot, 1, &vb);
pipe->bind_vertex_elements_state(pipe, ctx->velem_state_readbuf[0]);
bind_vs_pos_only(ctx);
bind_vs_pos_only(ctx, 1);
if (ctx->has_geometry_shader)
pipe->bind_gs_state(pipe, NULL);
if (ctx->has_tessellation) {
@@ -2103,7 +2106,7 @@ void util_blitter_clear_buffer(struct blitter_context *blitter,
pipe->set_vertex_buffers(pipe, ctx->base.vb_slot, 1, &vb);
pipe->bind_vertex_elements_state(pipe,
ctx->velem_state_readbuf[num_channels-1]);
bind_vs_pos_only(ctx);
bind_vs_pos_only(ctx, num_channels);
if (ctx->has_geometry_shader)
pipe->bind_gs_state(pipe, NULL);
if (ctx->has_tessellation) {

View File

@@ -70,6 +70,20 @@ void _debug_vprintf(const char *format, va_list ap)
#endif
}
void
_pipe_debug_message(
struct pipe_debug_callback *cb,
unsigned *id,
enum pipe_debug_type type,
const char *fmt, ...)
{
va_list args;
va_start(args, fmt);
if (cb && cb->debug_message)
cb->debug_message(cb->data, id, type, fmt, args);
va_end(args);
}
void
debug_disable_error_message_boxes(void)
@@ -276,7 +290,7 @@ debug_get_flags_option(const char *name,
for (; flags->name; ++flags)
namealign = MAX2(namealign, strlen(flags->name));
for (flags = orig; flags->name; ++flags)
_debug_printf("| %*s [0x%0*"PRIu64"]%s%s\n", namealign, flags->name,
_debug_printf("| %*s [0x%0*"PRIx64"]%s%s\n", namealign, flags->name,
(int)sizeof(uint64_t)*CHAR_BIT/4, flags->value,
flags->desc ? " " : "", flags->desc ? flags->desc : "");
}
@@ -291,9 +305,9 @@ debug_get_flags_option(const char *name,
if (debug_get_option_should_print()) {
if (str) {
debug_printf("%s: %s = 0x%"PRIu64" (%s)\n", __FUNCTION__, name, result, str);
debug_printf("%s: %s = 0x%"PRIx64" (%s)\n", __FUNCTION__, name, result, str);
} else {
debug_printf("%s: %s = 0x%"PRIu64"\n", __FUNCTION__, name, result);
debug_printf("%s: %s = 0x%"PRIx64"\n", __FUNCTION__, name, result);
}
}

View File

@@ -42,6 +42,7 @@
#include "os/os_misc.h"
#include "pipe/p_format.h"
#include "pipe/p_defines.h"
#ifdef __cplusplus
@@ -262,6 +263,25 @@ void _debug_assert_fail(const char *expr,
_debug_printf("error: %s\n", __msg)
#endif
/**
* Output a debug log message to the debug info callback.
*/
#define pipe_debug_message(cb, type, fmt, ...) do { \
static unsigned id = 0; \
_pipe_debug_message(cb, &id, \
PIPE_DEBUG_TYPE_ ## type, \
fmt, __VA_ARGS__); \
} while (0)
struct pipe_debug_callback;
void
_pipe_debug_message(
struct pipe_debug_callback *cb,
unsigned *id,
enum pipe_debug_type type,
const char *fmt, ...) _util_printf_format(4, 5);
/**
* Used by debug_dump_enum and debug_dump_flags to describe symbols.

View File

@@ -45,7 +45,7 @@ struct util_dl_library *
util_dl_open(const char *filename)
{
#if defined(PIPE_OS_UNIX)
return (struct util_dl_library *)dlopen(filename, RTLD_LAZY | RTLD_GLOBAL);
return (struct util_dl_library *)dlopen(filename, RTLD_LAZY | RTLD_LOCAL);
#elif defined(PIPE_OS_WINDOWS)
return (struct util_dl_library *)LoadLibraryA(filename);
#else

View File

@@ -169,6 +169,25 @@ util_format_is_snorm(enum pipe_format format)
desc->channel[i].normalized;
}
boolean
util_format_is_snorm8(enum pipe_format format)
{
const struct util_format_description *desc = util_format_description(format);
int i;
if (desc->is_mixed)
return FALSE;
i = util_format_get_first_non_void_channel(format);
if (i == -1)
return FALSE;
return desc->channel[i].type == UTIL_FORMAT_TYPE_SIGNED &&
!desc->channel[i].pure_integer &&
desc->channel[i].normalized &&
desc->channel[i].size == 8;
}
boolean
util_format_is_luminance_alpha(enum pipe_format format)
{

View File

@@ -686,6 +686,9 @@ util_format_is_pure_uint(enum pipe_format format);
boolean
util_format_is_snorm(enum pipe_format format);
boolean
util_format_is_snorm8(enum pipe_format format);
/**
* Check if the src format can be blitted to the destination format with
* a simple memcpy. For example, blitting from RGBA to RGBx is OK, but not

View File

@@ -81,7 +81,13 @@ void util_set_vertex_buffers_count(struct pipe_vertex_buffer *dst,
const struct pipe_vertex_buffer *src,
unsigned start_slot, unsigned count)
{
uint32_t enabled_buffers = (1ull << *dst_count) - 1;
unsigned i;
uint32_t enabled_buffers = 0;
for (i = 0; i < *dst_count; i++) {
if (dst[i].buffer || dst[i].user_buffer)
enabled_buffers |= (1ull << i);
}
util_set_vertex_buffers_mask(dst, &enabled_buffers, src, start_slot,
count);

View File

@@ -651,6 +651,28 @@ util_max_layer(const struct pipe_resource *r, unsigned level)
}
}
static inline unsigned
util_pipe_shader_from_tgsi_processor(unsigned processor)
{
switch (processor) {
case TGSI_PROCESSOR_VERTEX:
return PIPE_SHADER_VERTEX;
case TGSI_PROCESSOR_TESS_CTRL:
return PIPE_SHADER_TESS_CTRL;
case TGSI_PROCESSOR_TESS_EVAL:
return PIPE_SHADER_TESS_EVAL;
case TGSI_PROCESSOR_GEOMETRY:
return PIPE_SHADER_GEOMETRY;
case TGSI_PROCESSOR_FRAGMENT:
return PIPE_SHADER_FRAGMENT;
case TGSI_PROCESSOR_COMPUTE:
return PIPE_SHADER_COMPUTE;
default:
assert(0);
return PIPE_SHADER_VERTEX;
}
}
#ifdef __cplusplus
}
#endif

View File

@@ -450,6 +450,43 @@ null_constant_buffer(struct pipe_context *ctx)
util_report_result(pass);
}
static void
null_fragment_shader(struct pipe_context *ctx)
{
struct cso_context *cso;
struct pipe_resource *cb;
void *vs;
struct pipe_rasterizer_state rs = {0};
struct pipe_query *query;
union pipe_query_result qresult;
cso = cso_create_context(ctx);
cb = util_create_texture2d(ctx->screen, 256, 256,
PIPE_FORMAT_R8G8B8A8_UNORM);
util_set_common_states_and_clear(cso, ctx, cb);
/* No rasterization. */
rs.rasterizer_discard = 1;
cso_set_rasterizer(cso, &rs);
vs = util_set_passthrough_vertex_shader(cso, ctx, false);
query = ctx->create_query(ctx, PIPE_QUERY_PRIMITIVES_GENERATED, 0);
ctx->begin_query(ctx, query);
util_draw_fullscreen_quad(cso);
ctx->end_query(ctx, query);
ctx->get_query_result(ctx, query, true, &qresult);
/* Cleanup. */
cso_destroy_context(cso);
ctx->delete_vs_state(ctx, vs);
ctx->destroy_query(ctx, query);
pipe_resource_reference(&cb, NULL);
/* Check PRIMITIVES_GENERATED. */
util_report_result(qresult.u64 == 2);
}
/**
* Run all tests. This should be run with a clean context after
* context_create.
@@ -459,6 +496,7 @@ util_run_tests(struct pipe_screen *screen)
{
struct pipe_context *ctx = screen->context_create(screen, NULL, 0);
null_fragment_shader(ctx);
tgsi_vs_window_space_position(ctx);
null_sampler_view(ctx, TGSI_TEXTURE_2D);
null_sampler_view(ctx, TGSI_TEXTURE_BUFFER);

View File

@@ -544,6 +544,7 @@ u_vbuf_translate_find_free_vb_slots(struct u_vbuf *mgr,
index = ffs(unused_vb_mask) - 1;
fallback_vbs[type] = index;
unused_vb_mask &= ~(1 << index);
/*printf("found slot=%i for type=%i\n", index, type);*/
}
}
@@ -997,26 +998,30 @@ u_vbuf_upload_buffers(struct u_vbuf *mgr,
return PIPE_OK;
}
static boolean u_vbuf_need_minmax_index(struct u_vbuf *mgr)
static boolean u_vbuf_need_minmax_index(const struct u_vbuf *mgr)
{
/* See if there are any per-vertex attribs which will be uploaded or
* translated. Use bitmasks to get the info instead of looping over vertex
* elements. */
return (mgr->ve->used_vb_mask &
((mgr->user_vb_mask | mgr->incompatible_vb_mask |
((mgr->user_vb_mask |
mgr->incompatible_vb_mask |
mgr->ve->incompatible_vb_mask_any) &
mgr->ve->noninstance_vb_mask_any & mgr->nonzero_stride_vb_mask)) != 0;
mgr->ve->noninstance_vb_mask_any &
mgr->nonzero_stride_vb_mask)) != 0;
}
static boolean u_vbuf_mapping_vertex_buffer_blocks(struct u_vbuf *mgr)
static boolean u_vbuf_mapping_vertex_buffer_blocks(const struct u_vbuf *mgr)
{
/* Return true if there are hw buffers which don't need to be translated.
*
* We could query whether each buffer is busy, but that would
* be way more costly than this. */
return (mgr->ve->used_vb_mask &
(~mgr->user_vb_mask & ~mgr->incompatible_vb_mask &
mgr->ve->compatible_vb_mask_all & mgr->ve->noninstance_vb_mask_any &
(~mgr->user_vb_mask &
~mgr->incompatible_vb_mask &
mgr->ve->compatible_vb_mask_all &
mgr->ve->noninstance_vb_mask_any &
mgr->nonzero_stride_vb_mask)) != 0;
}

View File

@@ -62,6 +62,18 @@ const enum pipe_format const_resource_formats_VUYA[3] = {
PIPE_FORMAT_NONE
};
const enum pipe_format const_resource_formats_YUVX[3] = {
PIPE_FORMAT_R8G8B8X8_UNORM,
PIPE_FORMAT_NONE,
PIPE_FORMAT_NONE
};
const enum pipe_format const_resource_formats_VUYX[3] = {
PIPE_FORMAT_B8G8R8X8_UNORM,
PIPE_FORMAT_NONE,
PIPE_FORMAT_NONE
};
const enum pipe_format const_resource_formats_YUYV[3] = {
PIPE_FORMAT_R8G8_R8B8_UNORM,
PIPE_FORMAT_NONE,
@@ -102,6 +114,12 @@ vl_video_buffer_formats(struct pipe_screen *screen, enum pipe_format format)
case PIPE_FORMAT_B8G8R8A8_UNORM:
return const_resource_formats_VUYA;
case PIPE_FORMAT_R8G8B8X8_UNORM:
return const_resource_formats_YUVX;
case PIPE_FORMAT_B8G8R8X8_UNORM:
return const_resource_formats_VUYX;
case PIPE_FORMAT_YUYV:
return const_resource_formats_YUYV;

Some files were not shown because too many files have changed in this diff Show More