Compare commits

...

4194 Commits

Author SHA1 Message Date
Dylan Baker
02fcd9d803 docs/relnotes/19.2.8: Add SHA256 sum 2019-12-18 11:23:15 -08:00
Dylan Baker
34896d2299 VERSION: bump for 19.2.8 2019-12-18 11:02:09 -08:00
Dylan Baker
1743dec475 docs: add relnotes for 19.2.8 2019-12-18 11:01:53 -08:00
Lionel Landwerlin
6979d19fcb mesa: avoid triggering assert in implementation
When tearing down a GL context with an active performance query, the
implementation can be confused by a query marked active when it's
being deleted.

This shouldn't happen in the implementation because the context will
already be idle.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2235
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3115>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3115>
(cherry picked from commit 2c8742ed85)
2019-12-17 09:10:49 -08:00
Gert Wollny
f42f9bbcd6 virgl: Increase the shader transfer buffer by doubling the size
With only linearly increasing the size of the shader transfer buffer
the transfer of very large shaders may fail, so with each attempt double
the size of the buffer.

CTS:
  dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.48
  for VTK-GL-CTS b5dcfb9c5 and newer

virglrenderer bug:
  https://gitlab.freedesktop.org/virgl/virglrenderer/issues/150

Fixes: a8987b88ff
    virgl: add driver for virtio-gpu 3D (v2)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3121>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3121>
(cherry picked from commit cffa7bb990)
2019-12-17 09:08:49 -08:00
Dylan Baker
4244f4af88 cherry-ignore: Update for 19.2.8 2019-12-16 16:09:30 -08:00
Bas Nieuwenhuizen
ed9c1f7f42 amd/common: Fix tcCompatible degradation on Stoney.
addrlib sometimes returns smaller sizes for tcCompat as it does
not seem to take into account the depth+stencil matching config
gymnastics with tcCompat.

This fixes
dEQP-VK.pipeline.render_to_image.core.2d_array.huge.height.r8g8b8a8_unorm_d32_sfloat_s8_uint

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>
(cherry picked from commit e197fb1c2f)
Conflicts resolved by Dylan Baker

Conflicts:
	src/amd/common/ac_surface.c
2019-12-16 16:09:30 -08:00
Iván Briano
5d2d6442ff anv: Export filter_minmax support only when it's really supported
Fixes: bea4d4c78c ("anv: add VK_EXT_sampler_filter_minmax support")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>
(cherry picked from commit 0fd93b9589)
2019-12-16 15:16:50 -08:00
Bas Nieuwenhuizen
e8913e62a6 amd/common: Always use addrlib for HTILE tc-compat.
Even without depth+stencil addrlib can (correctly!) decide to
disable tc compatible HTILE.

One example is 8x sampling with 32-bit depth on Stoney. The row size
on Stoney is 1024, while the tile size is 2048, which results in
tile splits which are not supported with tc-compat.

On Stoney, this fixes
dEQP-VK.glsl.builtin_var.fragdepth.*_list_d32_sfloat_multisample_8

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>
(cherry picked from commit b53856aca3)
2019-12-16 15:16:50 -08:00
Lionel Landwerlin
448b90a4a0 anv: fix fence underlying primitive checks
We appear to have got lucky that the only type of temporary fence
payload we could have was a syncobj and that would only happen when
the type of the permanent payload was also a syncobj.

This code was broken if that assumption changed and it did in commit
f9a3d9738b.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit 52bc235f2a)
2019-12-16 15:16:49 -08:00
Kenneth Graunke
06b97d1e34 iris: Default to X-tiling for scanout buffers without modifiers
Neither Mutter nor KWin's wayland compositors appear to use modifiers.
In the non-modifier case, iris was still trying to use Y-tiling for
scan-out surfaces, leading to this error:

(gnome-shell:7247): mutter-WARNING **: 09:23:47.787: meta_drm_buffer_gbm_new failed: drmModeAddFB failed: Invalid argument

We now fall back to the historical X-tiling for scanout buffers, which
ought to work everyone, at lower performance.  To regain that, we need
to ensure modifiers are actually supported in environments people use.

Fixes: fbf3124771 ("iris: Rework tiling/modifiers handling")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit dcb4230e5e)
2019-12-16 15:16:49 -08:00
Jason Ekstrand
9925871a1a anv: Don't leak when set_tiling fails
Fixes: a44744e01d "anv: Require a dedicated allocation for..."
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 0a36fafa95)
Conflicts resolved by Dylan Baker

Conflicts:
	src/intel/vulkan/anv_device.c
2019-12-16 15:16:49 -08:00
Dylan Baker
325ef15f26 meson/broadcom: libbroadcom_cle also needs zlib
Fixes: 1ae8018a6a
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit d0eebda990)
2019-12-11 13:19:24 -08:00
Dylan Baker
5e100b6ba7 meson/broadcom: libbroadcom_cle needs expat headers
Fixes: 1ae8018a6a
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 85a9698ac3)
2019-12-11 13:19:17 -08:00
Alyssa Rosenzweig
47c8b41f8f gallium/util: Support POLYGON in u_stream_outputs_for_vertices
u_decomposed_prims_for_vertices cannot support POLYGON, but POLYGON is
trivial to support as a special case directly (since we have the number
of vertices directly).

Fixes aborts in Panfrost in apps using GL_POLYGON.

Fixes: e881aa8c12 ("gallium/util: Add u_stream_outputs_for_vertices helper")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Revewied-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit a37822f5f7)
2019-12-10 09:09:05 -08:00
Jason Ekstrand
af0d38bfde anv: Re-emit all compute state on pipeline switch
It's a very odd case to hit in the real world.  However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.

Fixes: bc612536eb "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 0f60aa4037)
2019-12-10 09:08:59 -08:00
Nanley Chery
f3507690f8 gallium: Store the image format in winsys_handle
This format will be used to properly handle planar images with modifiers
in iris.

Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 51ee8fff9b)
2019-12-10 09:08:41 -08:00
Nanley Chery
96b8f42611 gallium/dri2: Fix creation of multi-planar modifier images
The commit noted below assumed and enforced that DRM_MOD_INVALID was the
only valid modifier for multi-planar imported images. Due to that, it
required that modifier on multi-planar images to:

   1. Allow multiple planes.
   2. Perform YUV format lowering and extent adjustments.
   3. Use buffer_index to correctly map the given planes.

Fix these issues by removing or updating the code built on that
assumption.

Fixes: 2066966c10 ("gallium/dri2: Support creating multi-planar modifier images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d5c857837a)
2019-12-10 09:08:35 -08:00
Timothy Arceri
0a70ed2aa3 glsl/nir: iterate the system values list when adding varyings
Iterate the system values list when adding varyings to the program
resource list in the NIR linker. This is needed to avoid CTS
regressions when using the NIR to build the GLSL resource list in
an upcoming series. Presumably it also fixes a bug with the current
ARB_gl_spirv support.

Fixes: ffdb44d3a0 ("nir/linker: Add inputs/outputs to the program resource list")

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
(cherry picked from commit 1abca2b3c8)
2019-12-10 09:08:29 -08:00
Rob Clark
3742910ba2 nir/lower_clip: Fix incorrect driver loc for clipdist outputs
Somehow adjusting maxloc based on existing outputs got lost, resulting
in the clipdist varying clobbering the position varying.  Causing a
shader that had no position output in freedreno/ir3, which triggers GPU
hangs in neverball.

Fixes: d0f746b645 ("nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs.")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 372ed42d22)
2019-12-04 14:44:25 -08:00
Lionel Landwerlin
21a4be582c intel/perf: fix improper pointer access
This expression was unused by the macro, probably why it didn't
register in the compilation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit ddacd3d43b)
2019-12-04 14:44:04 -08:00
Lionel Landwerlin
272c4f2711 intel/perf: simplify the processing of OA reports
This is a more accurate description of what happens in processing the
OA reports.

Previously we only had a somewhat difficult to parse state machine
tracking the context ID.

What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :

   * whether the r0 is tagged with the context ID relevant to us

   * if r0 is not tagged with our context ID and r1 is: does r0 have a
     invalid context id? If not then we're in a case where i915 has
     resubmitted the same context for execution through the execlist
     submission port

v2: Update comment (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 8c0b058263)
2019-12-04 14:43:58 -08:00
Lionel Landwerlin
d4bb049e98 intel/perf: take into account that reports read can be fairly old
If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b364e920bf)
2019-12-04 14:43:51 -08:00
Lionel Landwerlin
21df9767ad intel/perf: set read buffer len to 0 to identify empty buffer
We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.

We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9d0a5c817c)
2019-12-04 14:43:38 -08:00
Lionel Landwerlin
82177cdc1d intel/perf: fix invalid hw_id in query results
Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit acea59dbf8)
2019-12-04 14:43:29 -08:00
Dylan Baker
e5b4f23475 docs: Add SHA256 sums for 19.2.7 2019-12-04 14:36:13 -08:00
Dylan Baker
65d255cd1e VERSION: bump version for 19.2.7 2019-12-04 13:48:11 -08:00
Dylan Baker
d8e767ede8 docs: Add release notes for 19.2.7 2019-12-04 13:47:44 -08:00
Rhys Perry
4a0199b6e4 radv: set writes_memory for global memory stores/atomics
Fixes: 13ab63bb62 ('radv: Implement VK_EXT_buffer_device_address.')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 35fab1ba33)
2019-12-04 13:43:32 -08:00
Samuel Pitoiset
3ed8c94244 radv: fix compute pipeline keys when optimizations are disabled
If an app first creates a compute pipeline with
VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT set, then re-compile it
without that flag, the driver should re-compile the compute shader.
Otherwise, it will return the unoptimized one.

Fixes: ce188813bf ("radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 9ab27647ff)
2019-12-04 13:43:32 -08:00
Samuel Pitoiset
5c98b36577 radv/gfx10: fix implementation of exclusive scans
This implementation is loosely based on ROCm.
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl

This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10.

Fixes: 227c29a80d ("amd/common/gfx10: implement scan & reduce operations")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit c9aa843961)
Conflicts resolved by Dylan Baker
2019-12-04 13:43:32 -08:00
Samuel Pitoiset
a3869c14c0 radv: fix enabling sample shading with SampleID/SamplePosition
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.

Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 86a5fbfd4a)
Conflicts resolved by Dylan Baker
2019-12-04 13:43:32 -08:00
Yevhenii Kolesnikov
bda6890f58 meson: Fix linkage of libgallium_nine with libgalliumvl
Do not link libgallium_nine with libgalliumvl_stub if it's already
linked with libgalliumvl. Linking with stub leads to "duplicate
symbol" errors.

Fixes: 6b4c7047d5
       ("meson: build gallium nine state_tracker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2040

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 9af22ccddc)
Conflicts resolved by Dylan Baker
2019-12-04 13:43:32 -08:00
Jason Ekstrand
2bf47550ce anv: Set up SBE_SWIZ properly for gl_Viewport
gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit b1f37688ba)
2019-12-04 13:43:32 -08:00
Jonathan Gray
b4e83559cb i965: update Makefile.sources for perf changes
brw_performance_query_metrics.h was removed in
134e750e16 and
brw_performance_query.h was removed in
8ae6667992

remove reference to these files from Makefile.sources

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Fixes: 134e750e16 ("i965: extract performance query metrics")
Fixes: 8ae6667992 ("intel/perf: move query_object into perf")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 34dda0ca65)
2019-12-04 13:43:32 -08:00
Boris Brezillon
a39d364af3 gallium: Fix the ->set_damage_region() implementation
BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update() (->lastStamp != ->texture_stamp), leading to a
damage region update on the wrong pipe_resource object.
Let's delay the ->set_damage_region() call until the attachments are
updated when we're in that case.

Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 492ffbed63 ("st/dri2: Implement DRI2bufferDamageExtension")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b196e1a8cf)
2019-12-04 13:43:32 -08:00
Jonathan Gray
c166435dc2 winsys/amdgpu: avoid double simple_mtx_unlock()
pthread_mutex_unlock() when unlocked is documented by posix as
being undefined behaviour.  On OpenBSD pthread_mutex_unlock() will call
abort(3) if this happens.

This occurs in amdgpu_winsys_create() after
cb446dc0fa
winsys/amdgpu: Add amdgpu_screen_winsys

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 3fe3bde4f2)
2019-12-04 13:43:32 -08:00
Bas Nieuwenhuizen
52a8d43a24 radv: Unify max_descriptor_set_size.
They were out of sync. Besides syncing, lets ensure they never diverge
again.

Fixes: 8d2654a419 "radv: Support VK_EXT_inline_uniform_block."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4cde0e04e3)
2019-12-04 13:43:32 -08:00
Bas Nieuwenhuizen
2e379b0a65 radv: Allocate cmdbuffer space for buffer marker write.
Fixes: 946193ae00 "radv: add support for VK_AMD_buffer_marker"
Reviewed-by:  Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 25bc9102d8)
2019-12-04 13:43:32 -08:00
Zebediah Figura
336f59f8ba Revert "draw: revert using correct order for prim decomposition."
This reverts commit f97b731c82.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/250

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit a3c8bc10aa)
2019-12-04 13:43:32 -08:00
Ian Romanick
38c8af9e2a intel/fs: Disable conditional discard optimization on Gen4 and Gen5
The CMP instruction on Gen4 and Gen5 generates one bit (the LSB) of
valid data and 31 bits of junk.  Results of comparisons that are used as
Boolean values need to have a fixup applied to generate the proper 0/~0
values.

Calling fs_visitor::nir_emit_alu with need_dest=false prevents the fixup
code from being generated.  This results in a sequence like:

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
(+f0.1) or.z.f0.1(16) null<1>UD g4<8,8,1>UD     g8<8,8,1>UD

instead of

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
        or(16) g4<1>UD g4<8,8,1>UD     g8<8,8,1>UD
(+f0.1) and.z.f0.1(16) null<1>UD g4<8,8,1>UD     1UD

I examined a couple of the shaders hurt by this change, and ALL of them
would have been affected by this bug. :(

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1836
Fixes: 0ba9497e66 ("intel/fs: Improve discard_if code generation")

Iron Lake
total instructions in shared programs: 8122757 -> 8122957 (<.01%)
instructions in affected programs: 8307 -> 8507 (2.41%)
helped: 0
HURT: 100
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.67% x̄: 2.81% x̃: 2.76%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.58% 3.03%
Instructions are HURT.

total cycles in shared programs: 188510100 -> 188510376 (<.01%)
cycles in affected programs: 76018 -> 76294 (0.36%)
helped: 0
HURT: 55
HURT stats (abs)   min: 2 max: 12 x̄: 5.02 x̃: 4
HURT stats (rel)   min: 0.07% max: 3.75% x̄: 0.86% x̃: 0.56%
95% mean confidence interval for cycles value: 4.33 5.71
95% mean confidence interval for cycles %-change: 0.60% 1.12%
Cycles are HURT.

GM45
total instructions in shared programs: 4994403 -> 4994503 (<.01%)
instructions in affected programs: 4212 -> 4312 (2.37%)
helped: 0
HURT: 50
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.25% x̄: 2.76% x̃: 2.72%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.45% 3.07%
Instructions are HURT.

total cycles in shared programs: 128928750 -> 128928982 (<.01%)
cycles in affected programs: 67442 -> 67674 (0.34%)
helped: 0
HURT: 47
HURT stats (abs)   min: 2 max: 12 x̄: 4.94 x̃: 4
HURT stats (rel)   min: 0.09% max: 3.75% x̄: 0.75% x̃: 0.53%
95% mean confidence interval for cycles value: 4.19 5.68
95% mean confidence interval for cycles %-change: 0.50% 1.00%
Cycles are HURT.

(cherry picked from commit e51eda99df)
2019-12-04 13:43:32 -08:00
Dylan Baker
5836dd66e0 VERSION: bumpre to 19.2.6 2019-11-21 16:04:47 -08:00
Dylan Baker
264d1187df docs: Add release notes for 19.2.6 2019-11-21 16:04:11 -08:00
Dylan Baker
aa620fdf8e meson: generate .pc files for gles and gles2 with old glvnd
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1921
2019-11-21 17:28:23 +00:00
Yevhenii Kolesnikov
05d5784ea6 glsl: Enable textureSize for samplerExternalOES
From OES_EGL_image_external_essl3

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1901

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
2019-11-21 09:26:26 -08:00
Dave Airlie
b1f505463d llvmpipe/ppc: fix if/ifdef confusion in backport.
Fixes: 32aba91c07 (llvmpipe: use ppc64le/ppc64 Large code model for JIT-compiled shaders)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2131
2019-11-21 09:26:26 -08:00
Hyunjun Ko
c2488d810b freedreno/ir3: fix printing output registers of FS.
Fixes: cea39af2fb ("freedreno/ir3: Generalize ir3_shader_disasm()")

Reviewed-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit d0f38394b1)
2019-11-21 09:26:26 -08:00
Alejandro Piñeiro
e594e4cefd v3d: adds an extra MOV for any sig.ld*
Specifically when we are in non-uniform control flow, as we would need
to set the condition for the last instruction. If (for example) a
image atomic load stores directly their value on a NIR register,
last_inst would be a nop, and would fail when set the condition.

Fixes piglit test:
spec/glsl-es-3.10/execution/cs-ssbo-atomic-if-else-2.shader_test

Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")

v2: (Changes suggested by Eric Anholt)
   * Cover all sig.ld* signals, not just ldunif and ldtmu, as all of
     them have the same restriction.
   * Update comment explaining why we add a MOV in that case
   * Tweak commit message.

v3:
   * Drop extra set of parens (Eric)
   * Add missing ld signal to is_ld_signal to fix shader-db regression.

Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit b4bc59e37e)
2019-11-21 09:26:26 -08:00
Jose Maria Casanova Crespo
8f526ee7cd v3d: Fix predication with atomic image operations
Fixes dEQP test:
dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.image_atomic_multiple_interleaved_write_read

Fixes piglit test:
spec/glsl-es-3.10/execution/cs-image-atomic-if-else.shader_test

Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit d983055184)
2019-11-21 09:26:26 -08:00
Eric Engestrom
be8a46d064 vulkan: delete typo'd header
Two files exist in that directory:
- vulkan_xlib_randr.h
- vulkan_xlib_xrandr.h

Both were imported in 205c271562 ("vulkan: Update the XML and
headers to 1.1.70") with identical contents (ie. the
VK_EXT_acquire_xlib_display extension), but the former was never
included anywhere and can't be found upstream [1], while the latter is
included in vulkan.h and found upstream.

[1] https://github.com/KhronosGroup/Vulkan-Headers/tree/master/include/vulkan

Fixes: 205c271562 ("vulkan: Update the XML and headers to 1.1.70")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 344859c32d)
2019-11-21 09:26:26 -08:00
Dylan Baker
4c86eda2c2 docs/relnotes/19.2.5: Add SHA256 sum 2019-11-20 09:18:11 -08:00
Dylan Baker
f418e9231a VERSION: bump for 19.2.5 2019-11-20 08:54:30 -08:00
Dylan Baker
9e0a0d2ebb docs: Add relnotes for 19.2.5 2019-11-20 08:54:10 -08:00
Jason Ekstrand
e10851ff34 anv: Stop bounds-checking pushed UBOs
The bounds checking is actually less safe than just pushing the data.
If the bounds checking actually ever kicks in and it's not on the last
UBO push range, then the shrinking will cause all subsequent ranges to
be pushed to the wrong place in the GRF.  One of the behaviors we
definitely don't want is for OOB UBO access to result in completely
unrelated UBOs returning garbage values.  It's safer to just push the
UBOs as-requested.  If we're really concerned about robustness, we can
emit shader code to do bounds checking which should be stupid cheap (a
CMP followed by SEL).

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-20 08:24:09 -08:00
Brian Paul
023ddb01b5 Call shmget() with permission 0600 instead of 0777
A security advisory (TALOS-2019-0857/CVE-2019-5068) found that
creating shared memory regions with permission mode 0777 could allow
any user to access that memory.  Several Mesa drivers use shared-
memory XImages to implement back buffers for improved performance.

This path changes the shmget() calls to use 0600 (user r/w).

Tested with legacy Xlib driver and llvmpipe.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 02c3dad0f3)
2019-11-20 08:24:09 -08:00
Danylo Piliaiev
3199172eaa i965: Unify CC_STATE and BLEND_STATE atoms on Haswell as a workaround
Re-emitting 3DSTATE_CC_STATE_POINTERS after emitting
3DSTATE_BLEND_STATE_POINTERS fixes the shadow flickering in
SuperTuxCart and Tropico 6 which was seen only on Haswell.
The reason for this is unknown and fix was found empirically.

The closest mention in PRM is that it should improve performance.
From the HSW PRM, volume 2b, page 823 (3DSTATE_BLEND_STATE_POINTERS):
 "When the BLEND_STATE pointer changes but not the CC_STATE pointer,
  driver needs to force a CC_STATE pointer change to improve
  blend performance in pixel backend."

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1834
Fixes: eca4a654 ("i965: Disable dual source blending when shader doesn't support it on gen8+")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 6f17fe0606)
2019-11-20 08:24:09 -08:00
Ben Crocker
ae071434e9 llvmpipe: use ppc64le/ppc64 Large code model for JIT-compiled shaders
Large programs, e.g. gnome-shell and firefox, may tax the
addressability of the Medium code model once a (potentially unbounded)
number of dynamically generated JIT-compiled shader programs are
linked in and relocated.  Yet the default code model as of LLVM 8 is
Medium or even Small.

The cost of changing from Medium to Large is negligible:
- an additional 8-byte pointer stored immediately before the shader entrypoint;
- change an add-immediate (addis) instruction to a load (ld).

Testing with WebGL Conformance
(https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html)
yields clean runs with this change (and crashes without it).

Testing with glxgears shows no detectable performance difference.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1753327, 1753789, 1543572, 1747110, and 1582226

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/223

Co-authored by: Nemanja Ivanovic <nemanjai@ca.ibm.com>, Tom Stellard <tstellar@redhat.com>

CC: mesa-stable@lists.freedesktop.org

Signed-off-by: Ben Crocker <bcrocker@redhat.com>
(cherry picked from commit 9c3be6d21f)
Conflicts resolved Dylan (PIPE_ARCH -> UTIL_ARCH rename)
2019-11-20 08:24:09 -08:00
Pierre-Eric Pelloux-Prayer
60c299c542 radeonsi: fix shader disk cache key
Use unsigned values otherwise signed extension will produce a 64 bits value where
the 32 left-most bits are 1.

Fixes: 2afeed3010 ("radeonsi: tell the shader disk cache what IR is used")
2019-11-20 08:24:09 -08:00
Pierre-Eric Pelloux-Prayer
b5b09acb74 radeonsi: tell the shader disk cache what IR is used
Until 8bef4df196 the IR (TGSI or NIR) was used in disk_cache driver_flags.
This commit restores this features to avoid crashing when switching from
one IR to the other.

As radeonsi's default is TGSI, I used "driver_flags & 0x8000000 = 0" for TGSI
to keep the same driver_flags.

Fixes: 8bef4df196 ("radeonsi: add si_debug_options for convenient adding/removing of options")

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-20 08:24:09 -08:00
Pierre-Eric Pelloux-Prayer
38bd621f0d radeonsi: disable sdma for gfx10
Disable sdma on gfx10 until all timeouts bugs are fixed.

See:
    https://gitlab.freedesktop.org/mesa/mesa/issues/1907
    https://bugs.freedesktop.org/show_bug.cgi?id=111481

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-11-20 08:24:09 -08:00
Marek Olšák
0e7e56aa2f tgsi_to_nir: handle PIPE_FORMAT_NONE in image opcodes
radeonsi doesn't use the format and internal shaders don't set it.

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
(cherry picked from commit f704fb7f0b)
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2112
2019-11-20 08:23:55 -08:00
Marek Olšák
2353a63a58 tgsi_to_nir: fix masked out image loads
This caused a failure in NIR validation.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 3906fce88b)
2019-11-19 16:56:01 -08:00
Illia Iorin
856c7eddf7 mesa/main: Ignore filter state for MS texture completeness
After the discussion in
https://github.com/KhronosGroup/OpenGL-API/issues/45
the section 8.17 (texture completeness) of the OpenGL 4.6 core profile
was changed to explicitly say that multisample texture completeness
ignores filter state of the texture.

"Using the preceding definitions, a texture is complete unless any of the
 following conditions hold true:
   ...
  - The minification filter requires a mipmap (is neither NEAREST nor LINEAR),
    the texture is not multisample, and the texture is not mipmap complete.
  - The texture is not multisample; either the magnification filter is not
    NEAREST, or the minification filter is neither NEAREST nor NEAREST_-
    MIPMAP_NEAREST; and any of
    – The internal format of the texture is integer (see table 8.12).
    – The internal format is STENCIL_INDEX.
    – The internal format is DEPTH_STENCIL, and the value of DEPTH_-
      STENCIL_TEXTURE_MODE for the texture is STENCIL_INDEX."

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 6b672e342a)
2019-11-19 16:56:00 -08:00
Ian Romanick
4c82d426bd nir/algebraic: Mark other comparison exact when removing a == a
This prevents some additional optimizations that would change the
original result.  This includes things like (b < a && b < c) => b <
min(a, c) and !(a < b) => b >= a.  Both of these optimizations were
specifically observed in the piglit tests added in piglit!160.

This was discovered while investigating
https://gitlab.freedesktop.org/mesa/mesa/issues/1958.  However, the
problem in that issue was Chrome or Angle is replacing calls to isnan()
with some stuff that we (correctly) optimize to false.  If they had left
the calls to isnan() alone, everything would have just worked.

No shader-db changes on any Intel platform.

I also tried marking the comparison generated by the isnan() function
precise.  The precise marker "infects" every computation involved in
calculating the parameter to the isnan() function, and this severely
hurt all of the (few) shaders in shader-db that use isnan().

I also considered adding a new ir_unop_isnan opcode that would implement
the functionality.  During GLSL IR-to-NIR translation, the resulting
comparison operation would be marked exact (and the samething would need
to happen in SPIR-V translation).

This approach taken by this patch seemed easier, but we may want to do
the ir_unop_isnan thing anyway.

Fixes: d55835b8bd ("nir/algebraic: Add optimizations for "a == a && a CMP b"")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit 9be4a422a0)
2019-11-19 16:56:00 -08:00
Ian Romanick
d8a37880b5 nir/algebraic: Add the ability to mark a replacement as exact
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit ea19f2fb68)
2019-11-19 16:56:00 -08:00
Paulo Zanoni
bd2f6150ca intel/compiler: fix nir_op_{i,u}*32 on ICL
On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.

This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
    dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2

This can also be verified by simply changing fix_byte_src() to apply
on all platforms.

Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit eb6352162d)
2019-11-19 16:56:00 -08:00
Marek Olšák
9ffb0176e2 st/mesa: fix Sanctuary and Tropics by disabling ARB_gpu_shader5 for them
They use the "sample" keyword as a variable name.

Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit e00791c552)
2019-11-19 16:56:00 -08:00
Lionel Landwerlin
a390cf739f anv/wsi: signal the semaphore in the acquireNextImage
We seem to have forgotten about the semaphore in the
acquireNextImageInfo.

v2: Signal semaphore/fence regardless of presentation status (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit edc6606d4e)
2019-11-19 16:56:00 -08:00
Lionel Landwerlin
a993dc20b6 anv: remove list items on batch fini
This doesn't seem to fix anything because those destroy() calls happen
right before the command buffer object & its list of batch_bo is also
destroyed. Still looks a bit cleaner.

v2: Found a second occurence

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files")
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 935f8f0e56)
2019-11-19 16:56:00 -08:00
Lionel Landwerlin
2281258a7b anv: invalidate file descriptor of semaphore sync fd at vkQueueSubmit
We always close the in_fence at the end the anv_cmd_buffer_execbuf()
so when we take it from the semaphore, let's not forget to invalidate
it.

Note that the code leaks the fence_in if we get any error before
reaching the close(). Let's fix that in another patch or better,
rewrite the whole thing!

v2: drop redundant fd = -1 (Jason)

v3: Update commit message (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 048f0690ee)
2019-11-19 16:56:00 -08:00
Dylan Baker
a89d4090bc cherry-ignore: Update for 19.2.4 cycle 2019-11-19 16:56:00 -08:00
Eric Engestrom
cb835d281b egl: fix _EGL_NATIVE_PLATFORM fallback
When the X11 or Haiku platforms were compiled in, they would bypass the
`_EGL_NATIVE_PLATFORM` fallback by always returning themselves instead.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 86d3a346f1)
2019-11-13 11:10:13 -08:00
Caio Marcelo de Oliveira Filho
6e98a923cb spirv: Don't leak GS initialization to other stages
The stage specific fields of shader_info are in an union.  We've
likely been lucky that this value was either overwritten or ignored by
other stages.  The recent change in shader_info layout in commit
84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
made this issue visible.

Fixes: cf2257069c ("nir/spirv: Set a default number of invocations for geometry shaders")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 087ecd9ca5)
2019-11-13 11:10:13 -08:00
Lepton Wu
2b89e68a2e gallium: dri2: Use index as plane number.
This fix wrong color when playing video under Android + virgl
configuration.

Fixes: 2decad495f ("gallium/dri2: Support images with multiple planes for modifiers")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Lepton Wu <lepton@chromium.org>
(cherry picked from commit 5a40e153fd)
2019-11-13 11:10:13 -08:00
Dylan Baker
9cbffed5d0 docs: Add SHA256 sum for for 19.2.4 2019-11-13 11:09:32 -08:00
Dylan Baker
6f1b8e554f VERSION: bump to 19.2.4 2019-11-13 10:38:55 -08:00
Dylan Baker
cebe2b12f5 docs: Add release notes for 19.2.4 2019-11-13 10:38:40 -08:00
Lionel Landwerlin
330d31f63b mesa: check framebuffer completeness only after state update
The change made in 88d665830f ("mesa: check draw buffer completeness
on glClearBufferfi/glClearBufferiv") correctly updated the state prior
to checking the framebuffer completeness on glClearBufferiv but not in
glClearBufferfi.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: 88d665830f ("mesa: check draw buffer completeness on glClearBufferfi/glClearBufferiv")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2072
(cherry picked from commit f93bb90302)
2019-11-13 10:34:47 -08:00
Dylan Baker
019dbc4b3b Bump version to 19.2.3 2019-11-06 08:19:59 -08:00
Dylan Baker
2e4da07a3e docs: add release notes for 19.2.3 2019-11-06 08:19:42 -08:00
Dylan Baker
85608bad46 meson: Add dep_glvnd to egl deps when building with glvnd
Otherwise if glvnd is not installed systemwide, but only in a prefix,
it's headers wont be found. This happens because if it's headers are in
/usr/include/ then another dependence will provide the necessary -I
arguments and compilation will work.

Fixes: 035ec7a2bb
       ("meson: Add support for EGL glvnd")
Acked-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 5d085ad052)
2019-11-05 08:48:03 -08:00
Paulo Zanoni
a2b3911a47 intel/compiler: remove the operand restriction for src1 on GLK
Commit 5847de6e9a implemented a restriction that applies to ICL, but
wrongly marked it as also applying to GLK. Reviewers or MR !1125
pointed this, and the commit history shows removal of GLK to parts of
the patch, but it turns there was still a left-over GLK check in the
code.

This code was breaking some of the i8vec2 tests on GLK, for example:
  dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2

Removing the GLK check solves the issue for GLK. I don't see a reason
on why implementing this restriction would actually break GLK, so
there's still more to investigate here since this bug may be affecting
ICL+, but let's apply the real GLK fix while we analyze and discuss
the other possible issues.

Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1
on ICL")
BSpec: 3017
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit b57383a944)
2019-11-05 08:48:03 -08:00
Kenneth Graunke
a74300e4cf iris: Fix "Force Zero RTA Index Enable" setting again
In 2ca0d913ea, we began updating cso_fb->layers to the actual layer
count, rather than 0.  This fixed cases where we were setting "Force
Zero RTA Index Enable" even when doing layered rendering.  Sadly, it
also broke the check entirely: cso_fb->layers is now 1 for non-layered
cases, but the Force Zero RTA Index check was still comparing for 0.

Fixes: 2ca0d913ea ("iris: Fix framebuffer layer count")
(cherry picked from commit fc7b748086)
2019-11-05 08:46:55 -08:00
Dylan Baker
c1f1a1056f nir: correct use of identity check in python
Python has the identity operator `is`, and the equality operator `==`.
Using `is` with strings sometimes works in CPython due to optimizations
(they have some kind of cache), but it may not always work.

Fixes: 96c4b135e3
       ("nir/algebraic: Don't put quotes around floating point literals")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 717606f9f3)
2019-11-05 08:46:48 -08:00
Jon Turney
13768f3714 Fix timespec_from_nsec test for 32-bit time_t
Since struct timespec's tv_sec member is of type time_t, adjust the
expected value to allow for the truncation which will occur with 32-bit
time_t.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit dd1dba80b9)
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2043
2019-11-04 08:17:58 -08:00
Lionel Landwerlin
aab604398d mesa: check draw buffer completeness on glClearBufferfi/glClearBufferiv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 88d665830f)
2019-11-04 08:05:19 -08:00
Dylan Baker
5fbc187720 cherry-ignore: update for 19.2.3 cycle 2019-11-04 08:05:19 -08:00
Jason Ekstrand
aaba438f80 anv/tests: Zero-initialize instances
Some of the tests were actually relying on some of those uninitialized
bits to be non-zero.  In particular, a couple want use_softpin = true.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 9076e9f375)
2019-11-04 08:02:36 -08:00
Jason Ekstrand
3de8c6e8fb anv: Fix a potential BO handle leak
Fixes: 731c4adcf9 "anv/allocator: Add support for non-userptr"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit bb257e1852)
2019-11-04 08:02:36 -08:00
Pierre-Eric Pelloux-Prayer
6d96ee8507 mesa: enable msaa in clear_with_quad if needed
If the DrawBuffer sample count is > 1 and msaa is enabled we must also
enable msaa when clearing it.

Fixes: ea5b7de138 ("radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1991

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Witold Baryluk <witold.baryluk@gmail.com>
(cherry picked from commit 8a723282e3)
2019-11-04 08:02:36 -08:00
Bas Nieuwenhuizen
4575aa2a30 anv: Remove _mesa_locale_init/fini calls.
The resulting locale is not used for Vulkan, and it is not reference
counted, giving issues when multiple instances are created.

CC: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3e86d553a4)
2019-11-04 08:02:36 -08:00
Bas Nieuwenhuizen
ff9237e8ea turnip: Remove _mesa_locale_init/fini calls.
The resulting locale is not used for Vulkan, and it is not reference
counted, giving issues when multiple instances are created.

CC: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 72f858fc07)
2019-11-04 08:02:36 -08:00
Bas Nieuwenhuizen
a839a023c5 radv: Remove _mesa_locale_init/fini calls.
The resulting locale is not used for Vulkan, and it is not reference
counted, giving issues when multiple instances are created.

CC: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 344ba56b0f)
2019-11-04 08:02:36 -08:00
Bas Nieuwenhuizen
63ff2a51d3 radv: Fix timeout handling in syncobj wait.
libdrm returns -errno instead of directly the ioctl ret of -1.

Fixes: 1c3cda7d27 "radv: Add syncobj signal/reset/wait to winsys."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ec770085c2)
2019-11-04 08:02:36 -08:00
Ilia Mirkin
43234ba02a nv50/ir: mark STORE destination inputs as used
Observed an issue when looking at the code generatedy by the
image-vertex-attrib-input-output piglit test. Even though the test
itself worked fine (due to TIC 0 being used for the image), this needs
to be fixed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 1b9d1e13d8)
2019-11-04 08:02:35 -08:00
Jonathan Marek
49e105048b etnaviv: fix depth bias
Fixes remaining failures in these deqp tests (tested on GC3000/GC7000L):
dEQP-GLES2.functional.polygon_offset.*

Fixes: 6c3c05dc ("etnaviv: fix polygon offset")

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit 7b524e1acb)
2019-10-31 08:43:30 -07:00
Sagar Ghuge
db4f7eae8b intel/blorp: Assign correct view while clearing depth stencil
We never saw any failures regarding this typo but it's good to assign
correct stencil view while constructing blorp_params.

Fixes: 0cabf93b80 "intel/blorp: Add an entrypoint for clearing depth and stencil"

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
(cherry picked from commit ce208be2d8)
2019-10-30 10:55:34 -07:00
Caio Marcelo de Oliveira Filho
5290062a3f anv: Fix output of INTEL_DEBUG=bat for chained batches
The anv_batch_bo contents are linked one to another, and when printing
we have to start with the first of those.  Since in `u_vector` new
elements are added to the head, to get the first element we need the
vector's tail.

Fixes: 32ffd90002 ("anv: add support for INTEL_DEBUG=bat")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit e2155158e9)
2019-10-29 08:16:07 -07:00
Nanley Chery
7affc43e81 iris: Disallow incomplete resource creation
If a modifier specifies an aux, it must be created.

Fixes: 75a3947af4 ("iris/resource: Fall back to no aux if creation fails")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d298740a1c)
2019-10-29 08:16:03 -07:00
Nanley Chery
8cb1070ea3 iris: Don't leak the resource for unsupported modifier
Make sure the res struct is free'd before returning.

Fixes: 2dce0e94a3 ("iris: Initial commit of a new 'iris' driver for Intel Gen8+ GPUs.")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit f2fc5dece9)
2019-10-29 08:15:58 -07:00
Nanley Chery
88ca3cf11d iris: Clear ::has_hiz when disabling aux
Fixes: 2cddc953cd ("iris: some initial HiZ bits")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 6cd9731d96)
2019-10-29 08:15:52 -07:00
Nanley Chery
73a9f31b57 intel/blorp: Disable depth testing for slow depth clears
We'll start doing slow depth clears more often on HIZ_CCS buffers in a
future commit. Reduce the performance impact by making them use less
bandwidth.

From the Depth Test section of the BSpec:

   This function is enabled by the Depth Test Enable state variable. If
   enabled, the pixel's ("source") depth value is first computed. After
   computation the pixel's depth value is clamped to the range defined
   by Minimum Depth and Maximum Depth in the selected CC_VIEWPORT state.
   Then the current ("destination") depth buffer value for this pixel is
   read.

and from the Depth Buffer Updates section of the BSpec:

   If depth testing is disabled or the depth test passed, the incoming
   pixel's depth value is written to the Depth Buffer.

Taken together, it's clear that depth testing isn't necessary to perform
a depth buffer clear. Mark Janes and I analyzed this patch with
frameretrace and a depthrange piglit test. I disabled HiZ to ensure we'd
get slow depth clears. We've observed the bandwidth consumption by the
depth buffer access to be cut ~50% on BDW and SKL during depth clears.
On a more graphically intensive workload, the Shadowmapping Sascha
benchmark, I took the average of 3 runs on a BDW with a display
resolution of about 1920x1200 (minus some desktop environment
decorations). I measured a 22.61% FPS improvement when HiZ is disabled.

v2. The BSpec doesn't mandate this behavior, update comment accordingly.
    (Ken)

Fixes: bc4bb5a7e3 ("intel/blorp: Emit more complete DEPTH_STENCIL state")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d5fb9cccdc)
2019-10-29 08:15:45 -07:00
Nanley Chery
d8847c2f28 anv: Properly allocate aux-tracking space for CCS_E
add_aux_state_tracking_buffer() actually checks the aux usage when
determining how many dwords to allocate for state tracking. Move the
function call to the point after the CCS_E aux usage is assigned.

Fixes: de3be61801 ("anv/cmd_buffer: Rework aux tracking")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d0fcc2dd50)
2019-10-29 08:15:40 -07:00
Danylo Piliaiev
aa51bd5aa1 glsl: Initialize all fields of ir_variable in constructor
Better be safe, even if we could technically avoid this for
some fields.

Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1999
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Tested-by: Witold Baryluk <witold.baryluk@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 8818e0df74)
2019-10-28 08:32:19 -07:00
Tapani Pälli
5378782434 i965: setup sized internalformat for MESA_FORMAT_R10G10B10A2_UNORM
Commit d2b60e433e introduced restrictions (as per GLES spec) on the
internal format. We need to setup a sized format for the texture image
so framebuffers created with that are considered complete.

This change fixes following Android CTS test in AHardwareBufferNativeTests
category:

   SingleLayer_ColorTest_GpuColorOutputAndSampledImage_R10G10B10A2_UNORM

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: d2b60e433e ("mesa/main: R10G10B10_(A2) formats are not color renderable in ES")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 412badd059)
2019-10-28 08:32:12 -07:00
Dylan Baker
2936df10df bin/gen_release_notes.py: Add a warning if new features are introduced in a point release
Fixes: 86079447da
       ("scripts: Add a gen_release_notes.py script")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit 8a4541aae2)
2019-10-28 08:32:06 -07:00
Dylan Baker
1d86a89733 bin/gen_release_notes.py: html escape all external data
All of these (bug titles, patch titles, features, and people's names)
can contain characters that are not valid html. Just escape everything
for safety.

Fixes: 86079447da
       ("scripts: Add a gen_release_notes.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit b153785370)
2019-10-28 08:31:58 -07:00
Dylan Baker
05605ad196 bin/post_release.py: Add .html to hrefs
oops.

Fixes: 3226b12a09
       ("release: Add an update_release_calendar.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit 7e4b87f987)
2019-10-28 08:31:53 -07:00
Dylan Baker
a884f9d5e9 bin/post_version.py: white space fixes
Fixes: 3226b12a09
       ("release: Add an update_release_calendar.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit 5eef803625)
2019-10-28 08:31:46 -07:00
Dylan Baker
e9a1f47123 bin/post_version.py: Pass version as an argument
I made a bad assumption; I assumed this would be run in the release
branch. But we don't do that, we run in the master branch. As a result
we need to pass the version as an argument.

Fixes: 3226b12a09
       ("release: Add an update_release_calendar.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit abf9e7ac7b)
2019-10-28 08:31:41 -07:00
Dylan Baker
fa47f0f5cd bin/gen_release_notes.py: Return "None" if there are no new features
Which is very likely .Z > 0 releases.

Fixes: 86079447da
       ("scripts: Add a gen_release_notes.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit c6d41e7f0b)
2019-10-28 08:31:35 -07:00
Dylan Baker
701457466c bin/gen_release_notes.py: strip '#' from gitlab bugs
If they use the `Fixes: #1` form.

Fixes: 86079447da
       ("scripts: Add a gen_release_notes.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit df3d4ad82d)
2019-10-28 08:31:30 -07:00
Dylan Baker
58a37997a3 bin/gen_release_notes.py: fix conditional of bugfix
Previously this would result in the .0 warning be generated for .z > 0
and the .z == 0 would get the other message.

Fixes: 86079447da
       ("scripts: Add a gen_release_notes.py script")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit 69f540c017)
2019-10-28 08:31:24 -07:00
Illia Iorin
cab0e230ec Revert "mesa/main: Fix multisample texture initialize"
This reverts commit a113a42e73.

Per https://github.com/KhronosGroup/OpenGL-API/issues/45 it
was a wrong way to fix the issue.

Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 71d4ece366)
2019-10-28 08:31:19 -07:00
Jon Turney
87db4b9376 rbug: Fix use of alloca() without #include "c99_alloca.h"
[12/60] Compiling C object 'src/gallium/auxiliary/eb820e8@@gallium@sta/rbug_rbug_texture.c.o'.
FAILED: src/gallium/auxiliary/eb820e8@@gallium@sta/rbug_rbug_texture.c.o
[...]
../src/gallium/auxiliary/rbug/rbug_texture.c: In function 'rbug_send_texture_info_reply':
../src/gallium/auxiliary/rbug/rbug_texture.c:302:21: error: implicit declaration of function 'alloca'; did you mean 'malloc'? [-Werror=implicit-function-declaration]
  uint32_t *height = alloca(sizeof(uint32_t) * height_len);
                     ^~~~~~
                     malloc
../src/gallium/auxiliary/rbug/rbug_texture.c:302:21: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
../src/gallium/auxiliary/rbug/rbug_texture.c:303:20: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
  uint32_t *depth = alloca(sizeof(uint32_t) * height_len);
                    ^~~~~~
cc1: some warnings being treated as errors

Include c99_alloca.h to portably make the alloca() prototype available.

See also: 498d9d0f, adfb9c5c, fc8139b1

Fixes: 6174cba7 ("rbug: fix transmitted texture sizes")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit 2649609ac5)
2019-10-25 09:06:25 -07:00
Thomas Hellstrom
5f6f349bcf winsys/svga: Limit the maximum DMA hardware buffer size
The kernel total GMR/DMA size is limited, but it's definitely possible for the
kernel to allow a larger buffer allocation to succeed, but command
submission using that buffer as a GMR would fail typically causing an
application crash.

So have the winsys limit the size of GMR/DMA buffers. The pipe driver will
then resort to allocating smaller buffers and perform the DMA transfer in
multiple bands, also allowing for the pre-flush mechanism to kick in.

This avoids the related application crashes.

Fixes: e7843273fa ("winsys/svga: Update to vmwgfx kernel module 2.1")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 91146c0796)
2019-10-25 09:06:25 -07:00
Thomas Hellstrom
dacfad708f svga: Fix banded DMA upload unmap
Even with banded DMA uploads, st->hwbuf is always non-NULL, but when we've
allocated a software buffer to hold the full upload, unmapping of the
hardware buffer has already been done before
svga_texture_transfer_unmap_dma(), and the code was performing an unmap of
an already mapped buffer.

Fix this by testing for software buffer not present.

Fixes: a9c4a861d5 ("svga: refactor svga_texture_transfer_map/unmap functions")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 00db976905)
2019-10-25 09:06:25 -07:00
Marek Olšák
340849f311 util/u_queue: skip util_queue_finish if num_threads is 0
This fixes a deadlock in pthread_barrier_destroy.

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c2efd2cbfb)
2019-10-25 09:06:25 -07:00
Samuel Pitoiset
e2b416e8ae radv: fix vkUpdateDescriptorSets with inline uniform blocks
descriptorCount is the number of bytes into the descriptor, so
it shouldn't be used as an index. srcArrayElement/dstArrayElement
specify the starting byte offset within the binding to copy from/to.

This fixes new CTS tests:
dEQP-VK.binding_model.descriptor_copy.*.inline_uniform_block_*
dEQP-VK.binding_model.descriptor_copy.*.mix_3
dEQP-VK.binding_model.descriptor_copy.*.mix_array1

Fixes: 8d2654a419 ("radv: Support VK_EXT_inline_uniform_block.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 7562a2cbe3)
2019-10-25 09:06:25 -07:00
Samuel Pitoiset
a352e1d2f7 radv/gfx10: fix 3D images
GFX10 does act like GFX9 actually.

This fixes
dEQP-VK.glsl.texture_functions.query.texturesize.*sampler3d_*.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 9c92a21fe5)
2019-10-25 09:06:25 -07:00
Samuel Pitoiset
e8d4a75d9a radv: do not emit rbplus if attachments are undefined
Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer
on Raven and other chips that allow rbplus.

This just prevents a crash and rbplus probaby needs more work.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 956d825ed8)
2019-10-25 09:06:25 -07:00
Samuel Pitoiset
2bd9eb83b4 radv: do not create meta pipelines with 16 samples
The driver only supports up to 8 samples, so it's useless to
create more pipelines than needed.

This fixes a conditional jump reported by Valgrind on GFX10:

==194282== Conditional jump or move depends on uninitialised value(s)
==194282==    at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242)
==194282==    by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334)
==194282==    by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440)
==194282==    by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764)
==194282==    by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788)
==194282==    by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114)
==194282==    by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297)
==194282==    by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277)
==194282==    by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363)
==194282==    by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080

This is caused by an out of bound access of 'fmask_array' (ie. index
is 4 as for 16 samples).

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit f4ab58c1a0)
2019-10-25 09:06:25 -07:00
Lionel Landwerlin
f0cad4c038 anv: fix unwind of vkCreateDevice fail
We're skipping the context destruction in some cases which is the
grand scheme of thing is not that important because closing device->fd
will destroy the associated context as well.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Fixes: b30e01aef5 ("anv: fix memory leak on device destroy")
(cherry picked from commit 0dfa643feb)
2019-10-25 09:06:25 -07:00
Dylan Baker
c5f5ce1e37 Bump version for 19.2.2 release 2019-10-23 09:06:39 -07:00
Dylan Baker
db2797251d docs: Add release notes for 19.2.2 2019-10-23 09:06:39 -07:00
Samuel Pitoiset
56f0434232 radv: fix updating bound fast ds clear values with different aspects
On GFX9, the driver is able to do an optimized fast depth/stencil
clear with only one aspect (ie. clear the stencil part of a
depth/stencil image). When this happens, the driver should only
update the clear values of the given aspect.

Note that it's currently only supported on GFX9 but I have some
local patches that extend this optimized path for other gens.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit a13320370e)
2019-10-22 08:51:44 -07:00
Bas Nieuwenhuizen
ad56aaef4f radv: Fix single stage constant flush with merged shaders.
e.g. a VERTEX only flush with tess on Vega should look at the TCS
to see which bits are needed.

CC: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1953
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit fd21ee8b52)
Conflicts resolved by Dylan Baker

Conflicts:
	src/amd/vulkan/radv_cmd_buffer.c
2019-10-22 08:51:15 -07:00
Lepton Wu
35b900310b egl/android: Remove our own reference to buffers.
We currently doesn't maintain it correctly and the buffer gets leaked if
surface is destroyed before calling swapping buffers.

From Android frameworks/native/libs/nativewindow/include/system/window.h:

  The window holds a reference to the buffer between dequeueBuffer and
  either queueBuffer or cancelBuffer, so clients only need their own
  reference if they might use the buffer after queueing or canceling it.

v2: Remove our own reference.

Fixes: 0212db3504 ("egl/android: Cancel any outstanding ANativeBuffer in surface destructor")

Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
(cherry picked from commit f4ba31ff50)
2019-10-21 08:59:25 -07:00
Lionel Landwerlin
cb0215a6fb anv: fix memory leak on device destroy
v2: handle vma destruction if vkCreateDevice fails (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1959
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit b30e01aef5)
2019-10-21 08:59:18 -07:00
Lionel Landwerlin
425fbe2902 anv: fix vkUpdateDescriptorSets with inline uniform blocks
With inline uniform blocks descriptor, the meaning of descriptorCount
is a number of bytes to copy into the descriptor. Don't try to use
that size as an index into the descriptor table.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 43f40dc7cb ("anv: Implement VK_EXT_inline_uniform_block")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1195
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 3f8f52b241)
2019-10-21 08:59:11 -07:00
Lucas Stach
62f9ba1bf2 rbug: unwrap index buffer resource
All resources passed to the drivers below rbug need to be unwrapped before
being passed down. We missed to do this for the index buffer resource when
this was made part of the draw_info structure.

Fixes: 330d0607ed (gallium: remove pipe_index_buffer and set_index_buffer)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
(cherry picked from commit a75eb888e0)
2019-10-18 09:56:11 -07:00
Lucas Stach
306e82acb6 rbug: fix transmitted texture sizes
The rbug wire format defines the texture size parameters to be uint32_t sized
and uses memcpy to move the function parameters to the message structure.
This caused totally wrong transmitted texture sizes since the height and depth
paramterds have been changed to uint16_t in the gallium API. Fix this by doing
an explicit conversion to the correct representation before packing into the
wire message.

Fixes: e6428092f5 (gallium: decrease the size of pipe_resource - 64 -> 48 bytes)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
(cherry picked from commit 6174cba748)
2019-10-18 09:56:07 -07:00
Ian Romanick
ef906b4636 intel/vec4: Don't try both sources as immediates for DPH
DPH isn't actually commutative, so this doesn't work.  If the immediate
in src0 would be a VF candidate, we could do better. *shrug*

No shader-db changes on any Intel platform.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b04beaf41d ("intel/vec4: Try both sources as candidates for being immediates")
(cherry picked from commit 92252219d3)
2019-10-18 09:56:03 -07:00
Ian Romanick
e958b35a40 nir/search: Fix possible NULL dereference in is_fsign
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
(cherry picked from commit 050e4e28bf)
2019-10-18 09:55:58 -07:00
Roland Scheidegger
16af8e9772 gallivm: Fix saturated signed psub/padd intrinsics on llvm 8
LLVM 8 did remove both the signed and unsigned sse2/avx intrinsics in
the end, and provide arch-independent llvm intrinsics instead.
Fixes a crash when using snorm framebuffers (tested with piglit
arb_color_buffer_float-render GL_RGBA8_SNORM -auto).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 045f05a2f6)
Conflicts resolved by Dylan Baker
2019-10-17 09:29:01 -07:00
Samuel Pitoiset
01e31f8cab radv: fix DCC fast clear code for intensity formats (correctly)
Previous fix was pretty bogus.

This fixes a rendering regression with Nier (minimap too large).

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1943
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1952
Fixes: ea92273cea ("radv: fix DCC fast clear code for intensity formats")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit c644644c65)
2019-10-17 09:12:02 -07:00
Eric Engestrom
4d6fcddc65 util/u_atomic: fix return type of p_atomic_{inc,dec}_return() and p_atomic_{cmp,}xchg()
We're trying to cast the return type to the type of the var, but instead
we were casting `sizeof(*v)`.

Fixes: 6df72e970c ("util: Make u_atomic.h typeless.")
Fixes: 0a7f17cf5b ("util/u_atomic: add p_atomic_xchg")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit aaab70035a)
2019-10-17 09:11:21 -07:00
Pierre-Eric Pelloux-Prayer
5694c20188 mesa: fix invalid target error handling for teximage
This commit moves the target check before using _mesa_get_current_tex_object
to fix a "Mesa implementation error: bad target in _mesa_get_current_tex_object()"
error.

Fixes: 9dd1f7cec0 ("mesa: pass gl_texture_object as arg to not depend on state")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 16233797f4)
2019-10-17 09:11:13 -07:00
James Xiong
9a929cfef2 iris: finish aux import on get_param
A buffer and its aux are imported separately, if the aux import is
not completed yet when resource_get_param is called, merge the
separate aux a.k.a the 2nd image into the main image.

Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")

Signed-off-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit fd235484fe)
2019-10-17 09:11:06 -07:00
Lionel Landwerlin
ae4f569232 etnaviv: remove variable from global namespace
Found out by accident this was clashing with another driver.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 701e0ac077)
2019-10-17 09:11:00 -07:00
Alan Coopersmith
9d13289ad4 intel/common: include unistd.h for ioctl() prototype on Solaris
Fixes build errors of:
In file included from ../src/intel/vulkan/anv_private.h:48,
                 from ../src/intel/vulkan/genX_blorp_exec.c:26:
../src/intel/common/gen_gem.h: In function ‘gen_ioctl’:
../src/intel/common/gen_gem.h:68:15: error: implicit declaration of function ‘ioctl’ [-Werror=implicit-function-declaration]
   68 |         ret = ioctl(fd, request, arg);
      |               ^~~~~
In file included from ../include/c11/threads_posix.h:35,
                 from ../include/c11/threads.h:66,
                 from ../src/mesa/main/mtypes.h:39,
                 from ../src/intel/compiler/brw_compiler.h:30,
                 from ../src/intel/vulkan/anv_private.h:51,
                 from ../src/intel/vulkan/genX_blorp_exec.c:26:
/usr/include/unistd.h: At top level:
/usr/include/unistd.h:471:12: error: conflicting types for ‘ioctl’
  471 | extern int ioctl(int, int, ...);
      |            ^~~~~
/usr/include/unistd.h:471:1: note: a parameter list with an ellipsis can’t match an empty parameter name list declaration
  471 | extern int ioctl(int, int, ...);
      | ^~~~~~
In file included from ../src/intel/vulkan/anv_private.h:48,
                 from ../src/intel/vulkan/genX_blorp_exec.c:26:
../src/intel/common/gen_gem.h:68:15: note: previous implicit declaration of ‘ioctl’ was here
   68 |         ret = ioctl(fd, request, arg);
      |               ^~~~~

Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 6804b8e1ff)
2019-10-17 09:09:03 -07:00
Alan Coopersmith
9b49a4ea12 meson: recognize "sunos" as the system name for Solaris
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit d8a9420f6f)
Minor conflicts resolved by Dylan Baker
2019-10-17 09:08:50 -07:00
Alan Coopersmith
55a04df479 util: Solaris has linux-style pthread_setname_np
Fixes: dcf9d91a ("util: Handle differences in pthread_setname_np")

Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 7040795a69)
2019-10-17 09:08:10 -07:00
Alan Coopersmith
45aa00da9f util: Workaround lack of flock on Solaris
v2: Replace autoconf check for flock() with meson check

Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit b3028a9fb8)
Minor conflicts resolved by Dylan Baker
2019-10-17 09:07:40 -07:00
Alan Coopersmith
a1e6d1fb30 util: Make Solaris implemention of p_atomic_add work with gcc
gcc is very particular about where you place the (void) cast
The previous placement made it error out with:

In file included from disk_cache.c:40:0:
../../src/util/u_atomic.h:203:29: error: void value not ignored as it ought to be
 #define p_atomic_add(v, i) ((void)         \
                              ^
disk_cache.c:658:4: note: in expansion of macro ‘p_atomic_add’
    p_atomic_add(cache->size, size);
    ^

Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit a56c3e3a47)
2019-10-17 09:07:01 -07:00
Alan Coopersmith
2380173433 c99_compat.h: Don't try to use 'restrict' in C++ code
Fixes build failures on Solaris in C++ files using gcc:

../src/util/u_math.h:628:41: error: expected ‘,’ or ‘...’ before ‘dest’
  628 | util_memcpy_cpu_to_le32(void * restrict dest, const void * restrict src, size_t n)
      |                                         ^~~~
../src/util/u_math.h: In function ‘void* util_memcpy_cpu_to_le32(void*)’:
../src/util/u_math.h:641:18: error: ‘dest’ was not declared in this scope
  641 |    return memcpy(dest, src, n);
      |                  ^~~~
../src/util/u_math.h:641:24: error: ‘src’ was not declared in this scope
  641 |    return memcpy(dest, src, n);
      |                        ^~~
../src/util/u_math.h:641:29: error: ‘n’ was not declared in this scope; did you mean ‘yn’?
  641 |    return memcpy(dest, src, n);
      |                             ^
      |                             yn

Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit ddde652e70)
2019-10-17 09:07:00 -07:00
Samuel Pitoiset
45ebe99a88 Revert "radv: do not emit PKT3_CONTEXT_CONTROL with AMDGPU 3.6.0+"
This reverts commit 2ca8629fa9.

This was initially ported from RadeonSI, but in the meantime it has
been reverted because it might hang. Be conservative and re-introduce
this packet emission.

Unfortunately this doesn't fix anything known.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 4a3bdc6d22)
2019-10-15 10:05:50 -07:00
Lucas Stach
c1ca1602dd etnaviv: fix vertex buffer state emission for single stream GPUs
GPUs with a single supported vertex stream must use the single state
address to program the stream.

Fixes: 3d09bb390a (etnaviv: GC7000: State changes for HALTI3..5)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
(cherry picked from commit ce23bc9283)
2019-10-15 10:05:26 -07:00
Kenneth Graunke
91960ae890 iris: Implement the Gen < 9 tessellation quads workaround
Fixes several CTS tests:
- KHR-GL46.tessellation_shader.vertex.vertex_spacing
- KHR-GL46.tessellation_shader.tessellation_shader_point_mode.points_verification

Fixes: 823609b1a3 ("iris/WIP: add broadwell support")
(cherry picked from commit ac7af7c500)
2019-10-14 11:16:04 -07:00
Samuel Pitoiset
59e56bf05d radv: fix DCC fast clear code for intensity formats
This fixes a rendering issue with DiRT 4 on GFX10. Only GFX10 was
affected because intensity formats are different.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1923
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit ea92273cea)
2019-10-14 11:16:00 -07:00
Timothy Arceri
6f30614d73 glsl: fix crash compiling bindless samplers inside unnamed UBOs
The check to see if we were dealing with a buffer block was
too late and only worked for named UBOs.

Fixes: f32b01ca43 "glsl/linker: remove ubo explicit binding handling"

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1900
(cherry picked from commit 1294f01e06)
2019-10-14 11:15:56 -07:00
Bas Nieuwenhuizen
b87edab8a2 nir/dead_cf: Remove dead control flow after infinite loops.
And after discard-only loops. Otherwise we end up with dead code
which confuses nir_repair_ssa into adding a whole bunch of uses
of undefined. However, for derefs, we sometimes always expect to
get a variable instead of undefined.

Fixes dEQP-VK.graphicsfuzz.write-red-in-loop-nest on radv.

Fixes: c832820ce9 "nir/dead_cf: Repair SSA if the pass makes progress"
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1928
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit 6da3bf2600)
2019-10-11 11:09:52 -07:00
Eric Engestrom
8355658fa8 meson: skip installation of GLVND-provided headers
Fixes: 93df862b6a ("meson: re-add incorrect pkg-config files with GLVND for backward compatibility")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1846
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 34ba363ab0)
2019-10-11 11:09:47 -07:00
Eric Engestrom
089aa74d57 meson: split Mesa headers as a separate installation
Fixes: 93df862b6a ("meson: re-add incorrect pkg-config files with GLVND for backward compatibility")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 1a7e9652c4)
2019-10-11 11:09:41 -07:00
Eric Engestrom
7f0d0ab83d meson: split headers one per line
Fixes: 93df862b6a ("meson: re-add incorrect pkg-config files with GLVND for backward compatibility")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit daae003f47)
2019-10-11 11:09:37 -07:00
Eric Engestrom
7b70f2ec47 meson: move a couple of include installs around
Preparation for a later commit.

Fixes: 93df862b6a ("meson: re-add incorrect pkg-config files with GLVND for backward compatibility")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit b9a5fb1f05)
2019-10-11 11:09:32 -07:00
Eric Engestrom
f446e56d30 meson: rename glvnd_missing_pc_files to not glvnd_has_headers_and_pc_files
This reflects better what is provided by glvnd or not.

Fixes: 93df862b6a ("meson: re-add incorrect pkg-config files with GLVND for backward compatibility")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit b57fa7ca49)
2019-10-11 11:09:26 -07:00
Eric Engestrom
767965b6fa GL: drop symbols mangling support
SCons and Meson have never supported that feature, and Autotools was
deleted over 6 months ago and no-one complained yet, so it's pretty
obvious nobody cares about it.

Fixes: 95aefc94a9 ("Delete autotools")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit a0829cf23b)
2019-10-11 11:09:22 -07:00
Bas Nieuwenhuizen
3deb4fa226 radv: Disallow sparse shared images.
Since we really cannot share them ever.

Also remove an unused switch.

Fixes: b70829708a "radv: Implement VK_KHR_external_memory"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 53b1372571)
2019-10-11 11:09:16 -07:00
Connor Abbott
0056943e69 nir/sink: Don't sink load_ubo to outside of its defining loop
Previously, this could have made the resource divergent in code like
that which is genereated by nir_lower_non_uniform_access.

Fixes: da8ed68a ('nir: replace nir_move_load_const() with nir_opt_sink()')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 5ac32b2954)
2019-10-10 09:09:35 -07:00
Connor Abbott
d14d70de2f nir/sink: Rewrite loop handling logic
Previously, for code like:
loop {
    loop {
        a = load_ubo()
    }
    use(a)
}
adjust_block_for_loops() would return the block before the first loop.
Now we compute the range of allowed blocks and then walk the dominance
tree directly, guaranteeing directly that we always choose a block that
dominates all the uses and is dominated by the definition.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit af9296b8c0)
2019-10-10 09:09:27 -07:00
Alejandro Piñeiro
0beee2f723 v3d: take into account prim_counts_offset
Specifically when reading the primitive counters.

This fixed ~700 CTS tests using this pattern:
dEQP-GLES3.functional.transform_feedback.*

when run after tests like
dEQP-GLES3.functional.prerequisite.read_pixels on the same
caselist. When run individually those tests were passing because
prim_counts_offset was zero.

Fixes: 0f2d1dfe65 ("v3d: use the GPU to
       record primitives written to transform feedback")

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit fa41a51891)
2019-10-10 09:05:34 -07:00
Samuel Pitoiset
c48dc6ad5f radv: bump minTexelBufferOffsetAlignment to 4
The spec has probably been misinterpreted during RADV bringup.

This fixes GPU hangs with dEQP-VK.binding_model.*offset_nonzero*.

Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 030e67fac3)
2019-10-10 09:04:55 -07:00
Samuel Pitoiset
03df69d6a1 drirc: enable vk_x11_override_min_image_count for DOOM
DOOM fails to handle more images than expected when the adaptative
sync mode is enabled.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1902
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit ad96c4987c)
2019-10-10 09:04:55 -07:00
Clément Guérin
47bc45ba1a radeonsi: enable zerovram for Rocket League
Fixes corruption on game startup.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1888

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit 5afbe87d21)
2019-10-10 09:04:55 -07:00
Kenneth Graunke
aa89c0a2bd iris: Properly unreference extra VBOs for draw parameters
bound_vertex_buffers doesn't include extra draw parameters buffers.
Tracking this correctly is kind of complicated, and iris_destroy_state
isn't exactly in a hot path, so just loop over all VBO bindings.

Fixes: 4122665dd9 (iris: Enable ARB_shader_draw_parameters support)
Reported-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
(cherry picked from commit face221283)
2019-10-10 09:04:55 -07:00
Dylan Baker
3e15620451 docs: Add SHA256 sum for 19.2.1 2019-10-09 10:19:16 -07:00
Dylan Baker
877417918f Bump version for 19.2.1 2019-10-09 09:47:07 -07:00
Dylan Baker
e19dc53aae docs: Add relnotes for 19.2.1 2019-10-09 09:47:07 -07:00
Dylan Baker
a06f8341d8 bin: delete unused releasing scripts
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit 974e3ad004)
2019-10-09 09:46:04 -07:00
Dylan Baker
4f38287970 release: Add an update_release_calendar.py script
This script is for updating post version bump.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit 3226b12a09)
2019-10-09 09:46:04 -07:00
Dylan Baker
a4348e9594 scripts: Mesa 19.2.x only implements GL 4.5 2019-10-09 09:46:04 -07:00
Dylan Baker
61b3371acc scripts: Add a gen_release_notes.py script
This script is responsible for generating an entire page in the
docs/relnotes/ directory. It includes a template for the page, and uses
mako to fill in the necessary bits. It is designed to be purely fire and
forget, calculating previous versions, shortlogs, bug fixes, and dates.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
(cherry picked from commit 86079447da)
2019-10-09 09:15:28 -07:00
Eric Engestrom
5a0361ce09 meson: add missing idep_nir_headers in iris_gen_libs
Fixes: 4929f020c3 ("iris: better SBE")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 731097c747)
2019-10-08 09:18:31 -07:00
Tapani Pälli
06b8b29a0a anv/android: fix images created with external format support
This fixes a case where user first creates image and then later binds it
with memory created from AHW buffer.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit e4a826b2c8)
2019-10-08 09:18:22 -07:00
Prodea Alexandru-Liviu
9052318565 scons/MSYS2-MinGW-W64: Fix build options defaults
Signed-off-by: Prodea Alexandru-Liviu <liviuprodea@yahoo.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Cc: <mesa-stable@lists.freedesktop.org>

When building in a MSYS2 Mingw-w64 environment Mesa3D sets wrong default build options which inevitably lead to build failure.

(cherry picked from commit 6309c31fd8)
2019-10-07 10:49:04 -07:00
Lionel Landwerlin
85193e808a intel/isl: Set null surface format to R32_UINT
It appears we never had a test in piglit or deqp sampling from a null
surface...

It turns out this triggers a hang on IVB only. Updating the null
surface format to R32_UINT fixes the hang on ivb and doesn't affect
other platforms, so set it by default for all platforms.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1872
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c445d6f66e)
2019-10-07 10:48:14 -07:00
Lionel Landwerlin
34d738ff2e intel: fix subslice computation from topology data
We're missing the offset of the slice in the subslice mask...

This worked for most platforms that don't have first slice fused off
because we would reread the same mask from slice0 again and again...

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c1900f5b0f ("intel: devinfo: add helper functions to fill fusing masks values")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1869
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit d36763b2a4)
2019-10-07 10:48:14 -07:00
Andres Gomez
c289ac9a22 egl: Remove the 565 pbuffer-only EGL config under X11.
The CTS finally has agreed to drop the requirement for a
565-no-depth-no-stencil config for ES 3.0. Hence we can now remove the
code to satisfy this requirement using a pbuffer-only visual with
whatever other buffers the driver happens to have given us.

This reverts commit 82607f8a90,
commit 6ad31c4ff3 and
commit dacb11a585.

v2:
  - Reference the VK-GL-CTS issue (Eric E.).

v3:
  - Don't revert
    fc21394bc4 ("egl: Quiet warning about front buffer rendering for pixmaps/pbuffers")
    (Kenneth).

References: VK-GL-CTS issue 1601.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Andres Gomez <agomez@igalia.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 02c265be9d)
Conflicts resolved by Dylan Baker
2019-10-07 10:48:14 -07:00
Dylan Baker
bad5e64da8 meson: Only error building gallium video without libdrm when the platform is drm
Fixes: 3b265f61f5
       ("meson: gallium media state trackers require libdrm with x11")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1878
Tested-by: Vinson Lee <vlee@freedesktop.org>
(cherry picked from commit 1481d05409)
2019-10-07 10:48:14 -07:00
Dylan Baker
52cf623955 .cherry-ignore: Update for 19.2.1 cycle 2019-10-07 10:48:14 -07:00
Bas Nieuwenhuizen
78a05b8cbb radv: Fix condition for skipping the continue CS.
We need the continue CS for referencing the tess/GDS/sample position BOs.

Fixes: 46e52df34d "radv: add tessellation ring allocation support. (v2)"
Fixes: e1dc3ab753 "radv/gfx10: allocate GDS/OA buffer objects for NGG streamout"
Fixes: 1171b304f3 "radv: overhaul fragment shader sample positions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 8ad3d8b178)
2019-10-04 15:54:22 +02:00
Lionel Landwerlin
676471a092 intel: fix topology query
i915 will report ENODEV on generations prior to Haswell because there
is no point in reporting values on those. This is prior any fusing
could happen on parts with identical PCI ids.

This query call was previously only triggered on generations that
support performance queries, which happens to match generation for
which i915 reports topology, but the commit pointed below started
using it on all generations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1860
Cc: <mesa-stable@lists.freedesktop.org>
Fixes: 96e1c945f2 ("i965: Move device info initialization to common code")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
(cherry picked from commit 1c6fdbc83c)
2019-10-03 09:56:36 -07:00
Marek Olšák
0b97377f58 radeonsi/gfx10: fix corruption for chips with harvested TCCs
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 235ebe9163)
2019-10-01 15:29:22 -07:00
Marek Olšák
33eecbcc9b ac: add radeon_info::tcc_harvested
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 8cbe83445b)
Conflicts resolved by Dylan Baker

Conflicts:
	src/amd/common/ac_gpu_info.h
2019-10-01 15:28:53 -07:00
Lionel Landwerlin
2bbe4c69c8 mesa: don't forget to clear _Layer field on texture unit
On the Android Antutu benchmark we ran into an assert in ISL where the
(base layer + num layers) > total layers. It turns out the core of
mesa forgot to clear the _Layer variable, potentially leaving an
inconsistent value.

v2: Pull setting u->_Layer out of the conditional blocks (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 2208d79dde)
2019-10-01 12:30:07 -07:00
Ken Mays
db56bc2c52 haiku: fix Mesa build
1. The hgl.c file is a read-only file versus read-write.
Ref: src/gallium/state_trackers/hgl/hgl.c

2.  I've included the Haiku-specific patches I used to get a successful
build of Mesa 19.1.7 on Haiku using the meson/ninja build procedure.
Shows "[764/764] linking target ... libswpipe.so" at build completion.

v2:
Remove autotools files (Eric)

v3:
Update the patch

Reported-by: Ken Mays <kmays2000@gmail.com>
Tested-by: Ken Mays <kmays2000@gmail.com>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Alexander von Gluck IV <kallisti5@unixzen.com>
(cherry picked from commit 4943c89d6d)
2019-10-01 12:30:02 -07:00
Kenneth Graunke
4169d86913 iris: Fix iris_rebind_buffer() for VBOs with non-zero offsets.
We can't just check for the BO base address, we need to check for the
full address including any offset we may have applied.  When updating
the address, we need to include the offset again.

Fixes: 5ad0c88dbe ("iris: Replace buffer backing storage and rebind to update addresses.")
(cherry picked from commit 309924c3c9)
2019-10-01 12:29:58 -07:00
Dylan Baker
680e18c159 meson: gallium media state trackers require libdrm with x11
v2: - update copyright year in all changed files
    - rebase on master

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 3b265f61f5)
2019-10-01 12:29:54 -07:00
Kenneth Graunke
d4dab05a09 iris: Disable CCS_E for 32-bit floating point textures.
A while back, Michael Larabel noticed that Paraview's Wavelet Volume
case runs significantly slower on iris than i965.  It turns out this
is because we enable CCS_E for 32-bit floating point formats, while
i965 disables it, with an oblique comment saying that we benchmarked
it (on what exactly?) and determined that it was a loss.

Paraview uses both R32_FLOAT and R32G32B32A32_FLOAT, and I observed
large framerate drops when enabling CCS_E for either format.  However,
several other benchmarks (Aztec Ruins, many Synmark cases) use 16-bit
floating point formats, with no apparent ill effects.

So, disable compression for 32-bit float formats for now, but leave it
enabled for 16-bit float formats as they seem to be working fine.

Improves performance in Paraview's Wavelet Volume test by 62% on a
Skylake GT4e.

Fixes: 3cfc6a207b ("iris: Fill out res->aux.possible_usages")
(cherry picked from commit a0a93763fb)
2019-10-01 12:29:51 -07:00
Marek Olšák
f5cccfe0e6 ac: fix num_good_cu_per_sh for harvested chips
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit b7c2f7c5a6)
2019-10-01 12:29:45 -07:00
Marek Olšák
0a2285b1d4 ac: fix incorrect vram_size reported by the kernel
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 7d97013294)
2019-10-01 12:29:15 -07:00
Marek Olšák
8f95245068 radeonsi/gfx10: fix L2 cache rinse programming
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 3c0938bece)
2019-10-01 12:29:11 -07:00
pal1000
a413b55157 scons/windows: Support build with LLVM 9.
As X86AsmPrinter component is gone, LLVMX86AsmPrinter got replaced
with LLVMRemarks, LLVMBitstreamReader and LLVMDebugInfoDWARF.

Tests done with llvm-config on both LLVM 8 and 9 indicate that
mcjit, bitwriter and x86asmprinter fully fit inside engine component.

On other platforms and with meson build mcdisassembler was used to replace
X86AsmPrinter but mcdisassembler also fully fits inside engine component
for LLVM>=8 according to same tests.

v2: Avoid duplicating code related to Mingw pthreads.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>

On 19.1 this patch does not apply cleanly without 88eb2a1f

(cherry picked from commit bcb4dfb14b)
2019-09-30 09:10:21 -07:00
Michel Zou
5a027b6201 scons: add py3 support
SCons 3.1 has moved to python 3, requiring this fix
to continue supporting scons builds.

Closes: #944
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 3f92d17894)
2019-09-30 09:10:16 -07:00
Mauro Rossi
769a18d1f3 android: compiler/nir: build nir_divergence_analysis.c
Prerequisite to avoid following radv linking error happening with aco

FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/vulkan.radv_intermediates/LINKED/vulkan.radv.so
...
external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:178:
error: undefined reference to 'nir_divergence_analysis'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: df86c5f ("nir: add divergence analysis pass.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
(cherry picked from commit 268fb10e9c)
2019-09-30 09:10:11 -07:00
Andrii Simiklit
cb2649768f glsl: disallow incompatible matrices multiplication
glsl 4.4 spec section '5.9 expressions':
"The operator is multiply (*), where both operands are matrices or one operand is a vector and the
 other a matrix. A right vector operand is treated as a column vector and a left vector operand as a
 row vector. In all these cases, it is required that the number of columns of the left operand is equal
 to the number of rows of the right operand. Then, the multiply (*) operation does a linear
 algebraic multiply, yielding an object that has the same number of rows as the left operand and the
 same number of columns as the right operand. Section 5.10 “Vector and Matrix Operations”
 explains in more detail how vectors and matrices are operated on."

This fix disallows a multiplication of incompatible matrices like:
mat4x3(..) * mat4x3(..)
mat4x2(..) * mat4x2(..)
mat3x2(..) * mat3x2(..)
....

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111664
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
(cherry picked from commit b32bb888c7)
2019-09-30 09:10:07 -07:00
Jason Ekstrand
e6edeebd15 intel/fs: Fix fs_inst::flags_read for ANY/ALL predicates
Without this, we were DCEing flag writes because we didn't think their
results were used because we didn't understand that an ANY32 predicate
actually read all the flags.

Fixes: df1aec763e "i965/fs: Define methods to calculate the flag..."
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 6c858b9a91)
2019-09-30 09:10:02 -07:00
Dylan Baker
2dbf10ba3d meson: Link xvmc with libxv
Prior to xvmc 1.0.12 libxvmc incorrectly required libxv, but that was
fixed. This results in compilation failures for the gallium xvmc tracker
and tools. This patch fixes that by explicitly linking to libxv.

Fixes: 22a817af8a
       ("meson: build gallium xvmc state tracker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1844
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit e456a053c3)
2019-09-30 09:09:57 -07:00
Dylan Baker
daeb959c91 meson: Try finding libxvmcw via pkg-config before using find_library
This fixes cross compiling issues, because pkg-config is less likely to
get the wrong libs.

v2: - Fix typo in comment

Fixes: 22a817af8a
       ("meson: build gallium xvmc state tracker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/939
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit 8c5c21d7e3)
2019-09-30 09:09:52 -07:00
Andreas Gottschling
db1ed17ac4 drisw: Fix shared memory leak on drawable resize
XDestroyImage will mark the segment as to-be-destroyed, but it will
persist until we detach it, and we weren't doing so.

Cc: mesa-stable@lists.freedesktop.org
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/121
Reviewed-by: Adam Jackson <ajax@redhat.com>
(cherry picked from commit c5a2ccec5e)
2019-09-30 09:09:47 -07:00
pal1000
dc0995669d scons: Fix MSYS2 Mingw-w64 build.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

This patch is based on 28e3f85e09/mingw-w64-mesa/link-ole32.patch but with tweaks to avoid MSVC build break when applied.

v2: Create Mingw platform alias pointing to windows host platform define to avoid spurious crosscompilation;

v3: Fix obviously wrong compiler flags for swr driver;

v4: Update original patch URL because it has been relocated;

v5: Don't bother patching autools stuff as it's not used by MSYS2 Mingw-w64 build and it's days are numbered anyway;

v6: After Mingw posix flag fix in 295851eb things are far simpler as we don't need more linking of uuid, ole32, version and shell32 than what is already in place.
(cherry picked from commit ffb0d3a25c)
2019-09-30 09:09:00 -07:00
Michel Dänzer
434ab094c0 radeonsi: fix VAAPI segfault due to various bugs
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111236
(cherry picked from commit 67d930d64b)
2019-09-26 14:40:54 -07:00
Tapani Pälli
e35a7a0238 iris: disable aux on first get_param if not created with aux
This moves the fix from commit 361f3d19f1 to happen in get_param
(used now instead of get_handle by st/dri). This fixes artifacts
seen with Xorg and CCS_E.

Fixes: fc12fd05f5 "iris: Implement pipe_screen::resource_get_param"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit f4d9169204)
2019-09-26 08:47:50 -07:00
Marek Olšák
95d87a897b gallium: extend resource_get_param to be as capable as resource_get_handle
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d307aa56f9)
2019-09-26 08:47:50 -07:00
Lionel Landwerlin
c4b70fef71 intel: use proper label for Comet Lake skus
Fixes: 82f6a746e8 ("intel: Add support for Comet Lake")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 813f3460e7)
2019-09-26 08:47:50 -07:00
Ian Romanick
bc6cc94d5a nir/range-analysis: Bail if the types don't match
Some shaders are hurt by this change because now a
load_const(0x00000000) is not recognized as eq_zero when loaded as a
float.  This behavior is restored in a later patch (nir/range-analysis:
Use types to provide better ranges from bcsel and mov).

v2: Add a comment about reinterpretation of int/uint/bool.  Suggested by
Caio.  Rewrite condition the check for types being float versus checking
for types not being all the things that aren't float.

Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16327543 -> 16328255 (<.01%)
instructions in affected programs: 55928 -> 56640 (1.27%)
helped: 0
HURT: 208
HURT stats (abs)   min: 1 max: 16 x̄: 3.42 x̃: 3
HURT stats (rel)   min: 0.33% max: 6.74% x̄: 1.31% x̃: 1.12%
95% mean confidence interval for instructions value: 3.06 3.79
95% mean confidence interval for instructions %-change: 1.17% 1.46%
Instructions are HURT.

total cycles in shared programs: 363682759 -> 363683977 (<.01%)
cycles in affected programs: 325758 -> 326976 (0.37%)
helped: 44
HURT: 133
helped stats (abs) min: 1 max: 179 x̄: 33.61 x̃: 5
helped stats (rel) min: 0.06% max: 14.21% x̄: 2.47% x̃: 0.29%
HURT stats (abs)   min: 1 max: 157 x̄: 20.28 x̃: 14
HURT stats (rel)   min: 0.07% max: 14.44% x̄: 1.42% x̃: 0.73%
95% mean confidence interval for cycles value: 0.38 13.39
95% mean confidence interval for cycles %-change: -0.06% 0.96%
Inconclusive result (%-change mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10787433 -> 10787443 (<.01%)
instructions in affected programs: 1842 -> 1852 (0.54%)
helped: 0
HURT: 10
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.33% max: 1.85% x̄: 0.73% x̃: 0.49%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.36% 1.10%
Instructions are HURT.

total cycles in shared programs: 153724543 -> 153724563 (<.01%)
cycles in affected programs: 8407 -> 8427 (0.24%)
helped: 1
HURT: 3
helped stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18
helped stats (rel) min: 0.98% max: 0.98% x̄: 0.98% x̃: 0.98%
HURT stats (abs)   min: 4 max: 18 x̄: 12.67 x̃: 16
HURT stats (rel)   min: 0.21% max: 0.75% x̄: 0.56% x̃: 0.72%
95% mean confidence interval for cycles value: -21.31 31.31
95% mean confidence interval for cycles %-change: -1.11% 1.46%
Inconclusive result (value mean confidence interval includes 0).

No shader-db changes on Iron Lake or GM45.

(cherry picked from commit 018d2b524a)
2019-09-26 08:47:50 -07:00
Lionel Landwerlin
6273d4d4ed anv: gem-stubs: return a valid fd got anv_gem_userptr()
Fixes invalid close(-1) in the unit tests.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit da2d67fc3b)
2019-09-26 08:47:50 -07:00
Danylo Piliaiev
960ab3e465 st/nine: Ignore D3DSIO_RET if it is the last instruction in a shader
RET as a last instruction could be safely ignored.
Remove it to prevent crashes/warnings in case underlying driver
doesn't implement arbitrary returns.

A better way would be to remove the RET after the whole shader
is parsed which will handle a possible case when the last RET is
followed by a comment.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
(cherry picked from commit 2d8f77db83)
2019-09-26 08:47:50 -07:00
Dylan Baker
41b57b8b73 meson: fix logic for generating .pc files with old glvnd
We want to generate PC files for non-glvnd builds and for builds with
old glvnd, but the current logic doesn't do that, it builds them
unconditionally, and for GLES it builds the shared libraries, which is
also not what we want. This does not generate .pc files for gles1 or
gles2. Which it we weren't doing before either, making this not a
regression but a return to status-quo.o

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1838
Fixes: 93df862b6a
       ("meson: re-add incorrect pkg-config files with GLVND for backward compatibility")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit fafd20f67d)
2019-09-26 08:47:50 -07:00
Erik Faye-Lund
109137ee7b glsl: correct bitcast-helpers
Without this, we'll incorrectly round off huge values to the nearest
representable double instead of keeping it at the exact value  as
we're supposed to.

Found by inspecting compiler-warnings.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 85faf5082f ("glsl: Add 64-bit integer support for constant expressions")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 88f909eb37)
2019-09-26 08:47:50 -07:00
Rhys Perry
f4faf5cbd7 nir/opt_remove_phis: handle phis with no sources
This can happen with loops with unreachable exits which are later
optimized away.

Fixes assertion in dEQP-VK.graphicsfuzz.unreachable-loops with RADV.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 12372d60ff)
2019-09-26 08:47:50 -07:00
Marek Olšák
b29682c290 gallium/vl: don't set PIPE_HANDLE_USAGE_EXPLICIT_FLUSH
because vl doesn't call flush_resource and I wasn't able to find
all places where flush_resource needs to be called.

This fixes corrupted / unflushed surfaces with fullscreen videos on Raven.

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit f52afdf672)
2019-09-26 08:47:50 -07:00
Connor Abbott
1b67e47c0c nir/opt_large_constants: Handle store writemasks
This fixes some piglit tests on radeonsi NIR where a varying is
initialized to a constant array in the vertex shader. Varying packing
after nir_lower_io_to_temporaries creates writemasked stores which
persist after pulling the constant initialization down into the fragment
shader.

While we're here, rewrite handle_constant_store() to do the loop over
components outside the switch, so that we don't have to duplicate the
writemask checking for every bitsize.

Fixes: 1235850522 ("nir: Add a large constants optimization pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 270fe55256)
2019-09-26 08:47:50 -07:00
Eric Engestrom
0c56cb50c7 meson: drop -Wno-foo bug workaround for Meson < 0.46
This was a workaround for a bug in Meson that was fixed in 0.46 [1].

[1] https://github.com/mesonbuild/meson/pull/2284

Fixes: f7b6a8d12f ("meson: bump required version to 0.46")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 3fd0afd5e3)
2019-09-26 08:47:50 -07:00
Eric Engestrom
40d592473e radv: fix s/load/store/ copy-paste typo
Fixes: cdc6efddf9 ("radv: implement all depth/stencil resolve modes using graphics")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 30f639c181)
2019-09-26 08:47:50 -07:00
Stephen Barber
2947b89369 nouveau: add idep_nir_headers as dep for libnouveau
Fixes a compilation error when building libnouveau:

In file included from ../src/gallium/drivers/nouveau/nv50/nv50_program.c:25:
../src/compiler/nir/nir.h:1115:10: fatal error: nir_intrinsics.h: No such file or directory
 #include "nir_intrinsics.h"
           ^~~~~~~~~~~~~~~~~~
           compilation terminated.

Fixes: f014ae3c7c ("nouveau: add support for nir")
Signed-off-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
(cherry picked from commit 8c3ace6991)
2019-09-26 08:47:50 -07:00
Dylan Baker
0c47b502c2 Bump version for 19.2.0 final 2019-09-25 09:56:31 -07:00
Dylan Baker
8ea440fa3a docs: Add release notes for 19.2.0 2019-09-25 09:55:33 -07:00
Eric Engestrom
e59e9cd58a meson: re-add incorrect pkg-config files with GLVND for backward compatibility
This is a bit counter-intuitive, but the issue is that GLVND is broken
in versions <= 1.1.1, so we need to keep wrongly providing these files
to cover up their mistake, otherwise the rest of the world ends up
broken.

Suggested-by: Dylan Baker <dylan@pnwbakers.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 93df862b6a)
2019-09-25 09:30:31 -07:00
Mauro Rossi
0d781fe4b8 android: anv: libmesa_vulkan_common: add libmesa_util static dependency
Change needed to fix the following building error:

In file included from external/mesa/src/intel/vulkan/anv_device.c:43:
external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found
         ^~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 4dcb1ff ("anv: add support for driconf")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit ae5ac26dfa)
2019-09-24 08:25:33 -07:00
Mauro Rossi
ba8b282ae4 android: mesa: revert "Enable asm unconditionally"
This patch partially reverts 20294dc ("mesa: Enable asm unconditionally, ...")

Android makefile build logic needs to disable assembler optimization
in 32bit builds to avoid text relocations for libglapi.so shared

Fixes the following build error with Android x86 32bit target:

[  0% 4/477] target SharedLib: libglapi (out/target/product/x86/obj/SHARED_LIBRARIES/libglapi_intermediates/LINKED/libglapi.so)
FAILED: out/target/product/x86/obj/SHARED_LIBRARIES/libglapi_intermediates/LINKED/libglapi.so
...
prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/x86_64-linux-android/bin/ld: warning: shared library text segment is not shareable
prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/x86_64-linux-android/bin/ld: error: treating warnings as errors
clang-6.0: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: 20294dc ("mesa: Enable asm unconditionally, now that gen_matypes is gone.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 7a6e7803a7)
2019-09-24 08:25:27 -07:00
Erik Faye-Lund
f7e25ae6d6 util: fix SSE-version needed for double opcodes
This code generates CVTSD2SI, which requires SSE2. So let's fix the
required SSE-version.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 5de29ae (util: try to use SSE instructions with MSVC and 32-bit gcc)
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit 2ade1c5cf7)
2019-09-24 08:25:01 -07:00
Bas Nieuwenhuizen
5037ffb646 radv: Add workaround for hang in The Surge 2.
Released today and hangs on RADV. We don't have the root cause yet,
but this should unblock people playing the game.

No drirc because the radv debugflags are not usable from drirc and
I want this backported.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 780182f0a0)
2019-09-24 08:24:22 -07:00
Juan A. Suarez Romero
c90dc1232d bin/get-pick-list.sh: sha1 commits can be smaller than 8 chars
The script only handles commits with "Fixes: <sha1>" where <sha1> is
equal or great than 8 chars. But <sha1> can be smaller, like 7 chars.

This commit relax the restriction to handle <sha1> 4 or more chars.

Fixes: 533fead423 ("bin/get-pick-list.sh: tweak the commit sha matching pattern")

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit b3c25e6f99)
2019-09-24 08:24:17 -07:00
Kenneth Graunke
a1b6212710 intel: Increase Gen11 compute shader scratch IDs to 64.
From the MEDIA_VFE_STATE docs:

   "Starting with this configuration, the Maximum Number of Threads must
    be set to (#EU * 8) for GPGPU dispatches.

    Although there are only 7 threads per EU in the configuration, the
    FFTID is calculated as if there are 8 threads per EU, which in turn
    requires a larger amount of Scratch Space to be allocated by the
    driver."

It's pretty clear that we need to increase this for scratch address
calculations, because the FFTID has a certain bit-pattern.  The quote
above seems to indicate that we should increase the actual thread count
programmed in MEDIA_VFE_STATE as well, but we think the intention is to
only bump the scratch space.

Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8.

Fixes: 5ac804bd9a ("intel: Add a preliminary device for Ice Lake")
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit b9e93db208)
2019-09-24 08:23:56 -07:00
Marek Olšák
9f94cd7140 ac/addrlib: fix chip identification for Vega10, Arcturus, Raven2, Renoir
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit 48742de601)
2019-09-24 08:23:50 -07:00
Marek Olšák
5e289248a4 amd: add more PCI IDs for Navi14
trivial and urgent

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 65b698136c)
2019-09-24 08:23:45 -07:00
Jason Ekstrand
dc7129777f nir/repair_ssa: Replace the unreachable check with the phi builder
In a3268599f3, I attempted to fix nir_repair_ssa for unreachable
blocks.  However, that commit missed the possibility that the use is in
a block which, itself, is unreachable.  In this case, we can end up in
an infinite loop trying to replace a def with itself.  Even though a
no-op replacement is a fine operation, it keeps extending the end of the
uses list as we're walking it.  Instead of explicitly checking for the
group of conditions, just check if the phi builder gives us a different
def.  That's guaranteed to be 100% reliable and, while it lacks symmetry
with the is_valid checks, should be more reliable.

Fixes: a3268599 "nir/repair_ssa: Repair dominance for unreachable..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit d63162cff0)
2019-09-23 11:11:59 -07:00
Hal Gentz
6485fd8362 gallium/osmesa: Fix the inability to set no context as current.
Currently there is no way to make no context current w/gallium + osmesa.
The non-gallium version of osmesa does this if the context and buffer
passed to `OSMesaMakeCurrent` are both null. This small change makes it
so that this is also the case with the gallium version.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 57c894334e)
2019-09-23 11:11:59 -07:00
Ian Romanick
2d30c0fc59 nir/algebraic: Do not apply late DPH optimization in vertex processing stages
Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant.  For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry.  This
can result in Z-fighting artifacts (at best).  For now, disable this
optimization in these stages.

In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs)   min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel)   min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.

total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs)   min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel)   min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.

(cherry picked from commit 92f70df8c3)
2019-09-23 11:11:59 -07:00
Adam Jackson
ac9ef75a25 docs: Update bug report URLs for the gitlab migration
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 5b5c5bf833)
2019-09-23 11:11:59 -07:00
Andres Gomez
75836b70c7 docs: Add the maximum implemented Vulkan API version in 19.2 rel notes
Currently, Vulkan 1.1.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
(cherry picked from commit 41b0e0d7e0)
2019-09-23 11:11:59 -07:00
Arcady Goldmints-Orlov
0b54e1d176 anv: fix descriptor limits on gen8
Later generations support bindless for samplers, images, and buffers and
thus per-stage descriptors are not limited by the binding table size.
However, gen8 doesn't support bindless images and thus needs to report a
lower per-stage limit so that all combinations of descriptors that fit
within the advertised limits are reported as supported by
vkGetDescriptorSetLayoutSupport.

Fixes test dEQP-VK.api.maintenance3_check.descriptor_set
Fixes: 79fb0d27f3 ("anv: Implement SSBOs bindings with GPU addresses in the descriptor BO")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 5ec5fecc26)
2019-09-23 11:11:59 -07:00
Tapani Pälli
00eaba7761 egl: check for NULL value like eglGetSyncAttribKHR does
Commit d1e1563bb6 added a NULL check for eglGetSyncAttribKHR
but eglGetSyncAttrib does not do this. Patch adds same check to
happen with eglGetSyncAttrib.

Fixes crashes in (when exposing EGL 1.5):
   dEQP-EGL.functional.fence_sync.invalid.get_invalid_value

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 99cbec0a5f)
2019-09-23 11:11:59 -07:00
Paulo Zanoni
770e77dcd1 intel/fs: fix SHADER_OPCODE_CLUSTER_BROADCAST for SIMD32
The current code can create functions with a width of 32, which is not
supported by our hardware. Add some code to simplify how we express
what we want and prevent such cases.

For some unknown reason, all the tests I could run seem to work even
with these unsupported MOVs.

Fixes: b0858c1cc6 "intel/fs: Add a couple of simple helper opcodes"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit 8e614c7a29)
2019-09-23 11:11:59 -07:00
Eric Engestrom
0c29a15a09 gl: drop incorrect pkg-config file for glvnd
Akin to 1a25980c46 ("egl: drop incorrect pkg-config file for
glvnd") and b01524fff0 ("meson: don't build libGLES*.so with
GLVND") , removes a pkg-config file that shouldn't have been there in
the first place, but was needed because of that GLVND bug.

Now that the glvnd bug has been fixed, it was apparent that this gl.pc
pkg-config file was forgotten to be removed, so let's do just that :)

Suggested-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit a1de3011f3)
2019-09-23 11:11:59 -07:00
Samuel Pitoiset
401de43891 radv/gfx10: fix VK_KHR_pipeline_executable_properties with NGG GS
No GS copy shader if a pipeline enables NGG GS.

This fixes
dEQP-VK.pipeline.executable_properties.graphics.*geometry_stage*.

Fixes: 86864eedd2 ("radv: Implement radv_GetPipelineExecutablePropertiesKHR.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 99c186fbbe)
2019-09-23 11:11:59 -07:00
Dylan Baker
e27644cc2e bin/get-pick-list: use --oneline=pretty instead of --oneline
--oneline shortens hashes, while --oneline=pretty doesn't, otherwise
they are the same. Having full hashes is convenient as that is the
format that the bin/.cherry-ignore script requires to work correctly.
2019-09-23 11:11:55 -07:00
Dylan Baker
b1ea3dcc1f rehardcode from origin/master to upstream/master 2019-09-23 11:11:50 -07:00
Dylan Baker
6ee069f94a cherry-ignore: Add patches 2019-09-23 11:08:24 -07:00
Samuel Pitoiset
d083220879 radv: fix loading 64-bit GS inputs
We have to load 2 32-bit integer and to cast correctly.

This fixes crashes with gs-double-interpolator.vk_shader_test.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111734
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 68820007fd)
2019-09-18 09:38:42 -07:00
Bas Nieuwenhuizen
9bf80c2b53 tu: Set up glsl types.
Addresses this assert:

deqp-vk: ../mesa-freedreno-9999/src/compiler/glsl_types.cpp:1244: static const glsl_type *glsl_type::get_interface_instance(const glsl_struct_field *, unsigned int, enum glsl_interface_packing, bool, const char *): Assertion `glsl_type_users > 0' failed.

running dEQP-VK.api.smoke.triangle .

Fixes: 624789e370 "compiler/glsl: handle case where we have multiple users for types"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 7999e10cab)
2019-09-18 09:38:42 -07:00
Haihao Xiang
fa59ef37ed i965: support AYUV/XYUV for external import only
Fixes: 89785e2d56 ("i965: add support for sampling from AYUV")
Fixes: 7cab8d3661 ("i965: Add support for sampling from XYUV images")
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 8a9b81ab9d)
2019-09-18 09:38:42 -07:00
Samuel Iglesias Gonsálvez
0ff13c291b intel/nir: do not apply the fsin and fcos trig workarounds for consts
If we have fsin or fcos trigonometric operations with constant values
as inputs, we will multiply the result by 0.99997 in
brw_nir_apply_trig_workarounds, making the result wrong.

Adjusting the rules so they do not apply to const values we let a
later constant fold to deal with it.

v2:
- Do not early constant fold but only apply the trig workaround for
  non constants (Caio).
- Add fixes tag to commit log (Caio).

Fixes: bfd17c76c1 "i965: Port INTEL_PRECISE_TRIG=1 to NIR."
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 3c474f8513)
2019-09-18 09:38:42 -07:00
Dylan Baker
71fafc13b9 Bump version for 19.2.0-rc4 2019-09-18 09:10:19 -07:00
Marek Olšák
e2bb51a99b radeonsi: add Navi12 PCI ID
trivial and urgent

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 83f195414a)
2019-09-18 09:09:54 -07:00
Tapani Pälli
2e9a37cf1c iris: close screen fd on iris_destroy_screen
Otherwise it never gets closed, this fixes errors seen with deqp-egl
where we end up opening 1024 files.

Fixes: 2dce0e94 ("iris: Initial commit of a new 'iris' driver for Intel Gen8+ GPUs.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 631255387f)
2019-09-18 09:09:54 -07:00
Rhys Perry
b94ab6f5e3 radv: always emit a position export in gs copy shaders
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: f8d0337299 ('radv: add multiple streams support for the GS copy shader')
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit ffabcbba60)
2019-09-18 09:09:54 -07:00
Lionel Landwerlin
b2a09536ff drirc: include unreal engine version 0 to 23
This was meant to include up to version 23.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0616b7ac90 ("vulkan: add vk_x11_strict_image_count option")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit dcf13fbac9)
2019-09-18 09:09:54 -07:00
Lionel Landwerlin
3aeddc1f2f util/xmlconfig: fix regexp compile failure check
This is embarrasing...

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 04dc6074cf ("driconfig: add a new engine name/version parameter")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 10206ba17b)
2019-09-18 09:09:54 -07:00
Sergii Romantsov
d5fe3f73fc nir/large_constants: more careful data copying
A filed of nir_variable.location may be equel to -1.
That may cause copying to invalid address of list-node,
making some internal fields corrupted.

Patch fixes segfault during freeing context due to
corrupted address of ralloc_header.destructor.

v2: copy data if var is constant (Connor Abbott)

CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b6d4753568 (nir/large_constants: De-duplicate constants)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111676
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit c7b2a2fd36)
2019-09-18 09:09:54 -07:00
Lionel Landwerlin
5650308d08 vulkan: add vk_x11_strict_image_count option
This option strictly allocate the minImageCount given by the
application at swapchain creation.

This works around application that do not deal with the fact that the
implementation allocates more images than the minimum specified.

v2: Add values in default drirc (Bas)

v3: specify engine name/version (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0616b7ac90)
2019-09-18 09:09:54 -07:00
Lionel Landwerlin
fbd96932d6 driconfig: add a new engine name/version parameter
Vulkan applications can register with the following structure :

typedef struct VkApplicationInfo {
    VkStructureType    sType;
    const void*        pNext;
    const char*        pApplicationName;
    uint32_t           applicationVersion;
    const char*        pEngineName;
    uint32_t           engineVersion;
    uint32_t           apiVersion;
} VkApplicationInfo;

This enables the Vulkan implementations to apply workarounds based off
matching this description.

Here we add a new parameter for matching the driconfig options with
the following :

    <device driver="anv">
        <application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42">
            <option name="blaaah" value="true" />
        </application>
    </device>

v2: switch engine name match to use regexps

v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric)

v4: Add missing bit that went to the following commit (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 04dc6074cf)
2019-09-18 09:09:54 -07:00
Lionel Landwerlin
82fc77b521 radv: store engine name
We'll use this later for a new driconfig matching parameter.

v2: Avoid leak in device creation error case (Bas)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6d5f11ab34)
2019-09-18 09:09:54 -07:00
Kenneth Graunke
78c34c3bfb iris: Initialize ice->state.prim_mode to an invalid value
It was calloc'd to 0 which is PIPE_PRIM_POINTS, which means that we
fail to notice an initial primitive of points being new, and fail at
updating the "primitive is points or lines" field.

We do not need to reset this on device loss because we're tracking
the last primitive mode sent to us on the CPU via draw_vbo, not the
last primitive mode sent to the GPU.

Fixes several tests:
- dEQP-GLES3.functional.clipping.point.wide_point_clip
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner

Fixes: dcfca0af7c ("iris: Set XY Clipping correctly.")
(cherry picked from commit c9fb704f72)
2019-09-18 09:09:54 -07:00
Samuel Pitoiset
cd0402c582 radv: fix allocating number of user sgprs if streamout is used
streamout_buffers is assigned after that function, so the previous
fix was completely wrong. This probably fix something when streamout
buffers and push constants are used/inlined in the same shader.

Fixes: 378e2d2414 ("radv: fix computing number of user SGPRs for streamout buffers")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 8137df3a46)
[Juan A. Suarez: fix the structure usage]
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-09-18 09:09:54 -07:00
Iago Toral Quiroga
110dc21ed3 v3d: make sure we have enough space in the CL for the primitive counts packet
Fixes: 0f2d1dfe65 ("v3d: use the GPU to record primitives written to transform feedback")

Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit b9a07eed00)
2019-09-18 09:09:54 -07:00
Jason Ekstrand
cfcb96da38 intel/fs: Handle UNDEF in split_virtual_grfs
When the UNDEF instruction was added, we didn't do anything special in
split_virtual_grfs.  This mean that anything with an UNDEF wasn't
getting split which causes problems for the compiler.  Among other
things, it makes RA harder because things are in bigger chunks.  It also
meant that dvec4s weren't getting split which means that they are larger
than the maximum register size.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 14959202 -> 14960035 (<.01%)
    instructions in affected programs: 96197 -> 97030 (0.87%)
    helped: 140
    HURT: 128
    helped stats (abs) min: 1 max: 17 x̄: 1.62 x̃: 1
    helped stats (rel) min: 0.09% max: 6.15% x̄: 0.65% x̃: 0.45%
    HURT stats (abs)   min: 1 max: 825 x̄: 8.28 x̃: 1
    HURT stats (rel)   min: 0.13% max: 139.83% x̄: 1.70% x̃: 0.50%
    95% mean confidence interval for instructions value: -2.96 9.18
    95% mean confidence interval for instructions %-change: -0.56% 1.51%
    Inconclusive result (value mean confidence interval includes 0).

    total loops in shared programs: 4372 -> 4372 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 352646771 -> 352840997 (0.06%)
    cycles in affected programs: 218600800 -> 218795026 (0.09%)
    helped: 21167
    HURT: 21411
    helped stats (abs) min: 1 max: 2924 x̄: 36.89 x̃: 10
    helped stats (rel) min: <.01% max: 41.90% x̄: 2.97% x̃: 0.98%
    HURT stats (abs)   min: 1 max: 26027 x̄: 45.54 x̃: 10
    HURT stats (rel)   min: <.01% max: 324.46% x̄: 3.88% x̃: 1.06%
    95% mean confidence interval for cycles value: 2.87 6.26
    95% mean confidence interval for cycles %-change: 0.40% 0.55%
    Cycles are HURT.

    total spills in shared programs: 8840 -> 8953 (1.28%)
    spills in affected programs: 126 -> 239 (89.68%)
    helped: 1
    HURT: 2

    total fills in shared programs: 21782 -> 21914 (0.61%)
    fills in affected programs: 431 -> 563 (30.63%)
    helped: 1
    HURT: 3

    LOST:   0
    GAINED: 5

Shader-db results on Haswell:

    total instructions in shared programs: 13320918 -> 13320769 (<.01%)
    instructions in affected programs: 40998 -> 40849 (-0.36%)
    helped: 146
    HURT: 56
    helped stats (abs) min: 1 max: 8 x̄: 2.73 x̃: 2
    helped stats (rel) min: 0.16% max: 8.60% x̄: 2.52% x̃: 2.22%
    HURT stats (abs)   min: 2 max: 23 x̄: 4.45 x̃: 4
    HURT stats (rel)   min: 0.21% max: 10.26% x̄: 6.83% x̃: 10.26%
    95% mean confidence interval for instructions value: -1.26 -0.21
    95% mean confidence interval for instructions %-change: -0.62% 0.77%
    Inconclusive result (%-change mean confidence interval includes 0).

    total loops in shared programs: 4373 -> 4373 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 374518258 -> 374384193 (-0.04%)
    cycles in affected programs: 231101954 -> 230967889 (-0.06%)
    helped: 21427
    HURT: 19438
    helped stats (abs) min: 1 max: 2035 x̄: 31.09 x̃: 8
    helped stats (rel) min: <.01% max: 40.95% x̄: 2.42% x̃: 0.86%
    HURT stats (abs)   min: 1 max: 20875 x̄: 27.38 x̃: 8
    HURT stats (rel)   min: <.01% max: 59.09% x̄: 2.49% x̃: 0.80%
    95% mean confidence interval for cycles value: -4.49 -2.07
    95% mean confidence interval for cycles %-change: -0.14% -0.04%
    Cycles are helped.

    total spills in shared programs: 23406 -> 23411 (0.02%)
    spills in affected programs: 3 -> 8 (166.67%)
    helped: 0
    HURT: 2

    total fills in shared programs: 34845 -> 34850 (0.01%)
    fills in affected programs: 3 -> 8 (166.67%)
    helped: 0
    HURT: 2

    LOST:   0
    GAINED: 0

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111566
Fixes: f4ef34f207 "intel/fs: Add an UNDEF instruction to avoid..."
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit acfa2340e6)
2019-09-18 09:09:54 -07:00
Dylan Baker
b0b63d9593 add patches to be ignored 2019-09-18 09:09:54 -07:00
Danylo Piliaiev
5eba28227c tgsi_to_nir: Translate TGSI_INTERPOLATE_COLOR as INTERP_MODE_NONE
Translating TGSI_INTERPOLATE_COLOR as INTERP_MODE_SMOOTH made
it for drivers impossible to have flatshaded color inputs.

Translate it to INTERP_MODE_NONE which drivers interpret as
smooth or flat depending on flatshading state.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111467

Fixes: 770faf54 ("tgsi_to_nir: Improve interpolation modes.")

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 175c32e9bd)
2019-09-18 09:09:54 -07:00
Dylan Baker
f9ef2b4375 meson: don't generate file into subdirs
This is unsupported by meson and may become a hard error in the future.

Fixes: 5adfc8602c
       ("lima/ppir: move sin/cos input scaling into NIR")
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
(cherry picked from commit 52cf2d05a7)
2019-09-18 09:09:54 -07:00
Kenneth Graunke
b85d10e14b gallium: Fix util_format_get_depth_only
This is a pipe format, not a boolean.

Fixes: 5849e0612c ("gallium/auxiliary: Add util_format_get_depth_only() helper.")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit c6d40b5182)
2019-09-18 09:09:54 -07:00
Caio Marcelo de Oliveira Filho
7882268cc9 glsl/nir: Avoid overflow when setting max_uniform_location
Don't use the UNMAPPED_UNIFORM_LOC (-1) to set the unsigned
max_uniform_location.  Those unmapped uniforms don't have to be
accounted at this point.

Fixes: 7a9e5cdfbb ("nir/linker: Add gl_nir_link_uniforms()")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
(cherry picked from commit 4f33f96c45)
2019-09-18 09:09:54 -07:00
Kenneth Graunke
4eabbc04f2 iris: Fix constant buffer sizes for non-UBOs
Since the system value refactor, we've accidentally only been setting
cbuf->buffer_size in the UBO case, and not in the uploaded-constants
case.  We use cbuf->buffer_size to fill out the SURFACE_STATE entry,
so it needs to be initialized in both cases.

Fixes: 3b6d787e40 ("iris: move sysvals to their own constant buffer")
(cherry picked from commit 077a1952cc)
2019-09-18 09:09:54 -07:00
Danylo Piliaiev
97792279e4 nir/loop_analyze: Treat do{}while(false) loops as 0 iterations
Loops like:

block block_0:
vec1 32 ssa_2 = load_const (0x00000020)
vec1 32 ssa_3 = load_const (0x00000001)
loop {
    vec1 32 ssa_7 = phi block_0: ssa_3, block_4: ssa_9
    vec1 1 ssa_8 = ige ssa_2, ssa_7
    if ssa_8 {
        break
    } else {
    }
    vec1 32 ssa_9 = iadd ssa_7, ssa_1
}

Were treated as having more than 1 iteration and after unrolling
produced wrong results, however such loop will exit during
the first iteration if not unrolled.

So we check if loop will actually loop.

Fixes tests/shaders/glsl-fs-loop-while-false-02.shader_test

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit e71fc7f238)
2019-09-18 09:09:54 -07:00
Dylan Baker
3771534b2f Bump version for rc3 2019-09-11 09:05:32 -07:00
Marek Olšák
481d82b65b radeonsi/gfx10: fix wave occupancy computations
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit d95afd8b9e)
2019-09-10 09:51:57 -07:00
Marek Olšák
732950bf36 radeonsi/gfx10: don't call gfx10_destroy_query with compute-only contexts
This fixes a crash.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit 28adf0d00c)
2019-09-10 09:51:57 -07:00
Lepton Wu
637e02c1b1 virgl: Fix pipe_resource leaks under multi-sample.
Fixes: 900a80f9e4 ("virgl: virgl_transfer should own its virgl_resource")

Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
(cherry picked from commit 263136fb5d)
2019-09-10 09:51:57 -07:00
Timur Kristóf
e4bb2ceb78 st/nine: Properly initialize GLSL types for NIR shaders.
NIR shaders use GLSL types (note: these live outside libglsl), and
nine needs to properly initialize these just like the other state
trackers. This fixes an assertion failure when TTN is used.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
2019-09-09 18:01:13 +00:00
Bas Nieuwenhuizen
d858388bfc Revert "ac/nir: Lower large indirect variables to scratch"
This reverts commit 74470baebb.

This change introduces some significant performance regressions. We
are fixing those on master, but the follow up work is large enough
not to backport to 19.2 .

Fixes: 74470baebb "ac/nir: Lower large indirect variables to scratch"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111576
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-09 17:32:38 +00:00
Jason Ekstrand
7635b74acf nir/dead_cf: Repair SSA if the pass makes progress
The dead_cf pass calls into the CF manipulation helpers which attempt to
keep NIR's SSA form sane.  However, when the only break is removed from
a loop, dominance gets messed up anyway because the CF SSA clean-up code
only looks at phis and doesn't consider the case of code becoming
unreachable.  One solution to this would be to put the loop into LCSSA
form before we modify any of its contents.  Another (and the approach
taken by this pass) is to just run the repair_ssa pass afterwards
because the CF manipulation helpers are smart enough to keep all the
use/def stuff sane; they just don't always preserve dominance
properties.

While we're here, we clean up some bogus indentation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111405
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111069
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit c832820ce9)
2019-09-09 09:14:48 -07:00
Jason Ekstrand
2012c4a75c nir/repair_ssa: Insert deref casts when needed
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 1005272a2b)
2019-09-09 09:14:43 -07:00
Jason Ekstrand
3f56fa8fe9 nir/repair_ssa: Repair dominance for unreachable blocks
NIR currently assumes that unreachable blocks are trivially dominated by
everything.  However, when considering well-formed SSA, there is no path
from any block to an unreachable block.  Therefore, we can break any
use-def chains where the use is in an unreachable block.  This removes
any dependencies on code created by uses in unreachable blocks and lets
DCE do a better job of cleaning it up.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit a3268599f3)
2019-09-09 09:14:38 -07:00
Jason Ekstrand
2c6e34ac93 nir: Add a block_is_unreachable helper
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit f81a2623d8)
2019-09-09 09:14:34 -07:00
Jason Ekstrand
23bc3a401d nir: Don't infinitely recurse in lower_ssa_defs_to_regs_block
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 517142252f)
2019-09-09 09:13:30 -07:00
Jason Ekstrand
8889cc1241 nir: Handle complex derefs in nir_split_array_vars
We already bail and don't split the vars but we were passing a NULL to
_mesa_hash_table_search which is not allowed.

Fixes: f1cb3348f1 "nir/split_vars: Properly bail in the presence of ..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 37cdb7fc44)
2019-09-09 09:13:23 -07:00
Eric Engestrom
89776bb48c drirc: override minImageCount=2 for gfxbench
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110765
Fixes: 4689e98fe8 ("vulkan/wsi: Set X11 minImageCount to 3.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 27339fe9a7)
2019-09-09 09:13:19 -07:00
Eric Engestrom
55a10da818 radv: add support for vk_x11_override_min_image_count
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 5eb7d48b58)
2019-09-09 09:13:14 -07:00
Eric Engestrom
c9b1f07e55 amd: move adaptive sync to performance section, as it is defined in xmlpool
Fixes: 3844ed8d44 ("radv: Add adaptive_sync driconfig option and enable it by default.")
Fixes: e260493f2a ("radeonsi: Enable adaptive_sync by default for radeon")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 4ad99ee961)
2019-09-09 09:13:09 -07:00
Eric Engestrom
9d22b0dc90 anv: add support for vk_x11_override_min_image_count
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 037b5b567f)
2019-09-09 09:13:04 -07:00
Eric Engestrom
cb548d566d wsi: add minImageCount override
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit a72cdd00ab)
2019-09-09 09:12:46 -07:00
Eric Engestrom
172cff6357 anv: add support for driconf
No option is supported yet, this is just the boilerplate.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 4dcb1fff19)
2019-09-09 09:12:41 -07:00
Jason Ekstrand
c1114994f3 anv: Bump maxComputeWorkgroupSize
Fixes: 9a129510f5 "anv: Bump maxComputeWorkgroupInvocations"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111552
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 3b1a7e5333)
2019-09-09 09:12:36 -07:00
Caio Marcelo de Oliveira Filho
1a99cfef28 nir/lower_explicit_io: Handle 1 bit loads and stores
Load a 32-bit value then convert to 1-bit.  Convert 1-bit to 32-bit
value, then Store it.

These cases started to appear when we changed Anvil to use derefs for
shared memory.

v2: Use `bit_size` in a couple of places we were missing.  (Jason)
    Reassign `value` instead of `src[0]`.  (Jason)

Fixes: 024a46a407 ("anv: use derefs for shared memory access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit c0c55bd84f)
2019-09-06 08:33:07 -07:00
Jonathan Marek
f42a4300aa freedreno/a2xx: ir2: fix lowering of instructions after float lowering
Some instructions generated by int/bool float lowering need to be lowered
by opt_algebraic.

Fixes: 43dbd7d6

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 3516a90ab4)
2019-09-06 08:30:40 -07:00
Sergii Romantsov
0e6ccd180c intel/dri: finish proper glthread
KWin was able to get NULL-context in the call
intelUnbindContext. But a call _mesa_glthread_finish
is not resistent to such case.
Case can be catched with steps:
	1. Create both glx and egl contexts
	2. Make glx as current
	3. Make egl as current
	4. Reset glx context
	5. Make egl as current

Solution adds proper finishing of glthread-context
(context will be taken from the requested dri-context
for unbinding, but not from the saved current context).

Piglit-test: https://gitlab.freedesktop.org/mesa/piglit/merge_requests/87

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110814
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111271
Fixes: dca36d5516 (i965: Implement threaded GL support)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 1dce75c183)
2019-09-06 08:30:35 -07:00
Dave Stevenson
5e72987777 broadcom/v3d: Allow importing linear BOs with arbitrary offset/stride.
Equivalent of 0c1dd9dee "broadcom/vc4: Allow importing linear BOs with
arbitrary offset/stride." for v3d.

Allows YUV buffers with a single buffer and plane offsets to be
passed in.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 873b092e91)
2019-09-05 15:16:51 -07:00
Connor Abbott
95145376d1 radv: Call nir_propagate_invariant()
Without this, invariant qualifiers don't do anything. Together with a
fix to the game, this fixes flickering in No Man's Sky.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 3f5b541fc8)
2019-09-05 08:44:57 -07:00
Samuel Pitoiset
3ef013f0e9 nir: do not assume that the result of fexp2(a) is always an integral
It's only correct when 'a' is an integral greater or equal to 0.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111493
Fixes: 5544b2cbbd ("nir/algebraic: Use value range analysis to eliminate useless unary ops")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 966a455bb9)
(conflicts resolved by Dylan Baker)
2019-09-04 16:25:49 -07:00
Dylan Baker
b84cbdfd4d nir: Add is_not_negative helper function
This was taken from 636da12433, and is
needed by the next patch.
2019-09-04 16:25:49 -07:00
Hal Gentz
f180b04d65 glx: Fix SEGV due to dereferencing a NULL ptr from XCB-GLX.
When run in optirun, applications that linked to `libGLX.so` and then
proceeded to querying Mesa for extension strings caused a SEGV in Mesa.

`glXQueryExtensionsString` was calling a chain of functions that
eventually led to `__glXQueryServerString`. This function would call
`xcb_glx_query_server_string` then `xcb_glx_query_server_string_reply`.
The latter for some unknown reason returned `NULL`. Passing this `NULL`
to `xcb_glx_query_server_string_string_length` would cause a SEGV as the
function tried to dereference it.

The reason behind the function returning `NULL` is yet to be determined,
however, simply checking that the ptr is not `NULL` resolves this. A
similar check has been added to `__glXGetString` for completeness sake,
although not immediately necessary.

In addition to that, we stumbled into a similar problem in
`AllocAndFetchScreenConfigs` which tries to access the configs to free
them if `__glXQueryServerString` fails. This, of course, SEGVs, because the
configs are yet to have been allocated. Simply continuing past the configs
if their config ptrs are `NULL` resolves this. We also switch to `calloc`
to make sure that the config ptrs are `NULL` by default, and not some
uninitialized value.

Cc: mesa-stable@lists.freedesktop.org
Fixes: 24b8a8cfe8 "glx: implement __glXGetString, hide __glXGetStringFromServer"
Fixes: cb3610e37c "Import the GLX client side library, formerly from xc/lib/GL/glx. Build it "
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
(cherry picked from commit 1591d1fee5)
2019-09-04 16:15:49 -07:00
Kenneth Graunke
45b22fb873 iris: Report correct number of planes for planar images
We were only handling the modifiers case and not counting the number of
planes in actual planar images.

Fixes Piglit's ext_image_dma_buf_import-export.

Fixes: fc12fd05f5 ("iris: Implement pipe_screen::resource_get_param")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111509
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
(cherry picked from commit f8887909c6)
2019-09-04 16:15:41 -07:00
Eric Engestrom
ffa1f33ec0 nir: fix memleak in error path
Fixes: 2cf59861a8 ("nir: Add partial redundancy elimination for compares")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 7659c6197f)
2019-09-04 16:15:36 -07:00
Eric Engestrom
0bbf6eab52 freedreno/drm-shim: fix mem leak
Fixes: 494ecef6b4 ("freedreno: Add support for drm-shim.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit c4969b0a25)
2019-09-04 16:15:29 -07:00
Eric Engestrom
ef0ccde381 anv: fix format string in error message
Fixes: 9775894f10 ("anv: Move size check from anv_bo_cache_import() to caller (v2)")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 7abf65aedc)
2019-09-04 16:15:23 -07:00
Eric Engestrom
ba255cdd50 util/os_file: fix double-close()
Fixes: 955c63d364 ("util/os_file: resize buffer to what was actually needed")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 1667360f7d)
2019-09-04 16:14:52 -07:00
Eric Engestrom
1a4e7d7293 egl: fix deadlock in malloc error path
Fixes: cb0980e69a ("egl: move alloc & init out of _eglBuiltInDriver{DRI2,Haiku}")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 43d470404c)
2019-09-04 16:14:41 -07:00
Eric Engestrom
1cfc906898 ttn: fix 64-bit shift on 32-bit 1
Fixes: 4d0b2c7aaa ("ttn: Update shader->info as we generate code.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 3afe9d798a)
2019-09-04 16:14:31 -07:00
Lionel Landwerlin
0a83226dc6 vulkan/overlay: bounce image back to present layout
Once we write the overlay to an image to be presented, we must not
forget to put it back into present layout.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111401
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
(cherry picked from commit 320b0f66c2)
2019-09-04 16:14:26 -07:00
Lionel Landwerlin
26cd407481 egl: fix platform selection
Add missing "device" platform

v2: Add the missing platform (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Jean Hertel <jean.hertel@hotmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111529
Fixes: d6edccee8d ("egl: add EGL_platform_device support")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 6775a52400)
2019-09-04 16:14:01 -07:00
Erik Faye-Lund
451ddeb429 gallium/auxiliary/indices: consistently apply start only to input
The majority of these only apply the start argument to the input, but a
few of them also does for the output-array. util_primconvert, the only
user of this argument expects this pass a non-zero start-argument does
not expect this to be applied to the output; if it is, it will write
outside of allocated memory, leading to VRAM corruption.

The reason this doesn't seem to have been noticed before, is that no
driver currently use util_primconvert to convert a primitive-type to
itself, which is the cases where this was broken. But for Zink, this
will no longer be true, because we need to eliminate the use of 8-bit
index-buffers.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 28f3f8d413 ("gallium/auxiliary/indices: add start param")
Reviewed-by: Rob Clark <robdclark@chromium.org>
(cherry picked from commit 52af1427c6)
2019-09-04 16:13:57 -07:00
Vinson Lee
6ac1d9b46e travis: Fail build if any command in if statement fails.
Travis is checking the exit code of the entire if statement.

Fixes: 64ffc289be ("travis: add MacOS Scons build")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
(cherry picked from commit 029b07b2ad)
2019-09-04 16:13:52 -07:00
Vinson Lee
529db83e7b swr: Fix build with llvm-9.0 again.
Commit 6f7306c029 ("swr/rast: Refactor memory API between rasterizer
core and swr") unintentionally removed changes for llvm-9.0.

Fixes: 6f7306c029 ("swr/rast: Refactor memory API between rasterizer core and swr")
Fixes: 5dd9ad1570 ("swr/rasterizer: Better implementation of scatter")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
(cherry picked from commit 3664a6600e)
2019-09-04 16:13:45 -07:00
Dylan Baker
7f9b49218f bump version to 19.2-rc2 2019-09-04 14:22:26 -07:00
Pierre-Eric Pelloux-Prayer
efa4aee99d glsl: replace 'x + (-x)' with constant 0
This fixes a hang in shadertoy for radeonsi where a buffer was initialized with:

   value -= value

with value being undefined.
In this case LLVM replace the operation with an assignment to NaN.

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111241
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 47cc660d9c)
2019-09-04 11:57:07 -07:00
Thong Thai
ec74c76a1a Revert "radeonsi: don't emit PKT3_CONTEXT_CONTROL on amdgpu"
This reverts commit 5a2e65be89.

Even though CONTEXT_CONTROL is emitted by the kernel, CONTEXT_CONTROL
still needs to be emitted by the UMD, or else the driver will hang

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 2a3a560407)
2019-09-04 11:57:03 -07:00
Ian Romanick
6934bc4f08 nir/range-analysis: Handle constants in nir_op_mov just like nir_op_bcsel
I discovered this while looking at a shader that was hurt by some other
work I'm doing.  When I examined the changes, I was confused that one
instance of a comparison that was used in a discard_if was (incorrectly)
eliminated, while another instance used by a bcsel was (correctly) not
eliminated.  I had to use NIR_PRINT=true to see exactly where things
when wrong.

A bunch of shaders in Goat Simulator, Dungeon Defenders, Sanctum 2, and
Strike Suit Zero were impacted.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16280659 -> 16281075 (<.01%)
instructions in affected programs: 21042 -> 21458 (1.98%)
helped: 0
HURT: 136
HURT stats (abs)   min: 1 max: 9 x̄: 3.06 x̃: 3
HURT stats (rel)   min: 1.16% max: 6.12% x̄: 2.23% x̃: 2.03%
95% mean confidence interval for instructions value: 2.93 3.19
95% mean confidence interval for instructions %-change: 2.08% 2.37%
Instructions are HURT.

total cycles in shared programs: 367168270 -> 367170313 (<.01%)
cycles in affected programs: 172020 -> 174063 (1.19%)
helped: 14
HURT: 111
helped stats (abs) min: 2 max: 80 x̄: 21.21 x̃: 9
helped stats (rel) min: 0.10% max: 4.47% x̄: 1.35% x̃: 0.79%
HURT stats (abs)   min: 2 max: 584 x̄: 21.08 x̃: 5
HURT stats (rel)   min: 0.12% max: 17.28% x̄: 1.55% x̃: 0.40%
95% mean confidence interval for cycles value: 5.41 27.28
95% mean confidence interval for cycles %-change: 0.64% 1.81%
Cycles are HURT.

(cherry picked from commit 7dba7df5e5)
2019-09-04 11:56:59 -07:00
Ian Romanick
6aa7a10370 nir/range-analysis: Fix incorrect fadd range result for (ne_zero, ne_zero)
Found by inspection.  I tried really, really hard to make a test case
that would trigger this problem, but I was unsuccesful.  It's very hard
to get an instruction to produce a ne_zero result without ne_zero
sources.  The most plausible way is using bcsel.  That proves
problematic because bcsel interprets its sources as integers, so it
cannot currently be used to "clean" values for floating point
instructions.

No shader-db changes on any Intel platform.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
(cherry picked from commit 0b4782fccd)
2019-09-04 11:56:54 -07:00
Ian Romanick
97e44d6817 nir/range-analysis: Adjust result range of multiplication to account for flush-to-zero
Fixes piglit tests (new in piglit!110):

    - fs-underflow-fma-compare-zero.shader_test
    - fs-underflow-mul-compare-zero.shader_test

v2: Add back part of comment accidentally deleted.  Noticed by
Caio. Remove is_not_zero function as it is no longer used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: fa116ce357 ("nir/range-analysis: Range tracking for ffma and flrp")
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

All Gen7+ platforms** had similar results. (Ice Lake shown)
total instructions in shared programs: 16278465 -> 16279492 (<.01%)
instructions in affected programs: 16765 -> 17792 (6.13%)
helped: 0
HURT: 23
HURT stats (abs)   min: 7 max: 275 x̄: 44.65 x̃: 8
HURT stats (rel)   min: 1.15% max: 17.51% x̄: 4.23% x̃: 1.62%
95% mean confidence interval for instructions value: 9.57 79.74
95% mean confidence interval for instructions %-change: 1.85% 6.61%
Instructions are HURT.

total cycles in shared programs: 367135159 -> 367154270 (<.01%)
cycles in affected programs: 279306 -> 298417 (6.84%)
helped: 0
HURT: 23
HURT stats (abs)   min: 13 max: 6029 x̄: 830.91 x̃: 54
HURT stats (rel)   min: 0.17% max: 45.67% x̄: 7.33% x̃: 0.49%
95% mean confidence interval for cycles value: 100.89 1560.94
95% mean confidence interval for cycles %-change: 0.94% 13.71%
Cycles are HURT.

total spills in shared programs: 8870 -> 8869 (-0.01%)
spills in affected programs: 19 -> 18 (-5.26%)
helped: 1
HURT: 0

total fills in shared programs: 21904 -> 21901 (-0.01%)
fills in affected programs: 81 -> 78 (-3.70%)
helped: 1
HURT: 0

LOST:   0
GAINED: 1

** On Broadwell, a shader was hurt for spills / fills instead of
   helped.

No changes on any earlier platforms.

(cherry picked from commit ef2e235252)
2019-09-04 11:56:43 -07:00
Ian Romanick
a8525d7751 nir/range-analysis: Adjust result range of exp2 to account for flush-to-zero
Fixes piglit tests (new in piglit!110):

    - fs-underflow-exp2-compare-zero.shader_test

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Most of the shaders affected are, unsurprisingly, in Unigine Heaven.

All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16278207 -> 16278465 (<.01%)
instructions in affected programs: 11374 -> 11632 (2.27%)
helped: 0
HURT: 58
HURT stats (abs)   min: 2 max: 13 x̄: 4.45 x̃: 4
HURT stats (rel)   min: 0.54% max: 4.11% x̄: 2.42% x̃: 2.82%
95% mean confidence interval for instructions value: 3.77 5.13
95% mean confidence interval for instructions %-change: 2.19% 2.64%
Instructions are HURT.

total cycles in shared programs: 367134284 -> 367135159 (<.01%)
cycles in affected programs: 81207 -> 82082 (1.08%)
helped: 17
HURT: 36
helped stats (abs) min: 6 max: 356 x̄: 90.35 x̃: 6
helped stats (rel) min: 0.69% max: 21.45% x̄: 5.71% x̃: 0.78%
HURT stats (abs)   min: 4 max: 235 x̄: 66.97 x̃: 16
HURT stats (rel)   min: 0.35% max: 27.58% x̄: 5.34% x̃: 1.09%
95% mean confidence interval for cycles value: -20.36 53.38
95% mean confidence interval for cycles %-change: -1.08% 4.67%
Inconclusive result (value mean confidence interval includes 0).

No changes on any earlier platforms.

(cherry picked from commit 33ad2bab4b)
2019-09-04 11:56:39 -07:00
Ian Romanick
2971f079e1 nir/algebraic: Mark some value range analysis-based optimizations imprecise
This didn't fix bug #111308, but it was found will trying to find the
actual cause of that bug.

Fixes piglit tests (new in piglit!110):

    - fs-fract-of-NaN.shader_test
    - fs-lt-nan-tautology.shader_test
    - fs-ge-nan-tautology.shader_test

No shader-db changes on any Intel platform.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: b77070e293 ("nir/algebraic: Use value range analysis to eliminate tautological compares")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit ccb236d1bc)
2019-09-04 11:56:34 -07:00
Kenneth Graunke
da03ddf677 iris: Fix partial fast clear checks to account for miplevel.
We enabled fast clears at level > 0, but didn't minify the dimensions
when comparing the box size, so we always thought it was a partial
clear and as a result never actually enabled any.

This eliminates some slow clears in Civilization VI, but they are mostly
during initialization and not the main rendering.

Thanks to Dan Walsh for noticing we had too many slow clears.

Fixes: 393f659ed8 ("iris: Enable fast clears on other miplevels and layers than 0.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit 30b9ed92ea)
2019-09-04 11:56:30 -07:00
Ian Romanick
753ea83477 intel/compiler: Request bitfield_reverse lowering on pre-Gen7 hardware
See the previous commit for the explanation of the Fixes tag.

Hurts 21 shaders in shader-db.  All of the hurt shaders are in Unreal
Engine 4 tech demos.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 7afa26d4e3 ("nir: Add lowering for nir_op_bitfield_reverse.")
(cherry picked from commit b418269d7d)
2019-09-04 11:56:12 -07:00
Ian Romanick
91fa24a686 nir/algrbraic: Don't optimize open-coded bitfield reverse when lowering is enabled
This caused a problem on Sandybridge where an open-coded
bitfieldReverse() function could be optimized to a
nir_op_bitfield_reverse that would generate an unsupported BFREV
instruction in the backend.  This was encountered in some Unreal4 tech
demos in shader-db.  The bug was not previously noticed because we don't
actually try to run those demos on Sandybridge.

The fixes tag is a bit a lie.  The actual bug was introduced about
26,000 commits earlier in 371c4b3c48 ("nir: Recognize open-coded
bitfield_reverse.").  Without the NIR lowering pass, the flag needed to
avoid the optimization does not exist.  Hopefully nobody will care to
fix this on an earlier Mesa release.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 7afa26d4e3 ("nir: Add lowering for nir_op_bitfield_reverse.")
(cherry picked from commit d3fd1c761a)
2019-09-04 11:56:08 -07:00
Kenneth Graunke
c96de002b7 mesa: Fix _mesa_float_to_unorm() on 32-bit systems.
This fixes the following CTS test on 32-bit systems:
GTF-GL46.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_init

It does glGetTexImage of a 16-bit SNORM image, requesting 32-bit UNORM
data.  In get_tex_rgba_uncompressed, we round trip through float to
handle image transfer ops for clamping.  _mesa_format_convert does:

   _mesa_float_to_unorm(0.571428597f, 32)

which translated to:

   _mesa_lroundevenf(0.571428597f * 0xffffffffu)

which produced different results on 64-bit and 32-bit systems:

   64-bit: result = 0x92492500
   32-bit: result = 0x80000000

This is because the size of "long" varies between the two systems, and
0x92492500 is too large to fit in a signed 32-bit integer.  To fix this,
we switch to the new _mesa_i64roundevenf function which always does the
64-bit operation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104395
Fixes: 594fc0f859 ("mesa: Replace F_TO_I() with _mesa_lroundevenf().")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit e18cd5452a)
2019-09-04 11:56:04 -07:00
Kenneth Graunke
07ac4269a5 util: Add a _mesa_i64roundevenf() helper.
This always returns a int64_t, translating to _mesa_lroundevenf on
systems where long is 64-bit, and llrintf where "long long" is needed.

Fixes: 594fc0f859 ("mesa: Replace F_TO_I() with _mesa_lroundevenf().")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit b59914e179)
2019-09-04 11:55:50 -07:00
Marek Olšák
c2aad5dc4d radeonsi: fix scratch buffer WAVESIZE setting leading to corruption
Cc: 19.2 19.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit 360cf3c4b0)
2019-09-04 11:55:44 -07:00
Marek Olšák
952fd55015 radeonsi: unbind blend/DSA/rasterizer state correctly in delete functions
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111414

Fixes: b758eed9c3 ("radeonsi: make sure that blend state != NULL and remove all NULL checking")

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
(cherry picked from commit f95a28d361)
2019-09-04 11:55:40 -07:00
Dave Airlie
9433241cc9 gallivm: fix atomic compare-and-swap
Not sure how I missed this before, but compswap was hitting an
assert here as it is it's own special case.

Fixes: b5ac381d8f ("gallivm: add buffer operations to the tgsi->llvm conversion.")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit 1eda49cc3d)
2019-09-04 11:55:35 -07:00
Paulo Zanoni
6e07e58ef6 intel/fs: grab fail_msg from v32 instead of v16 when v32->run_cs fails
Looks like a copy/paste error. This patch prevents a segfault when
running the following on BDW:

    INTEL_DEBUG=no8,no16,do32 ./deqp-vk -n \
        dEQP-VK.subgroups.arithmetic.compute.subgroupmin_dvec4

For the curious, the message we're getting is:

    CS compile failed: Failure to register allocate.  Reduce number
    of live scalar values to avoid this.

Fixes: 864737ce6c ("i965/fs: Build 32-wide compute shader when needed.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit 848d5e444a)
2019-09-04 11:55:31 -07:00
Kenneth Graunke
7160c70f0f isl: Don't set UnormPathInColorPipe for integer surfaces.
This fixes dEQP-GLES3.functional.texture.specification subtests on iris:

- texsubimage3d_depth.depth24_stencil8_2d_array
- texsubimage3d_depth.depth32f_stencil8_2d_array
- texsubimage3d_depth.depth_component32f_2d_array
- texsubimage3d_depth.depth_component24_2d_array
- texstorage2d.format.depth24_stencil8_2d
- texstorage2d.format.depth32f_stencil8_2d
- texstorage2d.format.depth_component24_2d
- texstorage2d.format.depth_component32f_2d
- texstorage3d.format.depth24_stencil8_2d_array
- texstorage3d.format.depth32f_stencil8_2d_array
- texstorage3d.format.depth_component24_2d_array
- texstorage3d.format.depth_component32f_2d_array

Here, something appears to be going wrong with having this bit set
during blorp_copy operations for texture upload, which override the
format to R8G8B8A8_UINT.

AFAICT this bit should have no effect for integer surfaces, as it has
to do with blending, and integer blending is not a thing.  So it should
be harmless to disable it.

The Windows driver appears to be setting this bit universally, so
I am unclear why we would need to.  Perhaps they simply haven't run
into this issue.

Fixes: f741de236b ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 2e1be771e4)
2019-09-04 11:55:22 -07:00
Kenneth Graunke
b871874de7 isl: Drop UnormPathInColorPipe for buffer surfaces.
Jason suggested I remove this in review, and he's right.  AFAICT this
affects blending, and that just isn't going to happen on buffers.

Fixes: f741de236b ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 1b090f065e)
2019-09-04 11:55:17 -07:00
Samuel Pitoiset
80514527e5 radv: fix getting the index type size for uint8_t
16-bit and 32-bit values match hardware values but 8-bit doesn't.

This fixes dEQP-VK.pipeline.input_assembly.* with 8-bit index.

Fixes: 372c3dcfdb ("radv: implement VK_EXT_index_type_uint8")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
(cherry picked from commit 89671ef205)
2019-09-04 11:55:09 -07:00
Dave Airlie
7d8eee2bdb virgl: fix format conversion for recent gallium changes.
The virgl formats are fixed in time snapshots of the gallium ones,
we just need to provide a translation table between them when
we enter the hardware.

This fixes a regression since Eric renumbered the gallium table.

Fixes: c45c33a5a2 (gallium: Remove manual defining of PIPE_FORMAT enum values.)
Bugzilla: https://bugs.freedesktop.org/111454

v1 by Dave Airlie <airlied@redhat.com>
v2: virgl: Add a number of formats to the table that are used, e.g. for vertex
    attributes
v3: cover some more missing formats from a piglit run

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit bba4d2f442)
2019-09-04 11:55:04 -07:00
Alex Smith
6ea07af9c1 radv: Change memory type order for GPUs without dedicated VRAM
Put the uncached GTT type at a higher index than the visible VRAM type,
rather than having GTT first.

When we don't have dedicated VRAM, we don't have a non-visible VRAM
type, and the property flags for GTT and visible VRAM are identical.
According to the spec, for types with identical flags, we should give
the one with better performance a lower index.

Previously, apps which follow the spec guidance for choosing a memory
type would have picked the GTT type in preference to visible VRAM (all
Feral games will do this), and end up with lower performance.

On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
other (Feral) games and saw similar improvement on those as well.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(Bas: CCing this to 19.2-rc due to high impact and limited complexity)
(cherry picked from commit fe0ec41c4d)
2019-09-04 11:55:00 -07:00
Rafael Antognolli
1ec895b4a7 anv: Only re-emit non-dynamic state that has changed.
On commit f6e7de41d7, we started emitting 3DSTATE_LINE_STIPPLE as part
of the non-dynamic state. That gets re-emitted every time we bind a new
VkPipeline. But that instruction is non-pipelined, and it caused a perf
regression of about 9-10% on Dota2.

This commit makes anv_dynamic_state_copy() return a mask with only the
state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be
emitted anymore unless it has changed, fixing the problem above.

v2: Improve commit message and add documentation about skipped checks
(Jason)

Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 2b7ba9f239)
2019-09-04 11:54:49 -07:00
Lionel Landwerlin
7ff682a12c util: fix compilation on macos
timespec_get() is not available on macos, we need to pull in the
include/c11/threads_posix.h helper.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103674
Fixes: e2d761de03 ("util: drop final reference to p_compiler.h")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 9d3fc737af)
2019-09-04 11:54:45 -07:00
Andres Rodriguez
9fff4192bc radv: additional query fixes
Make sure we read the updated data from the gpu in cases where WAIT_BIT
is not set.

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit a410823b3e)
2019-09-04 11:54:34 -07:00
Kenneth Graunke
5c1362581a iris: Fix large timeout handling in rel2abs()
...by copying the implementation of anv_get_absolute_timeout().

Appears to fix a CTS test with 32-bit builds:
GTF-GL46.gtf32.GL3Tests.sync.sync_functionality_clientwaitsync_flush

Fixes: f459c56be6 ("iris: Add fence support using drm_syncobj")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit 7ee7b0ecbc)
2019-09-04 11:54:29 -07:00
Samuel Pitoiset
649040ed8d radv/gfx10: do not use NGG with NAVI14
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit a4e6e59db8)
2019-09-04 11:54:25 -07:00
Samuel Pitoiset
7200ed1399 radv/gfx10: don't initialize VGT_INSTANCE_STEP_RATE_0
Only gfx9 and older use it to get InstanceID in VGPR1.
Ported from RadeonSI.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 0813c27d8d)
2019-09-04 11:54:20 -07:00
Tapani Pälli
966a2bdc99 egl: reset blob cache set/get functions on terminate
Fixes errors seen with eglSetBlobCacheFuncsANDROID on Android when
running dEQP that terminates and reinitializes a display.

Fixes: 6f5b57093b "egl: add support for EGL_ANDROID_blob_cache"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 3e03a3fc53)
2019-09-04 11:54:16 -07:00
Kenneth Graunke
dff3ab5c04 iris: Avoid unnecessary resolves on transfer maps
We were always resolving the buffer as if we were accessing it via
CPU maps, which don't understand any auxiliary surfaces.  But we often
copy to a temporary using BLORP, which understands compression just
fine.  So we can avoid the resolve, and accelerate the copy as well.

Fixes: 9d1334d2a0 ("iris: Use copy_region and staging resources to avoid transfer stalls")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit 2d79925034)
2019-09-04 11:54:10 -07:00
Kenneth Graunke
d78f39eba0 iris: Drop copy format hacks from copy region based transfer path.
This doesn't work for compressed formats, as the source texture and
temporary texture would have different block sizes.  (Forcing the driver
to always take the GPU path would expose the bug.)  Instead, just use
the source format for the temporary, and let blorp_copy deal with
overrides.

The one case where we can't do this is ASTC, because isl won't let us
create a linear ASTC surface.  Fall back to the CPU paths there for now.

Fixes: 9d1334d2a0 ("iris: Use copy_region and staging resources to avoid transfer stalls")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit 136629a1e3)
2019-09-04 11:54:05 -07:00
Kenneth Graunke
1be5f26cfb iris: Update fast clear colors on Gen9 with direct immediate writes.
Gen11 stores the fast clear color in an "indirect clear buffer", as
a packed pixel value.  Gen9 hardware stores it as a float or integer
value, which is interpreted via the format.  We were trying to store
that in a buffer, for similarity with Icelake, and MI_COPY_MEM_MEM
it from there to the actual SURFACE_STATE bytes where it's stored.

This unfortunately doesn't work for blorp_copy(), which does bit-for-bit
copies, and overrides the format to a CCS-compatible UINT format.  This
causes the clear color to be interpreted in the overridden format.

Normally, we provide the clear color on the CPU, and blorp_blit.c:2611
converts it to a packed pixel value in the original format, then unpacks
it in the overridden format, so the clear color we use expands to the
bits we originally desired.

However, BLORP doesn't support this pack/unpack with an indirect clear
buffer, as it would need to do the math on the GPU.  On Gen11+, it isn't
necessary, as the hardware does the right thing.

This patch changes Gen9 to stop using an indirect clear buffer and
simply do PIPE_CONTROLs with post-sync write immediate operations
to store the new color over the surface states for regular drawing.
BLORP continues streaming out surface states, and handles fast clear
colors on the CPU.

Fixes: 53c484ba8a ("iris: blorp using resolve hooks")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit 1cd13ccee7)
2019-09-04 11:52:53 -07:00
Kenneth Graunke
14588c0727 iris: Fix broken aux.possible/sampler_usages bitmask handling
For renderable surfaces, we allocate SURFACE_STATEs for each bit in
res->aux.possible_usages.  Sampler views use res->aux.sampler_usages.

When pinning buffers, we call surf_state_offset_for_aux() to calculate
the offset to the desired surface state.  surf_state_offset_for_aux()
took an aux_modes parameter, which should be one of those two fields.
However...it was not using that parameter.  It always used the broader
res->aux.possible_usages field directly.

One of the callers, update_clear_value(), was passing incorrect masks
for this parameter.  It iterated through the bits in order, using
u_bit_scan(), which destructively modifies the mask.  So each time we
called it, the count of bits before our selected mode was 0, which would
cause us to always update the SURFACE_STATE for ISL_AUX_USAGE_NONE,
rather than updating each in turn.  This was hidden by the earlier bug
where surf_state_offset_for_aux() ignored the parameter.

Fixes: 7339660e80 ("iris: Add aux.sampler_usages.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit 117a0368b0)
2019-09-04 11:52:46 -07:00
Kenneth Graunke
973d58e9b3 iris: Replace devinfo->gen with GEN_GEN
This is genxml, we can compile out this code.

Fixes: 2660667284 ("iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit f6c44549ee)
2019-09-04 11:52:41 -07:00
Alyssa Rosenzweig
58acce6dd9 pan/midgard: Fix writeout combining
shader-db regression in the scheduler.

Fixes: dff4986b1a ("pan/midgard: Emit store_output branch just-in-time")

total bundles in shared programs: 2055 -> 2019 (-1.75%)
bundles in affected programs: 1055 -> 1019 (-3.41%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 20.00% x̄: 6.71% x̃: 5.16%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -8.45% -4.97%
Bundles are helped.

total quadwords in shared programs: 3444 -> 3408 (-1.05%)
quadwords in affected programs: 1897 -> 1861 (-1.90%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 14.29% x̄: 3.97% x̃: 2.99%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -5.08% -2.86%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
(cherry picked from commit 272ce6f5a7)
2019-09-04 11:52:36 -07:00
Bas Nieuwenhuizen
bd0300f8ef radv: Disable NGG for geometry shaders.
A bunch of remaining issues including some that affect users.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111248
Fixes: ee21bd7440 "radv/gfx10: implement NGG support (VS only)"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit c037fe5ad1)
2019-09-04 11:52:30 -07:00
Lionel Landwerlin
4385e6cf02 util/timespec: use unsigned 64 bit integers for nsec values
We added this utility for vulkan where all timeouts are given as
uint64_t values. We can switch from signed to unsigned as this is the
only user and if we ever deal with signed integers somewhere else
we'll have to be careful to use the corresponding
timespec_(add|sub)_msec and always pass absolute values.

v2: Forgot to drop the test calling add_nsec() with a negative number

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Fixes: d2d70c3bb5 ("util: add a timespec helper")
Acked-by: Daniel Stone <daniels@collabora.com>
(cherry picked from commit 5833f43305)
2019-09-04 11:52:26 -07:00
Tapani Pälli
18511e3f5b iris/android: fix build and link with libmesa_intel_perf
Fixes: 0fd4359733 "iris/perf: implement routines to return counter info"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 728ebcdec2)
2019-09-04 11:52:09 -07:00
Samuel Pitoiset
7c615873e5 ac: fix exclusive scans on GFX8-GFX9
This fixes a regression introduced with scan&reduce operations
on GFX10. Note that some subgroups CTS still fail on GFX10 but
I assume it's a different issue.

This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive*.

Fixes: 227c29a80d "amd/common/gfx10: implement scan & reduce operations"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 2d9f401a83)
2019-09-04 11:52:02 -07:00
Tapani Pälli
6af303f6fc util: fix os_create_anonymous_file on android
Commit fixes current crashes with Vulkan applications on Android.

Fixes: c0376a1234 "util: add anon_file.h for all memfd/temp file usage"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
(cherry picked from commit ce8fd042a5)
2019-09-04 11:51:55 -07:00
Kenneth Graunke
844fbc5c42 gallium/noop: Implement resource_get_param
v2: Pass through to oscreen rather than faking it (review from Marek).

Fixes: 0346b70083 ("gallium/screen: Add pipe_screen::resource_get_param")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit bc844d92ce)
2019-09-04 11:51:50 -07:00
Kenneth Graunke
813ed8629e gallium/rbug: Wrap resource_get_param if available
Fixes: 0346b70083 ("gallium/screen: Add pipe_screen::resource_get_param")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f02d1a0b75)
2019-09-04 11:51:44 -07:00
Kenneth Graunke
6e6f137a4e gallium/trace: Wrap resource_get_param if available
Fixes: 0346b70083 ("gallium/screen: Add pipe_screen::resource_get_param")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit c43a44791b)
2019-09-04 11:51:39 -07:00
Kenneth Graunke
07760c1c9e gallium/ddebug: Wrap resource_get_param if available
Fixes: 0346b70083 ("gallium/screen: Add pipe_screen::resource_get_param")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 0e6b573ae5)
2019-09-04 11:51:31 -07:00
Jose Maria Casanova Crespo
0504bff354 mesa: recover target_check before get_current_tex_objects
At compressed_tex_sub_image we only can obtain the tex_object after
compressed_subtexture_target_check is validated for TEX_MODE_CURRENT.
So if the target is wrong the error is raised to the user.

This completes the fix for the regression introduced on "mesa: refactor
compressed_tex_sub_image function" of the pending failing tests:

dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d
dEQP-GLES31.functional.debug.negative_coverage.get_error.texture.compressedtexsubimage3d

v2: Fix warning that texObj might be used uninitialized (Gert Wollny)

Fixes: 7df233d68d ("mesa: refactor compressed_tex_sub_image function")
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
(cherry picked from commit 74a7e3ed3b)
2019-09-04 11:51:25 -07:00
Samuel Pitoiset
637a9cbd3b radv: force enable VK_AMD_shader_ballot for Wolfenstein Youngblood
This gives a nice boost, +20% at this time on my Vega 56. Shader
ballot should be enabled by default at some point but it reduces
performance a bit (-6%) with Wolfeinstein II. Enable it only for
Youngblood at the moment, like what we did for Talos in the past.

As a bonus point, it gets rid of some minor artifacts that only
happens when ballot is disabled for some reasons.

Cc: 19.2 <mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit a6ad9e8ccf)
2019-09-04 11:51:15 -07:00
Samuel Pitoiset
690f050608 radv: add a new debug option called RADV_DEBUG=noshaderballot
Shader ballot will be enabled by default for Wolfenstein
Youngblood. This follows what we did for sisched.

Cc: 19.2 <mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit f202ac27a9)
2019-09-04 11:50:59 -07:00
Samuel Pitoiset
3ab1368c4f radv: allow to enable VK_AMD_shader_ballot only on GFX8+
Scans aren't implemented on SI/CIK.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit e73d863a66)
2019-09-04 11:50:53 -07:00
Danylo Piliaiev
71daf2ef67 nir/loop_unroll: Prepare loop for unrolling in wrapper_unroll
Without loop_prepare_for_unroll loops are losing phis.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111411
Fixes: 5db98195 "nir: add loop unroll support for wrapper loops"
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
(cherry picked from commit 84b3ef6a96)
2019-09-04 11:50:48 -07:00
Bas Nieuwenhuizen
614def1a89 radv: Emit VGT_GS_ONCHIP_CNTL for tess on GFX10.
Otherwise hangs are possible. This register was already set for
GS and NGG.

Fixes: 5eaed7ecfc "radv/gfx10: enable support for NAVI10, NAVI12 and NAVI14"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit e04761d0f9)
2019-09-04 11:50:42 -07:00
Bas Nieuwenhuizen
55334521f7 radv: Use correct vgpr_comp_cnt for VS if both prim_id and instance_id are needed.
Should take the max of the 2.

Fixes: ea337c8b7e "radv/gfx10: fix VS input VGPRs with the legacy path"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 2e763f7c87)
2019-09-04 11:50:37 -07:00
Ilia Mirkin
8ee40f6b63 gallium/vl: use compute preference for all multimedia, not just blit
The compute paths in vl are a bit AMD-specific. For example, they (on
nouveau), try to use a BGRX8 image format, which is not supported.
Fixing all this is probably possible, but since the compute paths aren't
in any way better, it's difficult to care.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Fixes: 9364d66cb7 (gallium/auxiliary/vl: Add video compositor compute shader render)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 958390a9bf)
2019-09-04 11:50:32 -07:00
Marek Olšák
25de459644 radeonsi: consolidate determining VGPR_COMP_CNT for API VS 2019-08-27 16:10:40 -04:00
Marek Olšák
5d7754017c radeonsi/gfx10: set PA_CL_VS_OUT_CNTL with CONTEXT_REG_RMW to fix edge flags
We need two different values of the register, one for NGG and one for
legacy, in order to fix edge flags for the legacy pipeline.

Passing the ngg flag to emit_clip_regs would be too complicated,
so CONTEXT_REG_RMW is used for partial register updates.
2019-08-27 16:10:40 -04:00
Marek Olšák
d23bf14d44 radeonsi/gfx10: remove incorrect ngg/pos_writes_edgeflag variables
It varies depending on si_shader_key::as_ngg.
2019-08-27 16:10:40 -04:00
Marek Olšák
514eb1587e radeonsi: add PKT3_CONTEXT_REG_RMW 2019-08-27 16:10:40 -04:00
Marek Olšák
a935da7cef winsys/amdgpu+radeon: process AMD_DEBUG in addition to R600_DEBUG 2019-08-27 16:10:40 -04:00
Marek Olšák
b9330a6189 radeonsi/gfx10: add AMD_DEBUG=nongg 2019-08-27 16:10:40 -04:00
Marek Olšák
0207c318e0 radeonsi/gfx10: finish up Navi14, add PCI ID 2019-08-27 16:10:40 -04:00
Marek Olšák
e09d469622 radeonsi/gfx10: always use the legacy pipeline for streamout
The best way to prevent GDS hangs is not to use GDS.
2019-08-27 16:10:40 -04:00
Marek Olšák
f208b04dba radeonsi/gfx10: don't initialize VGT_INSTANCE_STEP_RATE_0
Only gfx9 and older use it to get InstanceID in VGPR1.
2019-08-27 16:10:40 -04:00
Marek Olšák
c0716446a4 radeonsi/gfx10: fix InstanceID for legacy VS+GS 2019-08-27 16:10:40 -04:00
Marek Olšák
4d3097f36a radeonsi/gfx10: add as_ngg variant for VS as ES to select Wave32/64
Legacy GS only works with Wave64.
2019-08-27 16:10:40 -04:00
Marek Olšák
a3a266807e radeonsi/gfx10: create the GS copy shader if using legacy streamout 2019-08-27 16:10:40 -04:00
Marek Olšák
beea2dee8a radeonsi/gfx10: fix the PRIMITIVES_GENERATED query if using legacy streamout 2019-08-27 16:10:40 -04:00
Marek Olšák
78c603ebf5 radeonsi/gfx10: fix tessellation for the legacy pipeline
ported from PAL
2019-08-27 16:10:40 -04:00
Marek Olšák
3dec21a8aa radeonsi: move some global shader cache flags to per-binary flags 2019-08-27 16:10:40 -04:00
Marek Olšák
6e07ac3343 radeonsi/gfx10: fix the legacy pipeline by storing as_ngg in the shader cache
It could load an NGG shader when we want a legacy shader and vice versa.
2019-08-27 16:10:40 -04:00
Emil Velikov
c0b9399d9d Update version to 19.2.0-rc1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2019-08-20 23:18:37 +01:00
Erico Nunes
71fb721ca5 lima/ppir: use ra_get_best_spill_node to select spill node
ra_get_best_spill_node is what other users of the mesa register
allocator use.
Switching to it now also fixes an infinite loop issue with ppir regalloc
with the ppir control flow patchset, and also provides a small gain over
the previous herusitic on number of spilled nodes testing with
shader-db.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-20 21:16:02 +00:00
Eric Anholt
c1dc84e71d tgsi: Remove unused tgsi_check_soa_dependencies().
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-08-20 13:31:13 -07:00
Eric Anholt
4ebe6b2e72 tgsi: Drop the SSE2 constants setup that's been dead code since 2011.
The SSE2 executor was removed in 4eb3225b38 ("Remove tgsi_sse2.")

Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-08-20 13:31:13 -07:00
Eric Anholt
98c58355d3 tgsi: drop a stale comment
This was fixed in 912ed84f83 ("tgsi: move to using vector for system
values.")

Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-08-20 13:31:13 -07:00
Eric Anholt
553cd82d64 gitlab-ci: Enable the GLES2/3 CTS on softpipe.
The GLES2 CTS takes about 8 minutes of total runtime (at parallel 4 is
~2 minutes in the test stage if runners are free), while GLES3 takes
about 25.  Since the GLES3 run is pretty expensive, just do a cheap
touch test of 1 out of every 10 tests in the test list on MRs, until
we can get the runtime down.

v2: Drop the full run for now until we can bring runtime down or bring
    up a dedicated mesa runner.

Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v1)
Reviewed-By: Gert Wollny <gert.wollny@collabora.com> (v1)
2019-08-20 13:31:13 -07:00
Jose Maria Casanova Crespo
6c904773fe mesa: reverse no_error on compressed_tex_sub_image for TEX_MODE_CURRENT
This fixes the regression introduced on "mesa: refactor
compressed_tex_sub_image function" that started to crash
KHR-GLES2.texture_3d.compressed_texture.negative_compressed_tex_sub_image

Fixes: 7df233d68d ("mesa: refactor compressed_tex_sub_image function")
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-20 20:45:21 +01:00
Adam Jackson
b283919398 glx: Eliminate glx_config::{rgb,float,colorIndex}Mode
These are redundant with glx_config::renderType, let's just use that
consistently.
2019-08-20 14:05:07 -04:00
Adam Jackson
74ca87e4bc glx: Remove unused glx_config::pixmapMode
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-20 14:05:03 -04:00
Adam Jackson
35fc7bdf0e glx: convert glx_config_create_list to one big calloc
Simpler, less failure prone, less malloc overhead, what's not to like.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-20 14:05:01 -04:00
Adam Jackson
97d58eabcc glx: convert a malloc+memset to calloc
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-20 14:04:59 -04:00
Adam Jackson
cabd09c9e7 glx: Fix parameter documentation of glx_config_create_list
'minimum_size' is not, in fact, an argument to this function.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-20 14:04:56 -04:00
Arcady Goldmints-Orlov
3835535537 anv: inline uniforms blocks don't count toward descriptor set limits
In a descriptor set inline uniform blocks don't use up any bindings.
However, the presence of any inline uniform blocks doed require the
use of the descriptor buffer, which takes up one binding.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 16:48:45 +00:00
Daniel Schürmann
df86c5ffb3 nir: add divergence analysis pass.
This pass expects the shader to be in LCSSA form.
The algorithm is based on 'The Simple Divergence Analysis' from
Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira.
Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-20 17:40:13 +02:00
Rhys Perry
7b07034931 nir/subgroups: Lower clustered reductions with cluster_size >= subgroup_size into reductions
The behavior for reductions with cluster_size >= subgroup_size is implementation defined.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 17:40:10 +02:00
Rhys Perry
911a1dfad2 nir/lcssa: allow to create LCSSA phis for loop-invariant booleans
ACO depends on LCSSA phis for divergent booleans to work correctly.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 17:40:05 +02:00
Daniel Schürmann
9c40ad49d5 nir/lcssa: Skip loop invariant variables when converting to LCSSA.
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 17:40:01 +02:00
Rhys Perry
8a6cfaa15a nir: make nir_to_lcssa() a general NIR pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 17:39:54 +02:00
Daniel Schürmann
204846ad06 nir/lcssa: handle deref instructions properly
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 414148cdc1 "nir: Support deref instructions in loop_analyze"
2019-08-20 17:39:52 +02:00
Jose Maria Casanova Crespo
7c56a68c8b tgsi_to_nir: only update TGSI properties of the current shader stage
The implementation introduced in "tgsi_to_nir: be careful about not
losing any TGSI properties silently (v2)" updates all the TGSI properties,
but it didn't take into account that the shader_info structure uses a union
to store the different attributes for each shader stage.

Now we only update the attributes if they affect current shader stage,
avoiding to overwrite members of the union that should be overwritten.
This has created hundreds of regressions in v3d.

For example the TGSI_PROPERTY_VS_BLIT_SGPRS_AMD was overwritting the
same position used by TGSI_PROPERY_CS_FIXED_BLOCK_DEPTH.

Fixes: e300365197 ("tgsi_to_nir: be careful about not losing any TGSI properties silently (v2)")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-20 10:30:21 +00:00
Samuel Pitoiset
83a63a5b12 radv/gfx10: do not emit PA_SC_TILE_STEERING_OVERRIDE twice
CLEAR_STATE emits it for us.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-20 12:13:44 +02:00
Samuel Pitoiset
2ca8629fa9 radv: do not emit PKT3_CONTEXT_CONTROL with AMDGPU 3.6.0+
It's emitted by the kernel.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-20 12:13:41 +02:00
Gert Wollny
6a09405368 mesa/program: Take ARB_framebuffers_no_attachments into account in wpos correction
If a drawbuffer is an fbo without an attachment then its 'Height' will be zero,
and we have to take its 'DefaultGeometry.Height' into account.

Fixes on softpipe (with the exception of tests that use multisample):
  dEQP-GLES31.functional.fbo.no_attachments.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-20 10:04:24 +02:00
Sagar Ghuge
fe0e9db797 iris: Enable non coherent framebuffer fetch on broadwell
v2: Use GEN_GEN in iris_state (Kenneth Graunke)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-20 00:50:58 -07:00
Sagar Ghuge
57ce422e20 iris: Free resource if failed to allocate surface state
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-20 00:50:55 -07:00
Sagar Ghuge
02244bc515 iris: Pass isl_surf to fill_surface_state
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-20 00:50:45 -07:00
Sagar Ghuge
638a157e02 iris: Add infrastructure to support non coherent framebuffer fetch
Create separate SURFACE_STATE for render target read in order to support
non coherent framebuffer fetch on broadwell.

Also we need to resolve framebuffer in order to support CCS_D.

v2: Add outputs_read check (Kenneth Graunke)

v3: 1) Import Curro's comment from get_isl_surf
    2) Rename get_isl_surf method
    3) Clean up allocation in case of failure

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-20 00:50:44 -07:00
Sagar Ghuge
61c0637afb iris: Add helper functions to get tile offset
All helper functions are ported from i965 driver.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-20 00:50:43 -07:00
Sagar Ghuge
7e816991cc iris: Add helper function to get isl dim layout
v2: Add missing space (Caio)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-20 00:50:41 -07:00
Sagar Ghuge
58471e20d2 iris: Add render target read entry in binding table
This will be used in next patches for supporting non coherent
framebuffer fetch on Broadwell.

v2: Fix comment (Kenneth Graunke)

v3: 1) Fix a few nits (Caio)
    2) Add comment (Caio)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-20 00:50:31 -07:00
Kai Wasserbäch
1abe87383e build: Bump C++ standard requirement to C++14 to fix FTBFS with LLVM 10
When building Mesa against a recent LLVM 10 with C++11, the build fails
if the AMD common code is built as well due to "std::index_sequence"
being undeclared.

LLVM requires a minimum of C++14.

Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
2019-08-20 05:39:19 +00:00
Rob Herring
d0ec5d38f6 panfrost: Add madvise support to BO cache
The kernel now supports madvise ioctl to indicate which BOs can be freed
when there is memory pressure. Mark BOs purgeable when they are in the
BO cache. The BOs must also be munmapped when they are in the cache or
they cannot be purged.

We could optimize avoiding the madvise ioctl on older kernels once the
driver version bump lands, but probably not worth it given the other
driver features also being added.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2019-08-19 19:33:20 -05:00
Rob Herring
c45c2d7960 panfrost: Sync UAPI header from kernel
Sync the panfrost_drm.h UAPI header with the latest from the kernel.
This adds madvise ioctl and GPU feature params.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2019-08-19 19:33:20 -05:00
Pierre-Eric Pelloux-Prayer
0f07d18e48 mesa: add ext_dsa GetMultiTexLevelParameterEXT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-19 18:50:08 -04:00
Pierre-Eric Pelloux-Prayer
e8c5dc9c24 mesa: add EXT_dsa glCompressedMultiTex* functions display list support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-19 18:50:07 -04:00
Pierre-Eric Pelloux-Prayer
1cb8e12717 mesa: add EXT_dsa glCompressedMultiTex* functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-19 18:50:05 -04:00
Pierre-Eric Pelloux-Prayer
a886025ef5 mesa: add EXT_dsa glCompressedTex* functions display list support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-19 18:50:03 -04:00
Pierre-Eric Pelloux-Prayer
8c76221886 mesa: add EXT_dsa glCompressedTexture(Sub)Image1D/2D/3D functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-19 18:49:57 -04:00
Pierre-Eric Pelloux-Prayer
7df233d68d mesa: refactor compressed_tex_sub_image function
Combine compressed_tex_sub_image, compressed_tex_sub_image_error and
compressed_tex_sub_image_no_error in a single function.

The added "enum tex_mode mode" parameter allows to implement the
DSA / non-DSA variants and their error/no_error combination.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-19 18:49:43 -04:00
Bas Nieuwenhuizen
6c5d983865 radv: Add Renoir support.
Took the freedom to enable dfsm even though I don't have benchmark
results yet, but it seems Raven-like.

Rest is from radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-19 22:34:11 +00:00
Marek Olšák
223b3174bd radeonsi/nir: always lower ballot masks as 64-bit, codegen handles it
This fixes KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks.

This solution is better, because the IR isn't dependent on wave32.
2019-08-19 17:23:38 -04:00
Marek Olšák
5d37194d43 radeonsi: remove the unsafemath debug option
unlikely to be used in the future

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-19 17:23:38 -04:00
Marek Olšák
5586411de4 radeonsi/nir: fix counting shader inputs & outputs 2019-08-19 17:23:38 -04:00
Marek Olšák
452cb7055f radeonsi/nir: fix assertion in si_nir_load_sampler_desc 2019-08-19 17:23:38 -04:00
Marek Olšák
1f8a661748 radeonsi: clean up si_llvm_context_set_tgsi
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-19 17:23:38 -04:00
Marek Olšák
43f8b5642b radeonsi: allocate and resize global_buffers as needed
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-19 17:23:38 -04:00
Marek Olšák
c315cb509d radeonsi/gfx10: don't set PA_SC_TILE_STEERING_OVERRIDE if CLEAR_STATE sets it
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-19 17:23:38 -04:00
Marek Olšák
5a2e65be89 radeonsi: don't emit PKT3_CONTEXT_CONTROL on amdgpu
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-19 17:23:38 -04:00
Marek Olšák
8d0d753bd0 radeonsi: fix an assertion failure: assert(!res->b.is_shared)
This only appears to happen on Raven2.

Possible way to reproduce:

resource_get_handle(WINSYS_HANDLE_TYPE_KMS) --> sets is_shared = true
resource_get_handle(WINSYS_HANDLE_TYPE_DMABUF) --> fail

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
2019-08-19 17:23:38 -04:00
Marek Olšák
bdcbac9459 radeonsi: handle the use_ngg_streamout flag in si_update_ngg 2019-08-19 17:23:38 -04:00
Marek Olšák
a6b3ca1c70 radeonsi: move the tess factor ring size assertion to a place where it matters
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-19 17:23:38 -04:00
Marek Olšák
21217efdfe ac/nir: set image=true when loading FMASK for images
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-19 17:23:38 -04:00
Christian Gmeiner
f52b9218ff etnaviv: rs: add support for 64bpp clears
Starting with HALTI2 the RS supports 64bpp clears.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <philipp.zabel@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-19 22:36:45 +02:00
Christian GMEINER
7492685b1b etnaviv: update headers from rnndb
Update to etna_viv commit c51353e.

Signed-off-by: Christian GMEINER <christian.GMEINER@bachmann.info>
Reviewed-by: Philipp Zabel <philipp.zabel@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-19 22:36:45 +02:00
Eric Anholt
1395503424 swrast: Make the fetch funcs table sparse.
This shrinks the table, avoids needing to update the table with NULL
entries on every MESA_FORMAT addition, and removes a surprising,
non-unit-tested format number ordering dependency.

Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-08-19 11:48:03 -07:00
Eric Anholt
c45c33a5a2 gallium: Remove manual defining of PIPE_FORMAT enum values.
Now that SVGA doesn't have a table that has to be in PIPE_FORMAT
order, we can let the enums have whatever values they naturally would
without worrying about holes.

Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-08-19 11:48:01 -07:00
Eric Anholt
84db6ba740 svga: Drop unsupported formats from the format table.
Now that we're using the array initializers, we don't need to manually
fill out all these stub entries.

Produced with "sed -i '/.*INVALID.*INVALID.*INVALID/d'
src/gallium/drivers/svga/svga_format.c"

Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-08-19 11:43:02 -07:00
Eric Anholt
ef37da52c0 svga: Remove duplication in the format table.
By using the [ ] = {} array initializer syntax, we no longer need the
entries to be listed in PIPE_FORMAT_* value order.  This means that
people adding new gallium formats don't need to cargo-cult changes to
this driver or regress that non-unit-tested requirement.

While I'm here, drop the lines for formats that no longer exist (the
numbered ones in the table).

Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-08-19 11:42:55 -07:00
Eric Anholt
42efa789b5 svga: Factor out the format conversion table entry lookup.
Seemed like a sensible cleanup, while I was looking at whether I could
make the table sparse.

To make the svga table not require fixups on every new gallium format,
we may want to change how it's populated.

Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-08-19 11:42:36 -07:00
Jason Ekstrand
5167e94f23 nir: Add more source types to nir_tex_instr_src_type
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 17:03:34 +00:00
Alyssa Rosenzweig
2bb4dc4054 pan/midgard: Compute liveness per-block
Rather than using a regalloc based on live internals, computed hastily
with repeated invocations of a forward-analysis pass, we switch to
compute liveness information on a per-block basis.

Within a given basic block, we compute liveness backwards with a
linear-time algorithm; for common shaders, this may help RA terminate
quicker.

Across blocks, we use a work list (really a work set) and check if we're
making progress. This isn't terribly efficient, but it gets the job
done. Point is, we get the live_in/live_out for each block.

From there, it's simple to rerun the linear-time update algorithm to
compute the interference graph.

The benefit of this technique is the ability to ignore "gaps" in
liveness across intermediate blocks that are never executed. On simple
shaders like the loops in glmark, this results in a minor reduction in
register pressure. The motivation was a complex shader in Krita that
failed register allocation due to an unfortunate interaction between
texture pipeline registers and control flow. This shader now compiles
successfully.

total instructions in shared programs: 3439 -> 3438 (-0.03%)
instructions in affected programs: 22 -> 21 (-4.55%)
helped: 1
HURT: 0

total bundles in shared programs: 2077 -> 2076 (-0.05%)
bundles in affected programs: 12 -> 11 (-8.33%)
helped: 1
HURT: 0

total quadwords in shared programs: 3457 -> 3456 (-0.03%)
quadwords in affected programs: 20 -> 19 (-5.00%)
helped: 1
HURT: 0

total registers in shared programs: 341 -> 338 (-0.88%)
registers in affected programs: 9 -> 6 (-33.33%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
24c91bb54b pan/midgard: Analyze load/store for swizzle propagation
If there's a nontrivial swizzle fed into an extra (shortened) argument,
we bail on copyprop. No glmark changes (since it doesn't use fancy
texturing/loads).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
9ae4d3653e pan/midgard: Treat cubemaps "stores" as loads
It's always been ambiguous which they are, but their primary register is
their output, not their input; therefore, they are loads.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
20dd482668 pan/midgard: Clamp cubemap swizzle to XYXX
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
2788721cc4 pan/midgard: Clamp st_vary swizzle by number of components
Same issue with liveness analysis. If we store out a vec3, we should not
reference the .w component.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
edc8e41566 pan/midgard: Use type-appropriate swizzle for texture coordinate
The texture coordinate for a 2D texture could be a vec2 or a vec3,
depending if it's an array texture or not. If it's vec2 (non-array
texture), we should not reference the z component; otherwise, liveness
analysis will get very confused when z is never written.

v2: Fix typo (Ilia).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
2bcb3d9226 pan/midgard: Set mask for lowered read-hazard moves
If we need to lower a move for a read from a vec2 texture coordinate, we
shouldn't write zw, even incidentally.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
739e09c297 pan/midgard: Fix texw lowering with complex control flow
Fixes shaders with control flow like:

   out = 0;

   if (A) {
      if (B)
         out = texture(A, ...)
   } else {
      out = texture(B, ...)
   }

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
6f1c8c148d pan/midgard: Add mir_rewrite_index_dst_single helper
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
d68019ad1f pan/midgard: Print predecessors in MIR
Just as a sanity check.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
e3a418fe86 pan/midgard: Index blocks for printing
Better than having pointers flying about.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
2f92479ffc pan/midgard: Add mir_foreach_src
This is repeated often enough.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
84580c6dbc pan/midgard: Add mir_foreach_instr_in_block_rev
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
c8c4471a92 pan/midgard: Add mir_foreach_successor helper
Now we should be able to walk the control-flow graph naturally.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
b8e526c520 pan/midgard: Add mir_foreach_predecessor utility
It's ugly, but c'est la vie.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
b4b2e111f8 pan/midgard: Link exit block
The exit block has been 'dangling' in the successors graph, so let's
ensure it's linked in.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
07c960cac0 pan/midgard: Add mir_exit_block helper
The exit block is gauranteed to be empty, signaling the end of the
program.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
aeeeef1242 pan/midgard: Maintain block predecessor set
While we already compute the successors array, for backwards data flow
analysis, it is useful to walk the control flow graph backwards based on
predecessors, so let's compute that information as well.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
4fa09329c1 pan/midgard: Use ralloc on ctx/blocks
This will allow us to get some level of automatic memory management.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig
b59b1793b8 pan/midgard: Shrink successors[] to 2 length
A block can't have more.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 08:32:17 -07:00
Roman Stratiienko
fdd6151612 nir: Add missing dependency in Android.nir.gen.mk
Fixes incremental build with Android

Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-08-19 09:53:18 +03:00
Erico Nunes
99d5bdcfa5 meson: build lima tools as part of 'all' tools
This is primarily so that this build gets tested in CI and we don't
break it again.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-18 22:27:55 +02:00
Connor Abbott
c550d367a7 ac/nir: Fix store_scratch with a non-full writemask
By adding one more helper to ac_llvm_build, we can also easily keep
vector stores together.

Fixes the
tests/spec/glsl-1.30/execution/fs-large-local-array-vec4.shader_test
piglit test.

Fixes: 74470baebb ("ac/nir: Lower large indirect variables to scratch")
Reviewed-by: Marek Olšák <marek.olsak@amd.com
2019-08-18 15:15:45 +02:00
Vasily Khoruzhick
0e394cda0d glsl/standalone: init shader stage in init_gl_program()
Otherwise lima standalone compiler fails when trying to compile fragment
shader with:

lima_compiler: ../src/compiler/nir/nir.c:55: nir_shader_create: Assertion `si->stage == stage' failed

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-17 11:14:40 -07:00
Jason Ekstrand
16edd02bfa iris: Only request an input mask if the shader needs it
Fixes: aebca3961b "iris: Fix handling of SIMD32 fragment shaders"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-16 19:59:42 -05:00
Xiong, James
dcad15ff54 gallium: add back YVU support
PIPE_FORMAT_YV12 is not handled so switching to PIPE_FORMAT_IYUV and
adding back YVU support.

Signed-off-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-16 13:24:49 -07:00
Erico Nunes
7a51abab42 lima: actually wait for bo in lima_bo_wait
PIPE_TIMEOUT_INFINITE is unsigned and gets assigned to signed fields
where it ends up as -1. When this reaches the kernel as a timeout it
gets translated as no timeout, which cause the waiting functions to
return immediately and not actually wait for a completion.

This seems to cause unstable results with lima where even piglit tests
randomly fail.

Handle this by setting the signed max value in case of infinite timeout.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-16 16:31:29 +02:00
Rhys Perry
0a790c3019 nir/algebraic: add a few masking-before-unpack optimizations
Helps some Dawn of War 3 and F1 2017 shaders with ACO:
Totals from affected shaders:
SGPRS: 2136 -> 2128 (-0.37 %)
VGPRS: 1624 -> 1628 (0.25 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 168068 -> 164332 (-2.22 %) bytes
LDS: 44 -> 44 (0.00 %) blocks
Max Waves: 222 -> 221 (-0.45 %)
Wait states: 0 -> 0 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-16 12:13:01 +01:00
Vasily Khoruzhick
861c2b8d31 lima: fix compilation of standalone compiler
Fixes: e0aeee946004("lima: add summary report for shader-db")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-15 16:59:51 -07:00
Bas Nieuwenhuizen
b9fb90e6d3 Revert "radv/gfx10: Enable DCC for storage images."
Quite useless without DCC for LAYOUT_GENERAL.

Fixes: b4dad3afaa Revert "radv: Do not decompress on LAYOUT_GENERAL."
Acked-by: Dave Airlie <airlied@redhat.com>
2019-08-16 01:22:54 +02:00
Bas Nieuwenhuizen
b4dad3afaa Revert "radv: Do not decompress on LAYOUT_GENERAL."
Causes issues with a bunch of games with DXVK.

Fixes: 50add1b33a "radv: Do not decompress on LAYOUT_GENERAL."
Acked-by: Dave Airlie <airlied@redhat.com>
2019-08-16 01:22:35 +02:00
Dave Airlie
f3af7886fe mesa: add support for CET to x86/x86-64 asm files.
Control-flow enforcement technology is a new instructions on x86
processors to denote where indirect jumps can land. Gcc auto adds
the instruction (which encodes as a NOP on older CPUs) to entrypoints
but assembler files need manual adding. This adds it to all the
entry points in the mesa x86/x86-64 assembler files.

This will only happen if mesa is built with the -fcf-protection flag
to gcc as some distros are wanting to do.

Acked-by: Eric Anholt <eric@anholt.net>
2019-08-16 09:00:35 +10:00
Alyssa Rosenzweig
78eda70892 pan/bifrost: Manually constant fold register class
Fixes errors for some people building Mesa:

../src/panfrost/bifrost/bifrost_sched.c:32:31: error: initializer
element is not constant
 const unsigned max_vec2_reg = max_primary_reg / 2;

../src/panfrost/bifrost/bifrost_sched.c:33:31: error: initializer
element is not constant
 const unsigned max_vec3_reg = max_primary_reg / 4; // XXX: Do we need
to align vec3 to vec4 boundary?

../src/panfrost/bifrost/bifrost_sched.c:34:31: error: initializer
element is not constant
 const unsigned max_vec4_reg = max_primary_reg / 4;

../src/panfrost/bifrost/bifrost_sched.c:35:32: error: initializer
element is not constant
 const unsigned max_registers = max_primary_reg +

../src/panfrost/bifrost/bifrost_sched.c:40:28: error: initializer
element is not constant
 const unsigned vec2_base = primary_base + max_primary_reg;

../src/panfrost/bifrost/bifrost_sched.c:41:28: error: initializer
element is not constant
 const unsigned vec3_base = vec2_base + max_vec2_reg;

../src/panfrost/bifrost/bifrost_sched.c:42:28: error: initializer
element is not constant
 const unsigned vec4_base = vec3_base + max_vec3_reg;

../src/panfrost/bifrost/bifrost_sched.c:43:27: error: initializer
element is not constant
 const unsigned vec4_end = vec4_base + max_vec4_reg;

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-15 19:06:35 +00:00
Erik Faye-Lund
18ab42644b gallium/util: widen type before multiplication
This method returns size_t, but the multiplication multiplies two
integers, leading to overflow rather than type widening.

Noticed by compiling with MSVC, which emits a warning.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-15 20:23:53 +02:00
Erik Faye-Lund
0091f62978 mesa: avoid warning on Windows
On Windows, p_atomic_inc_return returns an unsigned long long rather
than the type the pointer refers to, so let's make sure we cast the
result to the right type. Otherwise, we'll trigger a warning about
the wrong format-string for the type.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-15 20:23:49 +02:00
Erik Faye-Lund
544b088616 win32: unify strcasecmp definitions
There was two incompatible definitions of strcasecmp, which lead to a
compiler warning. Let's clean this up by only leaving one of them, and
using that one all the time.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-15 20:23:44 +02:00
Erik Faye-Lund
ecd312be96 mesa/main: avoid warning when casting offset to pointer
This generates a warning on some 64-bit systems, so let's cast to a
properly sized integer first.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-15 20:23:39 +02:00
Erik Faye-Lund
c646cd4bac nir: avoid warning when casting bogus pointer
This intentionally-bogus pointer generates a warning on some 64-bit
systems, so let's cast to a properly-sized integer first.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-15 20:23:35 +02:00
Erik Faye-Lund
b355eef920 glsl: fixup u64-warning
Similarly to the unsigned-version, we need to first cast the result to a
suiting integer before negating the number, otherwise we'll trigger a
warning.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-15 20:23:13 +02:00
Kenneth Graunke
f741de236b isl: Enable Unorm Path in Color Pipe
Improves performance on my Icelake 8x8 locked to 700Mhz.  For example,
some GfxBench5 subtests have the following results:

- [i965] gl_manhattan: ................ 7.01119% +/- 0.180971% (n=5)
- [i965] gl_4 (Car Chase):              4.24351% +/- 0.175622% (n=5)
- [i965] gl_blending:  ................ 3.36327% +/- 0.180267% (n=5)
- [i965] gl_5_normal (Aztec Ruins):     1.67962% +/- 0.243534% (n=10)
- [iris] gl_manhattan: ................ 3.92357% +/- 0.073965% (n=25)
- [iris] gl_4 (Car Chase):              2.17746% +/- 0.0826858% (n=5)
- [iris] gl_blending:  ................ 2.79599% +/- 0.803652% (n=15)
- [iris] gl_5_normal (Aztec Ruins):     1.30930% +/- 0.106523% (n=25)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-15 10:39:09 -07:00
Rafael Antognolli
ceeaf93c8e anv: Properly initialize device->slice_hash.
When subslices_delta == 0 and we take the early return,
device->slice_hash is not initialized on GEN11. It then causes a
segfault when going through anv_DestroyDevice, if compiled with
valgrind.

Fixes: 7bc022b4bb ("anv/gen11: Emit SLICE_HASH_TABLE when pipes are
                    unbalanced.)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-15 09:42:48 -07:00
Danylo Piliaiev
72354d43d4 intel/compiler: Fix resource leak in error path
CID: 1452261

Fixes: 04a99515 "intel/compiler: add ability to override shader's assembly"

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-08-15 08:17:36 +00:00
Alyssa Rosenzweig
44a6c38bd6 panfrost: Implement native RECT textures
We started honouring the normalized_coords flag in the texture
descriptor, but a bisection revealed that broke RECT textures -- since
we were *also* lowering them in the shader. So just remove the
shader-based lowering, use native RECT textures, and enjoy the nominal
reduction in complexity and performance boost.

Fixes: 3e47a1181b ("panfrost: Add MALI_SAMP_NORM_COORDS flag")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:57:42 -07:00
Alyssa Rosenzweig
6fe4822cca panfrost: Add R10G10B10A2_SSCALED vertex format
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig
e823a47f02 pan/midgard: Disassemble UBO index explicitly
It's a bit of a special case but that's fine.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig
3d54ed2488 pan/midgard: Account for unaligned UBOs when promoting uniforms
We only know how to promote aligned accesses, although theoretically we
should be able to promote unaligned to swizzles in the future. Check
this.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig
03350eb8b8 pan/midgard: Add mir_ubo_shift helper
Different UBO reads have different shift requirements.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig
cf3bb10f51 pan/midgard: Address emit_ubo_read offset in bytes
We'll want to be smarter about unaligned reads, so let's get this code
all in one place.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig
65e6cb4eb0 pan/midgard: Wire writemask into UBO reads
Helps the disassembly be clearer and maybe regalloc be smarter.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig
ec2f0b580f pan/midgard: Identify UBO/SSBO op symmetry
It's the same thing, just shifted.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig
375d4c2c74 panfrost: Extend blending to MRT
Our hardware supports independent (per-RT) blending, but we need to
route those settings through from Gallium.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig
dff4986b1a pan/midgard: Emit store_output branch just-in-time
We'll need multiple branches for MRT, so we can't defer. Also, we need
to track dependencies to ensure r0 is set to the correct value for each
store_output.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig
2fc44c4dc8 pan/midgard: Add dont_eliminate flag
We need to treat fragment writes specially.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig
6ed3843224 pan/mfbd: Stuff in RT count
Fixes DATA_INVALID_FAULTs with multiple render targets.

We do always allocate space for 4 cbufs just to keep things sane. This
may not be strictly necessary.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig
716be7862e pan/decode: Dump FBD tagged pointer
Turns out the rt count is stuffed in here..

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig
358372b256 pan/decode: Decode invalid access type upon fault
We don't have a good way to confirm this, but it parallels the kernel
definitons for MMU faults nicely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:42:39 -07:00
Alyssa Rosenzweig
f5cc5ef404 pan/decode: Fix duplicate heap_end property
This was supposed to read heap_start. It's the same value but still,
better get this right.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:42:39 -07:00
Alyssa Rosenzweig
b78e04c17b panfrost: Note "MFBD preload disable" bit
It's a chicken bit, as far as I can tell. Buck buck.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 16:39:57 -07:00
Alyssa Rosenzweig
64720d1e9e pan/bifrost: Link in compiler
We enable the standalone compiler, build the new files, and let it
blast.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig
b93fa7d232 pan/bifrost: Check in remainder of the Bifrost compiler
What it says on the tin.

Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig
0e126aa0f0 pan/bifrost: Add bifrost_print.c/h
IR printers.

Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig
d8d8b08fe5 pan/bifrost: Style format the disassembler
$ astyle *.c *.h --style=linux -s8

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig
fca491c0e1 pan/bifrost: Stub out standalone compiler
We don't actually have a standalone compiler in-tree yet, but let's get
prepared for when we do.

Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig
62bbc23da5 pan/bifrost: Sync disassembler with Ryan's tree
The disassembler was updated to move common code with the compiler into
a shared header. Additional, some new ops and control registers relating
to rounding were added.

Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig
b73cbd6880 panfrost: Remove standalone pandecode tool
Now that panwrap has gained the ability to trace directly without
dumping to the filesystem, there's no need to lug around this tool.

I can assure you nobody will miss it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig
6f4d796911 pan/midgard: Fix disassembly termination condition
Fixes: 863bdd1f8d ("pan/midgard: Break, not return, in disassembler")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig
de2efd5ea7 panfrost: Ensure we upload at least 1 blend RT
Otherwise we'll get memory junk.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig
54438267c3 panfrost: Zero tripipe on initialize
I don't think the hardware cares, but this adds a lot of noise to traces
that we would rather not need to look at.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig
1ab6290746 pan/midgard: Improve disassembler robustness
Some memory corruption / etc issues let to an accidental "fuzzing" of
the disassembler ;) This uncovered some issues leading to a disassembler
hang, so let's fix that.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig
9c4c7211a3 pan/decode: Split public.h out
We want a defined ABI for tracing; this set of functions should be as
small as strictly necessary to minimize panwrap shenanigans.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig
4f03728fb7 pan/decode: Prefer uint64_t to mali_ptr
This removes an unwanted dependency on panfrost-job.h

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig
6c84a2665c pan/midgard: Allocate spill_slot once
Multiple spill moves share a single spill slot. Issue found in Krita.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 14:58:34 -07:00
Alyssa Rosenzweig
2a9031ea44 pan/midgard: Use hint on midgard_instruction for spill_move
This allows us to have multiple spill moves, whereas otherwise for N
spill moves, the first N-1 would be clobbered. Issue found in Krita.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 14:58:34 -07:00
Alyssa Rosenzweig
3e6f2e7aba panfrost: Remove panfrost_add_dependency asserts
It doesn't... make a ton of sense to need to assert and this routine is
hotter than you might expect. Doesn't matter for release builds, of
course.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 14:58:34 -07:00
Marek Olšák
aafc95ceb6 radeonsi: add support for Renoir
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-14 17:31:04 -04:00
Eric Engestrom
a3d6024199 meson: add nir tests to the compiler/nir test suite
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-14 22:17:06 +01:00
Eric Engestrom
d0916edfcb EGL: sync headers with Khronos
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-14 21:48:23 +01:00
Christian Gmeiner
2c4fe6af78 relnotes: Add new ext on etnaviv for 19.2.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-14 21:47:35 +02:00
Christian Gmeiner
17200bb67a etnaviv: fix weird indentation
Fixes: 797a2e4fd0 ("etnaviv: update logic to determine uniform limits")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-14 21:29:48 +02:00
Ian Romanick
0e6581b87d nir/algebraic: Reassociate shift-by-constant of shift-by-constant
v2: After some review discussion with Alyssa, the replacements now
correct account for cases where (b+c) >= bitsize.

v3: Use a temporary to simplify the Python code quite a bit.  Suggested
by Jason.

Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16251155 -> 16249576 (<.01%)
instructions in affected programs: 232627 -> 231048 (-0.68%)
helped: 547
HURT: 1
helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for instructions value: -3.12 -2.65
95% mean confidence interval for instructions %-change: -1.20% -1.06%
Instructions are helped.

total cycles in shared programs: 365924392 -> 365372103 (-0.15%)
cycles in affected programs: 59207053 -> 58654764 (-0.93%)
helped: 497
HURT: 34
helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16
helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82%
HURT stats (abs)   min: 2 max: 424 x̄: 101.03 x̃: 63
HURT stats (rel)   min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06%
95% mean confidence interval for cycles value: -1426.41 -653.77
95% mean confidence interval for cycles %-change: -1.66% -1.15%
Cycles are helped.

total spills in shared programs: 8870 -> 8871 (0.01%)
spills in affected programs: 104 -> 105 (0.96%)
helped: 0
HURT: 1

Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11956236 -> 11955635 (<.01%)
instructions in affected programs: 94110 -> 93509 (-0.64%)
helped: 106
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4
helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76%
95% mean confidence interval for instructions value: -6.62 -4.72
95% mean confidence interval for instructions %-change: -2.27% -1.64%
Instructions are helped.

total cycles in shared programs: 179296340 -> 178788044 (-0.28%)
cycles in affected programs: 51009603 -> 50501307 (-1.00%)
helped: 82
HURT: 7
helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16
helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11%
HURT stats (abs)   min: 2 max: 8 x̄: 3.14 x̃: 2
HURT stats (rel)   min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10%
95% mean confidence interval for cycles value: -7649.38 -3773.00
95% mean confidence interval for cycles %-change: -2.71% -1.99%
Cycles are helped.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-14 11:15:37 -07:00
Ian Romanick
73aaeac0a3 nir/algebraic: Reassociate add-and-shift to be shift-and-add
A common thing in many shaders:

    uniform vs { vec4 bones[...]; };

    ...

    x = some_calculation(bones[i + 0]);
    y = some_calculation(bones[i + 1]);
    z = some_calculation(bones[i + 2]);

This turns into stuff like

    vec1 32 ssa_12 = iadd ssa_11, ssa_0
    vec1 32 ssa_13 = ishl ssa_12, ssa_3
    vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0)
    vec1 32 ssa_15 = iadd ssa_11, ssa_1
    vec1 32 ssa_16 = ishl ssa_15, ssa_3
    vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0)
    vec1 32 ssa_18 = iadd ssa_11, ssa_2
    vec1 32 ssa_19 = ishl ssa_18, ssa_3
    vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0)

By reassociating the shift and the add, we can reduce this to

    vec1 32 ssa_12 = ishl ssa_11, ssa_3
    vec1 32 ssa_13 = iadd ssa_12, ssa_0
    vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0)
    vec1 32 ssa_16 = iadd ssa_12, ssa_1
    vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0)
    vec1 32 ssa_19 = iadd ssa_12, ssa_2
    vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0)

v2: Add some commentary from Rhys Perry's nearly identical patch.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16277758 -> 16250704 (-0.17%)
instructions in affected programs: 1440284 -> 1413230 (-1.88%)
helped: 4920
HURT: 6
helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4
helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79%
HURT stats (abs)   min: 1 max: 12 x̄: 4.50 x̃: 3
HURT stats (rel)   min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55%
95% mean confidence interval for instructions value: -5.67 -5.31
95% mean confidence interval for instructions %-change: -2.26% -2.16%
Instructions are helped.

total cycles in shared programs: 367118526 -> 365895358 (-0.33%)
cycles in affected programs: 93504145 -> 92280977 (-1.31%)
helped: 2754
HURT: 1269
helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16
helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12%
HURT stats (abs)   min: 1 max: 1500 x̄: 35.85 x̃: 9
HURT stats (rel)   min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75%
95% mean confidence interval for cycles value: -387.31 -220.78
95% mean confidence interval for cycles %-change: -2.11% -1.68%
Cycles are helped.

LOST:   1
GAINED: 1

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-14 11:15:32 -07:00
Andrii Simiklit
ff2225cf88 nir/find_array_copies: Reject copies with mismatched lengths
copy_deref for wildcard dereferences requires the same
arrays lengths otherwise it leads to a crash in optimizations
like 'nir_opt_copy_prop_vars' because these optimizations expect
'copy_deref' just for arrays with the same lengths.

v2: check was moved to 'try_match_deref' to fix aoa cases
                 (Jason Ekstrand <jason@jlekstrand.net>)
v3: -fixed comment
    -the condition merged with other one
                 (Jason Ekstrand <jason@jlekstrand.net>)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111286
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
2019-08-14 18:11:31 +00:00
Alyssa Rosenzweig
c4a4f3db5a pan/midgard: Prefix blobber-db output for grepping
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 10:31:09 -07:00
Alyssa Rosenzweig
5f0f9e1333 pan/midgard: Implement blobber-db
We wire through some shader-db-style stats on the current shader in the
disassemble so we can get a quick estimate of shader complexity from a
trace.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Rob Clark <robdclark@chromium.org>
2019-08-14 10:31:09 -07:00
Alyssa Rosenzweig
863bdd1f8d pan/midgard: Break, not return, in disassembler
We'll want to dump some stats after the shader, and I refuse to use one
teensy little goto.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-14 10:31:09 -07:00
Ian Romanick
f2965fde9b nir/range-analysis: Fail gracefully on non-SSA sources
Tested-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-14 09:02:38 -07:00
Christian Gmeiner
1290cc3e27 etnaviv: split destroy_shader
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-14 15:10:07 +02:00
Christian Gmeiner
f90b23b8c4 etnaviv: split link_shader
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-14 15:10:07 +02:00
Christian Gmeiner
0765a1dd0e etnaviv: split dump_shader
Also this adds the missing impl for etna_dump_shader_nir(..).

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-14 15:10:07 +02:00
Christian Gmeiner
a36d04daa1 etnaviv: mv etnaviv_compiler.c etnaviv_compiler_tgsi.c
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-14 15:10:07 +02:00
Christian Gmeiner
b2da8a8357 etnaviv: correct PIPE_SHADER_CAP_MAX_CONST_BUFFER_SIZE handling
Have a correct answer to GL_MAX_FRAGMENT_UNIFORM_VECTORS and
GL_MAX_VERTEX_UNIFORM_VECTORS.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach l.stach@pengutronix.de
2019-08-14 12:29:56 +02:00
Christian Gmeiner
797a2e4fd0 etnaviv: update logic to determine uniform limits
Taken 1:1 from the header file.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach l.stach@pengutronix.de
2019-08-14 12:29:56 +02:00
Christian Gmeiner
45cb5eee5d etnaviv: put uniform limit determination into own function
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach l.stach@pengutronix.de
2019-08-14 12:29:56 +02:00
Marek Vasut
8f97262cdd etnaviv: Use reentrant screen lock around flush
The flush callback may be called on the same pipe context, and thus
the same stream, from two different threads of execution. However,
etna_cmd_stream_flush{,2}() must not be called on the same stream
from two different threads of execution as that would mess up the
etna_bo refcounting and likely have other ugly side effects.

Fix this by using a reentrant screen lock around the flush callback.

Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2019-08-14 10:36:36 +02:00
Marek Vasut
6bb4b6d078 etnaviv: Add valgrind support
Add Valgrind support for etnaviv to track BO leaks.

Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2019-08-14 10:36:20 +02:00
Marek Vasut
cf92074277 etnaviv: Use hash table to track BO indexes
Use hash table instead of ad-hoc arrays.

Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2019-08-14 10:36:04 +02:00
Marek Vasut
23f5f126d5 etnaviv: Fix double-free in etna_bo_cache_free()
The following situation can happen in a multithreaded OpenGL application.
A BO is submitted from etna_cmd_stream #1 with flags set for read.
A BO is submitted from etna_cmd_stream #2 with flags set for write.
This triggers a flush on stream #1 and clears the BO's current_stream
pointer. If at this point, stream #2 attempts to queue BO again, which
does happen, the BO will be added to the submit list twice. The Linux
kernel driver correctly detects this and warns about it with "BO at
index %u already on submit list" kernel message.

However, when cleaning the BO cache in etna_bo_cache_free(), the BO
which was submitted twice will also be free()d twice, this triggering
a glibc double free detector.

The fix is easy, even if the BO does not have current_stream set,
iterate over current streams' list of BOs before adding the BO to it
and verify that the BO is not yet there.

Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2019-08-14 10:35:48 +02:00
Roman Stratiienko
1ea95e37cc kmsro: Add missing definitions to Android.mk
Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Rob Herring robh@kernel.org
2019-08-14 07:39:53 +00:00
Gert Wollny
742d3c918f softpipe: Add support for ARB_derivative_control
Enables and passes piglits:

spec/ARB_drivative_control/
        dfdx-coarse
        dfdx-dfdy
        dfdx-fine
        dfdy-coarse
        dfdy-fine

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-08-14 07:03:15 +00:00
Vasily Khoruzhick
b579af77f3 lima/ppir: print srcs and dests in ppir_node_print_prog()
Now we have an accessors for ppir src, so it's possible to easily
print all srcs and dests while dumping ppir representation.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-13 22:44:07 -07:00
Vasily Khoruzhick
6920710af5 lima/ppir: use src accessors in ppir regalloc
Get rid of most switch/case by using src accessors

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-13 22:44:07 -07:00
Vasily Khoruzhick
a5e7c12ced lima/ppir: add ppir_node to ppir_src
We'll need it if we want to walk through node sources

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-13 22:43:58 -07:00
Vasily Khoruzhick
afa64a2105 lima/ppir: introduce accessors for ppir_node sources
Sometimes we need to walk through ppir_node sources, common
accessor for all node types will simplify code a lot.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-13 22:38:07 -07:00
Jordan Justen
0f5be81edd iris: Expose aux buffer as 2nd plane w/modifiers
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-13 15:20:47 -07:00
Jordan Justen
246eebba4a iris: Export and import surfaces with modifiers that have aux data
The DRI interface for modifiers with aux data treats the aux data as a
separate plane of the main surface.

When the dri layer requests the plane associated with the aux data, we
save the required information into the dri aux plane image.

Later when the image is used, the dri plane image will be available in
the pipe_resource structure's `next` field. Therefore in iris, we
reconstruct the aux setup from this separate dri plane image when the
image is used.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-13 15:20:47 -07:00
Kenneth Graunke
99c8eb997d iris: Do proper format checks for Y+CCS modifier support
We need to ensure that the DRI image format supports CCS.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-08-13 15:20:47 -07:00
Jordan Justen
51f941c20c iris: Create single bo for surfaces with modifiers and aux data
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-13 15:20:47 -07:00
Jordan Justen
2c7b577e13 iris: Split iris_resource_alloc_aux to enable aux modifiers
Reworks:

 * If the aux-state is not ISL_AUX_STATE_AUX_INVALID, then use memset
   even when memset_value is zero. The hiz buffer initial aux-state
   will be set to invalid, and therefore we can skip the memset. But,
   for CCS it will be set to ISL_AUX_STATE_PASS_THROUGH, and therefore
   the aux data must be cleared to 0 with the memset. Previously we
   would use BO_ALLOC_ZEROED with the CCS aux data, so this memset
   wasn't required. Now, the CCS aux data may be part of the main
   surface. We prefer to not use BO_ALLOC_ZEROED excessively, so the
   memset is needed for the CCS case. (Nanley)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-13 15:20:46 -07:00
Jordan Justen
aad36dfd16 iris: Add aux offset into hiz_address
This is not currently required because the hiz buffer is in a separate
buffer, and therefore the offset is 0. If we combine the aux buffer
with the main surface buffer, then the hiz offset may become non-zero.

Suggested-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-13 15:20:39 -07:00
Marek Olšák
f5e1f9ccef tgsi_to_nir: add assertions for max varying slots
Nine uses GENERIC slots > 31.

Trivial.
2019-08-13 18:15:53 -04:00
Marek Olšák
fad962eddc tgsi_to_nir: expand vec3 system values to vec4
for nir_intrinsic_load_work_group_id

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 18:15:53 -04:00
Marek Olšák
88a511bd42 tgsi_to_nir: fix incorrect number of image src1 components
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 18:15:53 -04:00
Mauro Rossi
37841f52b2 i965/gen11: fix genX_bits.h include path
Instead of "genX_bits.h" use "genxml/genX_bits.h"
as already done in other similar cases

Besides being more correct, it also fixes building error in Android.

Fixes: f0d2923 ("i965/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-08-13 23:58:25 +02:00
Alyssa Rosenzweig
0c56330361 panfrost: Workaround bug in partial update implementation
We can't intersect with empty regions.

Fixes: 65ae86b854 ("panfrost: Add support for KHR_partial_update()")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-13 11:13:48 -07:00
Eric Anholt
46daaca55e gitlab-ci: Run the GLES2 CTS on llvmpipe.
This is the start of doing CTS tests on merges to Mesa master.  We use
the surfaceless platform so that we don't need to bother bringing up
weston or X11.  The surface size is kept low to reduce runtime, but
this comes at the cost of many rendering tests skipping due to
too-small render targets (as we see the impact of Mesa on the shared
runner pool, we can reevaluate this and what set of CTS tests we want
to run).

We split the job up across 4 runners (each at 4 llvmpipe threads), so
that the job can load-balance across our shared runners and finish
sooner (since dEQP is very single-thread-performance bound).

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-13 10:30:01 -07:00
Eric Anholt
ab49873b44 gitlab-ci: Switch the meson-main build type to debugoptimized.
Now that we're running the drivers we build, building with
optimization is important for keeping our runtime down.  Shaves about
4 minutes of runtime off of GLES2 CTS of llvmpipe at 64x64.

v2: Only switch meson-main until we enable CTS for other builds
    on request by Michel.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-13 10:30:01 -07:00
Eric Anholt
9605749f99 gitlab-ci: Set the prefix to ./install instead of the DESTDIR.
If we don't set DESTDIR, then the DEFAULT_DRIVER_DIR built into the
libraries is correct and we don't need to use LIBGL_DRIVERS_PATH and
friends for CI usage.  Incidentally, this moves our installed paths
from /builds/anholt/mesa/install/usr/local/lib (for example) to
/builds/anholt/mesa/install/lib for simplicity.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-13 10:30:01 -07:00
Eric Anholt
f417ced5cc gitlab-ci: Build the CTS in the debian build image.
This will let us reuse the image for test runs.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-13 10:30:01 -07:00
Eric Anholt
86ae3c2186 surfaceless: Fix swrast-path segfault when loader doesn't know driver name.
If we're hitting the swrast fallback path here, it's probably because
we stumbled across a KMS-only device (such as the ASpeed that some of
our CI runners have) that will then return a NULL driver_name.  Don't
crash in that case.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-13 10:30:01 -07:00
Eric Anholt
6a8d39dccd surfaceless: Fix swrast path.
We get a getDrawableInfo() call in the MakeCurrent path, which
platform_device was handling correctly by returning the pbuffer's
width/height but platform_surfaceless segfaulted for.  Reuse
platform_device's implementation.

Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-08-13 10:29:34 -07:00
Eric Anholt
030aa6e184 gitlab-ci: Move around which builds cover which swrast.
I want to enable CI of llvmpipe out of the meson-main build.  So, kick
classic swrast/osmesa to meson-i386, then promote llvmpipe to
meson-main (along with nine, now that classic osmesa isn't keeping it
out of there).

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-13 10:29:34 -07:00
Eric Anholt
b816edcbf4 meson: Don't require DRI classic swrast for OSMesa.
OSMesa doesn't care about this build option, it links against
src/mesa/swrast regardless.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2019-08-13 10:29:34 -07:00
Alyssa Rosenzweig
29cfd154e3 panfrost: Implement transform feedback
Midgard has no hardware support for transform feedback, so we simulate
it in software. Lucky us.

What Midgard does do is write out vertex shader outputs to main memory
unconditonally. Fragment shaders read varyings back from main memory;
there's no on-chip storage for varyings. Whether this was a reasonable
design is a question I will not be engaging in this commit message.

What that does mean is that, in some sense, Midgard *always* does
transform feedback uncondtionally, and there's no way to turn off
transform feedback. Normally, we would allocate some scratch memory
every frame to store the varyings in an arbitrary format (interleaved
for simplicity), and then feed that scratch to the fragment shader and
discard when the rendering completes.

The only difference now is that sometimes, for some buffers, we use a BO
provided to us by Gallium and a format provided by Gallium, instead of
allocating the memory and choosing the format ourselves. This has some
limitations -- in particular, it only works at vec4 granularity, so a
corresponding GLSL linkage patch is needed to correctly implement
transform feedback for non-vec4 types. Nevertheless, given the hardware
already works in this admittedly-bizarre fashion, transform feedback is
"free". Or, at least, it's no more expensive than any other rendering.

Specifically not implemented is dynamically-sized transform feedback
(i.e. with geometry/tesselation shaders).

Spoiler alert: Midgard has no support for geometry *or* tessellation
shaders, despite advertising support. They get compiled to *massive*
compute shaders. How's that for checkbox compliance?

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-13 09:43:41 -07:00
Alyssa Rosenzweig
7c29588c07 panfrost: Increment offsets[] per draw
We have to maintain the internal offset ourselves. Per v3d.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-13 09:43:39 -07:00
Alyssa Rosenzweig
e7a05a601e panfrost: Fixup stream out information per variant
We could probably get away with doing this once per pipe_shader_state
but let's not jump down that rabbit hole quite yet.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-13 09:43:32 -07:00
Alyssa Rosenzweig
5b0a1a4e49 panfrost: Route outputs_written through the compiler
It's there in shader_info, but we need to access it from pan_context.c

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-13 09:43:17 -07:00
Alyssa Rosenzweig
f714eab882 panfrost: Import stream out utility from iris
We'll need this in a moment. Ken's implementation, lightly edited for
Panfrost.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-13 09:43:14 -07:00
Alyssa Rosenzweig
9b2514d6c6 panfrost: Flush when using transform feedback
This is a huge hack to workaround incomplete BO flushing logic, but it's
enough for the dEQP transform feedback tests, and doing the resource
management to get this right is out-of-scope for this patch series.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-13 09:43:11 -07:00
Alyssa Rosenzweig
4b0001c42d panfrost: Set PIPE_CAP_TGSI_TEXCOORD
It doesn't really make sense, since we don't have special texture
coordinate varyings, but it'll make some code simpler for XFB and it
doesn't hurt us, even if I lose a bit of my soul setting it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-13 09:43:09 -07:00
Alyssa Rosenzweig
72fc06df9c panfrost: Wire up statistics for primitives
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN should now be handled.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-13 09:43:04 -07:00
Alyssa Rosenzweig
7c224c1008 panfrost: Implement callbacks for PRIMITIVES queries
We're just going to compute them in the driver but let's get the
structures setup to handle them. Implementation from v3d.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-13 09:42:48 -07:00
Rob Clark
72d086fc36 freedreno/a6xx: move SSBO/image consts to IBO stateobj
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
ab01ab4d4f freedreno/a6xx: move VS driverparams to it's own stateobj
If driver-params are required, we really should emit it on every draw
for correctness.  And if not required, we should emit a DISABLE so that
un-applied state updates from previous draws don't corrupt the const
state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
882d53d8e3 freedreno/ir3+a6xx: same VBO state for draw/binning
Worth ~+20% on gl_driver2

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
4b82d1bbb7 freedreno/a6xx: add fd_emit_take_group()
Which takes ownership of the stateobj.  Useful for streaming state-
objs, to avoid an extra ref/unref

Worth ~5% at gl_driver2

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
4a188e4215 freedreno/ir3: track # of driver params
To avoid emitting unneeded const state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
7f1e3391c6 freedreno/a6xx: move immediates to program stateobj
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
f0b91730a1 freedreno/a6xx: stop using ir3_emit_{vs,fs}_consts()
Should be no functional change.  Next step is to re-arrange various
const state into different stateobjs.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
53667a43c4 freedreno/ir3: push ctx further up call chain
Move more of the code to deal just w/ screen, without requiring ctx.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
4080dfb8af freedreno/ir3: move ring_wfi() further up call chain
Hoist them out of code-paths that will eventually be called directly for
various a6xx+ const related stateobjs.

This ends up duplicating one constlen check in ir3_emit_vs_consts(), to
avoid what could otherwise be an unnecessary WFI on older gens.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
c6fab232c8 freedreno/all: move more emit helpers to screen
framebuffer_barrier() still depends on the ctx, but the rest can move to
screen.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
684f4b5843 freedreno/a3xx-a6xx+ir3: move emit_const* to screen
These don't need to be in context, and we'll need them in screen in a
later patch.  Plus it's a good cleanup.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
566f2281c5 freedreno/a6xx: add fd6_emit_init_screen()
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
e89255b0a5 freedreno/a5xx: add fd5_emit_init_screen()
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:25 -07:00
Rob Clark
d256e3f34a freedreno/a3xx: add fd3_emit_init_screen()
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:25 -07:00
Rob Clark
b9d3f39728 freedreno/a2xx: add fd2_emit_init_screen()
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:08:07 -07:00
Rob Clark
ec0ec641d8 freedreno/a4xx: add fd4_emit_init_screen()
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:08:07 -07:00
Rob Clark
2f94de2372 freedreno/a2xx: call fd2_emit_ib() directly from fd2
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:08:07 -07:00
Rob Clark
eb45422c5f freedreno/a5xx: call fd5_emit_ib() directly from fd5
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:08:07 -07:00
Rob Clark
50e15e1c6f freedreno/a4xx: call fd4_emit_ib() directly from fd4
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:08:07 -07:00
Rob Clark
4326eeac97 freedreno/a3xx: call fd3_emit_ib() directly from fd3
No reason for the indirection when called from a3xx specific code.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:08:07 -07:00
Rob Clark
32014afa44 freedreno/ir3: move VS driver-param emit
Move DP emit to it's own function.  No functional change, just code
motion to prepare for splitting up const state into multiple state-
objs on a6xx.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:08:07 -07:00
Rob Clark
5722149bf1 freedreno/ir3: drop unneeded ir3_ra() args
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:08:07 -07:00
Boris Brezillon
65ae86b854 panfrost: Add support for KHR_partial_update()
Implement ->set_damage_region() region to support partial updates.

This is a dummy implementation in that it does not try to merge
damage rects. It also does not deal with distinct regions and instead
pick the largest quad as the only damage rect and generate up to 4
reload rects out of it (the left/right/top/bottom regions surrounding
the biggest damage rect).

We also do not try to reduce the number of draws by passing all quad
vertices to the blit request (would require extending u_blitter)

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-13 14:41:10 +02:00
Daniel Stone
492ffbed63 st/dri2: Implement DRI2bufferDamageExtension
Add a pipe_screen->set_damage_region() hook to propagate
set-damage-region requests to the driver, it's then up to the driver to
decide what to do with this piece of information.

If the hook is left unassigned, the buffer-damage extension is
considered unsupported.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-13 14:40:45 +02:00
Harish Krupo
a4a8ebe156 egl/dri: Use __DRI2_BUFFER_DAMAGE extension for KHR_partial_update
Use the DRI2 interface callback to pass the damage rects to
the driver.

Signed-off-by: Harish Krupo <harishkrupo@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-13 14:40:31 +02:00
Daniel Stone
bd08a83b09 dri_interface: add DRI2_BufferDamage interface
Add a new DRI2_BufferDamage interface to support the
EGL_KHR_partial_update extension, informing the driver of an overriding
scissor region for a particular drawable.

Based on a commit originally authored by:
Harish Krupo <harish.krupo.kps@intel.com>
renamed extension, retargeted at DRI drawable instead of context,
rewritten description

Signed-off-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-13 14:40:14 +02:00
Harish Krupo
b4345da876 egl/android: Delete set_damage_region from egl dri vtbl
The intension of the KHR_partial_update was not to send the damage back
to the platform but to send the damage to the driver to ensure that the
following rendering could be restricted to those regions.
This patch removes the set_damage_region from the egl_dri vtbl and all
the platfrom_*.c files.
Then upcomming patches add a new dri2 interface for the drivers to
implement

Signed-off-by: Harish Krupo <harishkrupo@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-13 14:39:38 +02:00
Jordan Justen
fc12fd05f5 iris: Implement pipe_screen::resource_get_param
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-13 01:12:30 -07:00
Jordan Justen
3198c5b7bf gallium/dri2: Use pipe_screen::resource_get_param in image queries
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-08-13 01:12:29 -07:00
Jordan Justen
2decad495f gallium/dri2: Support images with multiple planes for modifiers
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-08-13 01:12:29 -07:00
Jordan Justen
6e749a6b2b gallium/dri2: Refactor image property queries
This refactor will let us more easily use
pipe_screen::resource_get_param as an alternative to
pipe_screen::resource_get_handle.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-08-13 01:12:29 -07:00
Jordan Justen
c5c2365455 state_tracker/winsys_handle: Add plane input field
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-08-13 01:12:29 -07:00
Jordan Justen
2066966c10 gallium/dri2: Support creating multi-planar modifier images
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-08-13 01:12:29 -07:00
Jordan Justen
fe06655e86 gallium/dri2: Implement dri2ImageExtension.queryDmaBufFormatModifierAttribs
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-08-13 01:12:29 -07:00
Jordan Justen
0346b70083 gallium/screen: Add pipe_screen::resource_get_param
This function retrieves individual parameters selected by enum
pipe_resource_param. It can be used as a more direct alternative to
pipe_screen::resource_get_handle.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-08-13 01:12:24 -07:00
Iago Toral Quiroga
2353f7f7ef vc4: clamp gl_PointSize to a minimum of 1.0
The OpenGL ES spec requires that the value of gl_PointSize is clamped
to an implementation-dependent range matching what is advertised by
GL_ALIASED_POINT_SIZE_RANGE. For VC4 this is [1.0, 512.0], but the
hardware won't clamp to the minimum side of the range and won't render
points with a size strictly smaller than 1.0 either, so we need to
clamp manually. For points larger than the maximum size of the range
the hardware clamps automatically.

Fixes piglit test:
spec/!opengl 2.0/vs-point_size-zero

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 09:44:54 +02:00
Iago Toral Quiroga
3539bd63dd v3d: clamp gl_PointSize to a minimum of 1.0
The OpenGL ES spec requires that the value of gl_PointSize is clamped
to an implementation-dependent range matching what is advertised by
GL_ALIASED_POINT_SIZE_RANGE. For V3D this is [1.0, 512.0], but the
hardware won't clamp to the minimum side of the range and won't render
points with a size strictly smaller than 1.0 either, so we need to
clamp manually. For points larger than the maximum size of the range
the hardware clamps automatically.

Fixes piglit test:
spec/!opengl 2.0/vs-point_size-zero

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 09:44:54 +02:00
Iago Toral Quiroga
48f5c34301 nir: add a pass to clamp gl_PointSize to a range
The OpenGL and OpenGL ES specs require that implementations clamp the
value of gl_PointSize to an implementation-depedent range. This pass
is useful for any GPU hardware that doesn't do this automatically
for either one or both sides of the range, such as V3D.

v2:
 - Turn into a generic NIR pass (Eric).
 - Make the pass work before lower I/O so we can use the deref variable
   to inspect if we are writing to gl_PointSize (Eric).
 - Make the pass take the range to clamp as parameter and allow it
   to clamp to both sides of the range or just one side.
 - Make the pass report progress.

v3:
 - Fix copyright header (Eric)
 - use fmin/fmax instead of bcsel to clamp (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 09:44:12 +02:00
Iago Toral Quiroga
62e0ca3064 v3d: line length style fixes
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 08:38:19 +02:00
Iago Toral Quiroga
99e9809cab v3d: honor the write mask on store operations
v2:
  - Fix incremental update of the const offset when we need to emit a sequence
    with more than one write because of the writemask.
  - Do not move the tmu write emission to a separate helper.

v3:
  - Get the store writemask before the loop, use ffs to get the first component
    to write and clear writemask bits as we process the components (Eric).
  - Simplified the code that figured out the number of components for the TMU
    config based on the number of tmu writes for stores and atomics.

v4:
  - Code clean-ups (Eric).

Fixes:
KHR-GLES31.core.shader_image_load_store.advanced-cast-cs
KHR-GLES31.core.shader_image_load_store.advanced-cast-fs
KHR-GLES31.core.shader_storage_buffer_object.advanced-switchBuffers-cs
KHR-GLES31.core.shader_storage_buffer_object.advanced-switchPrograms-cs
KHR-GLES31.core.shader_storage_buffer_object.basic-operations-case1-cs

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 08:38:19 +02:00
Iago Toral Quiroga
3d65d2a488 v3d: refactor ntq_emit_tmu_general() slightly
When we implement write masks on store operations we might need to
emit multiple write sequences for a given store intrinsic. To make
that easier, let's split the emission of the tmud instructions to
their own block after we are done with the code that only needs to
run once no matter how many write sequences we need to emit.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 08:38:19 +02:00
Iago Toral Quiroga
b594796f1b v3d: do not automatically flush current job for SSBOs and shader images
If the current job has a sequence of draw calls involving SSBOs and/or
shader images, we would flush the job in between each draw call.
With this change, we won't flush the current job and we rely on the
application inserting correct barriers by issuing glMemoryBarrier()
when needed.

v2 (Eric):
 - When mapping a buffer for writing, we always need to flush.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 08:25:15 +02:00
Iago Toral Quiroga
f1cf1153e8 v3d: only process glMemoryBarrier() for SSBOs and images
PIPE_BARRIER_UPDATE is defined as:
PIPE_BARRIER_UPDATE_BUFFER | PIPE_BARRIER_UPDATE_TEXTURE

Which means we were flushing for any flags other than these two, but
this was intended to only flush for ssbos and images.

Actually, the driver automatically flushes jobs as we need, including
writes/reads involving SSBOs and images, so we don't really need to
flush anything when the program emits a barrier. However, this may
lead to excessive flushing in some cases, so we will soon change this
to avoid atutomatic flushing of the current job for SSBOs and images,
meaning that we will rely on the application to emit correct memory
barriers for these that we should make sure to process here.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 08:25:15 +02:00
Iago Toral Quiroga
f1559ca922 v3d: fix flushing of SSBOs and shader images
If the current draw call includes SSBOs, then we must flush any jobs
that are writing to the same SSBOs (so that our SSBOs reads are correct),
as well as jobs reading from the same SSBO (so that our SSBO writes don't
stomp previous SSBO reads).

The exact same logic applies to shader images. In this case we were already
flushing previous writes, but we should also flush previous reads.

Note that We don't need to call v3d_flush_jobs_reading_resource() and
v3d_flush_jobs_writing_resource() separately though, since flushing
jobs that read a resource also flushes those writing to it.

Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 08:25:15 +02:00
Caio Marcelo de Oliveira Filho
1021abab07 intel/tools: Fix aub_file initialization in intel_dump_gpu
The `device` can be set earlier either by a command line or a by
intercepting an ioctl call to get the I915_PARAM_CHIPSET_ID done by
the application early.  In both cases `aub_file` and `devinfo` would
not be initialized.

Fix by splitting the conditions

- `device == 0`: use the FD to get both device and devinfo.
- Or `devinfo.gen == 0`: use `device` to initialize it.

And separatedly, initialize aub_file the first time it is needed.

Fixes: d594d2a052 ("intel/tools: use device info initializer")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 19:18:26 -07:00
Rafael Antognolli
f0d29238df i965/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.

v2: Set Mask field to 0xffff for workaround (Ken).
2019-08-12 16:19:08 -07:00
Rafael Antognolli
7bc022b4bb anv/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.

v2: Don't need to set the mask - it's mbo (Ken).
2019-08-12 16:19:08 -07:00
Rafael Antognolli
a1a499e7fe iris/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.

v2: Don't need to set the mask - it's mbo (Ken).
v3: Don't keep a reference to the resource used for emitting the table
(Ken).
2019-08-12 16:19:08 -07:00
Rafael Antognolli
ad513fd386 intel: Get information about pixel pipes subslices.
v2: Use 1 instead of 1UL (Ken).
2019-08-12 16:19:08 -07:00
Rafael Antognolli
32344dc581 intel/gen_decoder: Decode SLICE_HASH_TABLE. 2019-08-12 16:19:08 -07:00
Rafael Antognolli
e1cb71c182 intel/genxml: Update 3D_MODE and add SLICE_HASH_TABLE.
Add these fields and the 3DSTATE_SLICE_TABLE_STATE_POINTERS instruction
so we can properly configure the slice and subslice hashing on ICL+

v2: Make 'Mask' field a mbo (Ken).
2019-08-12 16:19:08 -07:00
Jason Ekstrand
d787a2d05e anv: Implement VK_KHR_pipeline_executable_properties
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
67cb55ad11 anv: Add a ralloc context to anv_pipeline
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
fec4bdff40 anv: Force a full re-compile when CAPTURE_INTERNAL_REPRESENTATION_TEXT is set
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
651fbbf9b8 anv/pipeline: Split setting up per-stage keys into its own loop
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
78f3dfb4a2 anv: Record shader compile stats in the pipeline cache
We're going to want these to be available regardless of caching.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
2af380d20f anv/pipeline: Stash generated code in the pipeline stage
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
8d3cbd0393 intel/fs: Add SLM size to brw_cs_prog_data
We don't need it for state setup but it's a useful statistic we want to
pass on to developers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
134607760a intel/compiler: Fill a compiler statistics struct
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct.  This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Khaled Emara
2720ad5fd9 freedreno: disable tiling for cubemaps
Tiling doesn't work quite well with cubemaps.
Revert to linear textures, until it's fixed.
2019-08-12 22:30:54 +00:00
Khaled Emara
0ae16fb565 freedreno: add tiling parameters for 2D/2DArray/3D 2019-08-12 22:30:54 +00:00
Khaled Emara
aeaba3e4a6 freedreno: simplified slices setup for a3xx
a3xx doesn't support ASTC and layout_first always returns false
2019-08-12 22:30:54 +00:00
Khaled Emara
e11a239e8c freedreno: enable tiled textures for debug builds 2019-08-12 22:30:54 +00:00
Paulo Zanoni
866bb775de intel/fs: add 64 bit integer multiplication lowering
While NIR's lower_imul64() solves the case of 64 bit integer multiplications
generated early, we don't have a way to lower such instructions when they are
generated by our own backend, such as the scan/reduce intrinsics. We'll need
this soon, so implement it now.

An easy way to test this is to simply disable nir_lower_imul64 to let
those operations reach the backend.

v2:
  - Fix Q/UQ copy/paste errors (Caio).
  - Transform an 'if' into 'else if' (Caio).
  - Add an extra comment to clarify the need for 64b = 32b * 32b
    (Caio).
  - Make private functions private (Caio).
v3:
  - Remove ambiguity with 'b' and 'd' variables (Caio).
  - Allocate potentially less regs for the dwords (Caio).

Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Matt Turner <matt.turner@intel.com>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Paulo Zanoni
9217cf3b5e intel/compiler: invert the logic of lower_integer_multiplication()
Invert the logic of how progress is handled: remove the continue
statements and mark progress inside the places where it actually
happens.

We're going to add a new lowering that also looks for BRW_OPCODE_MUL,
so inverting the logic here makes the resulting code much easier to
follow.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Paulo Zanoni
6ba4717924 intel/compiler: don't instantiate a builder for each instruction
Don't instantiate a builder for each instruction during
lower_integer_multiplication(). Instantiate one only when needed.

On the other hand, these unneeded builders don't seem to cost much to
init, so I don't expect any significant difference in performance:
this is mostly about code organization.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Paulo Zanoni
75b3868dcc intel/compiler: extract subfunctions of lower_integer_multiplication()
The lower_integer_multiplication() function is already a little too
big. I want to add more to it, so let's reorganize the existing code
first. Let's start with just extracting the current code to
subfunctions. Later we'll change them a little more.

v2: Make private functions private (Caio).
v3: Fix typo (Caio).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Rhys Perry
7740149852 nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Rhys Perry
da8ed68aca nir: replace nir_move_load_const() with nir_opt_sink()
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).

v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Francisco Jerez
c2fe7a0fb8 anv/gen9: Optimize slice and subslice load balancing behavior.
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.  According to Jason, improves Aztec Ruins
performance by 2.7%.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)

v2: Undo CPU performance micro-optimization done in i965 and iris due
    to lack of data justifying it on anv.  Use
    cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe
    control command directly.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-12 14:40:21 -07:00
Andreas Baierl
1c45541c7f lima/ppir: Add fddx and fddy
Lower fddx and fddy and set the right bits in codegen.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
2019-08-12 23:20:04 +02:00
Bas Nieuwenhuizen
f1da129220 radv: Enable VK_KHR_pipeline_executable_properties.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
afad67cd7a radv: Implement radv_GetPipelineExecutableStatisticsKHR.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
35302f0189 radv: Implement radv_GetPipelineExecutableInternalRepresentationsKHR.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
86864eedd2 radv: Implement radv_GetPipelineExecutablePropertiesKHR.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
8874af8ef4 radv: Keep shader info when needed.
This allows enabling the shader info keeping on a per shader basis.
Also disables the cache on a per shader basis.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
e8a256eb54 radv: Add VK_KHR_pipeline_executable_properties in disabled state.
So we can add the functions.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
5444d3e0c2 radv: Use string for nir dumping.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Allows us to easily dump all nir shaders for combined variants in
vega and simplifies ownership.
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
739a2880f5 radv: Get max workgroup size without nir.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen
290ca0c4dd radv: Add utility function to calculate max waves.
Not AC because a lot of it is data extraction out of radv structs.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Francisco Jerez
026773397b iris/gen9: Optimize slice and subslice load balancing behavior.
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 13:17:58 -07:00
Francisco Jerez
03cba9f5d9 intel/genxml: Add GT_MODE hashing defs for Gen9.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 13:17:58 -07:00
Francisco Jerez
9406b3a5c1 i965/gen9: Optimize slice and subslice load balancing behavior.
The default pixel hashing mode settings used for slice and subslice
load balancing are far from optimal under certain conditions (see the
comments below for the gory details).  The top-of-the-line GT4 parts
suffer from a particularly severe performance problem currently due to
a subslice load balancing issue.  Fixing this seems to improve
graphics performance across the board for most of the benchmarks in my
test set, up to ~20% in some cases, e.g. from SKL GT4:

unigine/valley:                3.44% ±0.11%
gfxbench/gl_manhattan31:       3.99% ±0.13%
gputest/pixmark_piano:         7.95% ±0.33%
synmark/OglTexFilterAniso:    15.22% ±0.07%
synmark/OglTexMem128:         22.26% ±0.06%

Lower-end platforms are also affected by some subslice load imbalance
to a lesser degree, especially during CCS resolve and fast clear
operations, which are handled specially here due to rasterization
ocurring in reduced CCS coordinates, which changes the semantics of
the pixel hashing mode settings.

No regressions seen during my tests on some SKL, KBL and BXT
configurations.  Additional benchmark reports welcome on any Gen9
platforms (that includes anything with Skylake, Broxton, Kabylake,
Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your
renderer string).

P.S.: A similar problem is likely to be present on other non-Gen9
      platforms, especially for CCS resolve and fast clear operations.
      Will follow-up with additional patches fixing the hashing mode
      for those once I have enough performance data to justify it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 13:17:58 -07:00
Alyssa Rosenzweig
b1965831e4 pan/midgard: Handle 64-bit address in mir_mask_of_read_components
This is a bit of a hack, but it'll hold us over until we have 64-bit
support wired through.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig
41e68094f8 pan/midgard: Allocate separate spill indices for lowered moves
This helps RA be slightly more reasonable.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig
14b5b9ac38 pan/midgard: Extend liveness analysis to trinary ops
Fixes RA fails with multiple indirect SSBO writes.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig
c690b37d76 pan/midgard: Fix load/store pairing
This used a delicate hack to try to find indirect inputs and skip them
as candidates for pairing. Let's use a better criterion -- no sources --
and pair based on that.

We could do better, but that would require more complex data flow
analysis than we're interested in doing here.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig
15954ab6ca pan/midgard: Implement nir_intrinsic_load_num_work_groups
Just a sysval to route through.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig
7229af794b pan/midgard: Implement some compute builtins
We implement gl_WorkGroupID and gl_LocalInvocationID, which map to
ld_compute_id with special sources.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig
2b4e579585 pan/midgard: Rename ld_global_id -> ld_compute_id
It's used for more general loads within a compute shader.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig
a5059f2cba pan/midgard: Handle partial writes in liveness analysis
This allows liveness analysis within a loop to be more fine grained,
fixing RA failures with partial spilled movs within a loop, as well as
enabling a slight reduction of register pressure more generally:

total registers in shared programs: 350 -> 347 (-0.86%)
registers in affected programs: 12 -> 9 (-25.00%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
e333bf606f pan/midgard: Dump "no spill"?
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
cc3df917d3 pan/midgard: Absorb nonexistance sources
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
0a7cc239bd pan/midgard: Pretty-print destinations
They're not "sources" but they follow the same conventions.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
ba8ec19a64 pan/midgard: Pretty-print units
Since we are seeing some use of MIR post-scheduling, let's get this
printed right.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
73f54f286a pan/midgard: Print mask in dumped MIR
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
2ec4f9a74b pan/midgard: Add no_spill flag
Hint for the RA to avoid infinite spilling loops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
7090971f2f pan/midgard: Generalize mir_mask_of_read_components
This now works for load/store and texture instructions as well as ALU.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
419ddd63b0 pan/midgard: Implement SSBO access
Just laying the groundwork. Reads and writes should be supported (both
direct and indirect, either int or float, vec1/2/3/4), but no bounds
checking is done at the moment.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig
a8639b91b5 pan/midgard: Pipe uniform mask through when spilling
This is a corner case that happens a lot with SSBOs. Basically, if we
only read a few components of a uniform, we need to only spill a few
components or otherwise we try to spill what we spilled and RA hangs.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:43:00 -07:00
Alyssa Rosenzweig
63e240dd05 pan/midgard: Clamp sysval component count
We don't want to load a 128-bit sysval when 64-bits will do. Fixes RA
failures with SSBO indirect writes.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig
e7ac46be7a pan/midgard: Pass uploaded midgard_instruction through
We want to edit it after emission in some cases.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig
fa68740187 pan/midgard: Allow sysval destination override
Sometimes a sysval is used to facilitate an instruction but is not the
instruction itself.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig
60d80157d1 panfrost: Force flush every compute job
This is of course suboptimal for performance, forcing each
glDispatchCompute call to be submitted separately to the kernel and
finish to completion. However, for the initial bring-up of compute jobs,
this simplifies quite a bit.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig
2efa025b05 panfrost: Add SSBO system value
For each SSBO index we get from Gallium/NIR, we need two pieces of
information in the shader:

1. The address of the SSBO in GPU memory. Within the shader, we'll be
accessing it with raw memory load/store, so we need the actual address,
not just an index.

2. The size of the SSBO. This is not strictly necessary, but at some
point, we may like to do bounds checking on SSBO accesses.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig
e881aa8c12 gallium/util: Add u_stream_outputs_for_vertices helper
This u_prim.h helper determines the number of outputs for stream output,
given a particular primitive type and a vertex count. This is useful for
statically calculating sizes of stream output buffers (i.e. when there
is no geometry/tessellation shader in use).

This helper will be used in Panfrost's transform feedback
implementation, as you can probably guess since why else would I be
submitting it....

See also dEQP's getTransformFeedbackOutputCount routine.

v2: Simplify definition using new helpers, which also extends to non-ES2
primitive types (Eric).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 12:22:54 -07:00
Marek Olšák
8ce4f9bbc3 radeonsi: remove the always_nir option
tgsi_to_nir is no longer optional if NIR is enabled.
2019-08-12 14:52:17 -04:00
Marek Olšák
4e545f934f radeonsi/nir: implement default tess level system values
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
9c7746ceae compiler: add SYSTEM_VALUE_TESS_LEVEL_OUTER/INNER_DEFAULT
TCS system values for internal passthru TCS, needed by radeonsi NIR support

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
5167ca27fa gallium: add TGSI_SEMANTIC_DEFAULT_OUTER/INNER_LEVEL
for radeonsi NIR support.
2019-08-12 14:52:17 -04:00
Marek Olšák
f8d4198998 tgsi_to_nir: handle tess level inner/outer varyings
for internal radeonsi shaders

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
8ac2583cd8 tgsi_to_nir: add support for the stencil FS output
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
f3f1d0dfd0 tgsi_to_nir: add support for TEX_LZ
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 14:52:17 -04:00
Marek Olšák
1b881852bc compiler: add SYSTEM_VALUE_USER_DATA_AMD
for internal radeonsi shaders
2019-08-12 14:52:17 -04:00
Marek Olšák
f0ccc5457a compiler: add shader_info.cs.user_data_components_amd 2019-08-12 14:52:17 -04:00
Marek Olšák
155789c8e7 tgsi_to_nir: add basic compute shader support
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
5a0adfd9f0 tgsi_to_nir: add support for LOAD & STORE with SSBOs and images
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
0b121cb89a tgsi_to_nir: make setup_texture_info reusable
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 14:52:17 -04:00
Marek Olšák
70fd85172b tgsi_to_nir: add support for TXF_LZ
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
028dbd35ba compiler: add shader_info.vs.blit_sgprs_amd
for internal radeonsi shaders
2019-08-12 14:52:17 -04:00
Marek Olšák
e300365197 tgsi_to_nir: be careful about not losing any TGSI properties silently (v2)
v2: squash with Timur Kristof's commit

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
8b6814211a tgsi/scan: don't set GS_INVOCATIONS for all shader stages
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
9fb2fd0b43 compiler: add ACCESS_STREAM_CACHE_POLICY
radeonsi will use this.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
902dd50cf0 gallium: add AMD-specific compute TGSI enums
for tgsi_to_nir
2019-08-12 14:52:17 -04:00
Marek Olšák
6a2bdb8d01 gallium: add TGSI_PROPERTY_VS_BLIT_SGPRS_AMD for tgsi_to_nir
needed by radeonsi NIR support
2019-08-12 14:52:17 -04:00
Marek Olšák
d1ad4fda31 st/mesa: don't allocate mipmapped texture for NEAREST_MIPMAP_LINEAR
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-08-12 14:52:17 -04:00
Kenneth Graunke
5180a222c0 glsl: Optimize the SoftFP64 shader when first creating it.
By optimizing the shader before inlining, we avoid having to redo this
work for each inlined copy of a function.  It should also reduce the
memory consumption a bit.

This cuts the KHR-GL46.arrays_of_arrays_gl.SubroutineFunctionCalls2
runtime by 25% on my Icelake.  That test compiles many shaders, which
contain large types (dmat4) and division (expensive operations).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-12 10:42:32 -07:00
Christian Gmeiner
914ecc9384 etnaviv: fix compile warnings in release build
[27/31] Compiling C object 'src/gallium/drivers/etnaviv/df32d18@@etnaviv@sta/etnaviv_compiler_nir.c.o'.
In file included from ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir.c:552:
../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir_emit.h: In function 'ra_assign':
../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir_emit.h:903:9: warning: unused variable 'ok' [-Wunused-variable]
    bool ok = ra_allocate(g);
         ^~
../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir.c: In function 'etna_compile_shader_nir':
../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir.c:663:9: warning: unused variable 'ok' [-Wunused-variable]
    bool ok = emit_shader(c->nir, &options, &v->num_temps, &num_consts);
         ^~

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-12 16:58:13 +00:00
Bas Nieuwenhuizen
e040c1b274 radv: Do not setup attachments without a framebuffer.
Test that found this: dEQP-VK.geometry.layered.1d_array.secondary_cmd_buffer

Fixes: 49e6c2fb78 "radv: Store color/depth surface info in attachment info instead of framebuffer."
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 17:19:24 +02:00
Jason Ekstrand
14c96a6300 anv: Implement VK_EXT_subgroup_size_control version 2
The version bump adds a proper features struct.

Fixes: d10de25309 "anv: Implement VK_EXT_subgroup_size_control"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-12 14:56:33 +00:00
Jason Ekstrand
8aef89cc2d vulkan: Update the XML and headers to 1.1.119
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 14:56:33 +00:00
Bas Nieuwenhuizen
d062bec48d radv: Hash Wave32 settings in shader key.
Can result in different shaders.

Fixes: 8a86908e9a "radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen
ba8d3c362b radv: Properly use Wave64 for non-NGG GS and copy shader.
Fixes: 8a86908e9a "radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen
035406ecf7 radv: Put wave size in shader options/info.
Instead of having the three values everywhere. This is also more
future proof if we want the driver to make those decisions eventually.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen
71621e877f relnotes: Make entries for radv more consistent.
Always use 'on' as for the rest of the drivers.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 13:29:27 +00:00
Bas Nieuwenhuizen
38961729a8 relnotes: Add new exts on radv for 19.2.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 13:29:27 +00:00
Tapani Pälli
d4b574f26a iris: reorder arguments as expected by the function
CID: 1452262
Fixes: b4c54894bb "iris: Handle vertex shader with window space position"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
2019-08-12 13:08:26 +03:00
Tapani Pälli
590ba15d6e iris/android: move iris_query.c to 'per gen' LIBIRIS_SRC_FILES
Fixes Iris build on Android.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-08-12 10:06:36 +03:00
Kenneth Graunke
0f3768bc5d iris: Free query on error path
CID: 1452276
2019-08-11 14:04:31 -07:00
Kenneth Graunke
661be3fef9 iris: Add missing 'break'
We don't want to fall through to unreachable().

CID: 1452277
2019-08-11 14:04:31 -07:00
Caio Marcelo de Oliveira Filho
5ed4e31c08 spirv: Drop lower_workgroup_access_to_offsets
Intel drivers are not using this anymore, and turnip still don't have
Compute Shaders, so won't make a difference to stop using this option.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-08-10 22:15:35 -07:00
Caio Marcelo de Oliveira Filho
925e9142bd i965/spirv: Lower shared memory later
Instead of asking spirv_to_nir to lower the workgroup (shared memory)
to offsets, keep them as derefs longer, then lower it later on.

Because Workgroup memory doesn't have explicit offsets, we need to set
those using nir_lower_vars_to_explicit_types before calling the I/O
lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-10 22:15:35 -07:00
Danylo Piliaiev
61d6be84f3 i965: Use force_compat_profile driconf option
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-10 11:39:29 -07:00
Eric Engestrom
d7eb40962b i965: fix mem leak in error path
Fixes: 8ae6667992 ("intel/perf: move query_object into perf")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-08-10 12:14:56 +01:00
Eric Engestrom
1c82fa0a92 gitlab-ci: simplify $CROSS option
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-08-10 12:11:28 +01:00
Kenneth Graunke
f1dba99639 iris: minor restyling 2019-08-10 00:16:45 -07:00
Mark Janes
9c597514d4 iris/query: enable amd performance monitors
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-09 19:28:34 -07:00
Mark Janes
469af7fdc9 iris/perf: get monitor results
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-09 19:28:32 -07:00
Mark Janes
1cb4fc184f iris/perf: add begin/end hooks
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-09 19:28:24 -07:00
Mark Janes
8c4c346665 iris/perf: add delete query
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-09 19:28:17 -07:00
Mark Janes
aca42759ff iris/perf: implement iris_create_monitor_object
This is the first call that provides the iris context to the monitor
implementation.  On the first call, use the iris context to initialize
the monitor context.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-09 19:28:14 -07:00
Mark Janes
0fd4359733 iris/perf: implement routines to return counter info
With this commit, Iris will report that AMD_performance_monitor is
supported, and will allow the caller to query the available metrics.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-09 19:28:03 -07:00
Eric Engestrom
e4aa0fc63a anv: add missing break
Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-09 23:34:31 +01:00
Lionel Landwerlin
e2d761de03 util: drop final reference to p_compiler.h
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:59:43 +03:00
Lionel Landwerlin
85bf1dc2de util: os_misc: drop p_compiler.h include
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:59:43 +03:00
Lionel Landwerlin
c44c3948c7 util: u_math: drop p_compiler.h include
This file was moved from gallium so drop depending on gallium headers.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:59:43 +03:00
Lionel Landwerlin
8818db8f2c vc4: prepare for p_compiler.h dependency removal
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:59:43 +03:00
Lionel Landwerlin
8a884a25c5 amd: prepare dropping include of p_compiler.h
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:59:43 +03:00
Lionel Landwerlin
a233a3a74e mesa: be consistent on GL_TRUE/GL_FALSE & TRUE/FALSE
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:59:43 +03:00
Lionel Landwerlin
8f4dea20fc mesa: drop some p_compiler.h types
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:50:29 +03:00
Lionel Landwerlin
7abac7a8bf mesa: add stddef include in preparation for dropping p_compiler.h
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:50:17 +03:00
Lionel Landwerlin
6637395073 panfrost: prepare for p_compiler.h dependency removal
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:50:03 +03:00
Lionel Landwerlin
351c2ad157 i965: don't use p_compiler.h types
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 22:49:48 +03:00
Eric Engestrom
9be5ce1d73 gitlab-ci: generate meson cross-files earlier
Suggested-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-09 20:07:50 +01:00
Alyssa Rosenzweig
9bc99e60a8 panfrost: Assign varying buffers dynamically
Rather than hardcoding certain varying buffer indices "by convention",
work it out at draw time. This added flexibility is needed for
futureproofing and will be enable streamout.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig
46dae9ef58 panfrost: Assign indices at draw-time
This will allow us to shuffle buffers.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig
af6d3f7cb5 panfrost: Break out pan_varyings.c
This code is fairly self-contained, so let's factor it out of the giant
pan_context.c monster.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig
4dba493fd7 panfrost: Enable PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS
Just as easy/hard as the rest of XFB.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig
5ff7973560 panfrost: Import streamout data structures
Pretty much copypasted from v3d to jumpstart us.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig
c82672c9c1 pan/midgard: Account for swizzle/mask in st_vary
Register allocation for varying stores is a bit different, since the
instructions ignore the writemask (varyings are normalized
packed/vectorized..)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-09 11:50:45 -07:00
Alyssa Rosenzweig
5ad83015cd pan/decode: Resolve crash with NULL attr/varyings
This case needs more investigation, but this was found with geometry
shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-09 11:50:45 -07:00
Krzysztof Raszkowski
c0ab268f9c gallium/swr: Fix glClear when it's used with glEnable/glDisable GL_SCISSOR_TEST
When GL_SCISSOR_TEST is enabled glClear is handled by state tracker
and there is no need to do this in gallium driver.

Reviewed-by: Alok Hota alok.hota@intel.com
2019-08-09 18:56:13 +02:00
Gurchetan Singh
d6f8ce1c96 util: Revert "util: added missing headers in anon-file"
This reverts commit c73988300f.

Reason: Made a fix for this, then saw @eric's change
        ("util/anon_file: add missing"), but some sequence of events
        I don't really remember caused this to get merged. So revert ;-)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-09 09:13:45 -07:00
Marek Vasut
bb47bedc85 etnaviv: Remove etna_bo_from_handle() prototype
Remove etna_bo_from_handle() as there are no known users.

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2019-08-09 17:21:55 +02:00
Lionel Landwerlin
cefb4341b7 anv: drop unused code
We stopped using this when we moved to Jason's mi_builder.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-09 17:01:38 +03:00
Christian Gmeiner
889e752965 etnaviv: fix typo
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-09 13:08:20 +00:00
Christian Gmeiner
de5070ea8d etnaviv: add gpu_supports_texture_target(..)
Currently I am seeing a handful of the following debug message:
translate_texture_target:495: Unhandled texture target: 0

PIPE_BUFFER is not handled in translate_texture_target(..) which makes
sense as it is used to translate from PIPE_XXX to GPU specific value
during etna_create_sampler_view_state(..).

To fix this problem introduce gpu_supports_texture_target(..) which just
checks if the texture target is supported.

Fixes: dfe048058f ("etnaviv: support 3D and 2D array textures")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-08-09 13:08:20 +00:00
Jon Turney
0141b7c6b2 util: Cygwin has linux-style pthread_setname_np
Fixes: dcf9d91a ("util: Handle differences in pthread_setname_np")
2019-08-09 12:46:43 +00:00
Tapani Pälli
5e38db0c47 anv/android: disable shared representable image support explicitly
Android 9 loader conditionally advertises VK_KHR_shared_presentable_image
extension based on this property and it looks like it does not
initialize the struct before query.

Pragmas are added to ignore warnings with Android specific structure
types in same manner as commit 8d386e6eef  did.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-09 08:53:54 +03:00
Vasily Khoruzhick
39a90749af lima: introduce a struct describing texture descriptor
Use a struct with bitfields to construct texture descriptor
instead of poking bits in array of uint32_t. It improves code
readability and makes it easier to experiment with unknown fields.

Also fix mipmapping while we're at it - Utgard can have up to 13
levels, but 64 bytes is enough only for 10. Calculate descriptor
size dynamically to account extra levels if we need them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-08 19:17:20 -07:00
Vasily Khoruzhick
edf008c04e lima: add texel format table
Introduce a table for supported texel formats and use it to check
whether format is supported and for converting pipe format to lima
texel format.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-08 19:17:20 -07:00
Gurchetan Singh
c73988300f util: added missing headers in anon-file
Otherwise I get:

../src/util/anon_file.c: In function ‘create_tmpfile_cloexec’:
../src/util/anon_file.c:75:9: error: implicit declaration of function ‘mkostemp’
[-Werror=implicit-function-declaration]
    fd = mkostemp(tmpname, O_CLOEXEC);
         ^~~~~~~~

../src/util/anon_file.c:133:7: error: implicit declaration of function ‘asprintf’
[-Werror=implicit-function-declaration]
       asprintf(&name, "%s/mesa-shared-%s-XXXXXX", path, debug_name);
       ^~~~~~~~
../src/util/anon_file.c:141:4: error: implicit declaration of function ‘free’
[-Werror=implicit-function-declaration]
    free(name)

Fixes: c0376a ("util: add anon_file.h for all memfd/temp file usage")
2019-08-08 16:21:57 -07:00
Gurchetan Singh
42759dc986 virgl: check scanout mask
Otherwise, virgl will report renderable or texturable formats as
also scan-out formats.

v2: drop host feature check (@kusma)

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-08-08 16:21:57 -07:00
Gurchetan Singh
3da029ac1a virgl: fixup_readback_format --> fixup_formats
This function is generalizable.

Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-08-08 16:21:57 -07:00
Gurchetan Singh
bf0ca99ec7 virgl: access caps in a less verbose way in virgl_is_format_supported
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-08-08 16:21:57 -07:00
Alyssa Rosenzweig
5a898e2a65 pan/midgard: Disassemble load/store barrel shift
Arm assembly intensifies.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-08 15:49:12 -07:00
Eric Engestrom
525a917c6c util/anon_file: const string param
Fixes: c0376a1234 ("util: add anon_file.h for all memfd/temp file usage")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Eric Anholt <eric@anholt.net>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
2019-08-08 22:02:54 +01:00
Eric Engestrom
8a028b0df2 util/anon_file: drop unused #include
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Eric Anholt <eric@anholt.net>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
2019-08-08 22:02:54 +01:00
Eric Engestrom
60af7f5a81 util/anon_file: add missing #include
Fixes: c0376a1234 ("util: add anon_file.h for all memfd/temp file usage")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Eric Anholt <eric@anholt.net>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
2019-08-08 22:02:54 +01:00
Greg V
ac1561088d intel/perf: use MAJOR_IN_SYSMACROS/MAJOR_IN_MKDEV
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 134e750e16 ("i965: extract performance query metrics")
2019-08-08 21:44:33 +01:00
Greg V
0233372581 util: fix cpuset support on FreeBSD
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Greg V
c00ee00031 i965/tiled_memcpy: avoid creating bswap32 if it exists as a macro (e.g. on FreeBSD)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Greg V
7b520dc74f anv: add MAP_POPULATE fallback define for portability
FreeBSD does not have MAP_POPULATE

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Greg V
2be3f16600 anv: remove unused Linux-specific include
Fixes: 4201cc2dd3 ("anv: Implement VK_KHX_external_semaphore_fd")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Greg V
c0dc5c1859 meson: define ETIME to ETIMEDOUT if not present
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Roman Stratiienko
28061e0ab0 lima: Fix Android.mk
1. Update LOCAL_SRC_FILES according to commit
54434fe670 ("lima/gpir: Rework the scheduler").

2. Add libpanfrost_shared.a dependency.

3. Generate lima_nir_algebraic.c with Android.mk
Fixes Android build error introduced by commit 5adfc8602c
("lima/ppir: move sin/cos input scaling into NIR")

Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
2019-08-08 17:47:22 +00:00
Roman Stratiienko
26a01a6797 Add libpanfrost_shared to Android build
1. Add missing directory to ./Android.mk
2. Fix ./src/panfrost/Android.shared.mk

Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
2019-08-08 17:47:22 +00:00
Rhys Perry
c52c54a746 anv,i965,iris: deduplicate setting of total_shared
v5: add patch

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 12:10:39 -05:00
Rhys Perry
024a46a407 anv: use derefs for shared memory access
vkpipeline-db for my Skylake GPU:
total instructions in shared programs: 8847602 -> 8847896 (<.01%)
instructions in affected programs: 10165 -> 10459 (2.89%)
helped: 8
HURT: 2

total cycles in shared programs: 1606273555 -> 1606251634 (<.01%)
cycles in affected programs: 2201803 -> 2179882 (-1.00%)
helped: 7
HURT: 3

The shaders with more instructions is due to a loop over a shared array
in Three Kingdoms being unrolled (and creating a lot of nested ifs). Not sure
if that's good or bad.

One of the shaders with worse cycles is only worse by 0.04% and the other
two are the shaders with loops unrolled.

v2: add patch
v4: don't set spirv_options.shared_addr_format
v4: move comment concerning the shared address format used and NULL
v4: add vkpipeline-db results
v5: rename to nir_lower_vars_to_explicit_types
v5: move setting of total_shared to outside brw_compile_cs
v6: set shared_addr_format
v6: formatting changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 12:10:39 -05:00
Rhys Perry
fd73ed1bd7 nir: add nir_lower_to_explicit()
v2: use glsl_type_size_align_func
v2: move get_explicit_type() to glsl_types.cpp/nir_types.cpp
v2: use align() instead of util_align_npot()
v2: pack arrays a bit tighter
v2: rename mem_* to field_*
v2: don't attempt to handle when struct offsets are already set
v2: use column_type() instead of recreating it
v2: use a branch instead of |= in nir_lower_to_explicit_impl()
v2: assign locations to variables and update shared_size and num_shared
v2: allow the pass to be used with nir_var_{shader_temp,function_temp}
v4: rebase
v5: add TODO
v5: small formatting changes
v5: remove incorrect assert in get_explicit_type()
v5: rename to nir_lower_vars_to_explicit_types
v5: correctly update progress when only variables are updated
v5: rename get_explicit_type() to get_explicit_shared_type()
v5: add comment explaining how get_explicit_shared_type() is different
v5: update cast strides
v6: update progress when lowering nir_var_function_temp variables
v6: formatting changes
v6: add more detailed documentation comment for get_explicit_shared_type
v6: rename get_explicit_shared_type to get_explicit_type_for_size_align
v7: fix comment in nir_lower_vars_to_explicit_types_impl()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 12:10:39 -05:00
Rhys Perry
8bd2e138f5 nir/lower_explicit_io: add nir_var_mem_shared support
v2: require nir_address_format_32bit_offset instead
v3: don't call nir_intrinsic_set_access() for shared atomics

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 12:10:39 -05:00
Erik Faye-Lund
1e21bb4123 mesa: avoid warning on Windows
On Windows, p_atomic_inc_return returns an unsigned long long rather
than the type the pointer refers to, so let's make sure we cast the
result to the right type. Otherwise, we'll trigger a warning about
the wrong format-string for the type.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
2019-08-08 18:20:29 +02:00
Erik Faye-Lund
e0a740c633 mesa/main: cast away constness
This avoids a warning about implicitly casting away the constness of the
pointer.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
2019-08-08 18:20:29 +02:00
Erik Faye-Lund
75097114d9 spirv: fixup signature
This avoids a warning on some compiler, complaining about implicitly
casting the function-pointer.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: d482a8f "spirv: Update the OpenCL.std.h header"
Acked-by: Eric Engestrom <eric@engestrom.ch>
2019-08-08 18:20:29 +02:00
Lucas Stach
68c24b09c2 etnaviv: remember data offset into BO
Imported resources might not start at offset 0 into the buffer object.
Make sure to remember the offset that is provided with the handle on
import.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-08-08 16:11:34 +02:00
Danylo Piliaiev
b8842bc312 i965: Emit a dummy MEDIA_VFE_STATE before switching from GPGPU to 3D
There is an object-level  preemption workaround which requires this.
However, even without object-level preemption, we seem to have issues
with geometry flickering when 3D and compute are combined in the same
batch and this appears to fix it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110395
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2019-08-08 13:39:15 +00:00
Bas Nieuwenhuizen
23a9d20997 radv: Avoid VEGA/RAVEN scissor bug in binning.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-08 14:08:21 +02:00
Bas Nieuwenhuizen
4a3f987afd radv: Avoid binning RAVEN hangs.
Mirroring radeonsi.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-08 14:08:21 +02:00
Bas Nieuwenhuizen
66ecc3eac8 radv: Fix off by one for S_028C48_MAX_ALLOC_COUNT.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-08 14:08:21 +02:00
Jan Zielinski
207026d29e swr/rasterizer: modernize thread TLB
Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-08-08 12:33:21 +02:00
Jan Zielinski
387599a661 swr/rasterizer: Refactor events collection mechanism
Several improvements and cleanups in events and statstics mechanisms

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-08-08 11:15:07 +02:00
Jan Zielinski
ff75c35846 swr/rasterizer: improvements in simdlib
1. fix build issues with MSVC 2019 compiler

The MSVC 2019 compiler seems to have an issue with optimized code-gen
when using the _mm256_and_si256() intrinsic.
Only disable use of integer vpand on buggy versions MSVC 2019.
Otherwise allow use of integer vpand intrinsic.

2. Remove unused vec/matrix functionality

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-08-08 10:53:47 +02:00
Jan Zielinski
b55a93fdd4 swr/rasterizer: Events are now grouped and enabled by knobs
All events are now grouped as follows:

-Framework (i.e. ThreadStart) [always ON]
-Api (i.e. SwrSync) [always ON]
-Pipeline [default ON]
-Shader [default ON]
-SWTag [default OFF]
-Memory [default OFF]

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-08-08 10:33:25 +02:00
Jan Zielinski
982d99490f swr/rasterizer: do not mark tiles dirty until actually rendered
Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-08-08 10:16:20 +02:00
Jan Zielinski
4f04f260d9 swr/rasterizer: enable size accumulation in mem stats
Small refactoring is also performed

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-08-08 10:16:20 +02:00
Jan Zielinski
365ad367f1 swr/rasterizer: enable using AOS vertex data format
Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-08-08 10:16:20 +02:00
Iago Toral Quiroga
fb9f7872e7 v3d: handle wait requirement when retrieving query results correctly
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08 08:36:52 +02:00
Iago Toral Quiroga
0f2d1dfe65 v3d: use the GPU to record primitives written to transform feedback
We can use the PRIMITIVE_COUNTS_FEEDBACK packet to write various primitive
counts to a buffer, including the number of primives written to transform
feedback buffers, which will handle buffer overflow correctly.

There are a couple of caveats with this:

Primitive counters are reset when we emit a 'Tile Binning Mode Configuration'
packet, which can happen in the middle of a primitives query, so we need to
read the buffer when we submit a job and accumulate the counts in the context
so we don't lose them.

We also need to do the same when we switch primitive type during transform
feedback so we can compute the correct number of recorded vertices from
the number of primitives. This is necessary so we can provide an accurate
vertex count for draw from transform feedback.

v2:
 - When computing the number of vertices for a primitive, pass in the base
   primitive, since that is what the hardware will count.
 - No need to update primitive counts when switching primitive types if
   the base primitives are the same.
 - Log perf warning when mapping the primitive counts BO for readback (Eric).
 - Only emit the primitive counts packet once at job end (Eric).
 - Use u_upload mechanism for the primitive counts buffer (Eric).
 - Use the XML to generate indices into the primitive counters buffer (Eric).

Fixes piglit tests:
spec/ext_transform_feedback/overflow-edge-cases
spec/ext_transform_feedback/query-primitives_written-bufferrange
spec/ext_transform_feedback/query-primitives_written-bufferrange-discard
spec/ext_transform_feedback/change-size base-shrink
spec/ext_transform_feedback/change-size base-grow
spec/ext_transform_feedback/change-size offset-shrink
spec/ext_transform_feedback/change-size offset-grow
spec/ext_transform_feedback/change-size range-shrink
spec/ext_transform_feedback/change-size range-grow
spec/ext_transform_feedback/intervening-read prims-written

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08 08:36:52 +02:00
Iago Toral Quiroga
cf8986bce0 gallium/util: add a helper to compute vertex count from primitive count
v2:
  - Only compute vertex counts for base primitives.
  - Add a unit test (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08 08:36:52 +02:00
Iago Toral Quiroga
9eb8699e0f v3d: be more explicit about the query types supported
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08 08:36:52 +02:00
Iago Toral Quiroga
9b316ab57a v3d: generate packet unpack functions
These were not being compiled because of the lack of __gen_unpack_address.

v2:
 - Shift raw address correctly (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08 08:36:52 +02:00
Iago Toral Quiroga
5ffb8b1716 v3d: add header guards in v3d_packet_helpers.h
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-08 08:36:52 +02:00
Tomeu Vizoso
e7eac8a1e8 panfrost: Print errors from kernel
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-08 07:42:52 +02:00
Tomeu Vizoso
7c8434889d panfrost: Mark buffers as PANFROST_BO_HEAP
What we call GROWABLE in Mesa corresponds to the HEAP BO flag in the
kernel. These buffers cannot be memory mapped in the CPU side at the
moment, so make sure they are also marked INVISIBLE.

This allows us to allocate a big heap upfront (16MB) without actually
reserving space unless it's needed.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-08 07:42:52 +02:00
Tomeu Vizoso
19afd41e65 panfrost: Mark BOs as NOEXEC
Unless a BO has the EXECUTABLE flag, mark it as NOEXEC.

v2: - Rework version detection (Alyssa).

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-08 07:42:52 +02:00
Tomeu Vizoso
9398932c2d panfrost: Take into account flags when looking up in the BO cache
This will be useful right now so we avoid retrieving a non-executable
buffer when a executable one is needed.

As we support more flags, this logic will need to be extended to
consider the different trade-offs to be made when matching BO
specifications to BOs in the cache.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-08 07:42:52 +02:00
Tomeu Vizoso
950b5fc596 panfrost: Allocate shaders in their own BOs
Instead of all shaders being stored in a single BO, have each shader in
its own.

This removes the need for a 16MB allocation per context, and allows us
to place transient blend shaders in BOs marked as executable (before
they were allocated in the transient pool, which shouldn't be
executable).

v2: - Store compiled blend shaders in a malloc'ed buffer, to avoid
      reading from GPU-accessible memory when patching (Alyssa).
    - Free struct panfrost_blend_shader (Alyssa).
    - Give the job a reference to regular shaders when emitting
      (Alyssa).

v3: - Split out the allocation flags change (Rob).

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-08 07:42:52 +02:00
Tomeu Vizoso
5804d75b9c util/hash_table: Fix hashing in clears on 32-bit
Some hash functions (eg. key_u64_hash) will attempt to dereference the
key, causing an invalid access when passed DELETED_KEY_VALUE (0x1) or
FREED_KEY_VALUE (0x0).

When in 32-bit arch a 64-bit key value doesn't fit into a pointer, so
hash_table_u64 internally use a pointer to a struct containing the
64-bit key value.

Fix _mesa_hash_table_u64_clear() to handle the 32-bit case by creating a
temporary hash_key_u64 to pass to the hash function.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-08-08 07:42:52 +02:00
Tapani Pälli
aba57b11ee anv: support GetSwapchainGrallocUsage2ANDROID for Android
New function supports gralloc1 usage flags that get set separately
for producer and consumer. As we still need to support old method too,
let's share common code and use android_convertGralloc0To1Usage helper.
Bump the VK_ANDROID_native_buffer version to indicate support for the
new call.

Changes were tested on Android Celadon P with Basemark GPU and various
Sascha Willems Vulkan demos.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 05:08:01 +00:00
Mark Janes
51c3ab618b st/mesa: eliminate unnecessary redirection
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
61c54a8878 intel/perf: fix debug typo
Misspelling was seen with INTEL_DEBUG=perfmon.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
2df1ab4d48 intel/perf: make gen_perf_query_object private
Encapsulate the details of this structure within the perf implemenation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
deea3798b6 intel/perf: make perf context private
Encapsulate the details of this data structure.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
1f4f421ce0 intel/perf: print debug information
INTEL_DEBUG=perfmon will iterate over the perf queries, printing
information about the state of each query.  Some of this information
will be private to intel/perf, and needs to a dump routine that can be
called from i965.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
a663c8c26e intel/perf: make internal methods private
Now that all references from i965 have been moved to perf, we can make
internal methods private again.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
be8b466cff intel/perf: make oa_sample_buffers private
All references to this data structure have been moved inside the perf
subsystem.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
f2a049b4e3 intel/perf: expose method to create query
By encapsulating this implementation within perf, we can eventually
make struct gen_perf_ctx private.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
9f5c160d82 intel/perf: move initialization of pipeline statistics metrics to gen_perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
9f84efb452 intel/perf: move get_query_data into gen_perf
This refactor moves several helper functions for get_query_data as
well:

 - accumulate_oa_reports
 - read_gt_frequency
 - get_pipeline_stats_data
 - get_oa_counter_data

Functions which are no longer referenced in brw_performance_query.c
have been removed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
73eccdc4a5 intel/perf: move delete_query to gen_perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
8c9eac1234 intel/perf: move is_query_ready to gen_perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
a9be292722 intel/perf: move wait_query to perf
The following methods have duplicate implementation of read_oa_samples_until in
brw_performance_query.c:

 - read_oa_samples_for_query
 - read_oa_samples_until

They ar still referenced by other methods in the file and will be
removed on the subsequent commit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
3c8ed58486 intel/perf: create a vtable entry for bo_busy
Iris and i965 variants of this method need to be called by perf
routines.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
6fed756388 intel/perf: create a vtable entry for bo_wait_rendering
Iris and i965 variants of this method need to be called by perf
routines.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
511bb15d4b intel/perf: create a vtable entry for batch_references
Iris and i965 variants of this method need to be called by perf
routines.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
3ecb23092e intel/perf: refactor gen_perf_end_query into gen_perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
018f9b81e5 intel/perf: refactor gen_perf_begin_query into gen_perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
52d3db9ab6 intel/perf: move perf-related state into gen_perf_context
To move more operations into intel/perf, several state items are
needed.  Save references to that state in the perf_ctxt, rather than
passing them in for every operation.

This commit includes an initializer for gen_perf_context, to set those
references and also encapsulate the initialization of the sample
buffer state.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
df18acee78 intel/perf: create a vtable entries for buffer object map/unmap
These operations are needed to refactor subsequent methods into perf

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
a330d759c5 intel/perf: move client reference counts into perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
4d0d4aa1b5 intel/perf: move open_perf into perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
79ded7cc8f intel/perf: move close_perf into perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
f57c8a6dc1 intel/perf: create a vtable entry for emit_mi_flush
This method is needed to move subsequent methods into perf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
52f7a0bff7 intel/perf: use temporary pointers to simplify access to perf state
Most accesses to perf state were made through repeated dereferences of
brw_context members.  Prefering temporary variables of perf_ctx and
perf_cfg has the following advantages:

 - more concise implementation
 - easier refactor when moving subsequent methods to perf

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
a157f5acb1 intel/perf: move snapshot_statistics_registers into perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
8ae6667992 intel/perf: move query_object into perf
Query objects can now be encapsulated within the perf subsystem.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
7e890ed476 intel/perf: create a vtable entry for store_register_mem64
This method is needed to move subsequent methods into perf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
4b2c885207 intel/perf: move free_sample_bufs into perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
2f712d21b9 intel/perf: move reap_old_sample_buffers into perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
31758bd36c intel/perf: move get_free_sample_buf into perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
e08a69b7f4 intel/perf: move the perf context into perf
The "context" that is necessary to submit and process perf commands to
the hardware was previously present in the brw_context.perfquery
struct.  This commit moves it into perf and provides a more
understandable name.

The intention is for this struct to be private, when all methods that
access it are migrated into perf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
fb622054f7 intel/perf: move get_metric_id to perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
b14e15e26a intel/perf: move oa_sample_buf structure to perf
oa_sample_buf holds the data provided by the kernel that will be
collated into performance metrics.  Since this functionality will be
implemented in perf, the struct needs to be defined there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
e091f33990 intel/perf: enumerate query-based metrics in perf
Iris and i965 both need to enumerate the available metrics, so these
routines must be located in perf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
2446f5cfd8 intel/perf: move perf-related constants to common location
The perf subsystem needs several macro definitions that were
duplicated in Iris and i965 headers.  Place these macros within perf,
if the perf implementation contains the only references to the values.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
67675a5802 intel/perf: create a vtable entry for capture_frequency_stat_register
In preparation for calling both Iris and i965 implementions from perf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
ae3fac851d intel/perf: create a vtable entry for batchbuffer_flush
In preparation for calling both Iris and i965 implementions from perf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
a921b215dd intel/perf: create a vtable entry for emit_report_count
In preparation for calling both Iris and i965 implementions from perf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
9a2a2e8bea intel/perf: create a vtable entry for bo_unreference
In preparation for calling both Iris and i965 implementions from perf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
439d5a3eff intel/perf: create a vtable for low-level driver functions
Performance metrics collections requires several actions (eg bo_map())
that have different implementations for Iris and i965.  The perf
subsystem needs a vtable for each of these actions, so it can invoke
the corresponding implementation for each driver.

The first call to be added to the table is bo_alloc.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
ea66484e86 intel/perf: use common ioctl wrapper
There were multiple ioctl-wrapper functions, so a common
implementation was put in gen_gem.h.   With a common implementation,
perf no longer needs the caller to configure one for it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Mark Janes
07d3bd5c46 intel/perf: rename gen_perf to gen_perf_config
This structure contains the configurations of the metrics for the
current platform, and the settings needed for the perf subsystem to
query that configuration from the device.  This data is available
without a rendering context, and needed to support MDAPI metrics for
Vulkan.

A gen_perf_context struct will be added later, which holds additional
state from the rendering context necessary for metric data
collection.  The gen_perf struct needs a more precise name to reduce
confusion.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:55 -07:00
Ilia Mirkin
9ff8da0e50 nvc0: fix program dumping, use _debug_printf
This debug situation is unforunate. debug_printf only does something
with DEBUG set, but in practice all that needs to be moved to !NDEBUG.
For now, use _debug_printf which always prints. However the whole
function is guarded by !NDEBUG.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-08-07 22:32:02 -04:00
Ilia Mirkin
f6af104340 nvc0: add support for ATOMC_WRAP TGSI operations
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-08-07 22:32:02 -04:00
Ilia Mirkin
a2bb7b26a1 gallium: redefine ATOMINC_WRAP to be more hardware-friendly
Both AMD and NVIDIA hardware define it this way. Instead of replicating
the logic everywhere, just fix it up in one place.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-07 22:31:56 -04:00
Ilia Mirkin
582c86346d st/mesa: relax EXT_shader_image_load_store enable
There's no reason to bring format-less load requirement into this
extension. It requires a size to be provided, and a compatible format is
computed from the size + data type. For example

  layout(size1x32) uniform iimage1D image;

becomes

  DCL IMAGE[0], 1D, PIPE_FORMAT_R32_SINT, WR

whereas PIPE_CAP_IMAGE_LOAD_FORMATTED is designed to allow
PIPE_FORMAT_NONE to be provided as a format and still enable LOAD
operations to be performed.

So the shader has all the information it needs about the format.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-07 22:31:38 -04:00
Mark Janes
a29bc3a3ad i965/perf: restore mdapi statistics query metrics
Registration of mdapi metrics based on statistics query registers was
inadvertently removed in the commit that checks for OA kernel support.

The statistics queries are not dependent on OA.

Fixes: 96e1c945f2 ("i965: Move device info initialization to common code")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-07 17:20:04 -07:00
Greg V
c0376a1234 util: add anon_file.h for all memfd/temp file usage
Move the Weston os_create_anonymous_file code from egl/wayland into util,
add support for Linux memfd and FreeBSD SHM_ANON,
use that code in anv/aubinator instead of explicit memfd calls for portability.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-07 22:57:55 +00:00
Pierre-Eric Pelloux-Prayer
519bebdb40 radeonsi: limit DPBB context_states_per_bin batches when using gfx9 workaround
It seems that using 'context_states_per_bin = 1' for DPBB fixes the reported issue.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110214

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-07 18:45:24 -04:00
Pierre-Eric Pelloux-Prayer
120d0ef937 radeonsi: reduce DPBB persistent_states_per_bin value for APUs
Fixes some reported GPU hangs on RAVEN.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111231

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-07 18:45:22 -04:00
Pierre-Eric Pelloux-Prayer
6bda9ca062 radeonsi: fix typo in DPBB register field
Also only set FLUSH_ON_BINNING_TRANSITION for GPU families that needs it (matches
what si_emit_dpbb_disable is doing).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-07 18:45:20 -04:00
Pierre-Eric Pelloux-Prayer
90bded140e radeonsi: fix S_028C48_MAX_ALLOC_COUNT value
This field uses "value minus 1" encoding.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-07 18:45:09 -04:00
Christian Gmeiner
323cda475b etnaviv: drop struct etna_3d_state
Also drop #if 0 code block.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <philipp.zabel@gmail.com>
2019-08-07 22:12:00 +02:00
Yevhenii Kolesnikov
0325860e90 mesa: Use _mesa_delete_transform_feedback_object in drivers
Function _mesa_delete_transform_feedback_object called from within
drivers once driver-specific clean-up has been done. Brings into
conformity with how other GL objects are handled.

CC: Eric Anholt <eric@anholt.net>
CC: Kenneth Graunke <kenneth@whitecape.org>

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-07 17:25:22 +00:00
Yevhenii Kolesnikov
4f767ded6e mesa: use _mesa_delete_query in drivers
Now drivers can call _mesa_delete_query once driver-specific
clean-up has been done. Brings into conformity with how other GL
objects are handled.

CC: Eric Anholt <eric@anholt.net>
CC: Kenneth Graunke <kenneth@whitecape.org>

Suggested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-07 17:25:22 +00:00
Juan A. Suarez Romero
4619535ab7 docs: update calendar, add news item and link release notes for 19.1.4
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-08-07 18:51:32 +02:00
Juan A. Suarez Romero
a19d43ebd5 docs: add sha256 checksums for 19.1.4
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 7fcb69a33c)
2019-08-07 18:49:25 +02:00
Juan A. Suarez Romero
8484fafc78 docs: add release notes for 19.1.4
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit b84ffa028d)
2019-08-07 18:49:23 +02:00
Bas Nieuwenhuizen
5a26f528cb meson,i965: Link with android deps when building for android.
The DBG marco in brw_blorp.c ends up calling an android log function:

error: undefined reference to '__android_log_print'

v2: On suggestion from Lionel, hang the Android dependency onto a new
    libintel_common dependency.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-07 15:34:46 +02:00
Erik Faye-Lund
da9e2958ec gallium/dump: add missing query-type to short-list
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 3f6b3d9db7 ("gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-07 12:03:24 +00:00
Erik Faye-Lund
70a93922db gallium/dump: add missing query-type to short-list
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: a677799e51 ("gallium: add PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE
                     and corresponding cap")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-07 12:03:24 +00:00
Eric Engestrom
32ce010951 gitlab-ci: don't install autotools deps
These could've been deleted a long time ago, but apparent we forgot.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2019-08-07 10:18:25 +01:00
Eric Engestrom
5b10ddf358 util: fix mem leak of program path
Fixes: 759b940389 ("util: Get program name based on path when possible")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-08-07 08:42:42 +01:00
Eric Engestrom
991137144a meson: build intel-ui tools as part of all tools
Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111289
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-07 08:19:31 +01:00
Eric Engestrom
c32ebfe003 gitlab-ci: add gtk3 dev files for -D tools=intel-ui
We also need to update wayland-protocols and libXrandr (and randrproto),
as they are too old for gdk3 (which gtk3 depends on).

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-07 08:19:30 +01:00
Jan Vesely
6b8269d0bb clover: Fix build after clang r367864
v2: Drop special case of llvm-9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Aaron Watry <awatry@gmail.com>
2019-08-06 23:33:55 -04:00
Timothy Arceri
d81e11332b mesa: remove super old TODOs from shaderapi.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-07 13:31:40 +10:00
John Stultz
fcfa2d1447 mesa: freedreno: Android.registers.mk: Fix up register xml.h file generation
The current Androdi.registers.mk file causes build failures that
look like:
 FAILED:
 external/mesa3d/src/freedreno/Android.registers.mk:49: error: implicit rules are obsolete: out/target/product/linaro_db845c/gen/STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/%.xml.h

Caused by the following Android build rule change:
https://android.googlesource.com/platform/build/+/HEAD/Changes.md#implicit_rules

I tried to replace this with something similar to the static
pattern suggested in the URL above, but ended up getting all the
xml.h files generated using only the first a2xx.xml source file.

So I've fallen back to explicitly defining the make rules for
each.

Additionally, we needed to provide the proper
LOCAL_EXPORT_C_INCLUDE_DIRS and add the defined static library
to the components that depend on the register headers.

Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-08-07 02:18:38 +00:00
John Stultz
96baf052b2 mesa: Add ir3/ir3_nir_imul.c generation to Android.mk
With current master we're seeing build failures with AOSP:
  error: undefined symbol: ir3_nir_lower_imul

This is due to the ir3_nir_imul.c file not being generated
in the Android.mk files.

This patch simply adds it to the Android build, after which
thigns build and book ok on db410c.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-08-07 02:18:19 +00:00
Rohan Garg
16edd56fcc panfrost: Take into account a index_bias for glDrawElementsBaseVertex calls
Midgard does not accept a index_bias directly and relies instead on a
bias correction offset (offset_bias_correction) in order to calculate
the unbiased vertex index.

We need to make sure we adjust offset_start and vertex_count in order to
take into account the index_bias as required by a
glDrawElementsBaseVertex call and then supply a additional
offset_bias_correction to the hardware.

Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-06 17:18:19 -07:00
Bas Nieuwenhuizen
4bb17c08ae radv/gfx10: Enable DCC for storage images.
v2: Hide it behind a perftest flag.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen
3a5950f501 radv: Add device argument for dcc compression check.
Because it is about to be generation dependent.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen
8c63ffe54d radv: Disable compression for compute DCC decompress store.
Previously we relied on stores not using DCC but that is going to
change, so disable compression explicitly.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen
216a9d8871 radv: Add extra struct to image view creation.
For extra args. Unlike image creation, I'm not embedding the vk
struct in there, so all the inline structs can be kept.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen
50add1b33a radv: Do not decompress on LAYOUT_GENERAL.
We handle render loops properly now and STORAGE still disables
DCC/TC-compat HTILE in general.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen
66131ceb8b radv: Pass through render loop detection to internal layout decisions.
And do nothing with it yet.

Everything outside a renderpass has no render loop.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen
a171a6663d radv: Add render loop detection in renderpass.
VK spec 7.3:

"Applications must ensure that all accesses to memory that backs
image subresources used as attachments in a given renderpass instance
either happen-before the load operations for those attachments, or
happen-after the store operations for those attachments."

So the only renderloops we can have is with input attachments. Detect
these.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-07 02:13:07 +02:00
Timothy Arceri
a5b9394b87 drirc: Add vendor workaround for Divinity: Original Sin EE
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93551
2019-08-07 10:12:49 +10:00
Timothy Arceri
dca119f12c mesa/gallium: add dric option to allow overriding GL vendor string
Will be used in the following patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93551
2019-08-07 10:12:49 +10:00
Marek Olšák
c95e2a1c6b relnotes/19.2: document EXT_texture_dhadow_lod 2019-08-06 20:10:15 -04:00
Bas Nieuwenhuizen
04c6feb12c radv: Fix config reg assert.
Using the wrong bounds

Fixes: "219d6939df8 radv: add more assertions to make sure packets are correctly emitted"
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-07 08:58:23 +10:00
Marek Olšák
16577f5002 tgsi_to_nir: add a few needed double opcodes
for internal radeonsi shaders

v2 (Connor):
- Split out prep work from adding opcodes, and rewrite the former

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 18:03:26 -04:00
Marek Olšák
2207daf549 tgsi_to_nir: implement a few needed 64-bit integer opcodes
for internal radeonsi shaders

v2 (Connor):
- Split this out from the prep work, and rework the former
- Add support for U64SNE

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 18:03:24 -04:00
Connor Abbott
37f6350c1d ttn: Prepare for 64-bit sources and destinations
v2: Properly handle 32->64 bit conversions

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 18:03:22 -04:00
Connor Abbott
4b10949482 ttn: Use 1-bit NIR comparison opcodes
We shouldn't be using the versions that output a 32-bit boolean, since
nir_opt_algebraic won't optimize them as well. Drivers will lower these
to the 32-bit versions after optimizing, if appropriate. Also, this will
make implementing 64-bit comparisons easier.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 18:03:19 -04:00
Connor Abbott
e7fd90e8ef nir/builder: Add nir_b2i
Same as nir_b2f but for integers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 18:03:10 -04:00
Pierre-Eric Pelloux-Prayer
f84c9ad17a radeonsi: enable EXT_shader_image_load_store
This depends on LLVM 10 because this needs https://reviews.llvm.org/D65283

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:41:07 -04:00
Pierre-Eric Pelloux-Prayer
25fff591c1 radeonsi: add support for nir atomic_inc_wrap/atomic_dec_wrap
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:41:06 -04:00
Pierre-Eric Pelloux-Prayer
8789248541 radeonsi: add support for tgsi ATOMDEC_WRAP / ATOMINC_WRAP opcodes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:41:04 -04:00
Pierre-Eric Pelloux-Prayer
704a6b5948 ac: add ac_atomic_inc_wrap / ac_atomic_dec_wrap support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:41:03 -04:00
Pierre-Eric Pelloux-Prayer
a9ec718652 nir: add atomic_inc_wrap/atomic_dec_wrap image intrinsics
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:41:02 -04:00
Pierre-Eric Pelloux-Prayer
fc0a2e5d01 glsl: add EXT_shader_image_load_store new image functions
This extension has 2 functions that are missing from the ARB versions:
- imageAtomicIncWrap
- imageAtomicDecWrap

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:41:00 -04:00
Pierre-Eric Pelloux-Prayer
70a47fb032 glsl: add EXT_shader_image_load_store keywords to lexer
All of them already existed for ARB_shader_image_load_store.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:40:58 -04:00
Pierre-Eric Pelloux-Prayer
cfba168b6c glsl: add size qualifiers from EXT_shader_image_load_store
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:40:56 -04:00
Pierre-Eric Pelloux-Prayer
cd45d09226 glsl: handle differences between ARB/EXT versions of shader_image_load_store
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:40:55 -04:00
Pierre-Eric Pelloux-Prayer
5db28b0cf7 mesa: add EXT_shader_image_load_store glBindImageTextureEXT function
The implementation is almost identical to glBindImageTexture except for error
checking.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:40:53 -04:00
Pierre-Eric Pelloux-Prayer
71e619a825 glapi: add EXT_shader_image_load_store
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:40:52 -04:00
Pierre-Eric Pelloux-Prayer
91924453ee gallium: add PIPE_CAP_TGSI_ATOMINC_WRAP to indicate support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:40:51 -04:00
Pierre-Eric Pelloux-Prayer
8b6bfed3d2 tgsi: add ATOMICINC_WRAP/ATOMICDEC_WRAP opcode
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:40:34 -04:00
Marek Olšák
1d8a71af57 radeonsi/gfx10: enable all CUs for GS if NGG is never used
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:09:03 -04:00
Marek Olšák
91227a1e17 radeonsi/gfx10: add global use_ngg and use_ngg_streamout flags
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:09:02 -04:00
Marek Olšák
f064b530f6 radeonsi/gfx10: remove an obsolete VGT_REUSE_OFF workaround
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:09:01 -04:00
Marek Olšák
37dd8ebcf7 radeonsi/gfx10: disable LATE_ALLOC_GS on Navi14
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:59 -04:00
Marek Olšák
c5a6ecf61a radeonsi/gfx10: implement a bug workaround for GE_PC_ALLOC
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:58 -04:00
Marek Olšák
8f8c28767e radeonsi/gfx10: implement a bug workaround for NGG -> legacy transitions
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:57 -04:00
Marek Olšák
cb9d95623b radeonsi/gfx10: implement a GE bug workaround
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:56 -04:00
Marek Olšák
e08b0d7ac4 radeonsi/gfx10: set GE_CNTL for tessellation correctly
to match PAL

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:54 -04:00
Marek Olšák
71b53020b7 radeonsi/gfx10: simplify NGG code in si_update_shaders
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:53 -04:00
Marek Olšák
a232f5e07c radeonsi/gfx10: fix input VGPRs for legacy VS
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:51 -04:00
Marek Olšák
8d90157d49 radeonsi: make sure that rasterizer state != NULL and remove all NULL checking
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:39 -04:00
Marek Olšák
8b8819e88a radeonsi: make sure that DSA state != NULL and remove all NULL checking
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:39 -04:00
Marek Olšák
b758eed9c3 radeonsi: make sure that blend state != NULL and remove all NULL checking
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:39 -04:00
Marek Olšák
8b68511ebc radeonsi: DCC MSAA blending bug - include logic op, limit to Navi14 and older
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:50 -04:00
Marek Olšák
e69c1c8b8f radeonsi: determine accurately whether logic op is enabled
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:48 -04:00
Marek Olšák
b38f5eb17a radeonsi: skip draw calls with 0-sized index buffers
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:39 -04:00
Marek Olšák
e777720173 radeonsi/nir: lower PS inputs before scanning the shader
Lowering PS inputs can eliminate some of them, which messes up
persp/linear barycentric coord usage info.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:46 -04:00
Marek Olšák
f818d9ae3c radeonsi/nir: handle key.mono.u.ps.interpolate_at_sample_force_center
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:39 -04:00
Marek Olšák
b3eed3cff9 radeonsi: add missing prints into si_dump_shader_key
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:15 -04:00
Marek Olšák
6b3ee86989 radeonsi: disable SDMA image copies on dGPUs to fix corruption in games
Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:08 -04:00
Pierre-Eric Pelloux-Prayer
0556932f4a mesa: add EXT_dsa glMultiTexCoordPointerEXT function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:22 -04:00
Pierre-Eric Pelloux-Prayer
e364ddece3 mesa: add EXT_dsa glMultiTexGen* functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:21 -04:00
Pierre-Eric Pelloux-Prayer
e8e0de6a8f mesa: add EXT_dsa glCopyMultiTexImage* and glCopyMultiTexSubImage*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:19 -04:00
Pierre-Eric Pelloux-Prayer
f28d9ab1a3 mesa: add EXT_dsa glGetMultiTexParameteriv/fvEXT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:18 -04:00
Pierre-Eric Pelloux-Prayer
989c375852 mesa: add EXT_dsa glMultiTexSubImage1D/2D/3DEXT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:16 -04:00
Pierre-Eric Pelloux-Prayer
aac6578732 mesa: add EXT_dsa glMultiTexImage1D/2D/3DEXT + glGetMultiTexImageEXT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:15 -04:00
Pierre-Eric Pelloux-Prayer
885dbe2e84 mesa: add glBindMultiTextureEXT display list support
Fixes: 0972b0b059 ("mesa: add support for glBindMultiTextureEXT")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:13 -04:00
Pierre-Eric Pelloux-Prayer
d9e26c3483 mesa: add EXT_dsa glMultiTexParameter* functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:12 -04:00
Pierre-Eric Pelloux-Prayer
e04f95057f mesa: add EXT_dsa (Get)MultiTexEnv functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:10 -04:00
Pierre-Eric Pelloux-Prayer
04b8e50bb8 mesa: add _mesa_(get)texenvi(f)v_indexed helpers
They are exactly like _mesa_GetTexEnvfv/_mesa_GetTexEnviv except they take
a GLuint texunit parameter instead of relying of ctx->Texture.CurrentUnit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:08 -04:00
Pierre-Eric Pelloux-Prayer
0e595326c4 mesa: add new helper _mesa_get_texobj_by_target_and_texunit
Based on the 'static get_texobj_by_target' function from texparam.c,
but extended to also take the texunit as a parameter.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:03:06 -04:00
Pierre-Eric Pelloux-Prayer
58030d2b3d mesa: replace _mesa_get_current_fixedfunc_tex_unit with _mesa_get_fixedfunc_tex_unit
The new function implements the same feature but doesn't depend
on ctx->Texture.CurrentUnit.
This change allows to use it from indexed functions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:02:52 -04:00
Danylo Piliaiev
b4c54894bb iris: Handle vertex shader with window space position
Iris advertises support for PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION
so let's actually implement it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110657

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-06 20:25:35 +00:00
Erico Nunes
b783f9f77e lima: fix pipe_debug_callback warnings
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-06 20:29:53 +02:00
Vasily Khoruzhick
5adfc8602c lima/ppir: move sin/cos input scaling into NIR
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-06 17:49:22 +00:00
Antia Puentes
954224b714 nir/spirv: Fix gl_BaseVertex for non-indexed draws for OpenGL
Lowers BaseVertex to the correct system value for OpenGL.

v2: use options->environment rather than adding a new flag to
    spirv_to_nir_options

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-06 09:11:27 -07:00
Kenneth Graunke
382f92a814 iris: Increase BATCH_SZ to 64kB
This seems to improve performance by roughly ~1% across the board.
Thanks to Rafael Antognolli and Dan Walsh for their help tuning.
2019-08-06 09:09:26 -07:00
Bas Nieuwenhuizen
2af00b1fdd ac/nir: Use correct cast for readfirstlane and ptrs.
Fixes: 028ce527 "radv: Add non-uniform indexing lowering."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-06 15:48:50 +00:00
Bas Nieuwenhuizen
2301b2e029 radv: Do non-uniform lowering before bool lowering.
Since it can introduce comparisons.

Fixes: 028ce52739 "radv: Add non-uniform indexing lowering."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-06 15:48:50 +00:00
Jonathan Marek
dfe048058f etnaviv: support 3D and 2D array textures
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-08-06 10:37:36 -04:00
Jonathan Marek
3508f2fb18 etnaviv: fix 3d texture upload
Fix uploading of 3D textures and 2D array textures:

* Remove asserts in BLT and RS checking z
* Use box->z/box->depth in etna_copy_resource_box and CPU tile/untile
* Track mip level depth and use it in etna_copy_resource

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-08-06 10:37:36 -04:00
Jonathan Marek
ed7a27719a etnaviv: add alternative NIR compiler
enable with ETNA_MESA_DEBUG=nir

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-08-06 10:33:17 -04:00
Jonathan Marek
ee1ed59458 etnaviv: prep for UBOs
Allow UBO relocs and only emitting uniforms that are actually used.

GC7000Lite has no address register, so upload uniforms to a UBO object to
LOAD from.

I removed the code to check for changes to individual uniforms and just
reupload to entire uniform state when the state is dirty. I think there
was very limited benefit to it and it isn't compatible with relocs.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-08-06 10:33:17 -04:00
Jonathan Marek
ca58c1120e etnaviv: disasm: add dual16 bits, immediate decoding, and some opcodes
Also use structs from etnaviv_asm since they hold the same information.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-08-06 10:33:17 -04:00
Jonathan Marek
e9a5181ad6 etnaviv: asm: new features
* Dual16 bits
* Halti5 disable multiple uniform src
* write_mask compose
* Halti2+ immediates

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-08-06 10:33:17 -04:00
Jonathan Marek
98e59f0a0a etnaviv: update headers from rnndb
Update to etna_viv commit f38ba2d.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-08-06 10:33:17 -04:00
Erico Nunes
e0aeee9460 lima: add summary report for shader-db
Very basic summary, loops and gpir spills:fills are not updated yet and
are only there to comply with the strings to shader-db report.py regex.

For now it can be used to analyze the impact of changes in instruction
count in both gpir and ppir.

The LIMA_DEBUG=shaderdb setting can be useful to output stats on
applications other than shader-db.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-06 15:43:31 +02:00
Erico Nunes
9e41a514a8 lima: add support for debug callback
This adds support for glDebugMessageCallback which is required to
support shader-db reports.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-06 15:43:26 +02:00
Tomeu Vizoso
67f4e1e787 panfrost/ci: Remove two tests from list of failures
These tests have been fixed by:

b514f41183 ("glcpp: use pre-expansion line number for __LINE__")

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-08-06 15:19:43 +02:00
Jon Turney
84fae8e649 st/dri: Move dri2_format_mapping table and it's accessors from dri2.c to dri_helpers.c
8af1990a exposed dri2_get_mapping_by_fourcc() in dri_helpers.h, so it
could be used by dri_get_egl_image(), but didn't move it.  This breaks
the build in the with_dri=false case (e.g. when building for a target
which doesn't have libdrm, so swrast is only dri driver built)
2019-08-06 12:21:56 +00:00
Jonathan Marek
b514f41183 glcpp: use pre-expansion line number for __LINE__
Fixes the following deqp tests:
dEQP-GLES2.functional.shaders.preprocessor.predefined_macros.line_2_*

It don't see the spec requiring this, but it seems to be better, as the
clang preprocessor for example has this behavior.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-06 11:27:04 +00:00
Jason Ekstrand
bc612536eb anv: Emit a dummy MEDIA_VFE_STATE before switching from GPGPU to 3D
There is an object-level  preemption workaround which requires this.
However, even without object-level preemption, we seem to have issues
with geometry flickering when 3D and compute are combined in the same
batch and this appears to fix it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109630
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111267
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-06 05:46:28 +00:00
Ian Romanick
5544b2cbbd nir/algebraic: Use value range analysis to eliminate useless unary ops
Sandy Bridge is the big winner because it lies at something of a
crossroads.  It supports a fairly high OpenGL version, and it still has
the old style math box.  The high OpenGL version means a lot more
shaders can run on it.  The old style math box means extra moves are
necessary to resolve source modifiers on operands to complex math
instructions like COS, SQRT, and RCP.

v2: Remove a couple patterns that are now redundant.

All Gen7+ platforms had similar results.  (Ice Lake shown)
total instructions in shared programs: 16282006 -> 16278207 (-0.02%)
instructions in affected programs: 174555 -> 170756 (-2.18%)
helped: 661
HURT: 0
helped stats (abs) min: 1 max: 36 x̄: 5.75 x̃: 3
helped stats (rel) min: 0.06% max: 23.68% x̄: 2.81% x̃: 1.94%
95% mean confidence interval for instructions value: -6.16 -5.34
95% mean confidence interval for instructions %-change: -3.02% -2.60%
Instructions are helped.

total cycles in shared programs: 367168597 -> 367134284 (<.01%)
cycles in affected programs: 1105276 -> 1070963 (-3.10%)
helped: 460
HURT: 150
helped stats (abs) min: 1 max: 568 x̄: 96.60 x̃: 82
helped stats (rel) min: 0.02% max: 32.50% x̄: 7.99% x̃: 4.27%
HURT stats (abs)   min: 1 max: 901 x̄: 67.49 x̃: 39
HURT stats (rel)   min: 0.07% max: 20.00% x̄: 4.90% x̃: 4.22%
95% mean confidence interval for cycles value: -65.68 -46.82
95% mean confidence interval for cycles %-change: -5.59% -4.05%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10824272 -> 10802557 (-0.20%)
instructions in affected programs: 1237988 -> 1216273 (-1.75%)
helped: 8199
HURT: 0
helped stats (abs) min: 1 max: 41 x̄: 2.65 x̃: 2
helped stats (rel) min: 0.12% max: 20.00% x̄: 2.04% x̃: 1.73%
95% mean confidence interval for instructions value: -2.70 -2.59
95% mean confidence interval for instructions %-change: -2.07% -2.00%
Instructions are helped.

total cycles in shared programs: 154009894 -> 153843598 (-0.11%)
cycles in affected programs: 10650486 -> 10484190 (-1.56%)
helped: 4973
HURT: 1533
helped stats (abs) min: 1 max: 3904 x̄: 40.20 x̃: 20
helped stats (rel) min: 0.02% max: 41.72% x̄: 2.63% x̃: 1.67%
HURT stats (abs)   min: 1 max: 453 x̄: 21.94 x̃: 8
HURT stats (rel)   min: 0.02% max: 41.91% x̄: 1.54% x̃: 0.58%
95% mean confidence interval for cycles value: -28.02 -23.10
95% mean confidence interval for cycles %-change: -1.74% -1.56%
Cycles are helped.

LOST:   0
GAINED: 2

GM45 and Iron Lake had similar results. (Iron Lake shown)
total instructions in shared programs: 8135196 -> 8134888 (<.01%)
instructions in affected programs: 31920 -> 31612 (-0.96%)
helped: 169
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 1.82 x̃: 2
helped stats (rel) min: 0.43% max: 3.23% x̄: 1.23% x̃: 1.16%
95% mean confidence interval for instructions value: -2.01 -1.64
95% mean confidence interval for instructions %-change: -1.32% -1.15%
Instructions are helped.

total cycles in shared programs: 188575724 -> 188574092 (<.01%)
cycles in affected programs: 406840 -> 405208 (-0.40%)
helped: 169
HURT: 0
helped stats (abs) min: 4 max: 72 x̄: 9.66 x̃: 10
helped stats (rel) min: 0.07% max: 2.16% x̄: 0.57% x̃: 0.47%
95% mean confidence interval for cycles value: -10.72 -8.59
95% mean confidence interval for cycles %-change: -0.63% -0.50%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:14 -07:00
Ian Romanick
8d14380971 nir/algebraic: Use value range analysis to convert fmin to fsat
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16297320 -> 16282006 (-0.09%)
instructions in affected programs: 2434498 -> 2419184 (-0.63%)
helped: 8091
HURT: 1
helped stats (abs) min: 1 max: 51 x̄: 1.89 x̃: 2
helped stats (rel) min: 0.04% max: 14.29% x̄: 0.98% x̃: 0.95%
HURT stats (abs)   min: 7 max: 7 x̄: 7.00 x̃: 7
HURT stats (rel)   min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28%
95% mean confidence interval for instructions value: -1.94 -1.85
95% mean confidence interval for instructions %-change: -0.99% -0.96%
Instructions are helped.

total cycles in shared programs: 367221624 -> 367168597 (-0.01%)
cycles in affected programs: 126409635 -> 126356608 (-0.04%)
helped: 5612
HURT: 1023
helped stats (abs) min: 1 max: 2332 x̄: 31.11 x̃: 16
helped stats (rel) min: <.01% max: 30.31% x̄: 1.69% x̃: 1.42%
HURT stats (abs)   min: 1 max: 2372 x̄: 118.84 x̃: 16
HURT stats (rel)   min: <.01% max: 46.98% x̄: 1.46% x̃: 0.35%
95% mean confidence interval for cycles value: -11.52 -4.46
95% mean confidence interval for cycles %-change: -1.26% -1.14%
Cycles are helped.

total spills in shared programs: 8868 -> 8870 (0.02%)
spills in affected programs: 28 -> 30 (7.14%)
helped: 0
HURT: 1

total fills in shared programs: 21903 -> 21904 (<.01%)
fills in affected programs: 42 -> 43 (2.38%)
helped: 0
HURT: 1

Haswell
total instructions in shared programs: 13353925 -> 13338728 (-0.11%)
instructions in affected programs: 2265850 -> 2250653 (-0.67%)
helped: 8127
HURT: 5
helped stats (abs) min: 1 max: 51 x̄: 1.88 x̃: 2
helped stats (rel) min: 0.04% max: 20.00% x̄: 1.13% x̃: 1.07%
HURT stats (abs)   min: 5 max: 16 x̄: 9.00 x̃: 6
HURT stats (rel)   min: 0.19% max: 0.52% x̄: 0.35% x̃: 0.28%
95% mean confidence interval for instructions value: -1.91 -1.83
95% mean confidence interval for instructions %-change: -1.15% -1.11%
Instructions are helped.

total cycles in shared programs: 375535444 -> 375536343 (<.01%)
cycles in affected programs: 131206582 -> 131207481 (<.01%)
helped: 5590
HURT: 1055
helped stats (abs) min: 1 max: 2844 x̄: 34.15 x̃: 16
helped stats (rel) min: <.01% max: 21.57% x̄: 2.08% x̃: 1.60%
HURT stats (abs)   min: 1 max: 2487 x̄: 181.78 x̃: 21
HURT stats (rel)   min: <.01% max: 40.66% x̄: 1.96% x̃: 0.37%
95% mean confidence interval for cycles value: -4.74 5.01
95% mean confidence interval for cycles %-change: -1.51% -1.37%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 23401 -> 23407 (0.03%)
spills in affected programs: 248 -> 254 (2.42%)
helped: 2
HURT: 5

total fills in shared programs: 34850 -> 34845 (-0.01%)
fills in affected programs: 383 -> 378 (-1.31%)
helped: 2
HURT: 5

Ivy Bridge
total instructions in shared programs: 11975423 -> 11968117 (-0.06%)
instructions in affected programs: 845703 -> 838397 (-0.86%)
helped: 4071
HURT: 0
helped stats (abs) min: 1 max: 51 x̄: 1.79 x̃: 1
helped stats (rel) min: 0.08% max: 8.21% x̄: 1.04% x̃: 0.93%
95% mean confidence interval for instructions value: -1.87 -1.71
95% mean confidence interval for instructions %-change: -1.06% -1.02%
Instructions are helped.

total cycles in shared programs: 179674318 -> 179635552 (-0.02%)
cycles in affected programs: 5100065 -> 5061299 (-0.76%)
helped: 2650
HURT: 611
helped stats (abs) min: 1 max: 900 x̄: 21.85 x̃: 16
helped stats (rel) min: <.01% max: 21.55% x̄: 2.39% x̃: 1.40%
HURT stats (abs)   min: 1 max: 1841 x̄: 31.33 x̃: 6
HURT stats (rel)   min: <.01% max: 58.71% x̄: 1.64% x̃: 0.37%
95% mean confidence interval for cycles value: -14.14 -9.64
95% mean confidence interval for cycles %-change: -1.75% -1.52%
Cycles are helped.

LOST:   3
GAINED: 7

Sandy Bridge
total instructions in shared programs: 10828844 -> 10824272 (-0.04%)
instructions in affected programs: 525678 -> 521106 (-0.87%)
helped: 2386
HURT: 0
helped stats (abs) min: 1 max: 51 x̄: 1.92 x̃: 2
helped stats (rel) min: 0.11% max: 7.96% x̄: 1.05% x̃: 0.94%
95% mean confidence interval for instructions value: -2.04 -1.80
95% mean confidence interval for instructions %-change: -1.08% -1.03%
Instructions are helped.

total cycles in shared programs: 154024591 -> 154009894 (<.01%)
cycles in affected programs: 4005766 -> 3991069 (-0.37%)
helped: 1245
HURT: 506
helped stats (abs) min: 1 max: 585 x̄: 21.07 x̃: 16
helped stats (rel) min: 0.02% max: 11.57% x̄: 1.98% x̃: 0.83%
HURT stats (abs)   min: 1 max: 639 x̄: 22.81 x̃: 6
HURT stats (rel)   min: 0.01% max: 26.21% x̄: 1.07% x̃: 0.26%
95% mean confidence interval for cycles value: -10.57 -6.21
95% mean confidence interval for cycles %-change: -1.23% -0.97%
Cycles are helped.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total instructions in shared programs: 8137248 -> 8135196 (-0.03%)
instructions in affected programs: 148322 -> 146270 (-1.38%)
helped: 992
HURT: 0
helped stats (abs) min: 1 max: 32 x̄: 2.07 x̃: 2
helped stats (rel) min: 0.41% max: 9.73% x̄: 1.74% x̃: 1.51%
95% mean confidence interval for instructions value: -2.16 -1.98
95% mean confidence interval for instructions %-change: -1.80% -1.67%
Instructions are helped.

total cycles in shared programs: 188583424 -> 188575724 (<.01%)
cycles in affected programs: 4409620 -> 4401920 (-0.17%)
helped: 956
HURT: 6
helped stats (abs) min: 2 max: 168 x̄: 8.09 x̃: 8
helped stats (rel) min: 0.04% max: 6.76% x̄: 0.27% x̃: 0.18%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.10% max: 0.10% x̄: 0.10% x̃: 0.10%
95% mean confidence interval for cycles value: -8.41 -7.60
95% mean confidence interval for cycles %-change: -0.29% -0.25%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:14 -07:00
Ian Romanick
b77070e293 nir/algebraic: Use value range analysis to eliminate tautological compares
It's only one application on one platform (Haswell) that's affected,
but spills and fills increase quite dramatically. :(

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16320850 -> 16297320 (-0.14%)
instructions in affected programs: 448012 -> 424482 (-5.25%)
helped: 1938
HURT: 0
helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10
helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38%
95% mean confidence interval for instructions value: -12.80 -11.48
95% mean confidence interval for instructions %-change: -5.99% -5.72%
Instructions are helped.

total cycles in shared programs: 367496943 -> 367221624 (-0.07%)
cycles in affected programs: 8557232 -> 8281913 (-3.22%)
helped: 1907
HURT: 26
helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48
helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94%
HURT stats (abs)   min: 4 max: 1870 x̄: 208.23 x̃: 20
HURT stats (rel)   min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79%
95% mean confidence interval for cycles value: -165.38 -119.48
95% mean confidence interval for cycles %-change: -5.68% -5.04%
Cycles are helped.

LOST:   1
GAINED: 0

Haswell
total instructions in shared programs: 13374211 -> 13353925 (-0.15%)
instructions in affected programs: 349868 -> 329582 (-5.80%)
helped: 1669
HURT: 1
helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10
helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49%
HURT stats (abs)   min: 700 max: 700 x̄: 700.00 x̃: 700
HURT stats (rel)   min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34%
95% mean confidence interval for instructions value: -13.25 -11.04
95% mean confidence interval for instructions %-change: -7.01% -6.63%
Instructions are helped.

total cycles in shared programs: 375763544 -> 375535444 (-0.06%)
cycles in affected programs: 6932686 -> 6704586 (-3.29%)
helped: 1622
HURT: 48
helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68
helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12%
HURT stats (abs)   min: 3 max: 7451 x̄: 259.44 x̃: 41
HURT stats (rel)   min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88%
95% mean confidence interval for cycles value: -159.86 -113.31
95% mean confidence interval for cycles %-change: -5.86% -5.18%
Cycles are helped.

total spills in shared programs: 23258 -> 23401 (0.61%)
spills in affected programs: 54 -> 197 (264.81%)
helped: 4
HURT: 2

total fills in shared programs: 34775 -> 34850 (0.22%)
fills in affected programs: 52 -> 127 (144.23%)
helped: 4
HURT: 1

LOST:   5
GAINED: 0

Ivy Bridge
total instructions in shared programs: 11996051 -> 11977964 (-0.15%)
instructions in affected programs: 346679 -> 328592 (-5.22%)
helped: 1508
HURT: 0
helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10
helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43%
95% mean confidence interval for instructions value: -12.65 -11.34
95% mean confidence interval for instructions %-change: -5.86% -5.60%
Instructions are helped.

total cycles in shared programs: 179891389 -> 179691339 (-0.11%)
cycles in affected programs: 7869479 -> 7669429 (-2.54%)
helped: 1485
HURT: 23
helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54
helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49%
HURT stats (abs)   min: 1 max: 403 x̄: 93.48 x̃: 6
HURT stats (rel)   min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81%
95% mean confidence interval for cycles value: -154.59 -110.73
95% mean confidence interval for cycles %-change: -4.79% -4.19%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10829247 -> 10828844 (<.01%)
instructions in affected programs: 21258 -> 20855 (-1.90%)
helped: 88
HURT: 0
helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5
helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21%
95% mean confidence interval for instructions value: -5.03 -4.13
95% mean confidence interval for instructions %-change: -2.21% -1.89%
Instructions are helped.

total cycles in shared programs: 154035437 -> 154024591 (<.01%)
cycles in affected programs: 430176 -> 419330 (-2.52%)
helped: 78
HURT: 10
helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32
helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07%
HURT stats (abs)   min: 3 max: 265 x̄: 31.30 x̃: 6
HURT stats (rel)   min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21%
95% mean confidence interval for cycles value: -232.53 -13.97
95% mean confidence interval for cycles %-change: -2.13% -1.23%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8137402 -> 8137248 (<.01%)
instructions in affected programs: 2280 -> 2126 (-6.75%)
helped: 10
HURT: 0
helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15
helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95%
95% mean confidence interval for instructions value: -17.69 -13.11
95% mean confidence interval for instructions %-change: -8.99% -5.39%
Instructions are helped.

total cycles in shared programs: 188538716 -> 188583424 (0.02%)
cycles in affected programs: 69326 -> 114034 (64.49%)
helped: 0
HURT: 10
HURT stats (abs)   min: 2068 max: 7686 x̄: 4470.80 x̃: 4870
HURT stats (rel)   min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41%
95% mean confidence interval for cycles value: 2830.86 6110.74
95% mean confidence interval for cycles %-change: 39.18% 99.91%
Cycles are HURT.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick
96fcb3f95b nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements
This just eliminates tautological / contradictory compares that are used
for bcsel and other non-if-statement cases.  If-statements are not
affected because removing flow control can cause the i965 instrution
scheduler to create some very long live ranges resulting in unncessary
spilling.  This causes some shaders to fall of a performance cliff.

Since many small if-statements are already flattened to bcsel, this
optimization covers more than 68% of the possible cases (2417 shaders
helped for instructions on Skylake vs. 3554).

v2: Reorder and add whitespace to make the relationship between the
patterns more obvious.  Suggested by Caio.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16333474 -> 16322028 (-0.07%)
instructions in affected programs: 438559 -> 427113 (-2.61%)
helped: 1765
HURT: 0
helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4
helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82%
95% mean confidence interval for instructions value: -6.87 -6.10
95% mean confidence interval for instructions %-change: -4.30% -3.84%
Instructions are helped.

total cycles in shared programs: 367608554 -> 367511103 (-0.03%)
cycles in affected programs: 8368829 -> 8271378 (-1.16%)
helped: 1541
HURT: 129
helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39
helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17%
HURT stats (abs)   min: 1 max: 973 x̄: 42.25 x̃: 10
HURT stats (rel)   min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60%
95% mean confidence interval for cycles value: -64.90 -51.81
95% mean confidence interval for cycles %-change: -3.89% -3.36%
Cycles are helped.

total spills in shared programs: 8867 -> 8868 (0.01%)
spills in affected programs: 18 -> 19 (5.56%)
helped: 0
HURT: 1

total fills in shared programs: 21900 -> 21903 (0.01%)
fills in affected programs: 78 -> 81 (3.85%)
helped: 0
HURT: 1

All Gen6 and earlier platforms had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10829877 -> 10829247 (<.01%)
instructions in affected programs: 30240 -> 29610 (-2.08%)
helped: 177
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3
helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94%
95% mean confidence interval for instructions value: -3.93 -3.18
95% mean confidence interval for instructions %-change: -3.04% -2.32%
Instructions are helped.

total cycles in shared programs: 154036580 -> 154035437 (<.01%)
cycles in affected programs: 352402 -> 351259 (-0.32%)
helped: 96
HURT: 28
helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6
helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46%
HURT stats (abs)   min: 1 max: 117 x̄: 9.68 x̃: 4
HURT stats (rel)   min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23%
95% mean confidence interval for cycles value: -13.40 -5.03
95% mean confidence interval for cycles %-change: -1.62% -0.53%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick
fa116ce357 nir/range-analysis: Range tracking for ffma and flrp
A similar technique could be used for fmin3, fmax3, and fmid3.

This could be squashed with the previous commit.  I kept it separate to
ease review.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick
586602c5d9 nir/range-analysis: Range tracking for bcsel
This could be squashed with the previous commit.  I kept it separate to
ease review.

v2: Add some missing cases.  Use nir_src_is_const helper.  Both
suggested by Caio.  Use a table for mapping source ranges to a result
range.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick
3009cbed50 nir/range-analysis: Tighten the range of fsat based on the range of its source
This could be squashed with the previous commit.  I kept it separate to
ease review.

v2: Use a switch statement and add more comments.  Both suggested by
Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick
405de7ccb6 nir/range-analysis: Rudimentary value range analysis pass
Most integer operations are omitted because dealing with integer
overflow is hard.  There are a few things that could be smarter if there
was a small amount more tracking of ranges of integer types (i.e.,
operands are Boolean, operand values fit in 16 bits, etc.).

The changes to nir_search_helpers.h are included in this patch to
simplify reordering the changes to nir_opt_algebraic.py.

v2: Memoize range analysis results.  Without this, some shaders appear
to get stuck in infinite loops.

v3: Rebase on many months of Mesa changes, including 1-bit Boolean
changes.

v4: Rebase on "nir: Drop imov/fmov in favor of one mov instruction".

v5: Use nir_alu_srcs_equal for detecting (a*a).  Previously just the SSA
value was compared, and this incorrectly matched (a.x*a.y).

v6: Many code improvements including (but not limited to) better names,
more comments, and better use of helper functions.  All suggested by
Caio.  Rework the handling of several opcodes to use a table for mapping
source ranges to a result range.  This change fixed a bug that caused
fmax(gt_zero, ge_zero) to be incorrectly recognized as ge_zero.
Slightly tighten the range of fmul by recognizing that x*x is gt_zero if
x is gt_zero.  Add similar handling for -x*x.

v7: Use _______ in the tables as an alias for unknown.  Suggested by
Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick
d24edb4b8c nir/algebraic: Simplify some comparisons like a+constant < constant
v2: Remove unsafe integer versions of the optimizations.  This change
had no effect on shader-db results.  Suggested by Caio.

All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16333713 -> 16332631 (<.01%)
instructions in affected programs: 258112 -> 257030 (-0.42%)
helped: 1275
HURT: 407
helped stats (abs) min: 1 max: 7 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.20% max: 8.33% x̄: 1.33% x̃: 0.86%
HURT stats (abs)   min: 1 max: 2 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 2.94% x̄: 0.98% x̃: 0.98%
95% mean confidence interval for instructions value: -0.70 -0.59
95% mean confidence interval for instructions %-change: -0.84% -0.70%
Instructions are helped.

total cycles in shared programs: 367596791 -> 367601268 (<.01%)
cycles in affected programs: 3420062 -> 3424539 (0.13%)
helped: 1553
HURT: 783
helped stats (abs) min: 1 max: 742 x̄: 24.36 x̃: 6
helped stats (rel) min: 0.05% max: 21.12% x̄: 1.47% x̃: 0.65%
HURT stats (abs)   min: 1 max: 557 x̄: 54.04 x̃: 14
HURT stats (rel)   min: 0.01% max: 33.66% x̄: 3.36% x̃: 1.43%
95% mean confidence interval for cycles value: -1.60 5.43
95% mean confidence interval for cycles %-change: -0.03% 0.33%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 2

Iron Lake
total instructions in shared programs: 8137992 -> 8137874 (<.01%)
instructions in affected programs: 17501 -> 17383 (-0.67%)
helped: 104
HURT: 2
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.25% max: 2.63% x̄: 0.87% x̃: 0.72%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.22 -1.00
95% mean confidence interval for instructions %-change: -0.94% -0.76%
Instructions are helped.

total cycles in shared programs: 188540038 -> 188539650 (<.01%)
cycles in affected programs: 704574 -> 704186 (-0.06%)
helped: 125
HURT: 84
helped stats (abs) min: 2 max: 96 x̄: 6.45 x̃: 4
helped stats (rel) min: <.01% max: 3.47% x̄: 0.42% x̃: 0.25%
HURT stats (abs)   min: 2 max: 58 x̄: 4.98 x̃: 4
HURT stats (rel)   min: 0.01% max: 2.75% x̄: 0.36% x̃: 0.33%
95% mean confidence interval for cycles value: -3.20 -0.52
95% mean confidence interval for cycles %-change: -0.19% -0.03%
Cycles are helped.

GM45
total instructions in shared programs: 5008889 -> 5008830 (<.01%)
instructions in affected programs: 8824 -> 8765 (-0.67%)
helped: 52
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.25% max: 2.38% x̄: 0.86% x̃: 0.72%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.27 -0.95
95% mean confidence interval for instructions %-change: -0.96% -0.71%
Instructions are helped.

total cycles in shared programs: 128969426 -> 128969128 (<.01%)
cycles in affected programs: 399798 -> 399500 (-0.07%)
helped: 74
HURT: 30
helped stats (abs) min: 2 max: 22 x̄: 6.76 x̃: 6
helped stats (rel) min: <.01% max: 1.83% x̄: 0.46% x̃: 0.29%
HURT stats (abs)   min: 2 max: 58 x̄: 6.73 x̃: 6
HURT stats (rel)   min: 0.06% max: 2.75% x̄: 0.42% x̃: 0.21%
95% mean confidence interval for cycles value: -4.60 -1.14
95% mean confidence interval for cycles %-change: -0.32% -0.08%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick
7c64cbf49d nir/algebraic: Recognize (a < 0 || 0 < b) as min(a, -b) < 0
Similar to commit 97e6c1b9 and f5cf74d8ba.

First apply 0 < b => -b < 0 to get (a < 0 || -b < 0), then apply some
pre-existing rules to get min(a, -b) < 0.

v2: Substantially update the comment explaining the use of is_used_once
and the duplication of patterns.  Suggested by Caio.  Also, while flt
and fge are not commutative, ior and iand are.  Half of the original
patterns were redundant, so delete them.  As alternate justification for
deleting them, fmin(a, -b) < 0 <=> 0 < fmax(-a, b).  Proof left as an
exercise for the reader.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16333789 -> 16333713 (<.01%)
instructions in affected programs: 11424 -> 11348 (-0.67%)
helped: 32
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 2
helped stats (rel) min: 0.20% max: 1.67% x̄: 0.76% x̃: 0.69%
95% mean confidence interval for instructions value: -3.03 -1.72
95% mean confidence interval for instructions %-change: -0.89% -0.62%
Instructions are helped.

total cycles in shared programs: 367598295 -> 367596791 (<.01%)
cycles in affected programs: 141414 -> 139910 (-1.06%)
helped: 23
HURT: 6
helped stats (abs) min: 3 max: 386 x̄: 72.52 x̃: 20
helped stats (rel) min: 0.15% max: 4.86% x̄: 1.01% x̃: 0.76%
HURT stats (abs)   min: 4 max: 88 x̄: 27.33 x̃: 12
HURT stats (rel)   min: 0.22% max: 3.95% x̄: 1.08% x̃: 0.59%
95% mean confidence interval for cycles value: -93.51 -10.21
95% mean confidence interval for cycles %-change: -1.10% -0.05%
Cycles are helped.

total instructions in shared programs: 10830836 -> 10830779 (<.01%)
instructions in affected programs: 6895 -> 6838 (-0.83%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 4.75 x̃: 1
helped stats (rel) min: 0.14% max: 1.61% x̄: 0.65% x̃: 0.33%
95% mean confidence interval for instructions value: -8.46 -1.04
95% mean confidence interval for instructions %-change: -1.03% -0.27%
Instructions are helped.

total cycles in shared programs: 154028477 -> 154032740 (<.01%)
cycles in affected programs: 178433 -> 182696 (2.39%)
helped: 3
HURT: 9
helped stats (abs) min: 3 max: 20 x̄: 11.00 x̃: 10
helped stats (rel) min: 0.07% max: 0.20% x̄: 0.12% x̃: 0.09%
HURT stats (abs)   min: 27 max: 1415 x̄: 477.33 x̃: 262
HURT stats (rel)   min: 0.22% max: 6.45% x̄: 2.49% x̃: 1.76%
95% mean confidence interval for cycles value: 28.68 681.82
95% mean confidence interval for cycles %-change: 0.37% 3.30%
Cycles are HURT.

Iron Lake
total instructions in shared programs: 8137966 -> 8137992 (<.01%)
instructions in affected programs: 3281 -> 3307 (0.79%)
helped: 0
HURT: 6
HURT stats (abs)   min: 3 max: 7 x̄: 4.33 x̃: 3
HURT stats (rel)   min: 0.63% max: 1.01% x̄: 0.76% x̃: 0.64%
95% mean confidence interval for instructions value: 2.17 6.50
95% mean confidence interval for instructions %-change: 0.56% 0.96%
Instructions are HURT.

total cycles in shared programs: 188539386 -> 188540038 (<.01%)
cycles in affected programs: 103826 -> 104478 (0.63%)
helped: 0
HURT: 7
HURT stats (abs)   min: 16 max: 218 x̄: 93.14 x̃: 80
HURT stats (rel)   min: 0.14% max: 0.95% x̄: 0.53% x̃: 0.46%
95% mean confidence interval for cycles value: 10.26 176.02
95% mean confidence interval for cycles %-change: 0.24% 0.81%
Cycles are HURT.

GM45
total instructions in shared programs: 5008876 -> 5008889 (<.01%)
instructions in affected programs: 1645 -> 1658 (0.79%)
helped: 0
HURT: 3
HURT stats (abs)   min: 3 max: 7 x̄: 4.33 x̃: 3
HURT stats (rel)   min: 0.63% max: 1.00% x̄: 0.76% x̃: 0.63%

total cycles in shared programs: 128968950 -> 128969426 (<.01%)
cycles in affected programs: 64854 -> 65330 (0.73%)
helped: 0
HURT: 4
HURT stats (abs)   min: 18 max: 218 x̄: 119.00 x̃: 120
HURT stats (rel)   min: 0.14% max: 0.95% x̄: 0.60% x̃: 0.66%
95% mean confidence interval for cycles value: -62.92 300.92
95% mean confidence interval for cycles %-change: -0.05% 1.26%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick
92b75c126b nir/algebraic: Replace checks that a value is between (or not) [0, 1]
v2: Add an extra line to one of the proofs.  Suggested by Caio.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16329772 -> 16329427 (<.01%)
instructions in affected programs: 41980 -> 41635 (-0.82%)
helped: 110
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 3.14 x̃: 2
helped stats (rel) min: 0.19% max: 5.56% x̄: 1.12% x̃: 0.94%
95% mean confidence interval for instructions value: -4.10 -2.17
95% mean confidence interval for instructions %-change: -1.28% -0.96%
Instructions are helped.

total cycles in shared programs: 367551273 -> 367549979 (<.01%)
cycles in affected programs: 492462 -> 491168 (-0.26%)
helped: 76
HURT: 25
helped stats (abs) min: 1 max: 400 x̄: 42.86 x̃: 12
helped stats (rel) min: 0.06% max: 10.72% x̄: 1.23% x̃: 0.75%
HURT stats (abs)   min: 2 max: 730 x̄: 78.52 x̃: 16
HURT stats (rel)   min: 0.17% max: 6.89% x̄: 2.08% x̃: 1.23%
95% mean confidence interval for cycles value: -37.79 12.16
95% mean confidence interval for cycles %-change: -0.90% 0.07%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 2

Sandy Bridge
total instructions in shared programs: 10831115 -> 10830836 (<.01%)
instructions in affected programs: 37830 -> 37551 (-0.74%)
helped: 70
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 3.99 x̃: 2
helped stats (rel) min: 0.33% max: 7.14% x̄: 1.21% x̃: 0.97%
95% mean confidence interval for instructions value: -5.47 -2.50
95% mean confidence interval for instructions %-change: -1.49% -0.92%
Instructions are helped.

total cycles in shared programs: 154029323 -> 154028477 (<.01%)
cycles in affected programs: 247909 -> 247063 (-0.34%)
helped: 52
HURT: 6
helped stats (abs) min: 2 max: 254 x̄: 25.81 x̃: 4
helped stats (rel) min: 0.07% max: 4.39% x̄: 0.81% x̃: 0.19%
HURT stats (abs)   min: 4 max: 403 x̄: 82.67 x̃: 8
HURT stats (rel)   min: 0.18% max: 1.60% x̄: 0.71% x̃: 0.53%
95% mean confidence interval for cycles value: -34.83 5.65
95% mean confidence interval for cycles %-change: -0.98% -0.32%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake
total instructions in shared programs: 8138007 -> 8137966 (<.01%)
instructions in affected programs: 4060 -> 4019 (-1.01%)
helped: 31
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.32 x̃: 1
helped stats (rel) min: 0.68% max: 8.33% x̄: 1.45% x̃: 0.90%
95% mean confidence interval for instructions value: -1.50 -1.15
95% mean confidence interval for instructions %-change: -2.11% -0.79%
Instructions are helped.

total cycles in shared programs: 188539492 -> 188539386 (<.01%)
cycles in affected programs: 26280 -> 26174 (-0.40%)
helped: 25
HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4
helped stats (rel) min: 0.08% max: 2.11% x̄: 0.54% x̃: 0.50%
95% mean confidence interval for cycles value: -5.08 -3.40
95% mean confidence interval for cycles %-change: -0.70% -0.37%
Cycles are helped.

GM45
total instructions in shared programs: 5008897 -> 5008876 (<.01%)
instructions in affected programs: 2096 -> 2075 (-1.00%)
helped: 16
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.31 x̃: 1
helped stats (rel) min: 0.68% max: 7.69% x̄: 1.41% x̃: 0.89%
95% mean confidence interval for instructions value: -1.57 -1.06
95% mean confidence interval for instructions %-change: -2.32% -0.49%
Instructions are helped.

total cycles in shared programs: 128969020 -> 128968950 (<.01%)
cycles in affected programs: 18490 -> 18420 (-0.38%)
helped: 15
HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 4.67 x̃: 4
helped stats (rel) min: 0.08% max: 2.11% x̄: 0.51% x̃: 0.48%
95% mean confidence interval for cycles value: -6.03 -3.30
95% mean confidence interval for cycles %-change: -0.78% -0.24%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Jonathan Marek
a44b4200f3 tgsi_to_nir: fix nir_gather_ssa_types for TGSI->NIR shaders
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
2019-08-05 22:09:47 -04:00
Jason Ekstrand
f6e7de41d7 anv: Implement VK_EXT_line_rasterization
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-06 02:05:28 +00:00
Jason Ekstrand
f03512f90b genxml: Rename 3DSTATE_SF::Anti-Aliasing Enable
This makes it consistent with the new name when it's moved to
3DSTATE_RASTER.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-06 02:05:28 +00:00
Jason Ekstrand
abf9e10488 anv: Use dirty bits for dynamic state tracking
Previously, we assumed that the dirty bit was always 1 << VK_DYNAMIC_*
and this assumption is about to be false.  Extensions which define new
VK_DYNAMIC_* enums won't be nice and tightly packed which this really
requires.  Instead, add functions to don the conversions and rework the
bits a bit.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-06 02:05:28 +00:00
Jason Ekstrand
aa13f75f01 anv: Advertise the right line width range on gen9 and CHV
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-06 02:05:28 +00:00
Alyssa Rosenzweig
77295b1fdc meson: Add panfrost to the --auto list
Look ma, we're a real driver now! I was waiting until Panfrost
stabilises a bit for this, but now that 19.2 is almost here, let's make
us official :)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-08-05 17:42:05 -07:00
Erico Nunes
360bda0b1d lima/ppir: enable lower_vector_cmp to lower fall_equal
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-05 23:36:46 +02:00
Erico Nunes
9e8f8dbcd1 lima: re-run nir_opt_algebraic after int lowering
nir_lower_int_to_float is currently only meant to run once, and some ops
must be lowered after being converted from int ops to be implementable,
so re-run nir_opt_algebraic after lowering ints to floats.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-05 23:36:35 +02:00
Alyssa Rosenzweig
3db4949197 pan/midgard: Extend SSA concurrency checks to other args
No glmark changes, but this seems like a good idea.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-05 11:22:49 -07:00
Alyssa Rosenzweig
2869758355 pan/midgard: Rewrite bidirectionally when eliminating moves
Symptom: the sky is black in SuperTuxKart (flashbacks to SMB/NES
emulation intensify).

Essentially, what happened is a fixed (special) move to r0 was
eliminated but scheduling did not factor this in, so
can_run_concurrent_ssa returned true even when there was a logical data
dependency that needed to be resolved.

Fixes: 20771ede1c ("pan/midgard: Add post-RA move elimination")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-05 10:58:39 -07:00
Danylo Piliaiev
04a9951580 intel/compiler: add ability to override shader's assembly
When dumping shader's assembly with INTEL_DEBUG=vs,tcs,...
sha1 of the resulting assembly is also printed, having environment
variable INTEL_SHADER_ASM_READ_PATH present driver will try to
load a "%sha1%.bin" file from the path and substitute current
assembly with the one from the file.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-05 17:19:09 +00:00
Danylo Piliaiev
430823c96b intel/tools: add binary output type to i965_asm
Add '-t,--type' command line option to specify the output type
which can be 'bin', 'c_literal' or 'hex'.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-08-05 17:19:09 +00:00
Alyssa Rosenzweig
1f8b653acb panfrost: Add app blacklist
In preparation for an initial 19.2 release, add a blacklist for apps
known to be buggy under Panfrost to protect users. Panfrost is NOT a
conformant implementation at this time.

Distros: please do not revert this patch. If blacklisted apps are run
using Panfrost, dragons will bite you. Thanks :)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-08-05 16:04:47 +00:00
Kenneth Graunke
64b73b770b iris: Fix bad external BO hash table and zombie list interactions
A while ago, we started deferring GEM object closure and VMA release
until buffers were idle.  This had some unforeseen interactions with
external buffers.

We keep imported buffers in hash tables, so if we have repeated imports
of the same GEM object, we map those to the same iris_bo structure.
This is critical for several reasons.  Unfortunately, we broke this
assumption.  When freeing a non-idle external buffer, we would drop it
from the hash tables, then move it to the zombie list.  If someone
reimported the same GEM object, we would not find it in the hash tables,
and go ahead and make a second iris_bo for that GEM object.  But the old
iris_bo would still be in the zombie list, and so we would eventually
call GEM_CLOSE on it - closing a BO that should have still been live.

To work around this, we defer removing a BO from the hash tables until
it's actually fully closed.  This has the strange effect that an
external BO may be on the zombie list, and yet be resurrected before
it can be properly cleaned up.  In this case, we remove it from the
list so it won't be freed.

Fixes severe instability in Weston, which was hitting EINVALs and
ENOENTs from execbuf2, due to batches referring to a GEM object that
had been closed, or at least had its VMA torched.

Fixes: 457a55716e ("iris: Defer closing and freeing VMA until buffers are idle.")
2019-08-05 08:53:41 -07:00
Kenneth Graunke
48e5a99d86 iris/bufmgr: Move iris_bo_reference into hash_find_bo, rename it
Everybody importing an external buffer was looking it up in the hash
table, then referencing it.  We can just do that in the helper instead,
which also gives us a convenient spot to stash extra code shortly.
2019-08-05 08:53:07 -07:00
Ahmad Fatoum
4f75ea57c2 gallium: add stm DRM entry point
The STM32MP157 features a Vivante GC400 GPU supported by etnaviv.
Add a DRM entry point for the STM display controller, so mesa
can be used with it.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-05 14:53:31 +00:00
Eric Engestrom
c251e2e662 gitlab-ci: don't remove a package we don't install anymore
Fixes: 85dace1c0b ("gitlab-ci: remove software-properties-common")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-08-05 15:43:26 +01:00
Andrii Simiklit
dc471f2ef8 etnaviv: fix a null pointer dereference
This issue was found by cppcheck

Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2019-08-05 15:31:43 +02:00
Connor Abbott
74470baebb ac/nir: Lower large indirect variables to scratch
results from radeonsi NIR:

Totals from affected shaders:
SGPRS: 704 -> 464 (-34.09 %)
VGPRS: 2056 -> 672 (-67.32 %)
Spilled SGPRs: 24 -> 0 (-100.00 %)
Spilled VGPRs: 28406 -> 0 (-100.00 %)
Private memory VGPRs: 0 -> 3182 (0.00 %)
Scratch size: 1064 -> 3228 (203.38 %) dwords per thread
Code Size: 935260 -> 40180 (-95.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 28 -> 70 (150.00 %)
Wait states: 0 -> 0 (0.00 %)

results from radv:

Totals from affected shaders:
SGPRS: 80 -> 48 (-40.00 %)
VGPRS: 204 -> 108 (-47.06 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 256 (0.00 %) dwords per thread
Code Size: 15792 -> 9504 (-39.82 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1 -> 2 (100.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-05 11:45:18 +02:00
Timothy Arceri
3c9144f9e5 drirc: Add discard workaround for Divinity: Original Sin EE
This adds an additional work around for the game to fix the blocky
shadows as reported in bug 105282

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105282
2019-08-05 15:35:00 +10:00
Erico Nunes
486b33558a lima/ppir: simplify load uni/temp op lowering and scheduling
The load uniform/temporary operations output only to a pipeline
register, which must be consumed by another op in the same instruction
later.
The current implementation delays the decision of who will consume this
result to until the scheduling step. If the consumer node is not able to
use the pipeline register, a mov node may have to be created, during the
scheduler step.

As part of the ppir scheduler simplification, and now that the ppir
scheduler supports pipeline register dependencies, this can be
simplified by always creating a single mov node outputting to a normal
register that can be used directly by all consumers.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-04 13:38:19 +02:00
Erico Nunes
fd29c4d6c5 lima/ppir: simplify select op lowering and scheduling
The select operation relies on the select condition coming from the
result of the the alu scalar mult slot, in the same instruction.
The current implementation creates a mov node to be the predecessor of
select, and then relies on an exception during scheduling to ensure that
both ops are inserted in the same instruction.

Now that the ppir scheduler supports pipeline register dependencies,
this can be simplified by making the mov explicitly output to the fmul
pipeline register, and the scheduler can place it without an exception.

Since the select condition can only be placed in the scalar mult slot,
differently than a regular mov, define a separate op for it.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-04 13:38:18 +02:00
Erico Nunes
eb82637c2f lima/ppir: support pipeline registers in scheduler
The ppir scheduler grew to be rather complicated and containing many
exceptions as it also has to take care of inserting additional nodes
when it is mandatory for nodes to be in the same instruction.
As such, the lima lowering and scheduling process can be difficult to
understand and maintain.
The ppir lowering step created nodes hoping that the scheduler would
notice the exception and do the right thing.

This proposal adds a simple refactor to the scheduler so that it places
nodes with pipeline registers in the same instruction.
With the scheduler handling this in a general way, it is possible to
create same-instruction dependencies by using pipeline registers during
the lowering stage.
This is simpler to maintain because now we can make these dependencies
explicit in a single place (lowering), and we can drop exceptions from
scheduling.
Reducing the complexity of the scheduler is also useful as preparatory
work to support control flow in ppir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-04 13:38:11 +02:00
Eric Engestrom
a1da8eccbe docs: fix "empty array" meson syntax
On recent versions of Meson (0.47+) these are synonymous, but we still
support older versions than that, so let's use the correct syntax to
avoid confusing users of old Meson versions.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-04 12:21:19 +01:00
Eric Engestrom
1361ab3c82 egl: drop unnecessary function deref
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2019-08-04 11:26:20 +01:00
Eric Engestrom
e7e3fd5c03 glx: drop unnecessary pointer deref for function calls
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2019-08-04 11:26:20 +01:00
Eric Engestrom
9668d7f539 introduce c11_compat.h to provide C11 things in C99
Right now, all it does is provide the new standard `static_assert()` name.

Fixes: fbf7c38da3 ("egl/wayland: use bitset.h for `formats` bit set")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Bhushan Shah <bshah@kde.org>
2019-08-04 11:14:25 +01:00
Eric Engestrom
64ffc289be travis: add MacOS Scons build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-08-04 11:11:32 +01:00
Eric Engestrom
8f1cdac793 symbols-check: fix nm invocation on MacOS
According to Mac OSX's man page [1], this is how we should get the list
of exported symbols:
  nm -g -P foo.dylib

-g to only show the exported symbols
-P to show it in a "portable" format, ie. readable by a script

Since this is supported by GNU nm as well, let's use that everywhere,
although some care needs to be taken as there are some differences in
the output.

[1] https://www.unix.com/man-page/osx/1/nm/

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-04 11:06:27 +01:00
Eric Engestrom
59f8809f3c symbols-check: discard platform symbols early
(as the comment there already claimed)

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-04 11:06:27 +01:00
Eric Engestrom
81b3d141b3 symbols-check: skip test if we can't get the symbols list
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-04 11:06:27 +01:00
Vasily Khoruzhick
c780af7771 lima/ppir: move alu vec to scalar lowering into NIR
Utgard PP is vec4, but some operations are scalar, utilize
NIR vec to scalar lowering pass and indicate operations that we
want to lower.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-04 02:17:12 +00:00
Jason Ekstrand
aebca3961b iris: Fix handling of SIMD32 fragment shaders
The brw_wm_prog_data_dispatch_grf_start_reg and _prog_offset helpers
read the _NPixelDispatchEnable fields from 3DSTATE_PS to figure out
which bits to pull out of the prog data and stuff where.  Therefore,
they need to be called with the final set of _NPixelDispatchEnable bits
after we've done the workaround for SIMD32 and 16x MSAA.  Otherwise, if
you end up with a somewhat odd combination of enables, the GRF start reg
and KSP data ends up in the wrong slots.  In particular, running
SIMD32-only is broken but several other combinations are as well.

Fixes: 5445c176e2 "iris: Disable SIMD32 when using a 16x MSAA..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-03 22:24:40 +00:00
Bas Nieuwenhuizen
9f37c9903b mesa: Rename GLX_USE_TLS to USE_ELF_TLS.
These days it is not GLX only and it does not work with all TLS
implementations.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-03 20:18:17 +02:00
Bas Nieuwenhuizen
d7ca1efc6c meson: Do not use GLX_USE_TLS on Android.
The asm code expects a specific kind of implementation, but Android
uses something different (emutls).

Turns out mesa has a fallback with pthread_getspecific, with an
optimizaiton if only a single thread is used. emutls also uses
getspecific, so lets just use the optimized mesa implementation.

Fixes: 20294dceeb "mesa: Enable asm unconditionally, now that gen_matypes is gone."
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-03 18:40:04 +02:00
Christian Gmeiner
2dd598c129 etnaviv: s/boolean/bool
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <philipp.zabel@gmail.com>
2019-08-03 12:32:28 +02:00
Andreas Baierl
5254e53deb lima/ppir: Add gl_FrontFace handling
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-08-03 08:04:12 +00:00
Jason Ekstrand
b62b0cfa71 intel/nir: Add 1-bit opcodes to brw_cmod_for_nir_comparison_op
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-03 00:35:48 +00:00
Jason Ekstrand
c02c3ff612 intel/nir: Add a common nir comparison -> cmod helper
We already had one in the vec4 code, we just had move it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-03 00:35:48 +00:00
Eric Engestrom
2fd30e3722 util: fix pointer type on NetBSD
NetBSD expects a `void *` argument [1] as the printf-style arguments to
the formatting string, so we need to cast the `const` away.

[1] https://netbsd.gw.com/cgi-bin/man-cgi?pthread_setname_np++NetBSD-current

Suggested-by: Kamil Rytarowski <n54@gmx.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-03 00:20:21 +00:00
Eric Engestrom
b558fa4dfe meson: remove unused field
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Eric Engestrom
9a07606b84 meson: replace last uses of libxmlconfig with idep_xmlconfig
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Eric Engestrom
178811d8f6 meson: drop unused dep_{thread,dl}
Unused as of last commit.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Eric Engestrom
d2d85b950d meson: replace libmesa_util with idep_mesautil
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.

Next commit will remove the ones that were only added for that reason.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Alyssa Rosenzweig
8ddb38209d pan/midgard: Print texture outmod
I have no idea who thought this was a good idea.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 16:54:53 -07:00
Alyssa Rosenzweig
ad864a0bbb pan/midgard: Promote all 16 uniforms
Now that register spilling is in place, this is reasonable. It turns out
for some shaders, it's actually better to cap at 8 work registers and
extra >8 uniform reigsters and tolerate the spilling, since the extra
resulting threads make up for the spillage. So incidentally, the shader
that spills here is in -bterrain, which jumps from 19fps to 21fps as a
result of this change.

total instructions in shared programs: 3513 -> 3448 (-1.85%)
instructions in affected programs: 776 -> 711 (-8.38%)
helped: 20
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 3.25 x̃: 2
helped stats (rel) min: 3.57% max: 16.00% x̄: 8.37% x̃: 7.19%
95% mean confidence interval for instructions value: -4.28 -2.22
95% mean confidence interval for instructions %-change: -10.02% -6.73%
Instructions are helped.

total bundles in shared programs: 2067 -> 2024 (-2.08%)
bundles in affected programs: 515 -> 472 (-8.35%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 6 x̄: 2.37 x̃: 2
helped stats (rel) min: 2.13% max: 17.86% x̄: 10.19% x̃: 11.11%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23%
95% mean confidence interval for bundles value: -3.01 -1.29
95% mean confidence interval for bundles %-change: -12.13% -6.91%
Bundles are helped.

total quadwords in shared programs: 3468 -> 3426 (-1.21%)
quadwords in affected programs: 764 -> 722 (-5.50%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 5 x̄: 2.26 x̃: 2
helped stats (rel) min: 1.41% max: 12.50% x̄: 6.76% x̃: 7.14%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.08% max: 1.08% x̄: 1.08% x̃: 1.08%
95% mean confidence interval for quadwords value: -2.83 -1.37
95% mean confidence interval for quadwords %-change: -8.08% -4.65%
Quadwords are helped.

total registers in shared programs: 383 -> 360 (-6.01%)
registers in affected programs: 112 -> 89 (-20.54%)
helped: 19
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 1.21 x̃: 1
helped stats (rel) min: 12.50% max: 27.27% x̄: 20.63% x̃: 20.00%
95% mean confidence interval for registers value: -1.47 -0.95
95% mean confidence interval for registers %-change: -22.39% -18.87%
Registers are helped.

total threads in shared programs: 432 -> 451 (4.40%)
threads in affected programs: 19 -> 38 (100.00%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.73 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.41 2.04
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are [helped].

total loops in shared programs: 4 -> 4 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 0 -> 4
spills in affected programs: 0 -> 4
helped: 0
HURT: 2

total fills in shared programs: 0 -> 7
fills in affected programs: 0 -> 7
helped: 0
HURT: 2

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 16:52:21 -07:00
Alyssa Rosenzweig
e94239b9a4 pan/midgard: Break mir_spill_register into its function
No functional changes, just breaks out a megamonster function and fixes
the indentation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 16:52:21 -07:00
Alyssa Rosenzweig
d4bcca19da pan/midgard: Switch sources to an array for trinary sources
We need three independent sources to support indirect SSBO writes (as
well as textures with both LOD/bias and offsets). Now is a good time to
make sources just an array so we don't have to rewrite a ton of code if
we ever needed a fourth source for some reason.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 16:48:54 -07:00
Alyssa Rosenzweig
513d02cfeb pan/midgard: Remove "r27-only" register class
As far as I know, there's no such thing as a load/store op that only
takes its argument in r27. We just need to set the appropriate arg_1
field in the RA to specify other registers if we want them.

To facilitate this, various RA-related changes are needed across the
compiler ; this should also fix indirect offsets which were implicitly
interpreted as "r27-only" despite not even passing through RA yet. One
ripple effect change is switching the move insertion point and adjusting
the liveness analysis accordingly, so while this was intended as a
purely functional change, there are some shader-db changes:

total instructions in shared programs: 3511 -> 3498 (-0.37%)
instructions in affected programs: 563 -> 550 (-2.31%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.93% max: 5.00% x̄: 2.58% x̃: 2.33%
95% mean confidence interval for instructions value: -1.27 -0.90
95% mean confidence interval for instructions %-change: -3.23% -1.93%
Instructions are helped.

total bundles in shared programs: 2067 -> 2067 (0.00%)
bundles in affected programs: 398 -> 398 (0.00%)
helped: 7
HURT: 4
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.54% max: 10.00% x̄: 5.04% x̃: 5.56%
HURT stats (abs)   min: 1 max: 2 x̄: 1.75 x̃: 2
HURT stats (rel)   min: 2.13% max: 4.26% x̄: 3.72% x̃: 4.26%
95% mean confidence interval for bundles value: -0.95 0.95
95% mean confidence interval for bundles %-change: -5.21% 1.50%
Inconclusive result (value mean confidence interval includes 0).

total quadwords in shared programs: 3464 -> 3454 (-0.29%)
quadwords in affected programs: 1199 -> 1189 (-0.83%)
helped: 18
HURT: 4
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.03% max: 5.26% x̄: 2.44% x̃: 1.79%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 2.56% max: 2.82% x̄: 2.63% x̃: 2.56%
95% mean confidence interval for quadwords value: -0.98 0.07
Inconclusive result (value mean confidence interval includes 0).

total registers in shared programs: 383 -> 373 (-2.61%)
registers in affected programs: 56 -> 46 (-17.86%)
helped: 12
HURT: 2
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 9.09% max: 33.33% x̄: 29.58% x̃: 33.33%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 20.00% max: 50.00% x̄: 35.00% x̃: 35.00%
95% mean confidence interval for registers value: -1.13 -0.29
95% mean confidence interval for registers %-change: -35.07% -5.63%
Registers are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig
5d9b7a8ddb pan/midgard: Handle get/set_swizzle for load/store arguments
Load/store's  main "argument 0" already has its swizzle handled
correctly (for stores, that is). But the tinier arguments, the compact
ones with a component select but not a full swizzle, those are not yet
handled. Let's do something about that!
2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig
9aeb726045 pan/midgard: Fix block successors
Rather than an ersatz thing that sort of looks like successors but is in
fact just the source order traversal with some backward jumps hacked in
for loops... construct an actual flow graph so we can do analysis
sanely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig
1a116037d8 pan/midgard: Add helper to pack load/store registers
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig
e112d9d333 pan/midgard: Decode register/component in load/store argument
3-bits out of 8 down!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig
5a572f4b55 pan/midgard: Fix REGISTER_OFFSET
r27 isn't the special one, usually.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig
c908772ee4 pan/midgard: Split ld/st unknown to arg_1/arg_2 fields
The 16-bit field can be decomposed to two independent 8-bit fields, each
representing a single (additional) argument to the load/store op,
generally used for encoding registers. Addressable registers here are
substantially limited compared to the main register in a load/store op.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 14:20:02 -07:00
Bas Nieuwenhuizen
2d54fdb563 radv: Expose VK_KHR_imageless_framebuffer.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 22:35:25 +02:00
Bas Nieuwenhuizen
9475782eac radv: Implement VK_KHR_imageless_framebuffer.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 22:35:19 +02:00
Bas Nieuwenhuizen
a7041f3b4e radv: Store image view also outside framebuffer.
So we can use it with imageless framebuffers.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 22:19:16 +02:00
Bas Nieuwenhuizen
49e6c2fb78 radv: Store color/depth surface info in attachment info instead of framebuffer.
That way we can use it for imageless framebuffers.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 22:18:51 +02:00
Alyssa Rosenzweig
cd98d94516 panfrost: Allocate polygon lists on-demand
Rather than alloacting a huge (64MB) polygon list on context creation
and sharing it across framebuffers, we instead allocate polygon lists as
BOs (which consistently hit the cache) sized appropriately; for about a
month, we've known how to calculate the polygon list size so this has
only recently become possible.

The good news is we can render to truly massive framebuffers without
crashing and, more importantly, we eliminate the 64MB upfront overhead.
If a list that size isn't actually needed, it's not allocated.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-08-02 21:54:58 +02:00
Boris Brezillon
ed501c00cb panfrost: Handle the bo == NULL case in panfrost_bo_[un]reference()
Allows us to pass BOs without checking if they're NULL or not.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 21:54:58 +02:00
Boris Brezillon
12f72175f3 panfrost: Get rid of the skippable param in attach_vt_framebuffer()
The only user of this function always passes true.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 21:54:58 +02:00
Boris Brezillon
8227d284f7 panfrost: Don't emit a new FB desc when setting a new FB state
The FB desc will be emitted/attached on the first draw targetting
this new FB.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 21:54:58 +02:00
Boris Brezillon
95507a3dd4 panfrost: Bail out early when doing a wallpaper blit
The wallpaper blit is a bit special in that the operation is targetting
the current FB, but the u_blitter logic creates a new surface for it
which makes util_framebuffer_state_equal() return false. In that case
we don't want a new FB descriptor to be emitted/attached, so let's just
copy the new state into ctx->pipe_framebuffer and exit the function.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 21:54:58 +02:00
Boris Brezillon
8645afce4c panfrost: Bail out early when new and current FB states are equal
If the current FB matches the new one there's nothing to be done in
panfrost_set_framebuffer_state(). By bailing out early in that case we
avoid emitting new FB descriptors (the old ones are still valid).

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 21:54:58 +02:00
Boris Brezillon
17d6ee2bd1 panfrost: Delay FB descriptor allocation
No need to emit SFBD/MFBD at frame invalidation. They can be emitted
when the framebuffer is attached, which saves us a potential FB desc
re-allocation if a new FB is bound after the swap.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 21:54:58 +02:00
Boris Brezillon
b5ca1e5458 panfrost: Remove job from ctx->jobs at submission time
This guarantees that new draws targetting the same framebuffer will
get a new job instance.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 21:54:58 +02:00
Boris Brezillon
20b00e1ff2 panfrost: Make ctx->job useful
ctx->job is supposed to serve as a cache to avoid an hash table lookup
everytime we access the job attached to the currently bound FB, except
it was never assigned to anything but NULL.

Fix that by adding the missing assignment in panfrost_get_job_for_fbo().
Also add a missing NULL assignment in the ->set_framebuffer_state()
path.

While at it, add extra assert()s to make sure ctx->job is consistent.

Fixes: 59c9623d0a ("panfrost: Import job data structures from v3d")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 21:52:56 +02:00
Bas Nieuwenhuizen
72e7b7a00b ac/nir,radv: Optimize bounds check for 64 bit CAS.
When the application does not ask for robust buffer access.

Only implemented the check in radv.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 21:21:55 +02:00
Roland Scheidegger
74baeacafc gallivm: fix issue with AtomicCmpXchg wrapper on llvm 3.5-3.8
These versions still need wrapper but already have both success and
failure ordering.
(Compile tested on llvm 3.3, 3.7, 3.8.)

v2: don't duplicate whole function (suggested by Brian).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111102

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-08-02 20:16:17 +02:00
Matt Turner
dcf9d91a80 util: Handle differences in pthread_setname_np
There are a lot of unfortunate differences in the implementation of this
function. NetBSD and Mac OS X in particular require different arguments.

https://stackoverflow.com/questions/2369738/how-to-set-the-name-of-a-thread-in-linux-pthreads/7989973#7989973
provides for a good overview of the differences.

Fixes: 9c411e020d ("util: Drop preprocessor guards for glibc-2.12")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111264
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
[Eric: use DETECT_OS_* instead of PIPE_OS_*]
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
55eadf971a util/os_time: use detect_os.h to uncouple from gallium
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
bffa23313a util/u_debug: use detect_os.h
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
7f12a66ad5 util/os_misc: use detect_os.h to start uncoupling from gallium
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
87adc898b3 util/os_memory: use detect_os.h to uncouple it from gallium
While at it, remove p_compiler.h as well as it is unused.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
9a5148190a gallium: deduplicate os detection logic by using detect_os.h
This allows us to avoid having to rename all the PIPE_OS_* at once while
still making sure PIPE_OS_* and DETECT_OS_* are always in sync.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
8c52bca112 gallium/utils: drop PIPE_SUBSYSTEM_WINDOWS_USER
This is basically just an alias for PIPE_OS_WINDOWS.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
e740e7a6f0 scons: rename PIPE_SUBSYSTEM_EMBEDDED to EMBEDDED_DEVICE
It has nothing to do with the PIPE_SUBSYSTEM_* stuff from gallium.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
8c63348c94 gallium: remove never-used PIPE_SUBSYSTEM_DRI
PIPE_SUBSYSTEM_DRI was introduced in dacfef1589 ("gallium: New
configuration header.") 11 years ago, and was never used.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
bfb70032d4 util: fix typo in comment
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Eric Engestrom
362e9d8682 util: introduce detect_os.h
Mostly copied from src/gallium/include/pipe/p_config.h, so I kept its
copyright and authorship.

Other than the obvious rename, the big difference is that these are
always defined, to be used as `#if DETECT_OS_LINUX`.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-02 18:38:52 +01:00
Rob Clark
9d5beab441 freedreno/batch: fix dependency loop detection
We can have a scenario like:

  A -> B
  A -> C -> B

When adding the A->C dependency, it doesn't really matter that C depends
on something that A depends on, that isn't a necessary condition for a
dependency loop.

Instead what we want to know is that nothing C depends on, directly or
indirectly, depends on A.  We can detect this by recursively OR'ing the
dependents_mask of C and all it's dependencies.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-02 10:24:14 -07:00
Rob Clark
e1790c532a freedreno/a6xx: add missing flush/invalidates for blit
Various things we were missing for multiple blits in a single batch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-02 10:24:14 -07:00
Rob Clark
d8379da19e freedreno/a6xx: skip tiles with no geometry
If no clear, and no geometry according to VSC_STATE[pipe] we can skip
the tile entirely.  If there is a fast-clear, we can't skip restore
(clear) or resolve IBs, but we can still skip draw IB.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark
de3e130fc9 freedreno/a6xx: VSC overflow detection/handling
Check VSC_SIZE/VSC_SIZE2 regs from cmdstream to detect overflow, and
skip use of VSC visibility stream when overflow is detected, to avoid
GPU hangs.  This is done w/ introduction of some CP_REG_TEST/
CP_COND_REG_EXEC packet pairs.

In addition, eventually (after a frame or two) detect the condition and
resize the VSC buffers until overflow no longer happens.

Note that this significantly reduces the initial size of the VSC
buffers, backing out a previous hack to make them 16x larger than
what should be typically required (the previous "solution" for
VSC overflow).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark
401f532bea freedreno/a6xx: remove USE/IGNORE_VISIBILITY draw patching
Seems this isn't needed anymore on a6xx to control whether visibility
stream is used.  And it would be hard to deal with if it was, for
disabling use of VSC stream in draw pass.  So just remove it and
simplify things.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark
146d6e6463 freedreno/a6xx: cleanup "blit_mem"
Rename to "control_mem", and switch to using a struct to manage the
layout, rather than just ad-hoc hard-coded offsets.

For recovering from VSC stream overflow, we'll need to add more, but
best to clean it up first.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark
1cbb7f7601 freedreno: refresh tile debug
Fix some #ifdef'd bitrot, and get rid of #ifdef so it doesn't bitrot
again.

And add a prints for per-tile state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark
44f3c1cf01 freedreno: update registers
Pull in some updates of VSC regs

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark
c179ded9cb freedreno/gmem: small cleanup
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark
e2bb3e84ab freedreno/drm: convert ring_pool to child_pool
Worth another couple percent at driver2

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-02 10:24:14 -07:00
Rob Clark
9ac23794c9 freedreno/drm: remove idx_lock
Since it ends up contended, it is a bit of a bottleneck for workloads
with high driver overhead.  Worth nearly +10% at gfxbench driver2.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-02 10:24:14 -07:00
Rob Clark
e439f63467 freedreno/batch: always update last_fence
Not all flush paths come thru fd_context_flush(), so we should also set
last_fence in the batch flush path.  This avoids some no-op flushes just
to get a fence.  For example when pctx->flush_resource() triggers a
flush.

We should probably keep the last_fence update in fd_context_flush() as
well to handle deferred flush case.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark
c93eae7f10 freedreno: drop unused fd_fence_ref param
The pscreen param was just there to satisfy pipe_screen::fence_reference

But some of the internal uses passed NULL for screen.  Which is a bit
ugly.  Instead drop the param and add a shim function to plug into the
screen.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Alyssa Rosenzweig
1637a53890 pan/midgard: Print invert modifier
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig
62a5ee3bb4 pan/midgard: Flip conditionals
We would like to flip ops to have a constant in the second place to
enable inlining of the constant.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig
d066ca3575 pan/midgard: Add bitwise src/invert fusing
De Morgan's Laws and some special ops basically.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig
620c2717cf pan/midgard: Add .not propagation pass
Essentially .pos propagation but for bitwise.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig
b821e1b85e pan/midgard: Fuse invert into bitwise ops
We use the new invert flag to produce ops like inand.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 09:57:15 -07:00
Jonathan Marek
d8584c5cf2 freedreno: a2xx: implement texture tiling
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek
fb5c3db0ab freedreno: a2xx: use nir_lower_alu_to_scalar instead of lowering pass
nir_lower_alu_to_scalar can now be used to only lower certain ops, so we
don't need the custom pass. And we can lower fall_equal/fany_nequal with
lower_vector_cmp instead.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek
e652ca4e0b freedreno: a2xx: fix HW binning for batches with >256K vertices
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek
257957b026 freedreno: a2xx: fix fneg/fabs/fsat opcodes
Previously we would get a fmov with modifiers, but now that mov has no type
these opcodes need to be supported.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek
43dbd7d603 freedreno: a2xx: fix order of NIR opts
int_to_float needs to come after bool_to_float, and lower_to_source_mods
needs to come after both, since they don't deal wih source mods.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek
57e980a4fb freedreno: a2xx: fix non-etc1 cubemaps
Not sure how this happened, but apparently all cubemaps need swapped XY.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek
2e029acbe2 freedreno: a2xx: fix fast clear not being used for Z24X8 buffers
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Jonathan Marek
e25388c97b freedreno: align renderonly scanout buffers
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Eric Engestrom
6125c93e00 gitlab-ci: just build all the tools
This line was mistakenly added while there is already a `-D tools=all`
a few lines below.

Fixes: f60defa72d ("gitlab-ci: Add a shader-db run using v3d on drm-shim.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 16:41:19 +01:00
Sergii Romantsov
a86eccfb78 i965/clear: clear_value better precision
Test-case with depth-clear 0.5 and format
MESA_FORMAT_Z24_UNORM_X8_UINT fails due inconsistent
clear-value of 0.4999997.
Maybe its better to improve?

CC: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 0ae9ce0f29 (i965/clear: Quantize the depth clear value based on the format)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111113
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-02 14:25:34 +00:00
Samuel Pitoiset
e8110e51c6 radv: fix image_has_{cmask,fmask}() helpers
The driver should now rely on cmask_offset because CMASK can be
disabled by the driver for some reasons (eg. mipmaps). Apply the
same change for FMASK, although it should be useless.

Fixes: ad1bc8621d ("radv: remove radv_get_image_fmask_info()")
Fixes: 10d08da52c ("radv/gfx10: add missing dcc_tile_swizzle tweak")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 14:00:50 +02:00
Samuel Pitoiset
ad1bc8621d radv: remove radv_get_image_fmask_info()
It's unnecessary to duplicate fields in another struct.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:46 +02:00
Samuel Pitoiset
10d08da52c radv/gfx10: add missing dcc_tile_swizzle tweak
Fixes: c90f46700d ("radv/gfx10: mask DCC tile swizzle by alignment")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:43 +02:00
Samuel Pitoiset
9c9745e8dd radv: remove radv_get_image_cmask_info()
It's unnecessary to duplicate fields in another struct.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:41 +02:00
Samuel Pitoiset
856487a280 radv: only account for tile_swizzle for color surfaces with DCC
It's 0 for depth surfaces with TC compat HTILE enabled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 13:34:39 +02:00
Bas Nieuwenhuizen
e1c5d8a364 radv: Enable VK_KHR_shader_atomic_int64
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 12:26:32 +02:00
Bas Nieuwenhuizen
a17f2206d3 ac/nir: Implement LLVM9 64-bit buffer compare & exchange.
LLVM 9 does not have a 64-bit buffer compswap intrinsic, so this
extracts the ptr, does a bound check and then uses a cmpxchg LLVM
instruction.

Not ideal, but the earliest release we're going to get a proper
intrinsic is LLVM 10.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 12:26:11 +02:00
Connor Abbott
73274c9ec2 Revert "ac/nir: handle negate modifier"
This reverts commit bfea7e4d29.
2019-08-02 11:14:50 +02:00
Connor Abbott
4a382d66ee Revert "ac/nir: handle abs modifier"
This reverts commit d3c80733cd.

These were only appearing due to memory corruption.
2019-08-02 11:14:08 +02:00
Timothy Arceri
06ec14d692 iris: bump compat profile support to 4.6
All of the current piglit compat profile tests pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-02 18:56:53 +10:00
Timothy Arceri
74f96b06d6 egl: fix OpenGL 3.1 context creation
>From the EGL_KHR_create_context spec:

   "* If OpenGL 3.1 is requested, the context returned may implement
       any of the following versions:

         * Version 3.1. The GL_ARB_compatibility extension may or may
           not be implemented, as determined by the implementation.
         * The core profile of version 3.2 or greater."

Fixes CTS tests:

    dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgb888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_no_depth_no_stencil
    dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_stencil
    dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_stencil

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-02 18:56:53 +10:00
Connor Abbott
f41516bdb5 nir/find_array_copies: Reject copies with mismatched type
When we detect a scalar/vector copy through load_deref/store_deref, we
have to be careful since those can bitcast an int to a float and
vice-versa even though copy_deref can't.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251
Fixes: 156306e5e6 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-02 10:34:29 +02:00
Samuel Pitoiset
7368000868 radv: re-apply "Optimize rebinding the same descriptor set."
This makes it cheaper to just change the dynamic offsets with
the same descriptor sets.

This optimization has been reverted a while back because of
random GPU hangs on GFX9, no it looks fine, at least CTS no longer
hangs on GFX9 and it doesn't hang on GFX10 as well.

It fixes a performance problem with Wolfenstein Youngblood.

Suggested-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2019-08-02 09:56:55 +02:00
Samuel Pitoiset
96a5445559 radv/gfx10: use the correct target machine for Wave32
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:38 +02:00
Samuel Pitoiset
8a86908e9a radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders
It can be enabled with RADV_PERFTEST=gewave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:36 +02:00
Samuel Pitoiset
953bbacc23 radv/gfx10: add Wave32 support for fragment shaders
It can be enabled with RADV_PERFTEST=pswave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:34 +02:00
Kenneth Graunke
18c2e09dc7 gallium: Implement GL_EXT_shader_samples_identical via a new capability
This exposes the textureSamplesIdenticalEXT function in GLSL.

We enable it for iris and radeonsi, because their compilers already
have support for this.  Tested on Intel Kabylake and AMD Vega 64.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 23:38:54 -07:00
Kenneth Graunke
adcc0a8fdc intel/tools: Fix aubinator_viewer build.
This functions was recently renamed and not all callers were updated.

Fixes: 086c486a75 ("intel/device: rename gen_get_device_info")
2019-08-01 23:36:41 -07:00
Francisco Jerez
54fbc625ea intel/ir: Fix CFG corruption in opt_predicated_break().
Specifically the optimization of a conditional BREAK + WHILE sequence
into a conditional WHILE seems pretty broken.  The list of successors
of "earlier_block" (where the conditional BREAK was found) is emptied
and then re-created with the same edges for no apparent reason.  On
top of that the list of predecessors of the block immediately after
the WHILE loop is emptied, but only one of the original edges will be
added back, which means that potentially several blocks that still
have it on their list of successors won't be on its list of
predecessors anymore, causing all sorts of hilarity due to the
inconsistency in the control flow graph.

The solution is to remove the code that's removing valid edges from
the CFG.  cfg_t::remove_block() will already clean up after itself.
The assert in bblock_t::combine_with() also needs to be removed since
we will be merging a block with multiple children into the first one
of them.

Found the issue on a hardware enabling branch originally, but
apparently somebody reproduced the same problem independently on
master in the meantime.

Fixes: d13bcdb3a9 ("i965/fs: Extend predicated break pass to predicate WHILE.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111009
Cc: jiradet.jd@gmail.com
Cc: Sergii Romantsov <sergii.romantsov@globallogic.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Paul Chelombitko <qamonstergl@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-08-01 16:56:48 -07:00
Mark Janes
ddb59cd20e intel/device: make internal functions private
The device info initializer makes several fuctions internal:

  - handling of device override
  - updating topology from kernel information

The implementation file is slightly reordered due to the renamed
functions being static.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:40:03 -07:00
Mark Janes
086c486a75 intel/device: rename gen_get_device_info
Rename the original device info initialization routine so callers
don't mistakenly call the wrong one:

  gen_get_device_info_from_fd:

      Queries kernel for full device info, including topology
      details.

  gen_get_device_info_from_pci_id:

      Partially initializes device info based on PCI ID lookup, when
      the kernel is not available.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:56 -07:00
Mark Janes
d594d2a052 intel/tools: use device info initializer
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:54 -07:00
Mark Janes
e4a0070db4 anv: use initialization routine for gen_device_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:51 -07:00
Mark Janes
49465f1330 iris/screen: use initialization routine for gen_device_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:48 -07:00
Mark Janes
96e1c945f2 i965: Move device info initialization to common code
With perf queries, initializing the device info is much more complex
than just getting a PCI ID and calling gen_get_device_info.  This commit
adds a new gen_get_device_info_from_fd helper in common code which does
all of the requisite kernel queries to get device info including all of
the topology information.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:44 -07:00
Mark Janes
1186f6ea69 i965/perf: verify kernel support before registering OA metrics
When gen_device_info updates the topology in it's initializer, the
kernel queries will fail silently.  Iris and anv have minimum
kernel requirements that support the queries.  i965 must verify kernel
support before reporting OA metrics.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:39:41 -07:00
Mark Janes
7852fe5415 intel/common: provide common ioctl routine
i965 links against libdrm for drmIoctl, but anv and iris both
re-implement this routine to avoid the dependency.

intel/dev also needs an ioctl wrapper, so lets share the same
implementation everywhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-01 16:38:40 -07:00
Alyssa Rosenzweig
b40ba2db6c panfrost: Remove unused argument
A relic from when we didn't have an online compiler, hah.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
ff345d4a01 panfrost: Handle MESA_SHADER_COMPUTE in compile callback
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
73c40d6bbb pan/midgard: Use standard list traversal to find initial tag
Fixes a hang (and abort) on empty shaders, which you shouldn't have
anyway but better safe than sorry. DCE going on the fritz is no reason
to freeze the system.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
4647999327 panfrost: Use gl_shader_stage directly for compiles
No need to add a third set of enums to the mix.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
d9eb65c60c panfrost: Emit "draw" info for compute jobs
Important fields relating to shader state and UBOs are filled out from
this (misnomer) function.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
22a8f6de61 panfrost: Feed compute shaders into the compiler
The path for compute shader compiles resembles the graphic shader
compile path, although it is substantially simpler as we don't need any
shader keying.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
1b284628ef panfrost: Expose compute shaders as panfrost_shader_variants
Whether variants are packed by graphics or compute is irrelevant.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
8b53230d47 panfrost: Remove shader state *base
It is now unused.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
c228046b4b panfrost: Remove CSO dependency from shader_compile
We want this routine to be generic across graphics and compute, so let
the caller deal with the typing.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
428bed3bde panfrost: Generalize UBO upload for other shader stages
Now that everything is unified, this generalization is nice and easy.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
a34370e855 panfrost: Guard vertex upload by ctx->vertex != NULL
This is irrelevant for graphics but matters for compute workloads.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
3bfdb878aa panfrost: Generalize vertex shader upload
This allows us to reuse the same code path for compute.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
3b7224190e panfrost: Share gl_enables between VERTEX/COMPUTE
Catch-all for magic bits.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
871c02b12e panfrost: Invoke compute shader according to grid info
We already have helpers for packing invocations (due to its role in
instanced vertex shaders), so we can reuse this drop in for compute
shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
748ccbc808 panfrost: Explain and include compute FBD
Squint at it hard enough and you realize it's the beginning of an
SFBD... I guess...

A compute shader with register spilling would be able to confirm this,
but we would expect to see the first field | 1 and an address splattered
later, setting up TLS.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
3113be3127 panfrost: Unify-driven cleanup
Again, now that stages are unified some logic goes away.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
ac6aa93f9e panfrost: Unify ctx->vs and ctx->fs
It's a little verbose, but this way we can support other shader stages
without too much contortion.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig
4b93152c29 panfrost: Flesh out launch_grid stub
It's still incomplette, but we're able to hook into launch_grid to
create a stub COMPUTE job.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:02 -07:00
Alyssa Rosenzweig
cd1be4605c panfrost: Cleanup via payload unification
Since these are now indexable, quite a bit of code cleans up.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:02 -07:00
Alyssa Rosenzweig
0da52015a1 panfrost: Unify payload_vertex/payload_tiler
Rather than disparate variables, let's use an array of payloads indexed
by the shader stage.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:02 -07:00
Alyssa Rosenzweig
902115f94f panfrost: Only wallpaper if we drew something
last_tiler.gpu may be NULL at flush time despite no clear and existing
jobs -- if we executed a compute-only workload.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:02 -07:00
Alyssa Rosenzweig
2d86828243 panfrost: Adjust shader CAPs to expose dEQP compute
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig
39fe9f5e2f panfrost: Expose NIR as our PIPE_SHADER_CAP_SUPPORTED_IRS
We *could* expose TGSI as well -- we pipe it through tgsi_to_nir for
Gallium-internal shaders anyway -- but we'd rather not.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig
1697760e05 panfrost: Copy freedreno's panfrost_get_compute_param
Values reported here aren't remotely correct, but it's a start to just
get the entrypoint stubbed out.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig
c8bc664447 panfrost: Expose COMPUTE-related caps for GLES3.1
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig
5a8b83ca0b panfrost: Stub out launch_grid
Just dumps some information about the invocation for later debug.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig
a8fc40aaf5 panfrost: Stub out compute CSO
Doesn't do anything, just gets the functions there.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig
e913986868 panfrost: Implement gl_FrontFacing
Interestingly, this requires no compiler changes. It's just exposed as a
special varying.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:15:03 -07:00
Alyssa Rosenzweig
f3e15122d4 panfrost: Add support for decoding gl_FrontFacing
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:15:03 -07:00
Alyssa Rosenzweig
9e66ff3ea9 pan/decode: Use max varying index as varying buffer count
This allows us to decode asymmetric varyings correctly, which occurs
with e.g. gl_FrontFacing.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-01 16:15:03 -07:00
Timothy Arceri
2afedfaf9a iris: add support for gl_ClipVertex in tess eval shaders
Required for OpenGL compat support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-01 16:12:37 -07:00
Timothy Arceri
00b5bf2d72 iris: add support for gl_ClipVertex in geometry shaders
This will enable us to support the OpenGL compat profile.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-01 16:12:27 -07:00
Jason Ekstrand
70dc017aec nir: Stop whacking gl_FrontFacing to a system value
We have a cap bit for gallium and a GLSL compiler flag to control this.
Just trust what GLSL gives us and stop forcing it.  In order for this to
be safe, we have to advertise another cap in some of the gallium
drivers.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-01 21:59:37 +00:00
Alyssa Rosenzweig
4e736b88f3 panfrost: Implement panfrost_set_shader_buffers callback
Just copy over the passed SSBO for now.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-01 14:32:08 -07:00
Alyssa Rosenzweig
898a18ea89 gallium/util: Add util_set_shader_buffers_mask helper
Conceptually follows util_set_vertex_buffers_mask but for SSBOs.

v2: Fix missing ~ when clearing mask. Adjust mask behaviour to match
freedreno/v3d when buffer == NULL.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-01 14:31:56 -07:00
Jonathan Marek
3e33173200 kmsro: move entry points from etnaviv to kmsro
These drivers are kmsro drivers so they should be part of the kmsro #if

This fixes missing imx_drm driver when building with only freedreno+kmsro

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-01 16:31:51 -04:00
Emil Velikov
85dace1c0b gitlab-ci: remove software-properties-common
Currently we use the python package to manage repositories. At the same
time we also do that by hand - since it's a trivial echo to a file.

Stay consistent, remove the package and manage things manually.

Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-08-01 16:16:15 +00:00
Brian Paul
3307c85a7d st/mesa: fix MSVC compile breakage
Trivial.
2019-08-01 09:07:21 -06:00
Gert Wollny
9de00e74fe virgl: Enable depth_clamp by lowering if the host is new enough.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 05:58:53 +00:00
Gert Wollny
b2e92c45ce gallium: Make PIPE_CAP_DEPTH_CLIP_DISABLE a tri-state value and use it
Use value "2" to signal that lowering is needed and supported and enable
it accordingly.

v2: - Note in CAP description that this lowering currently requires TGSI
    - use "true" instead of GL_TRUE (both Erik)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 05:58:53 +00:00
Gert Wollny
616f320745 mesa/st: Signal state changes when depth_clamp is emulated
v1 implemented by Erik Faye-Lund <erik.faye-lund@collabora.com>
v2: - Add GS and TES
    - fix constants state update flags (Erik)
v3: don't update rasterizer when depth_clamp is lowered (Erik)
v4: Correct NewDepthClamp and also set flags for NewClipControl (Erik)
v5: Also set shader_has_one_variant property acording to possible
   depth_clamp lowering (Marek)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 05:58:53 +00:00
Gert Wollny
d004fcc04a mesa/st: Add depth clamping to rasterizer code
implemented by Erik Faye-Lund <erik.faye-lund@collabora.com>

v2: Use current depth range values for clamping (Erik)
v3: fix scons-win64 build

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 05:58:53 +00:00
Gert Wollny
57361d89fa mesa/st: Tie depth_clamp code into other shaders (GS and TES)
v2: Use file scope defined depth_range_state  in common
v3: - don't use the one_shader_variant property, as this is
      not correct (Marek)
    - also use tests on available shader stages to enable
      depth_clamp lowering
v4: Don't use key.st, use st directly (Marek)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 05:58:53 +00:00
Gert Wollny
d81ba38b02 mesa/st: Tie depth_clamp lowering into the FS
v1 implemented by Erik Faye-Lund <erik.faye-lund@collabora.com>
v2: Use different call for FS
v3: Use file scope defined depth_range_state

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 05:58:53 +00:00
Gert Wollny
fefb152067 mesa/st: Tie depth clamp lowering in to the VP code
v1: implemented by Erik Faye-Lund <erik.faye-lund@collabora.com>
v2: Add handling of the ARB_clip_control depth mode
v3: Move depth_range_state to file scope and remove training zeros (Erik)
v4: - don't use the one_shader_variant property, as this is not correct (Marek)
    - also use tests on available shader stages to enable depth_clamp lowering
V5: Don't use key.st, use st directly (Marek)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 05:58:53 +00:00
Erik Faye-Lund
b048d8bf8f mesa/st: add tgsi-lowering code for depth-clamp
This is a TGSI pass that lowers depth-clamping into shader-operations,
by replacing the depth-value with 0 (a z-coordinate of zero will always
pass the OpenGL depth test conditions), and using a dedicated varying to
interpolate the real depth-value instead. Finally we replace the
depth-output in the fragment shader.

v1 implemented by Erik Faye-Lund <erik.faye-lund@collabora.com>
v2: Add support for handling depth clip mode, and refactor code
v3: - Rename *_vs functions to *_last_vertex_stage (Erik)
    - Use 0.0 depth to avoid clipping (Erik)
v4: Fix inversion of bool value for clip control property

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 05:58:53 +00:00
Gert Wollny
78ba12f40f mesa/st: replace boolean declarations by bool
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-01 05:58:53 +00:00
Gert Wollny
7fb47195d8 Revert "softpipe: Don't draw when rasterizer_discard is set"
This was too aggressive and breaks TF (Ilia)

This reverts commit 4ee638cd78.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-08-01 05:57:41 +00:00
Eric Engestrom
a563bb9e28 docs: reword meson instructions
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-01 00:42:02 +01:00
Eric Engestrom
8a1e803643 travis: drop unnecessary Meson option for MacOS
Those are already their default values on MacOS.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-01 00:25:20 +01:00
Jason Ekstrand
b539157504 intel/vec4: Drop all of the 64-bit varying code
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:14:09 -05:00
Jason Ekstrand
d03ec807a4 intel/fs: Drop all of the 64-bit varying code
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:14:09 -05:00
Jason Ekstrand
942c759059 intel: Use NIR to lower 64-bit varying access
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:14:09 -05:00
Jason Ekstrand
078dcb7ccd nir/lower_io: Add an option to lower 64-bit varyings
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:14:09 -05:00
Jorge Natz
a63e82deb5 docs: Update Platforms and Drivers page with more comprehensive information.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-31 22:50:43 +00:00
Dave Airlie
7ad6ec80d9 nir: use common deref has indirect code in scratch lowering.
This doesn't seem to need it's own copy here.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-01 08:32:12 +10:00
Eric Engestrom
5d7bcac4e7 nir: remove explicit nir_intrinsic_index_flag values
These were left after a rebase and happen to make
NIR_INTRINSIC_SWIZZLE_MASK == NIR_INTRINSIC_SRC_ACCESS, which is how it
was noticed.

Fixes: 6f20643b47 ("nir: Allow qualifiers on copy_deref and image instructions")
Cc: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-31 23:28:20 +01:00
Yevhenii Kolesnikov
830a8e6c47 state_tracker: Free Labels for querry and tranform_feedback
Memory leaks were observed on iris with GL_KHR_debug.

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-31 22:16:42 +00:00
Kenneth Graunke
b61f17d362 iris: Skip emitting 3DSTATE_INDEX_BUFFER if possible
We were emitting 3DSTATE_INDEX_BUFFER on every indexed draw, even if
back-to-back draws referred to the same index buffer.  This improves
drawoverhead scores in the DrawElements cases by about 10%, by giving
us even more minimal batches.
2019-07-31 15:14:10 -07:00
Mike Blumenkrantz
8af1990ad7 st/dri: simplify dri_get_egl_image by reusing dri2_format_table
this makes dri2_get_mapping_by_fourcc accessible from dri_helpers.h
and does a direct lookup on the fourcc id to match the pipe format

v2 (Ken): Allow map to be NULL, use img->texture->format.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-31 15:11:15 -07:00
Erico Nunes
82bf5a8aac lima: enable lower_bitops in ppir
The mali pp doesn't support integers and some nir_algebraic
optimizations may result in ops that are not easily lowerable to floats,
so disable optimizations resulting in bitops.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-07-31 23:06:26 +02:00
Erico Nunes
b3676a6548 nir/algebraic: rename lower_bitshift to lower_bitops
Optimizations that insert bitshift or bitwise operations should not be
applied on GPUs that don't support integer operations.
The .lower_bitshift could be used to control the bitshift related ones,
but there was also one bitwise optimization uncovered.
Since only lima and freedreno use this option and the use case is that
no bit operations are wanted, let's rename it to .lower_bitops and use
it to control all bitops related optimizations.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-07-31 23:06:04 +02:00
Erico Nunes
99c956fb47 lima/ppir: lower fdot in nir_opt_algebraic
Now that we have fsum in nir, we can move fdot lowering there.
This helps reduce ppir complexity and enables the lowered ops to be part
of other nir optimizations in the optimization loop.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-31 21:35:58 +02:00
Erico Nunes
4a407df682 nir/algebraic: add new fsum ops and fdot lowering
The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can
be used to optimize fdot lowering. fsum2 is not implemented and can be
further lowered to an add with the vector components.
Currently lima ppir handles this lowering internally, however this
happens in a very late stage and requires a big chunk of code compared
to a nir_opt_algebraic lowering.
By having fsum in nir, we can reduce ppir complexity and enable the
lowered ops to be part of other nir optimizations in the optimization
loop.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-31 21:35:58 +02:00
Erico Nunes
7f8ff686b7 lima/ppir: refactor texture code to simplify scheduler
The 'varying fetch' pp instruction deals only with coordinates, and
'texture fetch' deals only with the sampler index.
Previously it was not possible to clearly map ppir_op_load_coords and
ppir_op_load_texture to pp instructions as the source coordinates were
kept in the ppir_op_load_texture node, making this harder to maintain.
The refactor is made with the attempt to clearly map ppir_op_load_coords
to the 'varying fetch' and ppir_op_load_texture to the 'texture fetch'.
The coordinates are still temporarily kept in the ppir_op_load_texture
node as nir has both sampler and coordinates in a single instruction and
it is only possible to output one ppir node during emit. But now after
lowering, the sources are transferred to the (always) created
ppir_op_load_coords node, and it should be possible to directly map them
to their pp instructions from there onwards.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-31 21:22:41 +02:00
Erico Nunes
d2901de09e lima/ppir: lower texture projection
Lower texture projection in ppir using nir_lower_tex and nir_lower_tex.
This will insert a mul with the coordinate division before the load
varying.

Even though the lima pp supports projection in the load varying
instruction while loading the coordinates (from a register or a
varying), it requires that both the coordinates and projector be
components in a single register.
nir currently handles them in separate ssa, and attempting to merge them
manually may end up in worse code than just doing the coordinate
division manually. So for now let's just lower the projection to add
support for it in lima.
In the future, an optimization pass may be implemented in lima to ensure
that both coords and projector come in the same register, then this
lowering may be disabled and in this case lima may use the built-in
projection and save the mul instruction from lowering.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-31 21:22:41 +02:00
Vinson Lee
412e1b51fe scons: Fix random_r check.
Fixes: 597bddad47 ("scons: Test for random_r()")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:23:55 +00:00
Kenneth Graunke
3f9012839e Revert "st/dri: simplify dri_get_egl_image by reusing dri2_format_table"
This reverts commit c47af8b95f.  It causes
dEQP-EGL regressions.  (I think there is an easy fix, but we'll have it
go through review again.)
2019-07-31 11:06:32 -07:00
Alyssa Rosenzweig
91c4acedaf pan/midgard: Don't special case inline_constant
Another constant source of bugs. Ain't that special.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 10:59:19 -07:00
Alyssa Rosenzweig
29416a8599 pan/midgard: De-special-case branching
It's not that special.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 10:59:18 -07:00
Alyssa Rosenzweig
3e47a1181b panfrost: Add MALI_SAMP_NORM_COORDS flag
Corresponds to the normalized coordinates? flag on images in OpenCL and
evidently also shows up in GL, so let's wire it in.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 10:56:11 -07:00
Alyssa Rosenzweig
cf6cad3922 panfrost: Simplify filter_mode definition
It's just a bit field containing some flags; there's no need for all the
macro magic.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 10:56:11 -07:00
Alyssa Rosenzweig
160795429d pan/midgard: Shrink "compute FBD"
We still don't know what it is, but from a newer trace we now know it's
half the size we thought it was.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 10:56:11 -07:00
Alyssa Rosenzweig
194b49ee28 panfrost: Flip texture/sampler fields
We had them backwards in both the command stream and the Midgard stack.
In OpenGL ES 2.0, they're always the same, but in Vulkan/later-GL/CL
they diverge so we can fix this.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 10:56:11 -07:00
Alyssa Rosenzweig
a692126c93 panfrost: Add MALI_ATTR_IMAGE value
Images are implemented (in part) as special attributes, so include
support for decoding this.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 10:56:11 -07:00
Mike Blumenkrantz
c47af8b95f st/dri: simplify dri_get_egl_image by reusing dri2_format_table
this makes dri2_get_mapping_by_fourcc accessible from dri_helpers.h
and does a direct lookup on the fourcc id to match the pipe format

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-31 09:50:06 -07:00
Mike Blumenkrantz
7404833c2e gallium: add handling for YUV planar surfaces
st/dri:
this adds a table (similar to the one in i965) which provides
mappings for turning various planar formats into multiple sampler views.
whereas only NV12 and IYUV were supported, now many more formats are
supported here:
* P0XX
* YUV4XX
* YVU4XX
* AYUV
* XYUV
* YUYV
* UYVY

the table is used directly to handle image creation, simplifying
a lot of code and resolving related TODO/FIXME items where workarounds were
previously in place to manage NV12 and IYUV formats exclusively

st/mesa:
the changes here relate to setting up samplers for the planar formats.
this requires:
* checking for driver support for all the sampler formats
* creating the samplers with the corresponding formats and swizzling
* running nir_lower_tex with the appropriate options to trigger the lowering
  for each plane->sampler

fixes kwg/mesa#36

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-31 09:50:06 -07:00
Mike Blumenkrantz
338a29b08f gallium: add AYUV and XYUV formats
this only adds the PIPE_FORMAT members, not any direct handling for them

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-31 09:50:06 -07:00
Alyssa Rosenzweig
7f75b2b5af pan/midgard: Simplify discard logic
The "branch offset" is, in fact, ignored.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 09:39:16 -07:00
Alyssa Rosenzweig
27524d1462 pan/midgard: Add units for more instructions
For everything but freduce, we have some sense of what units the
instruction takes.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 09:39:16 -07:00
Alyssa Rosenzweig
64235b1ecc pan/midgard: Fix ball/bany opcode table
This were seriously messed up beyond all recognition. How we're passing
shaders.random.* is a mystery.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 09:39:16 -07:00
Alyssa Rosenzweig
13ee87c8b9 pan/midgard: Document branch combination LUT
This took way longer to figure out than it should have..

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-31 09:39:16 -07:00
Kenneth Graunke
2037478702 st/mesa: Skip scissor rect updates when scissor is entirely disabled.
If any scissor rectangles are enabled, then we need to set proper
scissor rectangles for all viewports.  But if the scissor test is
entirely disabled, then we can skip updating any scissor rectangles.

Without this step, we were updating the scissor rectangles based on
the current framebuffer size.  So if an app rendered to a variety of
render targets at different sizes, with scissor test disabled each
time, we'd still be continually updating the scissor rectangles,
even though it's not necessary.

In Civilization VI, this drops us from 310-350 set_scissor_state
calls per frame to 0, as it doesn't appear to use scissor testing.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-31 08:33:50 -07:00
Emil Velikov
72b97ad9b2 egl/drm: ensure the backing gbm is set before using it
Currently, if we error out before gbm_dri is set (say due to a different
name of the backing GBM implementation, or otherwise) the tear down will
trigger a NULL ptr deref and crash out.

Move the gbm_dri initialization as early as possible.

v2: Drop check in dri2_teardowm_drm (Eric)

Reported-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-31 14:18:12 +01:00
Eric Engestrom
4bf7e7b170 docs: update required meson version
Fixes: f7b6a8d12f ("meson: bump required version to 0.46")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-31 11:50:39 +01:00
Samuel Pitoiset
c66021069e radv/gfx10: implement a GE bug workaround
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-31 12:14:29 +02:00
Samuel Pitoiset
9a3fc7b6fa radv/gfx10: remove an obsolete VGT_REUSE_OFF workaround
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-31 12:14:29 +02:00
Samuel Pitoiset
bb8f25233a radv/gfx10: disable LATE_ALLOC_GS on Navi14
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-31 12:14:29 +02:00
Samuel Pitoiset
e041a74588 radv/gfx10: implement a bug workaround for GE_PC_ALLOC
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-31 12:14:29 +02:00
Samuel Pitoiset
0e1724af61 radv/gfx10: implement a bug workaround for NGG -> legacy transitions
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-31 12:14:29 +02:00
Samuel Pitoiset
29cca5f381 radv: skip draw calls with 0-sized index buffers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-31 12:14:29 +02:00
Eric Engestrom
fed6aa2fec autotools: delete leftover script wrapper
Randomly came across this file, which was likely only used by autotools
to pass arguments to the test.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-31 10:16:30 +01:00
Eric Engestrom
53b98b0185 virgl: make use of local variable
Otherwise that variable is only used in an assert() and would need an
ASSERTED to avoid the warning.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
20c89b060f mesa: add an ASSERTED
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
bbeb507543 compiler/nir: add an ASSERTED
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
7e2fe85a40 intel: add a couple of ASSERTED
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
abc226cf41 tree-wide: replace MAYBE_UNUSED with ASSERTED
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
ab9c76769a r600: replace MAYBE_UNUSED with specific #ifdef
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
745bae40ad gallium/aux: replace MAYBE_UNUSED with UNUSED
MAYBE_UNUSED is going away, so let's replace legitimate uses of it with
UNUSED, which the former aliased to so far anyway.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
513e67d2e4 mesa: replace MAYBE_UNUSED with UNUSED
MAYBE_UNUSED is going away, so let's replace legitimate uses of it with
UNUSED, which the former aliased to so far anyway.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
c8a453a770 v3d: replace MAYBE_UNUSED with UNUSED
MAYBE_UNUSED is going away, so let's replace legitimate uses of it with
UNUSED, which the former aliased to so far anyway.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
d470f1acce v3d: drop incorrect MAYBE_UNUSED
While at it, use that `screen` variable everywhere.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
84b8a50540 st/tests: drop incorrect MAYBE_UNUSED
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
aed15fa799 radv: drop incorrect MAYBE_UNUSED
`compressed` is clearly always used on the line right after.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
21196ec927 r600: move variable to proper scope
It helps show when it's actually used.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
5febd4d575 compiler: replace MAYBE_UNUSED with UNUSED
MAYBE_UNUSED is going away, so let's replace legitimate uses of it with
UNUSED, which the former aliased to so far anyway.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
bac5760e7b mesa: drop MAYBE_UNUSED var
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
e1dd6c2575 anv: drop MAYBE_UNUSED var
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
644cca65d3 i965: drop unused MAYBE_UNUSED function
Added in 1b85c605a6 but never used.

Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
7a3fb14609 i965: replace MAYBE_UNUSED with GEN condition
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
eee70e09bf intel: replace MAYBE_UNUSED with UNUSED
MAYBE_UNUSED is going away, so let's replace legitimate uses of it with
UNUSED, which the former aliased to so far anyway.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
e775b938b2 intel: drop incorrect MAYBE_UNUSED
All these are actually always used.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Eric Engestrom
14be04fb2b egl: replace MAYBE_UNUSED with UNUSED
MAYBE_UNUSED is going away, so let's replace legitimate uses of it with
UNUSED, which the former aliased to so far anyway.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 09:41:05 +01:00
Samuel Pitoiset
ea38565011 radv/gfx10: add Wave32 support for compute shaders
It can be enabled with RADV_PERFTEST=cswave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-31 09:35:04 +02:00
Kenneth Graunke
3a22a8bf49 iris: Skip repeated depth buffer disables.
Often times, the depth buffer is entirely disabled, but color render
targets change.  For example, GenerateMipmaps will change the color
render target for each miplevel, but there is no depth buffer.

In the Civilization VI benchmark, this drops the median number of
3DSTATE_DEPTH_BUFFER etc. packets emitted per frame from 472 to 34.
2019-07-30 19:47:41 -07:00
Marek Olšák
665989d98b radeonsi: release NIR in the right place to fix crashes 2019-07-30 22:06:23 -04:00
Marek Olšák
9ac7d0a0e2 radeonsi: fix packing of key.mono.u.ps 2019-07-30 22:06:23 -04:00
Marek Olšák
033c39a660 ac/nir: fix incorrect Phis if callbacks use control flow inside control flow 2019-07-30 22:06:23 -04:00
Marek Olšák
d3c80733cd ac/nir: handle abs modifier 2019-07-30 22:06:23 -04:00
Marek Olšák
efe2d8c5f9 ac: fix a memory leak in the error path of ac_build_type_name_for_intr 2019-07-30 22:06:23 -04:00
Marek Olšák
f6eca14f1b ac: allow control flow statements in NIR callbacks
This fixes a crash when compiling geometry shaders on radeonsi.
2019-07-30 22:06:23 -04:00
Marek Olšák
bfea7e4d29 ac/nir: handle negate modifier 2019-07-30 22:06:23 -04:00
Marek Olšák
33a8eab7a9 radeonsi: don't use lp_build_if for the prim discard compute shader 2019-07-30 22:06:23 -04:00
Marek Olšák
5562b6b067 radeonsi: don't use lp_build_if for the wrapping if block in the VS prolog 2019-07-30 22:06:23 -04:00
Marek Olšák
0ef4c1c04d radeonsi: don't use lp_build_if for the wrapping if block in merged shaders 2019-07-30 22:06:23 -04:00
Marek Olšák
6ec7d603f5 radeonsi: don't use lp_build_if (in most common places) 2019-07-30 22:06:23 -04:00
Marek Olšák
3406a57ff3 radeonsi: don't use lp_build_alloca 2019-07-30 22:06:23 -04:00
Marek Olšák
9234275320 radeonsi/nir: implement FBFETCH for KHR_blend_equation_advanced 2019-07-30 22:06:23 -04:00
Marek Olšák
925161c84c radeonsi/nir: set input_interpolate_loc for color inputs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-07-30 22:06:23 -04:00
Marek Olšák
5787bbf90d radeonsi/nir: set tgsi_shader_info::num_memory_instructions 2019-07-30 22:06:23 -04:00
Marek Olšák
0993dbcbef radeonsi/nir: accurately set input_usage_mask for doubles (v2)
v2: fix doubles

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-30 22:06:23 -04:00
Marek Olšák
56e3c70b56 radeonsi/nir: accurately set output_usagemask (v2)
v2: fix doubles
2019-07-30 22:06:23 -04:00
Marek Olšák
37527f8a11 radeonsi/nir: accurately set reads_*_outputs for TCS 2019-07-30 22:06:23 -04:00
Marek Olšák
6697e42c3c radeonsi/nir: clean up gather_intrinsic_load_deref_input_info 2019-07-30 22:06:23 -04:00
Marek Olšák
5f16fdefdf radeonsi/nir: add an option to convert TGSI to NIR
Use at your own risk.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-30 22:06:23 -04:00
Marek Olšák
eb43559bb8 radeonsi/nir: clean up some nir_scan_shader code
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-30 22:06:23 -04:00
Marek Olšák
34dc6ed2a5 radeonsi/gfx10: disable DCC image stores
Uncompressed image stores are usually faster.

Also, the driver didn't set WRITE_COMPRESS_ENABLE, so I don't know
what the hw did for image stores.
2019-07-30 22:06:23 -04:00
Marek Olšák
17021efc74 radeonsi: adjust RB+ blend optimization settings
based on PAL
2019-07-30 22:06:23 -04:00
Marek Olšák
27ac9a3326 ac/surface: allow linear swizzle mode automatic selection on gfx9 & 10
let addrlib make the decision to get the same result as PAL.
2019-07-30 22:06:23 -04:00
Pierre-Eric Pelloux-Prayer
a0ac0e2653 mesa: add EXT_dsa indexed generic queries
Only GetPointerIndexedvEXT needs an implementation, the other functions are
aliases of existing functions.
2019-07-30 22:04:26 -04:00
Pierre-Eric Pelloux-Prayer
ef84d93f3d mesa: add EXT_dsa indexed texture commands functions
Added functions:
  - EnableClientStateIndexedEXT
  - DisableClientStateIndexedEXT
  - EnableClientStateiEXT
  - DisableClientStateiEXT

Implemented using the idiom provided by the spec:

        if (array == TEXTURE_COORD_ARRAY) {
          int savedClientActiveTexture;

          GetIntegerv(CLIENT_ACTIVE_TEXTURE, &savedClientActiveTexture);
          ClientActiveTexture(TEXTURE0+index);
          XXX(array);
          ClientActiveTexture(savedActiveTexture);
        } else {
          // Invalid enum
        }
2019-07-30 22:04:26 -04:00
Pierre-Eric Pelloux-Prayer
7534c536ca mesa: add EXT_dsa (Named)Framebuffer functions
These functions dont support display list as specified:

    Should the selector-free versions of various OpenGL 3.0 and
    EXT_framebuffer_object framebuffer object commands not be allowed
    in display lists [...]?

    RESOLVED:  Yes
2019-07-30 22:04:26 -04:00
Pierre-Eric Pelloux-Prayer
e26c6764f2 mesa: add EXT_dsa NamedBuffer functions 2019-07-30 22:04:26 -04:00
Jason Ekstrand
9265e9d11a i965/curbe: Look at SYSTEM_VALUE_FRAG_COORD instead of VARYING_SLOT_POS
When transitioning gl_FragCoord over to a system value, we missed one
instance of VARYING_SLOT_POS in i965.  As of this commit, i965 has no
references to VARYING_SLOT_POS.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111263
Fixes: 4bb6e6817e "intel: Use a system value for gl_FragCoord"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 19:21:09 -05:00
Jason Ekstrand
8fd2f2c276 intel/fs: Implement quad_swap_horizontal with a swizzle on gen7
This fixes dEQP-VK.subgroups.quad.compute.subgroupquadswaphorizontal_*
on all gen7 platforms.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-30 22:38:19 +00:00
Jason Ekstrand
499d760c6e intel/fs: Use ALIGN16 instructions for all derivatives on gen <= 7
The issue here was discovered by a set of Vulkan CTS tests:

    dEQP-VK.glsl.derivate.*.dynamic_*

These tests use ballot ops to construct a branch condition that takes
the same path for each 2x2 quad but may not be uniform across the whole
subgroup.  They then tests that derivatives work and give the correct
value even when executed inside such a branch.  Because the derivative
isn't executed in uniform control-flow and the values coming into the
derivative aren't smooth (or worse, linear), they nicely catch bugs that
aren't uncovered by simpler derivative tests.

Unfortunately, these tests require Vulkan and the equivalent GL test
would require the GL_ARB_shader_ballot extension which requires int64.
Because the requirements for these tests are so high, it's not easy to
test on older hardware and the bug is only proven to exist on gen7;
gen4-6 are a conjecture.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-30 22:38:19 +00:00
Eric Engestrom
bf8b5de6b9 scons+meson: suppress spammy build warning on MacOS
Originally introduced in c7f3657450 ("darwin: Suppress type
conversion warnings for GLhandleARB") to fix Bugzilla #66346 [1], this
workaround was never ported to Scons or Meson.

[1] https://bugs.freedesktop.org/66346

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-07-30 23:21:42 +01:00
Matt Turner
46a3ea06be i965/fs: Print the scheduler mode.
Line wrap some awfully long lines while we are here.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-30 14:35:43 -07:00
Matt Turner
dabb5d4bee i965/fs: Add a shader_stats struct.
It'll grow further, and we'd like to avoid adding an additional
parameter to fs_generator() for each new piece of data.

v2 (idr): Rebase on 17 months.  Track a visitor instead of a cfg.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-30 14:35:43 -07:00
Connor Abbott
11a49f289d lima/gp: Support exp2 and log2
log2 is tricky because there cannot be a move between complex1 and
postlog2. We can't guarantee that scheduling complex1 will succeed when
we schedule postlog2, so we try to schedule complex1 and if it fails we
back out by rewriting the postlog2 as a move and introducing a new
postlog2 so that we can try again later.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-30 23:01:15 +02:00
Connor Abbott
c2f48d8f32 lima/gpir: Always schedule complex2 and *_impl right after complex1
See https://gitlab.freedesktop.org/lima/mesa/issues/94 for the gory
details of why this is needed. For *_impl this is easy, since it never
increases register pressure and it goes in the complex slot hence it
never counts against max nodes. It's a bit more challenging for
complex2, since it does count against max nodes, so we need to change
the reservation logic to reserve an extra slot for complex2 when
scheduling complex1. This second part isn't strictly necessary yet, but
it will be for exp2.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-30 23:00:41 +02:00
Bas Nieuwenhuizen
2b53c49d2f radv: Fix descriptor set allocation failure.
Set all the handles to VK_NULL_HANDLE:

"If the creation of any of those descriptor sets fails, then the implementation
must destroy all successfully created descriptor set objects from this command,
set all entries of the pDescriptorSets array to VK_NULL_HANDLE and return the
error."

(Vulkan 1.1.117 Spec, section 13.2)

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-30 22:33:24 +02:00
Andres Rodriguez
2b71b4e793 radv: fix queries with WAIT_BIT returning VK_NOT_READY
When vkGetQueryPoolResults() is called with VK_QUERY_RESULT_WAIT_BIT
set, the driver is supposed to wait for the query to become available
before returning.

Currently, radv returns once the query is indeed ready, but it returns
VK_NOT_READY. It also fails to populate the results.

The problem is a missing volatile in the secondary check for query
availability. This patch removes the secondary check altogether since it
is redundant with the preceding loop.

This bug was found with an unreleased version of SteamVR.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-27 10:19:19 -04:00
Matt Turner
c9b86cf526 meson: Test for program_invocation_name
program_invocation_name and program_invocation_short_name are both GNU
extensions. I don't believe one can exist without the other, so only
check for program_invocation_name.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-30 11:49:09 -07:00
Matt Turner
597bddad47 scons: Test for random_r()
Suggested-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-30 11:49:09 -07:00
Matt Turner
c96407f37e meson: Test for random_r()
It's better to test for needed functions instead of using external
knowledge about presence in this or that C library.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-30 11:49:09 -07:00
Matt Turner
9cc4311d86 st/nine: Drop preprocessor guards for glibc-2.12
Same rationale as the previous patch, but additionally these checks just
seem entirely unnecessary. pthread_self() has been used in Mesa since at
least 1999.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-30 11:49:09 -07:00
Matt Turner
9c411e020d util: Drop preprocessor guards for glibc-2.12
glibc-2.12 was released in 2010. No one is building new Mesa against 9
year old glibc, and removing these checks allows the code to work on
other C libraries like musl.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-30 11:49:09 -07:00
Alyssa Rosenzweig
a3c59f9f00 pan/midgard: Nothing to see here, move along folks
Fixes: dee1e18fe4 ("pan/midgard: Cleanup ops table")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:49:13 -07:00
Lionel Landwerlin
7deb5ec0e8 spirv: don't discard access set by vtn_pointer_dereference
We can have a access flag already set here so just augment the
existing ones.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0fb61dfdeb ("spirv: propagate access qualifiers through ssa & pointer")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-30 17:43:59 +00:00
Sagar Ghuge
587a497529 iris: Enable EXT_texture_shadow_lod
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 10:42:20 -07:00
Sagar Ghuge
adb9e18348 gallium: Add PIPE_CAP_TEXTURE_SHADOW_LOD
v2: Line wrap to 80 char (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 10:42:20 -07:00
Sagar Ghuge
6e04bd5f13 i965: Enable EXT_texture_shadow_lod
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 10:42:20 -07:00
Paulo Zanoni
25b03526c4 glsl: Add builtin functions for EXT_texture_shadow_lod
With the help of Sagar, Ian and Ivan.

v2: Fix dependencies (Ian Romanick)

v3: 1) fix function name (Marek Olsak)
    2) Add check for extension enable (Marek Olsak)

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 10:42:20 -07:00
Paulo Zanoni
154c789ad5 glsl: Allow _textureCubeArrayShadow function to accept ir_texture_opcode
This will be used to support one of the function from
Ext_texture_shadow_lod specification.

With the help of Sagar, Ian and Ivan.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 10:42:20 -07:00
Paulo Zanoni
d80a74fb99 mesa: extension boilerplate for EXT_texture_shadow_lod
With the help of Sagar, Ian and Ivan.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-30 10:42:20 -07:00
Alyssa Rosenzweig
dee1e18fe4 pan/midgard: Cleanup ops table
Hopefully this should make a few ops make more sense. No functional
changes.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:35:22 -07:00
Alyssa Rosenzweig
834aeb1e52 pan/midgard: Extend copy-propagation to swizzles
We can compose them when we rewrite, which is.. more code.. but helps.

total instructions in shared programs: 3611 -> 3513 (-2.71%)
instructions in affected programs: 672 -> 574 (-14.58%)
helped: 11
HURT: 2
helped stats (abs) min: 2 max: 14 x̄: 9.09 x̃: 10
helped stats (rel) min: 5.71% max: 24.56% x̄: 17.99% x̃: 18.87%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.19% max: 2.08% x̄: 1.64% x̃: 1.64%
95% mean confidence interval for instructions value: -10.45 -4.62
95% mean confidence interval for instructions %-change: -20.07% -9.87%
Instructions are helped.

total bundles in shared programs: 2117 -> 2067 (-2.36%)
bundles in affected programs: 356 -> 306 (-14.04%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 4.55 x̃: 5
helped stats (rel) min: 4.55% max: 15.22% x̄: 13.63% x̃: 14.71%
95% mean confidence interval for bundles value: -5.64 -3.45
95% mean confidence interval for bundles %-change: -15.71% -11.55%
Bundles are helped.

total quadwords in shared programs: 3567 -> 3468 (-2.78%)
quadwords in affected programs: 695 -> 596 (-14.24%)
helped: 11
HURT: 1
helped stats (abs) min: 2 max: 14 x̄: 9.09 x̃: 10
helped stats (rel) min: 5.56% max: 21.88% x̄: 14.97% x̃: 15.15%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 2.38% max: 2.38% x̄: 2.38% x̃: 2.38%
95% mean confidence interval for quadwords value: -10.96 -5.54
95% mean confidence interval for quadwords %-change: -17.42% -9.63%
Quadwords are helped.

total registers in shared programs: 391 -> 383 (-2.05%)
registers in affected programs: 46 -> 38 (-17.39%)
helped: 9
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 10.00% max: 10.00% x̄: 10.00% x̃: 10.00%
95% mean confidence interval for registers value: -1.25 -0.35
95% mean confidence interval for registers %-change: -29.42% -13.58%
Registers are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:35:10 -07:00
Alyssa Rosenzweig
c45487b770 pan/midgard: Extract simple source mod check
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:35:09 -07:00
Alyssa Rosenzweig
2d2abb08d0 pan/midgard: Lower texr/texw mixed registers
Conceptually, r28-r29 (as used for reading) and r28-r29 (as used for
writing) aren't registers at all, merely push/pull arrangements. So you
can't feed a texture result back into itself without explicitly moving
in the middle.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:20 -07:00
Alyssa Rosenzweig
2b248af43e pan/midgard: Always set .cont for derivatives in loops
We need to keep the helper invocations alive.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig
8f887329c0 pan/midgard: Implement derivatives
Implement the fdd* and fdd* opcodes in the Midgard compiler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig
982134d22e pan/midgard: Compose original texture swizzle in RA
Used for lowering derivatives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig
79875a9a64 pan/midgard: Add new swizzles
Used for derivatives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig
81e7782e30 pan/midgard: Add OP_IS_DERIVATIVE helper
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig
ae6aea0d98 pan/midgard: Add make_compiler_temp_reg helper
Corrollary to make_compiler_temp (for SSA).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig
30b15a830a pan/midgard: Move nir_*_src_index to compiler.h
These helpers are useful for code emission everywhere. Share the love!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig
c9498b3c5e pan/midgard: Disassemble unknown texture ops as hex
I'm not sure why I ever thought decimal was a good idea.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig
0714481894 pan/midgard: Add support for disassembling derivatives
They're just texture ops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-30 10:01:19 -07:00
Connor Abbott
a094928abc nir/find_array_copies: Use correct parent array length
instr->type is the type of the array element, not the type of the array
being dereferenced. Rather than fishing out the parent type, just use
parent->num_children which should be the length plus 1. While we're here
add another assert for the issue fixed by the previous commit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251
Fixes: 156306e5e6 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-30 17:14:33 +02:00
Connor Abbott
7788992bc6 nir: Fix comparison for nir_deref_instr_is_known_out_of_bounds()
There was an off-by-one error.

Fixes: 156306e5e6 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-30 17:14:28 +02:00
Samuel Pitoiset
9d7ead6f9b radv/gfx10: only compile the GS copy shader on-demand
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-30 16:51:30 +02:00
Michel Dänzer
5229f27f06 gitlab-ci: Fix scons build directory path
Fixes: dd3d0b2897 "gitlab-ci: Only keep the build logs as artifacts."

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-30 16:18:50 +02:00
Jan Zielinski
4d2890e8f7 swr/rasterizer: Add memory tracking support
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-07-30 15:58:36 +02:00
Jan Zielinski
5dd9ad1570 swr/rasterizer: Better implementation of scatter
Added support for avx512 scatter instruction. Non-avx512 will
now call into a C function to do the scatter emulation.

This has better jit compile performance than
the previous approach of jitting scalar loops.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-07-30 13:39:19 +00:00
Jan Zielinski
ad9aff5528 swr/rasterizer: cleanups for tessellation
This commit introduces small fixes in preparation for tessellation
support.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-07-30 13:39:18 +00:00
Jan Zielinski
c5c05979f7 rasterizer/swr: move BucketMgr to SwrContext
This move gets us back to parity  with global manager
in that we can dump render context buckets now.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-07-30 13:39:18 +00:00
Alejandro Piñeiro
cda4c62893 v3d: take into account separate_stencil when checking if stencil should be cleared
In most cases this is not needed because the usual is that when a
separate stencil is written, the parent resource is also written.

This is needed if we have a separate stencil, no depth buffer, and the
source and destination is the same, as in that case the stencil can be
updated, but not the parent source (like if you are blitting only the
stencil buffer). On that situation, the following access to the
stencil buffer would clear the stencil buffer (so overwritting the
previous blitting) cleared because the parent source has
v3d_resource.writes to 0.

As far as I see, that situation only happens with the
GL_DEPTH32F_STENCIL8 format.

Note that one alternative would consider that if the separate_stencil
has been written, the parent should also be considered written (and
update its "writes" field accordingly). But I found this patch more
natural.

Fixes the following piglit tests:
   spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-blit
   spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-copypixels

the latter regressed when internally glCopyPixels implementation
started to use blitting. So:

Fixes: 131d40cfc9 ("st/mesa: accelerate glCopyPixels(STENCIL)")

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-30 12:05:23 +02:00
Daniel Schürmann
45638e14fb radv: Don't include radv_private.h from radv_shader.h
This patch decouples radv_shader.h from any LLVM dependency.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-30 10:29:11 +02:00
Rafael Antognolli
f27908152b i965/gen10: Remove unnecessary workaround.
In fact, the description of the workaround states that the mask field
doesn't work correctly on gen10, and we need to set it to 0xffff even we
we only want to update a single field:

 "The mask bits are not implemented properly on 3DSTATE_3D_MODE.  Driver
 must always program bits 31:16 of DW1 a value of 0xFFFF.   This means
 if it is only updating 1 field, it must update all the fields to the
 correct value."

So unless we want to change any of the fields of 3DSTATE_3D_MODE,
there's not need to emit. Additionally, it seems this workaround is not
required on gen11. And last but not least, this workaround is not
implemented on iris or anv, and it doesn't seem to be missed there.

So let's just remove the whole thing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 16:54:17 -07:00
Kenneth Graunke
44e713eddb iris: Fix SO offset to be 32-bit in DrawTransformFeedback handling
We accidentally started copying a full 64-bit value rather than copying
a 32-bit offset and zeroing the top 32-bits.  This caused us to compute
bogus vertex counts which could lead to GPU hangs in some cases.

Thanks to Clayton Craft for catching the regressions!

Fixes: 0e24d10ff5 ("iris: Use gen_mi_builder to handle CS ALU operations.")
2019-07-29 16:38:19 -07:00
Jason Ekstrand
4bb6e6817e intel: Use a system value for gl_FragCoord
It's kind-of an anomaly that the Intel drivers are still treating
gl_FragCoord as an input.  It also makes zero sense because we have to
special-case it in the back-end.

Because ANV is the only user of nir_lower_wpos_center, we go ahead and
just update it to look for nir_intrinsic_load_frag_coord as part of this
patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 23:30:26 +00:00
Jason Ekstrand
44268b1c72 glsl: Treat gl_FragCoord as a varying even when it's a system value
This fixes glsl-fcoord-invariant-pass.shader_test on drivers that set
GLSLFragCoordIsSysVal which includes radeonsi among others.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 23:30:26 +00:00
Jason Ekstrand
169d896df2 mesa/spirv: Set frag_coord_is_sysval to GLSLFragCoordIsSysVal
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 23:30:26 +00:00
Jason Ekstrand
e401303597 intel/fs: Remove calculate_urb_setup from fs_visitor
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 23:30:26 +00:00
Rob Clark
010d255656 freedreno/a6xx: fix MSAA resolve hangs
Seems like RB_BLIT_SCISSOR needs to be aligned to (minimum?) tile size.

Fixes intermittent GPU hangs triggered by some of the three.js samples
on https://threejs.org/

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-07-29 15:15:31 -07:00
Rob Clark
73cc2dc084 freedreno/ir3: fix for array/reg store vs meta instructions
fishgl.com has a shader which does roughly:

   foo = texture(...);
   if (bar)
      foo = texture(...);

after lowering phi webs to regs we end up w/ a vec4 reg (array).  But
since it was not an indirect access, we try to skip the extra mov.  This
results that the per-component fanout (split) meta instructions store
directly to the reg (array).  Which doesn't work out in RA.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-07-29 15:15:31 -07:00
Eric Engestrom
f7b6a8d12f meson: bump required version to 0.46
0.45 has a few annoying bugs (like the one in !358 [1]), and 0.46 is
well over a year old by now, so let's move to it.

[1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/358

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-07-29 23:07:30 +01:00
Leo Liu
8d7f2e2221 radeon/vcn/vp9: add Arcturus VP9 support
Arcturus CHIP enum is less than Navi10, since it's still gfx9,
but its VCN version belongs to VCN2.x

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:58 -04:00
Leo Liu
a439863918 radeon/vcn: add Arcturus decode support
different internal registers offset from previous HW

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:56 -04:00
Marek Olšák
7708540363 amd: add support for Arcturus
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:54 -04:00
Marek Olšák
417ab8ef6b radeonsi: add AMD_DEBUG=nogfx for testing
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:53 -04:00
Marek Olšák
19d04191c4 radeonsi: add support for compute-only chips
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:51 -04:00
Sonny Jiang
c82f338855 gallium/auxiliary/vl: add compute shaders for deint yuv
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:49 -04:00
Sonny Jiang
ef77a92bca gallium/auxiliary/vl: don't call gfx functions on compute-only chips
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:46 -04:00
James Zhu
b618b65c98 gallium/auxiliary/vl: add PIPE_CAP_GRAPHICS check for vl compositor
Init graphic shader Only when PIPE_CAP_GRAPHICS is true.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:42 -04:00
Marek Olšák
187cc07d05 gallium: create multimedia contexts as compute-only if graphics is unsupported
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:41 -04:00
Marek Olšák
ea7646dc13 gallium: add PIPE_CAP_GRAPHICS
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-29 17:52:39 -04:00
Samuel Pitoiset
372c3dcfdb radv: implement VK_EXT_index_type_uint8
Natively supported on VI+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-29 23:36:53 +02:00
Lionel Landwerlin
c6196f7025 anv: implement VK_EXT_index_type_uint8
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-29 21:26:07 +00:00
Lionel Landwerlin
0d3a532a33 vulkan: Bump headers to 1.1.117
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-29 21:26:07 +00:00
Lionel Landwerlin
161b5f00db include/vulkan: bump vk_android_native_buffer
Taken off https://android.googlesource.com/platform/frameworks/native/+/refs/tags/android-9.0.0_r45/vulkan/include/vulkan/vk_android_native_buffer.h

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-29 21:26:07 +00:00
Eric Engestrom
8486dbb066 intel/mi: only resolve to a temp register if source isn't in memory
aka. fix a s/||/&&/ typo

Fixes: 74063ee61a ("intel/mi: Add a new gen_mi_store_if() helper.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-29 13:35:42 -07:00
Eric Anholt
5596038e2f gitlab-ci: Enable freedreno shader-db runs.
Now that helgrind is less upset and I've completed many successful
full shader-db runs, we should be able to enable freedreno shader-db
runs for Mesa checkins on the tiny public shader-db.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-29 12:52:39 -07:00
Eric Anholt
3c46778b75 nir: Fix helgrind complaints about data race in trivial_swizzle init.
Even if the data race wasn't real (I'm not great at reasoning about
this), helgrind is a nice enough tool that keeping noise out of it is
probably worthwhile.  Besides, typing out the numbers keeps the data
in the read-only data section instead of emitting code to initialize
it every time.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-29 12:50:49 -07:00
Eric Anholt
91986fbbdb freedreno: Fix data race on making the shader's id.
The value is only used for IR3_DBG_DISASM, but it cleans up the
helgrind output.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-29 12:50:49 -07:00
Eric Anholt
6f0521b78c freedreno: Take a lock around shader variant creation.
Shaders are shared across contexts in gallium (part of making it so
that you get shader compile at link time, for shader-db and to reduce
compiles at draw time).  So, we need to protect from variant creation
for a shader from multiple threads at the same time.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-29 12:50:49 -07:00
Eric Anholt
6e3b220ad3 freedreno: Fix data races with allocating/freeing struct ir3.
There is a single ir3_compiler in the screen, and each context may be
compiling ir3 shaders, which call ir3_create.  ralloc doesn't do any
locking on its own, so eventually you can end up racing to break
ralloc's linked lists.

We really don't want struct ir3 to live as long as the compiler (maybe
struct ir3_shader's lifetime, if anything), so you'd better be freeing
it anyway.

Fixes: 8fe2076243 ("freedreno/ir3: convert over to ralloc")
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-29 12:50:49 -07:00
Eric Anholt
65aeeae670 freedreno: Fix helgrind complaint on shader-db key setup.
If the variable's going to be static, we shouldn't be memsetting it
from every thread and instead just have it in the data section.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-29 12:50:49 -07:00
Bas Nieuwenhuizen
aac492901a radv: Take variable descriptor counts into account for buffer entries.
Fixes: b5e04e9217 "radv: Support allocating variable size descriptor sets."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-29 20:42:53 +02:00
Jason Ekstrand
99d04a5bd6 anv: Don't claim support for 24 and 48-bit formats on IVB
Cc: mesa-stable@lists.freedesktop.org
2019-07-29 11:34:30 -05:00
Jason Ekstrand
7c1b39cf18 isl/formats: R8G8B8_UNORM_SRGB isn't supported on HSW
On Haswell, the format works but it doesn't properly do an sRGB decode.
It appears to act identically to R8G8B8_UNORM.  Only Vulkan uses this
format so this only affects Vulkan on HSW.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-07-29 11:34:18 -05:00
Alyssa Rosenzweig
463164b325 pan/midgard: Fix alpha test w.r.t new indexing
Fixes: 9beb3391b5 ("pan/midgard: Tag SSA/reg")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-29 08:31:03 -07:00
Gert Wollny
4ee638cd78 softpipe: Don't draw when rasterizer_discard is set
Fixes:
  dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.basic.write_stencil_points
  dEQP-GLES3.functional.rasterizer_discard.fbo.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.fbo.write_stencil_points
  dEQP-GLES3.functional.rasterizer_discard.scissor.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.scissor.write_stencil_points

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-29 15:47:34 +02:00
Gert Wollny
45ac0dfad4 softpipe: Fix cube arrays layer selection
To select the correct layer the z-coordinate must be rounded before it
is multiplied by six.

Fixes a number of tests out of
   dEQP-GLES31.functional.texture.filtering.cube_array.formats.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-29 15:47:34 +02:00
Lionel Landwerlin
6659d11ff0 vulkan/wsi/wayland: implement acquire timeout
v2: Eric's nits

v3: Reuse timespec utils (Daniel)
    Deal with ppoll being interrupted by a signal (Daniel)

v4: Remove unnecessary time check

v5: Deal with EAGAIN from wl_display_prepare_read_queue() (Daniel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-07-29 13:11:36 +00:00
Lionel Landwerlin
d2d70c3bb5 util: add a timespec helper
Copied from Weston, upon Daniel's suggestion

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-07-29 13:11:36 +00:00
Eric Engestrom
ef57fb2350 intel: replace large stack buffer with heap allocation
For now, this keeps the "100 bytes" allocation; we can try to figure out
the correct size as a follow up.

Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-29 13:58:57 +01:00
Samuel Pitoiset
58ee973e87 radv/gfx10: do not use the fast depth or stencil clear bytes path
It causes issues on GFX10.

This fixes rendering issues with vkmark and Wreckfest at least.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
2019-07-29 14:47:13 +02:00
Samuel Pitoiset
4aa450193b ac: do not crash when the buffer data format is invalid
This might happen when a pipeline doesn't define the vertex input
state, so the buffer data format is 0 (aka INVALID).

This fixes crashes when compiling some shaders on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-29 13:19:32 +02:00
Rhys Perry
a9f58af454 ac/nir: fix txf_ms with an offset
Seems to fix some hair artifacts in Max Payne 3:
https://github.com/daniel-schuermann/mesa/issues/76

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: f4e499ec79 ('radv: add initial non-conformant radv vulkan driver')
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-29 11:50:13 +01:00
Connor Abbott
a69ab1b7d2 radv: Delete unused local variables in optimization loop
Totals from affected shaders:
SGPRS: 376 -> 376 (0.00 %)
VGPRS: 620 -> 560 (-9.68 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 292 -> 292 (0.00 %) dwords per thread
Code Size: 20024 -> 20144 (0.60 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 25 -> 25 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-29 11:37:46 +02:00
Connor Abbott
156306e5e6 nir/find_array_copies: Handle wildcards and overlapping copies
This commit rewrites opt_find_array_copies to be able to handle
an array copy sequence with other intervening operations in between. In
particular, this handles the case where we OpLoad an array of structs
and then OpStore it, which generates code like:

foo[0].a = bar[0].a
foo[0].b = bar[0].b
foo[1].a = bar[1].a
foo[1].b = bar[1].b
...

that wasn't recognized by the previous pass.

In order to correctly handle copying arrays of arrays, and in particular
to correctly handle copies involving wildcards, we need to use a tree
structure similar to lower_vars_to_ssa so that we can walk all the
partial array copies invalidated by a particular write, including
ones where one of the common indices is a wildcard. I actually think
that when factoring in the needed hashing/comparing code, a hash table
based approach wouldn't be a lot smaller anyways.

All of the changes come from tessellation control shaders in Strange
Brigade, where we're able to remove the DXVK-inserted copy at the
beginning of the shader. These are the result for radv:

Totals from affected shaders:
SGPRS: 4576 -> 4576 (0.00 %)
VGPRS: 13784 -> 5560 (-59.66 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread
Code Size: 329940 -> 263268 (-20.21 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 330 -> 898 (172.12 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-29 11:36:25 +02:00
Connor Abbott
c6543efe7a nir: Print array deref indices as decimal
We print the size as decimal too, and using hex without a leading "0x"
was very confusing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-29 11:36:19 +02:00
Connor Abbott
6fc7384fd4 lima/gpir/sched: Handle more special ops in can_use_complex()
We were missing handling for a few other ops that rearrange their
sources somehow in codegen, namely complex2 and select.

This should fix spec@glsl-1.10@execution@built-in-functions@vs-asin-vec3
and possibly other random regressions from the new scheduler which were
supposed to be fixed in the commit right after.

Fixes: 54434fe670 ("lima/gpir: Rework the scheduler")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-28 23:38:31 +02:00
Connor Abbott
af95f80a24 lima/gp: Clean up lima_program_optimize_vs_nir() a little
Remove an unnecessary nir_lower_regs_to_ssa as that should be done by
the state tracker, and add a missing DCE pass after running copy
propagation in order to remove the dead copies. This shouldn't fix
anything but the second part will reduce shader sizes.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-28 23:38:31 +02:00
Connor Abbott
d26d8c5617 lima/gpir/sched: Don't try to spill when something else has succeeded
In try_node(), we assume that the node we pick can still be scheduled
successfully after speculatively trying all the other nodes. Normally we
always undo every node after speculating it, so that when we finally
schedule best_node the scheduler state is exactly the same and it
succeeds. However, we also try to spill nodes, which can change the
state and in a corner case that can make scheduling best_node fail. In
particular, the following sequence of events happened with piglit
shaders@glsl-vs-if-nested: a partially-ready node N was spilled and a
register store node S, which is a use of N, was created and then later
the other uses of N were scheduled, so that S is now ready and N is
partially ready. First we try to schedule S and succeed, then we try to
schedule another node M, which fails, so we try to spill the remaining
uses of N. This succeeds, but scheduling M still fails so that best_node
is still S. However since one of the uses of N is one cycle ago, and
therefore we inserted a read dependent on S one cycle ago when spilling
N, S can no longer be scheduled as read-after-write latency is three
cycles.

While we could ad-hoc try to catch cases like this, or (the best option
but very complicated) treat the spill as speculative and roll it back if
we decide not to schedule the node, a simpler solution is to just
give up on spilling if we've already successfully speculatively
scheduled another node. We'd give up a few cases where we discover that
by spilling even harder we could schedule a more desirable node, but
that seems like it would be pretty rare in practice. With this we
guarantee that nothing has been touched after best_node was successfully
scheduled. We also cut down on pointless spilling, since if we already
scheduled a node it's unlikely that spilling harder will let us schedule
an even better node, and hence any spilling at this point is probably
useless.

While we're here, clean up the code around spilling by flattening the
two if's and getting rid of the second unnecessary check for INT_MIN.

Fixes: 54434fe670 ("lima/gpir: Rework the scheduler")
Acked-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2019-07-28 23:38:31 +02:00
Ilia Mirkin
de17922b8a nv50/ir: don't consider the main compute function as taking arguments
With OpenCL, kernels can take arguments and return values (?). However
in practice, there is no more TGSI compute implementation, and even if
there were, it would probably have named functions and no explicit main.

This improves RA considerably for compute shaders, since temps are not
kept around as return values.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-07-27 18:24:11 -04:00
Ilia Mirkin
3e468ff2fe nv50/ir: handle insn not being there for definition of CVT arg
This can happen if it's e.g. a uniform or a function argument.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2019-07-27 18:24:11 -04:00
Ilia Mirkin
23dfff0669 nouveau: flip DEBUG -> !NDEBUG
The meson conversion chose to change the meaning of DEBUG to "used for
debugging" to be "used for expensive things for debugging", primarily
for nir_validate. Flip things over so that we get nice things with
optimizations enabled.

While we're at it, also kill off nouveau_statebuf.h which is unused (and
has a mention of DEBUG which is how I found it).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-07-27 18:24:11 -04:00
Ilia Mirkin
9f8ed5aa67 nvc0: allow a non-user buffer to be bound at position 0
Previously the code only handled it for positions 1 and up (as would be
for UBO's in GL). It's not a lot of trouble to handle this, and vl or
vdpau want this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2019-07-27 18:24:11 -04:00
Ilia Mirkin
c52b057e00 nv50,nvc0: update sampler/view bind functions to accept NULL array
Apparently vl (or vdpau) wants to pass that in now. Handle it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
2019-07-27 18:24:11 -04:00
Ilia Mirkin
face27fdc5 gallium/vl: fix compute tgsi shaders to not process undefined components
This caused nouveau's function handling logic to think that the MAIN
function was due to receive external parameters, and cascaded some
failures after that. Instead avoid having the undefined components in
the first place.

Fixes: f6ac0b5d71 (gallium/auxiliary/vl: Add compute shader to support video compositor render)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-27 18:24:11 -04:00
Alyssa Rosenzweig
159abd527e pan/midgard: Introduce invert field
This will enable us to fuse inverts in various ways. Marginal hurt:

total instructions in shared programs: 3610 -> 3611 (0.03%)
instructions in affected programs: 67 -> 68 (1.49%)
helped: 0
HURT: 1

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 13:38:41 -07:00
Alyssa Rosenzweig
9beb3391b5 pan/midgard: Tag SSA/reg
Rather than putting registers after SSA in the MIR indexing, put them
side-by-side, shifted 1, using the bottom bit as the SSA/reg select.
This will allow us to generate SSA temps in the compiler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 13:38:41 -07:00
Boyuan Zhang
b0626c1f30 radeon/vcn: enable rate control for hevc encoding
Set cu_qp_delta_enable_flag on when rate control is enabled, and set it
off when rate control is disabled (e.g. constant qp).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org

V2: fix typo and add bugzilla info

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
2019-07-26 14:33:09 -04:00
Boyuan Zhang
5115c25bb8 radeon/uvd: enable rate control for hevc encoding
Set cu_qp_delta_enable_flag on when rate control is enabled, and set it
off when rate control is disabled (e.g. constant qp).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org

V2: fix typo and add bugzilla info

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
2019-07-26 14:33:09 -04:00
Boyuan Zhang
9aaf3aaf5d radeon/vcn: fix poc for hevc encode
MaxPicOrderCntLsb should be at least 16 according to the spec,
therefore add minimum value check.

Also use poc value passed from st instead of calculation
in slice header encoding.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org

V2: Fix typo

V3: Use MAX2 macro instead of coding. Also MaxPicOrderCntLsb
should be power of 2 according to spec.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
2019-07-26 14:33:09 -04:00
Boyuan Zhang
77cf700fa3 radeon/uvd: fix poc for hevc encode
MaxPicOrderCntLsb should be at least 16 according to the spec,
therefore add minimum value check.

Also use poc value passed from st instead of calculation
in slice header encoding.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org

V2: Fix typo

V3: Use MAX2 macro instead of coding. Also MaxPicOrderCntLsb
should be power of 2 according to spec.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
2019-07-26 14:33:09 -04:00
Sagar Ghuge
d5992ab134 nir: Optimize umod lowering
We don't have calculate final quotient in order to calculate unsigned
modulo result.  Once we are done with error correction we have partial
result which can be used to find out modulo operation result

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-26 11:19:23 -07:00
Alyssa Rosenzweig
f8c71d7632 pan/midgard: Improve scheduling
Make scalar scheduling onto vector units more aggressive (it can only
help while we schedule strictly in order). Also, allow imov on VLUT.

total bundles in shared programs: 2176 -> 2117 (-2.71%)
bundles in affected programs: 901 -> 842 (-6.55%)
helped: 24
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 2.46 x̃: 2
helped stats (rel) min: 2.08% max: 20.00% x̄: 8.68% x̃: 5.94%
95% mean confidence interval for bundles value: -3.93 -0.99
95% mean confidence interval for bundles %-change: -10.92% -6.45%
Bundles are helped.

total quadwords in shared programs: 3605 -> 3566 (-1.08%)
quadwords in affected programs: 1984 -> 1945 (-1.97%)
helped: 28
HURT: 5
helped stats (abs) min: 1 max: 3 x̄: 1.68 x̃: 2
helped stats (rel) min: 1.02% max: 14.29% x̄: 5.12% x̃: 2.94%
HURT stats (abs)   min: 1 max: 3 x̄: 1.60 x̃: 1
HURT stats (rel)   min: 0.57% max: 9.09% x̄: 6.40% x̃: 9.09%
95% mean confidence interval for quadwords value: -1.67 -0.69
95% mean confidence interval for quadwords %-change: -5.37% -1.37%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 10:28:46 -07:00
Alyssa Rosenzweig
94e281b9e0 pan/midgard: Specialize mod checking by type when checking constants
Fixes inlining of integer constants.

total quadwords in shared programs: 3585 -> 3568 (-0.47%)
quadwords in affected programs: 625 -> 608 (-2.72%)
helped: 13
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.31 x̃: 1
helped stats (rel) min: 1.27% max: 9.52% x̄: 3.84% x̃: 2.94%
95% mean confidence interval for quadwords value: -1.60 -1.02
95% mean confidence interval for quadwords %-change: -5.60% -2.07%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 09:47:40 -07:00
Alyssa Rosenzweig
e823d33e77 pan/midgard: Use more aggressive writeout criteria
We loosen the requirement of "no dependencies" to simply be "no
non-pipelined dependencies", so we check for what could be pipelined.

total bundles in shared programs: 2176 -> 2156 (-0.92%)
bundles in affected programs: 779 -> 759 (-2.57%)
helped: 20
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.33% max: 20.00% x̄: 6.47% x̃: 2.78%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -9.44% -3.50%
Bundles are helped.

total quadwords in shared programs: 3605 -> 3585 (-0.55%)
quadwords in affected programs: 1391 -> 1371 (-1.44%)
helped: 20
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 14.29% x̄: 3.84% x̃: 1.64%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -5.73% -1.94%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 09:47:40 -07:00
Alyssa Rosenzweig
c7fc5f3567 pan/midgard: Pipeline non-SSA registers
Rather than bailing if we see something that's not SSA, do out the
analysis to check if we can pipeline and do so if we can.

total registers in shared programs: 392 -> 391 (-0.26%)
registers in affected programs: 3 -> 2 (-33.33%)
helped: 1
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 09:40:10 -07:00
Alyssa Rosenzweig
79f0896491 pan/midgard: Add mir_mask_of_read_components helper
This facilitates analysis of vec4 registers (after going out-of-SSA).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 09:37:28 -07:00
Alyssa Rosenzweig
481447cb00 pan/midgard: Add mir_is_written_before helper
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 09:20:52 -07:00
Alyssa Rosenzweig
95732cc9ef pan/midgard: Obey fragment writeout criteria
Rather than always emitting an extra move for fragments, check the
actual criteria and emit accordingly. (This was lost during the RA
improvements at the end of May).

total bundles in shared programs: 2210 -> 2176 (-1.54%)
bundles in affected programs: 501 -> 467 (-6.79%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.59% max: 33.33% x̄: 13.13% x̃: 12.50%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -16.06% -10.21%
Bundles are helped.

total quadwords in shared programs: 3639 -> 3605 (-0.93%)
quadwords in affected programs: 795 -> 761 (-4.28%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.96% max: 33.33% x̄: 11.22% x̃: 8.33%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -14.31% -8.13%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:09 -07:00
Alyssa Rosenzweig
20771ede1c pan/midgard: Add post-RA move elimination
Think of this pass as register coalescing part 2. After RA runs, but
before scheduling, we scan for code of the form:

   mov rN, rN

and delete the move, since it's totally redundant. This pass helps
already, but it'd of course be much more effective paired with
register coalescing to encourage moves in general to end up in this
form. Nevertheless, even by itself:

total instructions in shared programs: 3665 -> 3613 (-1.42%)
instructions in affected programs: 2046 -> 1994 (-2.54%)
helped: 52
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 8.02% x̃: 4.00%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -10.26% -5.79%
Instructions are helped.

total bundles in shared programs: 2256 -> 2213 (-1.91%)
bundles in affected programs: 1154 -> 1111 (-3.73%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.33% max: 25.00% x̄: 9.10% x̃: 5.56%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -11.60% -6.60%
Bundles are helped.

total quadwords in shared programs: 3689 -> 3642 (-1.27%)
quadwords in affected programs: 2025 -> 1978 (-2.32%)
helped: 47
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 7.86% x̃: 3.85%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -10.30% -5.42%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:09 -07:00
Alyssa Rosenzweig
cb6dea6b4d pan/midgard: Share mir_nontrivial_outmod
To be used with redundant move elimination.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig
b6946d35c8 pan/midgard: Implement texture RA
total instructions in shared programs: 3916 -> 3665 (-6.41%)
instructions in affected programs: 1405 -> 1154 (-17.86%)
helped: 35
HURT: 0
helped stats (abs) min: 1 max: 21 x̄: 7.17 x̃: 3
helped stats (rel) min: 3.00% max: 28.57% x̄: 20.11% x̃: 21.74%
95% mean confidence interval for instructions value: -9.35 -4.99
95% mean confidence interval for instructions %-change: -22.75% -17.46%
Instructions are helped.

total bundles in shared programs: 2472 -> 2256 (-8.74%)
bundles in affected programs: 906 -> 690 (-23.84%)
helped: 32
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 6.75 x̃: 3
helped stats (rel) min: 5.56% max: 32.26% x̄: 20.83% x̃: 16.67%
95% mean confidence interval for bundles value: -9.09 -4.41
95% mean confidence interval for bundles %-change: -23.77% -17.89%
Bundles are helped.

total quadwords in shared programs: 3965 -> 3689 (-6.96%)
quadwords in affected programs: 1568 -> 1292 (-17.60%)
helped: 35
HURT: 0
helped stats (abs) min: 1 max: 21 x̄: 7.89 x̃: 3
helped stats (rel) min: 2.08% max: 28.57% x̄: 19.87% x̃: 20.00%
95% mean confidence interval for quadwords value: -10.38 -5.39
95% mean confidence interval for quadwords %-change: -22.57% -17.17%
Quadwords are helped.

total registers in shared programs: 411 -> 392 (-4.62%)
registers in affected programs: 76 -> 57 (-25.00%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.27 x̃: 1
helped stats (rel) min: 9.09% max: 50.00% x̄: 30.97% x̃: 33.33%
95% mean confidence interval for registers value: -1.52 -1.01
95% mean confidence interval for registers %-change: -39.12% -22.82%
Registers are helped.

total threads in shared programs: 426 -> 432 (1.41%)
threads in affected programs: 6 -> 12 (100.00%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig
13f61f24ea pan/midgard: Fix backwards blend color load
The source and destination were incorrectly flipped in the move, but
some details of our internal regalloc made this function anyway. Now
that we're changing the regalloc, we need to fix this to avoid
regressing blend shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig
a99ecc2b2b pan/midgard: Fix scheduling mishap
We shouldn't try to schedule onto a vmul if the last unit was a smul;
that would force a break ("traveling back in time").

total bundles in shared programs: 2519 -> 2472 (-1.87%)
bundles in affected programs: 791 -> 744 (-5.94%)
helped: 20
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 2.35 x̃: 1
helped stats (rel) min: 1.52% max: 11.76% x̄: 7.94% x̃: 7.69%
95% mean confidence interval for bundles value: -3.47 -1.23
95% mean confidence interval for bundles %-change: -9.36% -6.51%
Bundles are helped.

total quadwords in shared programs: 4028 -> 3965 (-1.56%)
quadwords in affected programs: 1223 -> 1160 (-5.15%)
helped: 17
HURT: 0
helped stats (abs) min: 1 max: 17 x̄: 3.71 x̃: 2
helped stats (rel) min: 2.97% max: 10.64% x̄: 6.97% x̃: 7.14%
95% mean confidence interval for quadwords value: -5.71 -1.70
95% mean confidence interval for quadwords %-change: -8.03% -5.91%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig
e4038f9445 pan/midgard: Fix vector->scalar swizzles
The swizzle should be taken on the masked component, rather than
unconditionally X.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig
10324095d2 pan/midgard: Add dead move elimination pass
This is a special case of DCE designed to run after the out-of-ssa pass
to cleanup special register lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig
082485d663 pan/midgard: Move DCE into its own file
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig
f9e619fa82 pan/midgard: Add mir_rewrite_dst_tag helper
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig
b3cab85606 pan/midgard: Fix flipped register bias fields
We mixed up component_lo and full, which made it appear that we had
less freedom in RA than we actually do. Fix this to fix some
disassemblies as well as prepare for RA with the bias field.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig
be56840d5a pan/midgard: Update RA for cubemap coords
Following the RA work, we apply the same technique to eliminate the move
to r27 when loading cubemaps.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-26 08:37:08 -07:00
Eric Engestrom
d2de5b6ba2 anv+tu+radv: delete unusable dev_icd.json
As per previous commit, Meson doesn't support using uninstalled libs,
they're simply not ready until `ninja install` is ran, so delete them.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> # for anv
Reviewed-by: Eric Anholt <eric@anholt.net> # for tu
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # for radv
2019-07-26 14:47:53 +00:00
Eric Engestrom
2605e9fe46 docs: fix intel_icd.json path
Meson doesn't support using uninstalled libs, they're simply not ready
until `ninja install` is ran, at which point one might as well use the
proper icd.json file in the install folder.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-26 14:47:53 +00:00
Bas Nieuwenhuizen
9653d80de1 vulkan/wsi/x11: Increase the effective min. images for mailbox.
We need 5 images:
1) CPU work
2) GPU work
3) idle
4) queued for flip
5) presenting

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-26 16:37:28 +02:00
Bas Nieuwenhuizen
5eae9bfbfc vulkan/wsi/x11: Wait for GPU work before present with mailbox.
Otherwise the wait only happens at flip time, which messes with
keeping idle buffers around if the GPU work makes the image miss
the next flip.

I decided not to use the wait fences as those are still xshm fences,
so that means we'd still have to wait in the application. Just doing
it before presenting makes things simpler.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-26 16:37:28 +02:00
Bas Nieuwenhuizen
cc6a72a002 vulkan/wsi/x11: Allow using thread present-only.
This allows doing a potential long blocking operation before present.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-26 16:37:28 +02:00
Bas Nieuwenhuizen
55da4e1ec2 vulkan/wsi: Use one fence per image.
Much easier to work with if we want to use them in the WS-specific
WSI implementation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-26 16:37:28 +02:00
Lionel Landwerlin
0fb61dfdeb spirv: propagate access qualifiers through ssa & pointer
Not only variables can be flagged as NonUniformEXT but also
expressions. We're currently ignoring it in an expression such as :

   imageLoad(data[nonuniformEXT(rIndex)], 0)

The associated SPIRV :

   OpDecorate %69 NonUniformEXT
   ...
   %69 = OpLoad %61 %68

This changes propagates access qualifiers through ssa & pointers so
that when it hits a OpLoad/OpStore style instructions, qualifiers are
not forgotten.

Fixes failure the following tests :

   dEQP-VK.descriptor_indexing.*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8ed583fe52 ("spirv: Handle the NonUniformEXT decoration")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-26 14:09:55 +00:00
Lionel Landwerlin
86b53770e1 spirv: wrap push ssa/pointer values
This refactor allows for common code to apply decoration on all
ssa/pointer values. In particular this will allow to propagage access
qualifiers.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-26 14:09:55 +00:00
Lionel Landwerlin
8c330728f3 nir: add access to image_deref intrinsics
SPIRV added the ability to access variables and have expressions non
dynamically uniform and because spirv_to_nir generates deref
instructions, we'll need to have that access there.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-26 14:09:55 +00:00
Yevhenii Kolesnikov
02ecf16a70 main: unreference ATIFragmentShader program before creating new one
Old program was overwritten without release of memory.

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-26 12:51:05 +00:00
Yevhenii Kolesnikov
fad848094f state_tracker: Add destroying routine for feedback and select stages
Fixes leaking memory on iris.

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-26 15:35:03 +03:00
Iago Toral Quiroga
1a99fc0fd0 v3d: fix glDrawTransformFeedback{Instanced}()
This needs to take the vertex count from the provided transform
feedback buffer.

v2:
 - don't take the vertex count from the underlying buffer, instead,
   take it from a v3d subclass of pipe_stream_output_target (Eric).

Fixes piglit tests:
spec/ext_transform_feedback2/draw-auto
spec/ext_transform_feedback2/draw-auto instanced

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-26 08:29:41 +02:00
Iago Toral Quiroga
47eb74ae00 v3d: subclass pipe_streamout_output_target to record TF vertices written
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-26 08:29:41 +02:00
Iago Toral Quiroga
39df568ca1 v3d: refactor v3d_tf_statistics_record slightly
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-26 08:29:41 +02:00
Alyssa Rosenzweig
2f9236096a Revert "panfrost: Don't DIY point size/coord fields"
This reverts commit 4508f43eed, which
broke a bunch of dEQP tests (e.g. in
dEQP-GLES2.functional.draw.draw_arrays.*)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 13:17:22 -07:00
Jason Ekstrand
295e5a17da anv: Disable transform feedback on gen7
It's totally implementable, it's just that the plumbing is a bit
different and we never hooked it up.  Don't advertise a broken feature.

Fixes: 36ee2fd61c "anv: Implement the basic form of VK_EXT_transform_feedback"
2019-07-25 14:58:14 -05:00
Pierre-Eric Pelloux-Prayer
cd02f60c1e mesa: Fix GetTextureImage error reporting, again
Iago Toral Quiroga fixed this in commit 94f740e3fc,
but it recently regressed in 0d8826f723.
Quoting Iago's original commit message for the fix:

GetTex*Image should return INVALID_ENUM if target is not valid, however,
GetTextureImage does not receive a target, and instead should return
INVALID_OPERATION if the effective target is not valid.  From the
OpenGL 4.6 core profile spec, section 8.11 Texture Queries:

   "An INVALID_OPERATION error is generated by GetTextureImage if the
    effective target is not one of TEXTURE_1D, TEXTURE_2D, TEXTURE_3D,
    TEXTURE_1D_ARRAY, TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY,
    TEXTURE_RECTANGLE, or TEXTURE_CUBE_MAP (for GetTextureImage only)."

Note that this differs from the original ARB_direct_state_access spec.

However, the EXT_direct_state_access version does take a target
parameter, so it should continue reporting INVALID_ENUM.

Fixes KHR-GL45.direct_state_access.textures_image_query_errors.

Fixes: 0d8826f723 ("mesa: refactor get_texture_image to remove duplicate code")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-25 18:43:40 +00:00
Kenneth Graunke
0e24d10ff5 iris: Use gen_mi_builder to handle CS ALU operations.
In a few cases, we switch to MI_MATH instead of MI_PREDICATE,
just because we were already doing math and it's easier to chain
together.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
fe08aa67a8 intel/mi: Add a unit test for gen_mi_store_if().
This tests that predicated stores work.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
74063ee61a intel/mi: Add a new gen_mi_store_if() helper.
This performs predicated MI_STORE_REGISTER_MEM commands, assuming that
the condition is already loaded into MI_PREDICATE_DATA.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
27b5817b6c intel/mi: Add gen_mi_nz() and gen_mi_z() helpers.
These provide comparisons against zero.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
4e16b838ba intel/mi: Add a gen_mi_ior() to go with gen_mi_iand()
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
79b8e3c260 intel/mi: Optimize away LOAD_REGISTER_REG from a register to itself
We might want to resolve something to be in a particular register,
so we can access it outside of the gen_mi framework...but it may already
be in that register, at which point there's no work to do.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
fe7ed6b057 iris: Make iris_query.c a genxml-compiled file.
This will let us use Jason's new MI-builder shortly.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
975f7e4a59 iris: Move iris_resolve_conditional_render to the vtable.
It's going to be in genxml code shortly.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
6c4c7b600d iris: Refactor genxml macros and inlines into iris_genx_macros.h.
This will let us put the genxml boilerplate in one place, before we
expand genxml to more files shortly.  Like i965/genX_boilerplate.h.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Kenneth Graunke
204a3bb816 iris: Make an iris_genx_protos.h header for prototypes.
This lets us specify the prototypes once, instead of cut and pasting
them per generation.  isl uses a similar approach (isl_genX_priv.h).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-25 18:42:55 +00:00
Marek Olšák
068093e84c radeonsi: fix DAL hang due to incorrect DCC offset on Raven
Set the correct relative offset.

Fixes: f8b6c5a "radeonsi: rewrite si_get_opaque_metadata, also for gfx10 support"
2019-07-25 14:09:11 -04:00
Jason Ekstrand
9d2aa67c47 anv: Disable subgroup arithmetic on gen7
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-25 16:43:16 +00:00
Eric Anholt
f60defa72d gitlab-ci: Add a shader-db run using v3d on drm-shim.
This provides significant compiler coverage during CI at a fairly low
cost in CPU time (~17s per thread for 4 threads on
gst-gitlab-htz-runner3).

I'm leaving wget in the docker image, as once this is in master I'm
planning on having an automatic shader-db comparison between master
and the branch included in the artifacts.  I also haven't done
freedreno yet, because it has some races when run in multithreaded
mode that I'm still tracking down.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-07-25 08:56:55 -07:00
Eric Anholt
dd3d0b2897 gitlab-ci: Only keep the build logs as artifacts.
On a build failure, we were tarring up the whole ccache directory,
build.ninja, build products, etc.  This was over 400MB compressed on a
recent early meson-main build failure, which fd.o then has to hang on
to for 4 weeks.  The build logs are probably the interesting part, are
potentially useful regardless ("how did CI's build flags differ from
mine?"), and are <500k uncompressed on my personal meson build.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2019-07-25 08:56:49 -07:00
Eric Anholt
f68b987387 gitlab-ci: Always set libdir to lib/
I introduced libdir for cross-builds so we could point at the
resulting drivers without per-arch dependencies, but I'd rather not
have to type x86_64-linux-whatever for non-cross-builds either.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-07-25 08:56:19 -07:00
Eric Anholt
494ecef6b4 freedreno: Add support for drm-shim.
I'm using this for shader-db analysis on x86_64 systems.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-25 08:56:19 -07:00
Eric Anholt
82bf1979d7 v3d: Introduce a DRM shim for calling out to the simulator.
The goal is to enable testing of parts of drivers without depending on any
particular kernel version or hardware being present.

Simply set LD_PRELOAD=$PREFIX/lib/libv3d_drm_shim.so in your environment,
and we'll fake a /dev/dri/renderD128 (or whatever the next available node
is) using v3dv3.  That node can then be used with the surfaceless or gbm
EGL platforms.

Acked-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-25 08:56:19 -07:00
Erik Faye-Lund
c5f1432296 glsl: report no function instead of empty candidate list
When generating the error message for a missing function error where
all available overloads were missing due to a too low GLSL version, we
used to report something like this:

---8<---
0:224(14): error: no matching function for call to
           `textureCubeLod(samplerCube, vec3, float)'; candidates are:
0:224(14): error: type mismatch
---8<---

This is a pretty confusing error message, and can throw people off when
debugging. So let's instead check if any overload is available before we
decide what to print. This allow us to report something like this
instead:

---8<---
0:224(14): error: no function with name 'textureCubeLod'
0:224(14): error: type mismatch
---8<---

This is arguably easier to understand for programmers, and doesn't send
you on a wild goose chase to figure out what argument is wrong just
because you stopped reading the message prematurely. I'm of course
referring to a friend, not me. For sure. I would never do that.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-25 17:20:10 +02:00
Bas Nieuwenhuizen
7e1fe81f56 radv: Set correct metadata size for GFX9+.
Without correct size, radeonsi assumes the metadata is incorrect,
which can and will cause issues.

Since the metadata is really incorrect without the size, let us
fix that.

Fixes: e43cc3e3af "radv/gfx9: handle GFX9 opaque metadata"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-25 17:07:53 +02:00
Arcady Goldmints-Orlov
832cedfdee anv: report HOST_ALLOCATION as supported for images
Report VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_ALLOCATION_BIT_EXT as
supported for images. It was being shown supported for buffers, but not
images.

Fixes: 69cc6272fb ("anv: Implement VK_EXT_external_memory_host")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-25 09:01:26 -05:00
Samuel Pitoiset
7d11bf2155 radv/gfx10: fix intensity formats by setting ALPHA_IS_ON_MSB
This fixes
dEQP-VK.rasterization.primitive_size.points.point_size_*

This also fixes some black squares with the Sascha SSAO demo.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-25 15:48:24 +02:00
Samuel Pitoiset
6a504ab473 radv/gfx10: use L2 for DMA copy/fill operations
It's coherent and faster. GFX7-GFX9 should also support this but
for now only uses L2 for GFX10 because it's untested on previous gens.

This fixes dEQP-VK.memory.pipeline_barrier.transfer_*

This also fixes some missing geometry in Dawn Of War III because
VBOs weren't updated correctly.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-25 15:48:21 +02:00
Alyssa Rosenzweig
9ce75826cb pan/midgard: Optimize varying projection
We add a new opt pass fusing perspective projection with varyings. Minor
win..? We don't combine non-varying projections, since if we're too
agressive, the extra load/store traffic will hurt us so it's not really
a win in practice.

total instructions in shared programs: 3915 -> 3913 (-0.05%)
instructions in affected programs: 76 -> 74 (-2.63%)
helped: 1
HURT: 0

total bundles in shared programs: 2520 -> 2519 (-0.04%)
bundles in affected programs: 46 -> 45 (-2.17%)
helped: 1
HURT: 0

total quadwords in shared programs: 4027 -> 4025 (-0.05%)
quadwords in affected programs: 80 -> 78 (-2.50%)
helped: 1
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
f6438d1e15 pan/midgard: Add perspective projection recombine pass
We don't use it yet, since it's actually a shader-db regression. This is
primarily helpful as an intermediate step for attaching projection to
varyings.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
8ddb0eda42 pan/midgard: Force perspective ops to use vec4
It doesn't make sense to use them with anything less.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
b06951d343 pan/midgard: Add R27-only op handling
We use a special conflicting register class.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
f55a760d0c pan/midgard: Add OP_R27_ONLY helper
While load/store ops like st_vary can take an argument in either
r26/r27, ops like those for perspective projection must specifically
take their argument in r27.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
233c0faadd pan/midgard: Enable RA for st_vary
Now that all the piping is in place to do so without regressions, we
flip on automatic register allocation for varyings. Hooray!

total instructions in shared programs: 4025 -> 3915 (-2.73%)
instructions in affected programs: 1667 -> 1557 (-6.60%)
helped: 62
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 1.77 x̃: 2
helped stats (rel) min: 0.93% max: 20.00% x̄: 10.80% x̃: 10.64%
95% mean confidence interval for instructions value: -1.89 -1.66
95% mean confidence interval for instructions %-change: -12.50% -9.11%
Instructions are helped.

total bundles in shared programs: 2683 -> 2520 (-6.08%)
bundles in affected programs: 1066 -> 903 (-15.29%)
helped: 62
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.63 x̃: 3
helped stats (rel) min: 2.94% max: 42.86% x̄: 23.85% x̃: 22.50%
95% mean confidence interval for bundles value: -2.83 -2.43
95% mean confidence interval for bundles %-change: -27.73% -19.97%
Bundles are helped.

total quadwords in shared programs: 4192 -> 4027 (-3.94%)
quadwords in affected programs: 1584 -> 1419 (-10.42%)
helped: 62
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.66 x̃: 3
helped stats (rel) min: 1.85% max: 30.00% x̄: 16.49% x̃: 16.52%
95% mean confidence interval for quadwords value: -2.87 -2.46
95% mean confidence interval for quadwords %-change: -19.14% -13.84%
Quadwords are helped.

total registers in shared programs: 433 -> 411 (-5.08%)
registers in affected programs: 67 -> 45 (-32.84%)
helped: 23
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 25.00% max: 50.00% x̄: 41.30% x̃: 50.00%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 14.29% max: 14.29% x̄: 14.29% x̃: 14.29%
95% mean confidence interval for registers value: -1.09 -0.74
95% mean confidence interval for registers %-change: -45.45% -32.52%
Registers are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
210dbe3fc1 pan/midgard: Remove check for class
Fixes classes defaulting to vec4 in some cases.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
8842db3a7d pan/midgard: Move uniforms to special registers
The load/store pipes can't take a uniform register in, so an explicit
move is necessary here.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
ae7acde91f pan/midgard: Emit st_vary registers in install_registers
Now that we have its registers handled normally like the rest of the IR.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
c3ad7500d2 pan/midgard: Add mir_lower_special_reads helper
Given the constraints on special registers, we add a helper for lowering
these by inserting moves (copies) where needed to satsify the ISA
constraints.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
e169301bd8 pan/midgard: Add emit_explicit_constant helper
We generalize the constant emission helper used in fragment writeout as
we'll also need it for vertex outputs.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
eedd6c1dd0 pan/midgard: Add mir_rewrite_index_src_tag
Specialized version of a rewrite that only rewrites a certain type of
instruction.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
5d5caf10af pan/midgard: Add class check
This ensures the rules for accessing special register classes are
satisfied. This is asserted as a prepass should have lowered offending
uses to something satisfying these rules. Special register classes are
*not* work registers and cannot be used for RMW operations; they are
essentially 1-way pipes straight into/from fixed-function logic in the
shader cores.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
91195bdff1 pan/midgard: Implement class spilling
We reuse the same register spilling mechanism as for work->memory to
spill special->work registers, e.g. to allow writing out more than 2
vec4 varyings (without better scheduling anyway).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
0f38f6466e pan/midgard: Extend liveness analysis to st_vary
These can consume sources now.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
dca0166ce1 pan/midgard: Implement load/store register classing
This does not yet support special->work spilling, nor does it support
multiclass breakup. These corner cases will be handled in succeeding
commits.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
839b80aa89 pan/midgard: Allocate special register classes
We'll want to also handle load/store and texture registers in our RA
loop.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
480b502443 pan/midgard: Move copy propagation into its own file
We also expose some utilities it uses as general MIR helpers.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig
b8caaa3000 pan/midgard: Add mir_simple_swizzle helper
Checks for x/xy/xyz/xyzw style swizzles (slightly more general but you
get the idea).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:21 -07:00
Alyssa Rosenzweig
63385a3fdb pan/midgard: Add mir_single_use helper
Helps as an optimization heuristic.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:37:21 -07:00
Alyssa Rosenzweig
5534fdb7bf panfrost: Compute I/O counts from shader_info
...rather than exposing it in the vendored compiler region.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig
4508f43eed panfrost: Don't DIY point size/coord fields
Again, it's in shader_info for us!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig
bab4f6c724 panfrost: Use nir_gather_info information about discards
No need to track this ourselves!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig
48991c7a1f panfrost: Use NIR helper invocations info
We don't need to guesstimate this ourselves. This will help when we
bringup derivatives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig
fb2fe6e7bc panfrost/sfbd: Flesh out fragment job
We include a zsbuf attachment function based on how the corresponding
MFBD code works, as well as extending cbufs to mipmapped rendering while
we're at it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig
e6802af8c3 panfrost: Disable tiled formats on SFBD systems
Just because we don't have the format codes to render to them yet.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:34:20 -07:00
Alyssa Rosenzweig
990e24469c panfrost: Move require_sfbd to screen
We'll need it to specialize resource creation by chip.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:34:20 -07:00
Alyssa Rosenzweig
a9c73e825a panfrost: Reserve, but do not upload, shader padding
Fixes invalid read errors reported by valgrind.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 06:34:20 -07:00
Alyssa Rosenzweig
b2a3ca6bd5 util/ra: Add a getter for a node class
Complements the existing getters and the setter for node class. To be
used in the Panfrost RA refactor.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-25 06:14:12 -07:00
Tomeu Vizoso
688d9b4fb7 panfrost/ci: Update kernel to 5.2
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-25 15:08:44 +02:00
Nicolas Dufresne
08f1cefecd egl: Also query modifiers when exporting DMABuf
This fixes eglExportDMABUFImageQueryMESA() so it will report the
modififers of the underlying image. Without this information,
re-importing will likely be broken as it is rare these days that no
modifiers are used.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Fixes: 8f7338f284 ("egl: add initial EGL_MESA_image_dma_buf_export v2.4")
2019-07-25 05:14:36 +00:00
Heinrich Fink
4886924262 mesa: Enable GL_MESA_framebuffer_flip_y for GL 4.3
Extend MESA_framebuffer_flip_y to be used with OpenGL versions 4.3 and
higher. OpenGL 4.3 adds FramebufferParameteri needed by this extension.

Reviewed-by: Fritz Koenig <frkoenig@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-25 04:47:38 +00:00
Alyssa Rosenzweig
31c9fcbd0f panfrost: Don't expose some atomic stuff even with dEQP
Fixes dEQP crashes.

Fixes: 2f93ecd654 ("panfrost: Fake CAPs for dEQP-GLES31")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-24 17:21:12 -07:00
Dave Airlie
16fcbb2eba gallium: fix windows build from params change.
This is why we can't have nice things. I'm sure there's someway
to do this with {0} but I really don't have time for that.

Fixes: 2631fd3b0b ("gallivm: rework lp_build_tgsi_soa to take a struct")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-25 10:02:22 +10:00
Jonathan Marek
97c8314c5f nir/algebraic: add scmp algebraic optimizations
When 'x' is the result of a scmp op:

x != 0.0 or x == 1.0: passthrough
x == 0.0 or x != 1.0: invert

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek
9be902097c nir/algebraic: add option to lower fall_equalN/fany_nequalN
Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal
for vec4 backends that doesn't have any special instructions for it, as
long as they support saturate.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek
397375d3f3 nir/algebraic: add fdot2 optimizations
Add simple fdot2 optimizations that are missing.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek
1e089d0575 nir/algebraic: add option to lower fdph
For backends that don't have a 'fdph' instructions

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek
bc3b6168ba nir: replace lower_sincos with algebraic opt
This version has less ops for the same precision.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek
5a4e71c082 nir/algebraic: allow swizzle in nir_algebraic replace expression
This is to allow optimizations in nir_opt_algebraic not otherwise possible

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Rob Clark
b4f4768672 gallium/u_transfer_helper: fix assert in RGTC case
Previously we'd hit the unreachable() for uploading RGTC.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-24 21:11:06 +00:00
Yevhenii Kolesnikov
53730ab32c main: Free memory allocated for gl_bitmap_atlas structure
Structure itself wasn't freed during context tear-down, causing a
memory leak on iris.

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-24 15:31:26 -04:00
Daniel Schürmann
e272fdd508 nir,intel: lower if (cond) demote() to new intrinsic demote_if(cond)
This will effectively enable the optimization in anv.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-24 13:02:18 -05:00
Kenneth Graunke
517005b4cf i965: Use NIR to lower legacy userclipping.
This allows us to drop legacy userclip plane handling in both the vec4
and FS backends, and simplifies a few interfaces.

v2 (Jason Ekstrand):
 - Move brw_nir_lower_legacy_clipping to brw_nir_uniforms.cpp because
   it's i965-specific.
 - Handle adding the params in brw_nir_lower_legacy_clipping
 - Call brw_nir_lower_legacy_clipping from brw_codegen_vs_prog

Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-24 18:00:13 +00:00
Jason Ekstrand
d10de25309 anv: Implement VK_EXT_subgroup_size_control
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
bcef32d49b anv/pipeline: Plumb pipeline shader stage create flags
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
2a236c76f8 intel/compiler: Allow for required subgroup sizes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
4397eb91c1 intel/compiler: Allow for varying subgroup sizes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
799f0f7b28 nir/lower_subgroups: Properly lower masks when subgroup_size == 0
Instead of building a constant mask (which depends on knowing the
subgroup size), we build an expression.  Because the pass uses the
nir_shader_lower_instructions helper, subgroup lowering will be run on
any newly emitted instructions as well as the previously existing
instructions.  In particular, if the subgroup size is known, the newly
emitted subgroup_size intrinsic will get turned into a constant and a
later constant folding pass will clean it up.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
256e6c2d94 vulkan: Update the XML and headers to 1.1.116
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
c84b8eeeac intel/compiler: Be more conservative about subgroup sizes in GL
The rules for gl_SubgroupSize in Vulkan require that it be a constant
that can be queried through the API.  However, all GL requires is that
it's a uniform.  Instead of always claiming that the subgroup size in
the shader is 32 in GL like we have to do for Vulkan, claim 8 for
geometry stages, the maximum for fragment shaders, and the actual size
for compute.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
1981460af2 intel/compiler: Lower gl_SubgroupSize in postprocess_nir
Instead of lowering the subgroup size so early, wait until we have more
information.  In particular, we're going to want different subgroup
sizes from different stages depending on the API.  We also defer
lowering of subgroup masks because the ge/gt masks require the subgroup
size to generate a subgroup mask.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Jason Ekstrand
f62227f2b7 intel/nir: Make brw_nir_apply_sampler_key more generic
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-24 12:55:40 -05:00
Sagar Ghuge
87cef718e1 nir: Add lowering for nir_op_irem and nir_op_imod
Tested on Gen > 9.

v2: 1) Fix lowering
    2) Keep a consistent i/u order (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 10:33:09 -07:00
Yevhenii Kolesnikov
882fe09a74 main: Fix memleaks in mesa_use_program
Add freeing of SubroutineIndexes to the _mesa_free_shader_state.

Fixes: 4566aaaa5b ("mesa/subroutines: start adding per-context
subroutine index support (v1.1)")
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-24 16:54:21 +00:00
Andrii Simiklit
fa2fc68de1 intel/compiler: don't use a keyword struct for a class fs_reg
warning: struct 'fs_reg' was previously declared as a class
Fixes: e64be391 ("intel/compiler: generalize the combine constants pass")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
2019-07-24 13:26:42 +00:00
Qiang Yu
280dfa02fa lima/ppir: fix disassembler temp read/write print
temp read/write use negtive offset, and handle
alignment==1 case.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-07-24 20:39:39 +08:00
Eric Engestrom
e7e31b18d6 gallium+mesa: fix tgsi_semantic array type
Fixes: ed23335a31 ("gallium: use enums in p_shader_tokens.h (v2)")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-24 09:33:29 +01:00
Eric Engestrom
f986741a91 util: fix no-op macro (bad number of arguments)
Fixes: b8e077daee ("util: no-op __builtin_types_compatible_p() for non-GCC compilers")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 09:13:58 +01:00
Samuel Pitoiset
4389e85dc9 radv/gfx10: enable VK_EXT_transform_feedback
When a pipeline uses transform feedback, the driver fallbacks to
the legacy path because NGG support for streamout is a non-trivial
amount of work.

AMDVLK also uses the legacy path for streamout, while RadeonSI
uses the new NGG path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-24 08:23:37 +02:00
Samuel Pitoiset
a3a4fa1860 radv/gfx10: do not enable NGG if a pipeline uses XFB
NGG GS for streamout requires a bunch of work, so enable it with
the legacy path only for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-24 08:23:34 +02:00
Samuel Pitoiset
09abe571a2 radv/gfx10: emit streamout shader config
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-24 08:23:32 +02:00
Samuel Pitoiset
383c2e625a radv/gfx10: declare streamout user SGPRs
Required for legacy streamout.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-24 08:23:30 +02:00
Samuel Pitoiset
fd195d8085 radv/gfx10: update streamout descriptors
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-24 08:23:27 +02:00
Samuel Pitoiset
ea337c8b7e radv/gfx10: fix VS input VGPRs with the legacy path
For some reasons, InstanceID is VGPR3 although StepRate0 is set to 1.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-24 08:23:21 +02:00
Dave Airlie
2631fd3b0b gallivm: rework lp_build_tgsi_soa to take a struct
The parameters were getting messy and I have to add a few more
for compute shaders, so clean it up before proceeding.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-24 09:20:09 +10:00
Jason Ekstrand
9700e45463 nir/lower_io: Return SSA defs from helpers
I can't find a single place where nir_lower_io is called after going out
of SSA which is the only real reason why you wouldn't do this. Returning
SSA defs is more idiomatic and is required for the next commit.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-23 17:48:49 -05:00
Dylan Baker
7cf50af6f5 meson: allow building all glx without any drivers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111016
Fixes: a47c525f32
       ("meson: build glx")
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-23 15:34:23 -07:00
Jan Zielinski
3d6cffffcf swr/rasterizer: Fix 3D resource copies.
Ensure constant attributes stay constant with barycentric interpolation.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-07-23 21:55:09 +02:00
Jan Zielinski
ec4a5f5e13 swr/rasterizer: Fix return type on SIMD8 version of Clamp and Normalize utility functions
Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-07-23 21:55:09 +02:00
Jan Zielinski
47cdb0ac27 swr/rasterizer: small formatting changes
Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-07-23 21:55:09 +02:00
Jan Zielinski
ccc6b4f96b swr/rasterizer: Adding support for unhandled clipEnable state
Clipping is not correctly handled by the rasterizer - fixing this.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-07-23 21:55:09 +02:00
Bas Nieuwenhuizen
e5b3f0a867 radv/gfx10: Enable binning.
Numbers for Talos:

gfx10 without binning: 77.0 77.7 77.2 77.6
gfx10 with binning: 82.3 82.0 82.7 82.4

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen
3268c806fb radv/gfx10: Implement bin size calculation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen
4b757697e9 radv/gfx9: Select between depth/color bins based on area.
Mirrors radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen
22f2f76789 radv: Generalize binning settings.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen
793cbf6161 radv/gfx10: Use new scan converter.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen
4058b354c5 radv: Set FLUSH_ON_BINNING_TRANSITION.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen
906fcfccfd radv: Use pbb_allow for framebuffer BREAK_BATCH.
Ported from radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-23 21:26:59 +02:00
Marek Olšák
264ab6ffcd radeonsi/nir: set tgsi_shader_info::uses_fbfetch for KHR_blend_equation_adv.
This doesn't implement the color buffer load.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-23 15:08:37 -04:00
Marek Olšák
45556731b6 tgsi/scan: add uses_fbfetch
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-23 15:08:37 -04:00
Marek Olšák
ee858871bd radeonsi: fail if importing a texture with incorrect last_level or samples
v2: don't fail if the texture comes from an incompatible driver.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> (v1)
2019-07-23 15:08:27 -04:00
Marek Olšák
f8b6c5a1a6 radeonsi: rewrite si_get_opaque_metadata, also for gfx10 support
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-23 15:03:51 -04:00
Marek Olšák
e718f8e713 radeonsi: simplify si_get_input_prim and remove incorrect TODO comment
u_vertices_per_prim(QUADS) is the same as TRIANGLES.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-23 15:03:49 -04:00
Marek Olšák
16392cc3f3 radeonsi/gfx10: fix and enable CLEAR_STATE
it was a driver bug.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-23 15:03:47 -04:00
Marek Olšák
ad642d5b3a radeonsi: stop using info.opcode_count[TGSI_OPCODE_INTERP_SAMPLE]
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-23 15:03:46 -04:00
Marek Olšák
6ac2146a98 ac/nir: implement nir_op_pack_{us}norm_2x16
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-23 15:03:44 -04:00
Pierre-Eric Pelloux-Prayer
079e5f73d7 mesa/st: rewrite src var when lowering tex_src_plane
The assign_extra_samplers() adds the needed extra samplers but they need
to be used in the nir_tex_instr.
Otherwise the plane information is simply lost and all nir_tex_instr use the
same sampler.
Here's an example of the bug:

NIR before st_nir_lower_tex_src_plane:
	vec1 32 ssa_8 = load_const (0x00000000 /* 0.000000 */)
	vec4 32 ssa_9 = tex ssa_0 (texture_deref), ssa_0 (sampler_deref), ssa_5 (coord), ssa_8 (plane)
	vec1 32 ssa_10 = load_const (0x00000001 /* 0.000000 */)
	vec4 32 ssa_11 = tex ssa_0 (texture_deref), ssa_0 (sampler_deref), ssa_5 (coord), ssa_10 (plane)

After:

	vec4 32 ssa_9 = tex ssa_0 (texture_deref), ssa_0 (sampler_deref), ssa_5 (coord)
	vec4 32 ssa_11 = tex ssa_0 (texture_deref), ssa_0 (sampler_deref), ssa_5 (coord)

This fixes the following piglit test for radeonsi + NIR:
  - ext_image_dma_buf_import-sample_nv12
  - ext_image_dma_buf_import-sample_yuv420
  - ext_image_dma_buf_import-sample_yvu420

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-23 15:00:43 -04:00
Pierre-Eric Pelloux-Prayer
e9cf8c1d30 u_blitter: add a msaa parameter to util_blitter_clear
Fixes: ea5b7de138 ("radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled")

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-23 14:42:20 -04:00
Pierre-Eric Pelloux-Prayer
d811446e6c u_blitter: enable msaa when dst num samples is > 1
Commit ea5b7de138 broke some piglit tests on radeonsi (Bonaire hardware).
This commit fixes half of the regression by enabling msaa if the dest surface has
more than 1 sample (instead of hardcoding it to false).

Fixes: ea5b7de138 ("radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled")

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-23 14:42:20 -04:00
Jason Ekstrand
ae392d73c9 nir/gather_info: Look for uses of helper invocations
The one obvious omission here is gl_HelperInvocation itself.  However,
the spec doesn't require that we generate then when gl_HelperInvocation
is used, it merely mandates that we report them if they are there.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-23 13:40:41 -05:00
Jason Ekstrand
41ab92a327 nir/gather_info: Move setting uses_64bit out of the switch
Otherwise, as we add things to the switch, we're going to forget and add
some 64-bit op at some point in the future and it'll stop getting
flagged.  There's no reason why we can't do the check for derivatives.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-23 13:40:41 -05:00
Jason Ekstrand
0e6cb481fa nir: Add a nir_tex_instr_has_implicit_derivatives helper
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-23 13:40:41 -05:00
Jason Ekstrand
7a98c7804c nir: Move nir_alu_instr_is_comparison to the ALU section
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-23 13:40:41 -05:00
Rafael Antognolli
1f4cbc9a06 intel/genxml: Add new test for subgroups.
Make sure that a <group> tag within another <group> tag work just fine.

v2: rename 'halfbyte' to 'byte' to match the size (Lionel).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-23 17:45:19 +00:00
Rafael Antognolli
fe5ae96d66 intel/genxml: Add basic infra for encoding/decoding unit tests.
Adding option to print quiet.

v2: Add license header.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-23 17:45:19 +00:00
Rafael Antognolli
e25ebe2ec9 intel/gen_decoder: Decode <group> inside <group>.
Now we can decode a <group> tag inside another <group> tag, and properly
print its indices and content.

v2: Use push/pop stack to fields, groups and iters (Lionel).
v3: Add assert(iter->level < DECODE_MAX_ARRAY_DEPTH) (Lionel).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-23 17:45:19 +00:00
Rafael Antognolli
f670c2e1ff intel/gen_decoder: Add the concept of array "levels".
We currently only support one level, which is the basic level of a
<group> tag.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-23 17:45:19 +00:00
Rafael Antognolli
618d054283 intel/gen_decoder: Add array field.
We currently use the group->next pointer to iterate through the <group>
tags. This change them to be a type of field, so we can descend into
them while iterating, and then go back to the original position. Will be
useful when we want to decode <group>'s inside <group>'s, and when there
are more <field>'s after a <group> tag.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-23 17:45:19 +00:00
Rafael Antognolli
21bdd51942 intel/gen_decoder: Rename internally "group" to "array".
A gen_group (group in most of the code) can be of several types:
   - instruction
   - struct
   - register
   - group (?!?)

The <group> tag actually represents an array of elements. So at least
in our code, lets call it an array to avoid confusion with gen_group.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-23 17:45:19 +00:00
Rafael Antognolli
69506cbb74 intel/gen_decoder: Add gen_spec_load_filename() function.
Refactor the code from gen_spec_load_from_path() into a separate
function, that can be used with a xml file that doesn't fit the genX.xml
filename format.

Will be used soon for implementing unit tests for gen_decoder.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-23 17:45:19 +00:00
Rafael Antognolli
1f2b22a6bd intel/gen_decoder: Fix parsing of small genxml file.
When using gen_spec_load_from path, only abort decoding if the read
length is 0. Previously, we were aborting if finding an EOF, even if
something was read from the file.

Also only kill the decoded file if no commands or structs were found,
and print a message in such case.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-23 17:45:19 +00:00
Guido Günther
85996567f5 kmsro: Extend to include mxsfb-drm
This allows using the LCDIF display controllers (with the mxsfb drm
modesetting driver) along with the Etnaviv render-only drivers. LCDIF is
found on i.MX SoCs.

Signed-off-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-23 17:12:10 +00:00
Sagar Ghuge
806e5a37ed anv: Implement VK_KHR_imageless_framebuffer
v2: Pass pointer instead of struct instance (Lionel)

v3: 1) Fix small nits (Jason)
    2) Add way to detect anv_framebuffer don't have attachments (Jason)
    3) Get rid of unncessary pNext chain walk (Jason)
    4) Keep framebuffer instance in anv_cmd_state (Jason)

v4: 1) Dump attachments from cmd_buffer (Jason)

v5: 1) Fix condition check and add assertion (Lionel)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-23 10:01:45 -07:00
Alyssa Rosenzweig
840b806d64 panfrost/midgard: Allocate registers once (per-screen)
This should save a lot of per-compile time by using the RA the way it's
actually supposed to be used.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-23 09:06:21 -07:00
Lionel Landwerlin
772a5f9814 anv: fix use of comma operator
This doesn't fix any bug at the moment because the next statement is
'true' which happens to be APIMODE_D3D, but if that changes it could.

The fixes tags is as far I could go but the error predates it (2016 is
probably far enough).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8db6f2e6eb ("anv/pipeline: Roll genX_pipeline_util.h into genX_pipeline.c")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-23 15:54:48 +00:00
Andrii Simiklit
79ab2c3e57 nir: use | instead of || operator
warning: use of logical '||' with constant operand
note: use '|' for a bitwise operation

Fixes: 758fdce9fe ("nir: Add some generic helpers for writing lowering passes")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
2019-07-23 18:08:58 +03:00
Arnaud Patard
397f9ba69f panfrost: Fix T6XX Support
While testing kmscube with mesa master, it turns out that kmscube is not
working anymore. After bisecting, commit
5a7688fdec is the culprit. A short trial
and error session allowed to find the removed bit of code making kmscube
working again.

This patch adds it back.

Fixes: 5a7688fde ("panfrost: Use 64-bit descriptors globally")

v2: Add comment pointing out this is magic. [Alyssa, trivial]

Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-23 08:04:42 -07:00
Alyssa Rosenzweig
83a1d5544a panfrost: Use correct definition for is_t6xx
Rather than anything "early Midgard", limit us specifically to T6XX, as
certain workarounds only apply to genuine T6XX, not T7XX.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-23 08:04:42 -07:00
Eric Engestrom
3acc4278ad nir: don't return void
Fixes: 14531d676b ("nir: make nir_const_value scalar")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-07-23 16:02:37 +01:00
Eric Engestrom
7797823afa util: fix asprintf() fallback
Fixes: 9607d499dc ("util: add asprintf() wrapper for MSVC")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-23 14:59:07 +00:00
Michel Dänzer
22c7738520 st/mesa: Try re-importing resource if necessary in st_vdpau_map_surface
This can be the case if the resource was obtained from
st_vdpau_output/video_surface_gallium.

st_vdpau_output/video_surface_dma_buf do a similar dance internally.

v2:
* Pass PIPE_HANDLE_USAGE_FRAMEBUFFER_WRITE instead of 0 for usage.

Bugzilla: https://bugs.freedesktop.org/111099
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> # v1
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-23 16:28:02 +02:00
Michel Dänzer
7499e7362d radeonsi: Allow PIPE_TEXTURE_2D_ARRAY in si_texture_from_handle
Needed for the following st/mesa fix.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-23 16:26:04 +02:00
Alyssa Rosenzweig
2f93ecd654 panfrost: Fake CAPs for dEQP-GLES31
We still have some big ticket items left on GLES 3.0, but it's often
helpful to be able to access higher dEQP levels for debugging features
that just don't quite match a particular API.

Plus, this opens up a whole slew of new features to poke at if boredom
overtakes, ahem.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-23 06:36:48 -07:00
Mark Menzynski
7493fbf032 nvc0/ir: Fix assert accessing null pointer
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111007
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111167

Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann<tobias.klausmann@freenet.de>
2019-07-23 15:08:25 +02:00
Samuel Pitoiset
d36af71f44 radv/gfx10: enable CLEAR_state
It actually works.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-23 14:15:55 +02:00
Juan A. Suarez Romero
c41545c2f5 docs: update calendar, add news item and link release notes for 19.1.3
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-07-23 11:20:00 +00:00
Juan A. Suarez Romero
3843c5f77a docs: add sha256 checksums for 19.1.3
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 33e57d0ace)
2019-07-23 11:18:31 +00:00
Juan A. Suarez Romero
fd965a3330 docs: add release notes for 19.1.3
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 09a1b2bdba)
2019-07-23 11:18:29 +00:00
Erico Nunes
65e6c42d27 lima/ppir: fix branch codegen register encode
The branch instruction has 6 bits per register operand which allows it
to specify a component in the register.
Fix codegen so that it outputs the right component, otherwise it always
outputs the x component.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-23 08:49:19 +00:00
Erico Nunes
a255b49593 lima/ppir: fix debug logs in regalloc
The macros already prepend "ppir: ", remove them from the actual strings
so it doesn't appear duplicated.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-23 08:24:19 +00:00
Erico Nunes
9254059dd8 lima/ppir: fix alignment on regalloc spilling loads
The spilling code spills entire vec4 registers regardless of the
components used by the spilled uses.
The inserted stores code force the 4 components, but these loads were
using a variable number of components, causing bugs on loading the
spilled registers.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-23 08:24:19 +00:00
Samuel Pitoiset
9343c93e34 radv: fix dumping disassembly with RADV_DEBUG=shaders
Fixes: a20a9d0c5e ("radv: dont store disasm string unless keep_shader_info flag set")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-23 10:22:29 +02:00
Eric Engestrom
b1c35fa6d6 st/nir: use asprintf() wrapper to fix MSVC issues
Fixes: 856e84083e ("mesa/st: add sampler uniforms")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-23 08:57:27 +01:00
Eric Engestrom
9607d499dc util: add asprintf() wrapper for MSVC
Fixes: 856e84083e ("mesa/st: add sampler uniforms")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-23 08:57:27 +01:00
Ilia Mirkin
affb2da0f8 gallium: remove boolean from state tracker APIs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-22 22:13:51 -04:00
Ilia Mirkin
0e30c6b8a7 gallium: switch boolean -> bool at the interface definitions
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).

This has been build-tested on amd64 with:

Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
                 vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st:      mesa xa xvmc xvmc vdpau va

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 22:13:51 -04:00
Dave Airlie
365f24705f st/nir: fix arb fragment stage conversion
The comment even justifies the wrongness wrongly.

We should be translating to pipe values properly here or else
fragment maps to tess ctrl.

Fixes: 3d7611e9a6 ("st/nir: use NIR for asm programs")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-23 11:00:53 +10:00
Marek Olšák
cb9eb1834d radeonsi: fix warning: ‘ret’ may be used uninitialized
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-22 20:57:44 -04:00
Marek Olšák
850619117e tgsi: fix warning: ‘interp’ may be used uninitialized
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-22 20:57:44 -04:00
Marek Olšák
f257ef2bbb gallivm: fix warning: ‘op’ may be used uninitialized
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-22 20:57:44 -04:00
Kenneth Graunke
7cdde962c5 iris: Support storage images that have matching typed formats for reads
Even if we don't directly support typed reads on a format, we can often
translate them to a reasonable matching format.  Advertise those too.
2019-07-22 17:30:13 -07:00
Kenneth Graunke
2f1c7fae9e iris: Stop advertising MSAA storage images by mistake
st_extensions.c sets const->MaxImageSamples (GL_MAX_IMAGE_SAMPLES) by
looping over [16, 15, .. 1x] MSAA modes, and RGBA/BGRA/ARGB/ABGR 8888
color formats, calling pipe->is_format_supported() for each, with
the usage set to PIPE_BIND_SHADER_IMAGE.  If any are supported, it
selects that number of samples.

We were checking if sample_count <= 1, which meant that we were getting
a value of 1x MSAA, rather than the expected 0x (feature doesn't exist).

But, only on Icelake because Gen11 adds support for typed read messages
for R8G8B8A8_UNORM.  The lack of typed read messages for these formats
was tricking the check on Gen9 to say no correctly.  This caused some
Icelake conformance failures, because we don't implement this feature.

Just check for sample_count == 0 instead.
2019-07-22 17:30:13 -07:00
Kenneth Graunke
82607f8a90 egl: Only expose 565 pbuffer configs if X can export them as DRI3 images
Glamor in xorg-server 1.20 cannot expose 16bpp pixmaps when running in
the usual 24bpp mode.  This meant our 565 pbuffer configs would
ultimately fail to create a backing pixmap, leading to crashes.

To hack around this, make a 16bpp pixmap and try and export it.
If it works, expose the configs.  Otherwise, just skip them.

This also disables them on DRI2.  These configs were only added to pass
conformance requirements, and I doubt anybody cares about testing out
565 pbuffer visuals on DRI2-only drivers.

v2: Don't leak the fds (caught by Eric Anholt)
v3: Don't free(fds), it's not malloc'd

Fixes: dacb11a585 ("egl: Add a 565 pbuffer-only EGL config under X11.")
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-22 16:58:09 -07:00
Kenneth Graunke
6ad31c4ff3 egl: Make the 565 pbuffer-only config single buffered.
In commit dacb11a585, Eric found the first
matching 565 pbuffer config, and stopped.  Our double-buffered configs
come first in the list, so we added that, making a pbuffer-only config
that claimed to be double buffered.  This doesn't make sense, since
pixmaps/pbuffers are fundamentally not double buffered.

When using that config, every call to eglCreatePbufferSurface would fail
with EGL_BAD_MATCH.  The call chain looks like this:

   - eglCreatePbufferSurface
   - dri3_create_pbuffer_surface
   - dri3_create_surface
   - dri2_get_dri_config

which eventually does:

   const bool double_buffer = surface_type == EGL_WINDOW_BIT;

and then fails to find a matching config, because it ends up looking
for a single-buffered config - and there aren't any.

To fix this, make the 565 pbuffer config single-buffered.  This fixes
at least 51 dEQP-EGL.* tests.

Fixes: dacb11a585 ("egl: Add a 565 pbuffer-only EGL config under X11.")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-22 16:58:09 -07:00
Kenneth Graunke
fc21394bc4 egl: Quiet warning about front buffer rendering for pixmaps/pbuffers
pbuffer configs cause a million of these warnings to trigger, but
when using pixmaps or buffers, there is only one surface, so this
warning doesn't make much sense.  Retain it for window surfaces for now.

Fixes: dacb11a585 ("egl: Add a 565 pbuffer-only EGL config under X11.")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-22 16:58:09 -07:00
Kenneth Graunke
78164a3a6c mesa: Fix ReadBuffers with pbuffers
pbuffers are internally single-buffered.  Marek fixed DrawBuffers to
handle this case, but we need to fix ReadBuffers too.  Otherwise,
pretty much every conformance test fails because glReadPixels breaks.

v2: Refactor the switch into a helper (suggested by Eric Anholt)

Fixes: 35294f2eca ("mesa: fix pbuffers because internally they are front buffers")
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-22 16:58:09 -07:00
Marek Olšák
c37df5feaa mesa: fix assertion failure in TexImage
Check the assertion after error checking.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111194

Fixes: 9dd1f7cec0 ("mesa: pass gl_texture_object as arg to not depend on state")
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-22 14:45:57 -07:00
Jason Ekstrand
5c5f11d1dd nir: Remove a bunch of large stack arrays
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-22 16:17:18 -05:00
Jason Ekstrand
fa63fad333 intel/fs: Stop stack allocating large arrays
Normally, we haven't worried too much about stack sizes as Linux tends
to be fairly friendly towards large stacks.  However, when running DXVK
apps under wine, we're suddenly subject to Windows' more stringent stack
limitations and can run out of space more easily.  In particular, some
of the shaders in Elite Dangerous: Horizons have quite a few registers
and the arrays in split_virtual_grfs are large enough to blow a 1 MiB
stack leading to crashes during shader compilation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108662
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2019-07-22 16:16:39 -05:00
Nataraj Deshpande
0661c357c6 egl/android: Update color_buffers querying for buffer age
color_buffers[] is currently hard coded to 3 for android which fails
in droid_window_dequeue_buffer when ANativeWindow creates color_buffers
>3 while querying buffer age during dEQP partial_update tests on chromeOS.

The patch removes static color_buffers[], queries for MIN_UNDEQUEUED_BUFFERS,
sets native window buffer count and allocates the correct number of
color_buffers as per android.

Fixes dEQP-EGL.functional.partial_update* tests on chromebooks with
enabling EGL_KHR_partial_update.

v2: update comment instead of removing (Eric Engestrom)
v3: change static array to dynamic allocated color_buffers
    querying MIN_UNDEQUEUED_BUFFERS (Chia-I Wu olv@chromium.org)

Fixes: 2acc69da8c "EGL/Android: Add EGL_EXT_buffer_age extension"
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-07-22 12:31:34 -07:00
Caio Marcelo de Oliveira Filho
0345aeeb40 intel/compiler: Use nir_opt_conditional_discard
anv vkpipeline-db results for SKL:

total instructions in shared programs: 3622461 -> 3611281 (-0.31%)
instructions in affected programs: 396452 -> 385272 (-2.82%)
helped: 2062
HURT: 1

total cycles in shared programs: 1458144669 -> 1458105320 (<.01%)
cycles in affected programs: 4171830 -> 4132481 (-0.94%)
helped: 1874
HURT: 180

total loops in shared programs: 2437 -> 2437 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 8745 -> 8748 (0.03%)
spills in affected programs: 8 -> 11 (37.50%)
helped: 1
HURT: 1

total fills in shared programs: 23392 -> 23395 (0.01%)
fills in affected programs: 8 -> 11 (37.50%)
helped: 1
HURT: 1

LOST:   0
GAINED: 1

No changes to shader-db on i965 or iris.  The glsl compiler already
does a similar optimization.

Improvement suggested by Daniel Schürmann.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-22 09:33:48 -07:00
Alyssa Rosenzweig
d07c846546 pan/decode: Disable magic divisor debugging
Memory corruption (for both legitimate and illegitimate reasons) causes
this to hang pantrace.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:34:26 -07:00
Alyssa Rosenzweig
e8dca7e1e1 pan/midgard: Report spills:fills to shader-db
Route this info through so we can track how we're doing on register
spilling.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
055aa9b1f4 panfrost/midgard: Reenable pipeline register creation
This was disabled to permit regression-free RA work. Now that the spill
code is in place, we can reenable, with some caveats about efficacy.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
f0d0061b18 panfrost/midgard: Report tls_size
Pipe through the number of bytes of spilled memory used from the
compiler into the main driver, where it will be used to allocate the
Thread Local Storage buffer.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
f1dcaa0df6 panfrost: Set initialized in more cases
Indirect linear writes were not being marked as initialized, causing the
back blit to be dropped, breaking the listed tests.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
9e3dc703ff panfrost/ci: Update expectations
We've fixed some shader tests.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
bc741599f2 panfrost/midgard: Promote to *move*, not rewrite for non-SSA
Fixes promoted uniform loads to registers.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
40abf11708 panfrost/midgard: Dump MIR of RA failure
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
a08e9511e3 pan/midgard; Dump successor graph when printing MIR
We just use the pointers of the midgard_block*, which is crude, but it
gets the point across and will help debug successor related issues.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
1aa556de2e pan/midgard: Remove debug statement
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
21510c253c panfrost/midgard: Implement register spilling
Now that we run RA in a loop, before each iteration after a failed
allocation we choose a spill node and spill it to Thread Local Storage
using st_int4/ld_int4 instructions (for spills and fills respectively).

This allows us to compile complex shaders that normally would not fit
within the 16 work register limits, although it comes at a fairly steep
performance penalty.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
533d65786f panfrost/midgard: Add mir_has_arg helper
Helps scan the MIR for uses of an index.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
076838ef0c panfrost/midgard: Check write-before-read in liveness analysis
If we write to an index before reading it, the old copy we're checking
liveness for isn't live in this block, even if it does get read later.
Fixes abnormally high register pressure in shaders with loops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
997f85c136 panfrost/midgard/disasm: Check for certain tag errors
Midgard bundles contain a tag, as well as a copy of the tag of the next
bundle to facilitate prefetch. Do some simple static analysis to detect
certain tag errors (particularly on shaders without branching).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
d168b08d62 pan/midgard: Add OP_IS_CSEL helper
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
1f297471a0 pan/midgard: Add mir_rewrite_index_src_single helper
Rather than rewriting an index away across the whole block, we expose
finer (per-instruction) granularity for rewrites.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
16c8c354d0 pan/midgard: Ignore inline_constant in liveness
It doesn't make any sense to look at it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
d155168e6c panfrost/midgard: Implement load/store scratch opcodes
These are used to load/store from Thread Local Storage, which is memory
allocated per-thread (corresponding to ctx->scratchpad in the command
stream) and used for register spilling.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
3bb780ecb9 pan/midg/disasm: Check for int varying ops
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
7e052d9332 pan/midgard: Remove "aliasing"
It was a crazy idea that didn't pan out. We're better served by a good
copyprop pass. It's also unused now.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
3174bc9972 panfrost: Promote uniform registers late
Rather than creating either a load or a uniform register read with a
fixed beginning offset, we always create a load and then promote to a
uniform register later. This will allow us to promote in a register
pressure aware manner.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig
aa03159120 pan/midgard: Call scheduler/RA in a loop
This will allow us to insert instructions as a result of register
allocation, permitting spilling to be implemented. As a side effect,
with the assert commented out this would fix a bunch of glamor crashes
(due to RA failures) so MATE becomes useable.

Ideally we'll have scheduling or RA actually sorted out before the
branch point but if not this gives us a one-line out to get X working...

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:33 -07:00
Alyssa Rosenzweig
1cabb8a706 pan/midgard: Remove custom register selection callback
What we have is equivalent to the default callback; let's use that.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-22 08:20:33 -07:00
Samuel Pitoiset
b5116d3cb7 radv: fix crash in vkCmdClearAttachments with unused attachment
depth_stencil_attachment and/or ds_resolve attachment can be NULL.

This fixes crashes with
dEQP-VK.renderpass.suballocation.unused_clear_attachments.*

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 14:25:54 +02:00
Sergii Romantsov
253be49402 i965: free object labels when deleting
Some leaks detected with GL_KHR_debug on i965.

CC: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-22 12:39:32 +03:00
Samuel Pitoiset
915abbe932 radv/gfx10: update descriptors for inline uniform blocks
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 09:02:42 +02:00
Samuel Pitoiset
d76746c1ff radv/gfx10: emit the GS NGG prologue before the nested barrier
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 09:02:39 +02:00
Samuel Pitoiset
8c97a07967 radv/gfx10: do not allocate space for the ZPASS_DONE bug
GFX10 isn't affected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 09:02:35 +02:00
Samuel Pitoiset
1fb7bd046b radv/gfx10: do not set ELEMENT_SIZE for buffer descriptors
This field doesn't exist.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 09:02:31 +02:00
Samuel Pitoiset
1878090b68 radv: clean up fill_geom_tess_rings()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 09:02:28 +02:00
Samuel Pitoiset
e7c356866e radv: change a bunch of >= GFX9 to == GFX9
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 09:02:26 +02:00
Samuel Pitoiset
6049745b13 ac/nir: do not clamp shadow reference on GFX10
RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures
on GFX10 and RADV doesn't promotes Z16 or Z24.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 09:02:22 +02:00
Daniel Schürmann
64b7386ee8 radv: move nir_opt_conditional_discard out of optimization loop
This late optimization pass is only affected by nir_opt_if() and handles all cases
in a single pass. It's enough to call it once after the optimization loop.
No changes on vkpipeline-db.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 08:12:18 +02:00
Iago Toral Quiroga
dacaf7ec06 v3d: fill logicop_func in the fragment shader key when precompiling shaders
Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng
of logic ops on precompiled shaders, which we don't want to do. Also, this
had the side effect of making shader-db crash, as during this lowering we
would try to read the color format swizzle information from the fragment shader
key that we don't populate in precompiled shaders because right now we only
need it when logic operations are enabled.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-22 08:05:59 +02:00
Jose Maria Casanova Crespo
9bf0bdf776 v3d: Avoid scheduling an instruction that stalls waiting for SFU retval
If we detect that a scheduling candidate will stall because having a
register source that is the written by the SFU unit in the previous
instruction we reduce its priority so any non stalling operation would
be chosen.

The latency of SFU operations is defined as 2. So they would be scheduled
earlier if other candidates have the same priority.

Finally we won't merge instructions that stall to a previously chosen one.
As the result of the previous one would be waiting for an extra cycle.

Although shader-db result show that instruction are hurt with an increase
of 0.35% the sum of instructions + stalls is reduced a 0.52%. And
the total of sfu-stalls is reduced a 63.51%. It implies also a small
increase in the max-temps metric because of scheduling earlier SFU
operations.

total instructions in shared programs: 9102719 -> 9117851 (0.17%)
instructions in affected programs: 4324628 -> 4339760 (0.35%)
helped: 4162
HURT: 12128
helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1
helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51%
HURT stats (abs)   min: 1 max: 27 x̄: 1.69 x̃: 1
HURT stats (rel)   min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68%
95% mean confidence interval for instructions value: 0.90 0.96
95% mean confidence interval for instructions %-change: 0.47% 0.50%
Instructions are HURT.

total max-temps in shared programs: 1327728 -> 1327812 (<.01%)
max-temps in affected programs: 4730 -> 4814 (1.78%)
helped: 61
HURT: 134
helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1
helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17%
HURT stats (abs)   min: 1 max: 3 x̄: 1.12 x̃: 1
HURT stats (rel)   min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26%
95% mean confidence interval for max-temps value: 0.28 0.58
95% mean confidence interval for max-temps %-change: 1.80% 3.52%
Max-temps are HURT.

total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%)
sfu-stalls in affected programs: 95029 -> 31802 (-66.53%)
helped: 25882
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2
helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00%
95% mean confidence interval for sfu-stalls value: -2.47 -2.42
95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54%
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%)
inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%)
helped: 22728
HURT: 855
helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1
helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92%
HURT stats (abs)   min: 1 max: 5 x̄: 1.25 x̃: 1
HURT stats (rel)   min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86%
95% mean confidence interval for inst-and-stalls value: -2.07 -2.01
95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05%
Inst-and-stalls are helped.

v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-22 03:00:50 +02:00
Jose Maria Casanova Crespo
c341ab7ffb v3d: add shader-db stat to count SFU stalls
SFU operations have a latency of 2 cicles, so if their results
are used in the following cycle to a SFU instruction, the GPU
stalls for an extra cycle until the result is available.

This adds the number of stalls to the shader-db debug mode and
sum of instruction + stalls to evaluate optimizations to schedule
instructions that avoid generating sfu-stalls.

v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-22 03:00:50 +02:00
Eric Engestrom
f7224014df radv: replace memset()+strcpy() with snprintf()
Just like the next line :)

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-21 10:38:17 +01:00
Eric Engestrom
29e8f15bdc radv: drop unnecessary memset() before snprintf()
snprintf() always terminates the string.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-21 10:38:17 +01:00
Bas Nieuwenhuizen
451f030c06 radv: Fix uninitialized warning.
For es_vgpr_comp_cnt.

Fixes: 795adbbadd "radv/gfx10: Add pipeline state support for tess."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-21 01:39:08 +02:00
Chia-I Wu
d31d25f634 virgl: fix a sync issue in virgl_buffer_transfer_extend
In virgl_buffer_transfer_extend, when no flush is needed, it tries
to extend a previously queued transfer instead if it can find one.
Comparing to virgl_resource_transfer_prepare, it fails to check if
the resource is busy.

The existence of a previously queued transfer normally implies that
the resource is not busy, maybe except for when the transfer is
PIPE_TRANSFER_UNSYNCHRONIZED.  Rather than burdening us with a
lengthy comment, and potential concerns over breaking it as the
transfer code evolves, this commit makes the valid_buffer_range
check the only condition to take the fast path.

In real world, we hit the fast path almost only because of the
valid_buffer_range check.  In micro benchmarks, the condition should
always be true, otherwise the benchmarks are not very representative
of meaningful workloads.  I think this fix is justified.

The recent change to PIPE_TRANSFER_MAP_DIRECTLY usage disables the
fast path.  This commit re-enables it as well.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-07-19 18:04:42 -07:00
Chia-I Wu
324c20304e virgl: rework virgl_transfer_queue_extend
Do not take a transfer and do the memcpy.  Add a _buffer suffix to
the function name to make it clear that it is only for buffers.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-07-19 18:04:37 -07:00
Chia-I Wu
2b8ad88078 virgl: fix virgl_buffer_transfer_extend
Without setting hw_res, virgl_transfer_queue_extend never finds a
match and always returns NULL.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-07-19 18:04:34 -07:00
Marek Olšák
bcabf75ab7 radeonsi: initialize scissor registers etc. without clear state
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:56 -04:00
Marek Olšák
47f41af06c radeonsi: return success from vi_dcc_clear_level to simplify callers
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:54 -04:00
Marek Olšák
7a764b963a radeonsi: fix compute-based culling regression in 1ce52c1e37
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:50 -04:00
Marek Olšák
c741bed6e8 radeonsi/gfx10: fix VGT_PRIMITIVE_TYPE programming
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
a0d330bedb radeonsi/gfx10: enable Wave32 for vertex, geometry, and tessellation shaders
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
1d82240f55 radeonsi/gfx10: add debug options to enable/disable Wave32
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
8f72f137ad radeonsi/gfx10: add as_ngg variant for TES as ES to select Wave32/64
Legacy GS has to use Wave64, so TES before GS has to use Wave64 too.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
88efb63caf radeonsi/gfx10: implement Wave32
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
54e6900ede radeonsi/gfx10: use 32-bit wavemasks for Wave32
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
81091a5183 ac: create the LLVM builder in ac_llvm_context_init
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
eb54b8c222 ac: create the LLVM module for Wave32 or Wave64 in ac_llvm_context_init
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
921c1d24d5 ac/rtld: add support for Wave32
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
73aa04e40d ac: add Wave32 LLVM target machine
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
9e467d111b ac: initial Wave32 support in LLVM build helpers
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
c35e926a81 radeonsi: assume that selector != NULL for compute shaders
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:48 -04:00
Marek Olšák
bf0f0697a1 radeonsi: remove what appears to be legacy compute code
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:47 -04:00
Marek Olšák
be67a275b5 radeonsi: remove si_program::use_code_object_v2
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:45 -04:00
Marek Olšák
fd92e65feb radeonsi: add si_shader_selector into si_compute
Now we can assume that shader->selector is always set.
This will simplify some code.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:43 -04:00
Marek Olšák
e2c8ff009e radeonsi: set threadgroup size to 0 for threadgroups with only 1 wave
This has no effect on Wave64.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:39 -04:00
Marek Olšák
a8a526c5cb radeonsi/gfx10: set as_ngg for GS prolog
as_ngg is required by Wave32.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
d3a80f2dda radeonsi/gfx10: remove the disable_ngg option
because legacy VS hangs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
0f30223cf4 radeonsi/gfx10: combine hw edgeflags with user edgeflags for correct behavior
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
bfaca7259c radeonsi/gfx10: deduplicate code for esvert_lds_size
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
a6722285c2 radeonsi/gfx10: simplify a streamout loop in gfx10_emit_ngg_epilogue
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
2683347ba0 radeonsi/gfx10: don't use MALLOC for outputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
1b4354dab9 radeonsi/gfx10: clean up ESGS ring size computation
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
37db9d2865 radeonsi/gfx10: fix unnecessary LDS overallocation for NGG GS
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
985a59e0d1 radeonsi/gfx10: don't compile the GS copy shader if it's 100% not needed
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
7f0ada3f3e radeonsi/gfx10: set GE_CTNL.PACKET_TO_ONE_PA for NGG
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
e08463ac22 radeonsi/gfx10: update a tunable max_es_verts_base for NGG
We have to fix the computation so as not to break quads.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
79d56e6a4a radeonsi/gfx10: implement ARB_post_depth_coverage
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
a57f0f8a6b radeonsi: fix leaked compute shader NIR
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:37 -04:00
Marek Olšák
98377d3450 radeonsi: save the enable_nir option in the shader cache correctly
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:35 -04:00
Marek Olšák
d227b91d2e radeonsi/gfx10: enable SDMA
no changes since gfx9 for buffers

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
47dee97329 ac: use llvm.amdgcn.writelane
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák
39d0c68321 ac: fix shader clock on LLVM 9
Probably relevant commit:

commit dd32dc3f72ec99b1794d62c74d2beb3b60468d50
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Date:   Tue Jul 9 03:10:18 2019 +0000

    [AMDGPU] Always use s_memtime for readcyclecounter

    Differential Revision: https://reviews.llvm.org/D64369

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365431 91177308-0d34-0410-b5e6-96231b3b80d8

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Boyuan Zhang
26099bc35d radeon/vcn: adding engine type for new fw interface
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:33 -04:00
Marek Olšák
936e9fa951 radeonsi: use the correct buffer size in si_vid_clear_buffer
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Pierre-Eric Pelloux-Prayer
b1efc9d05f mesa: add EXT_dsa glEnabledIndexedEXT
The implementation uses _mesa_ActiveTexture to change the active texture unit and
then reset it.

It causes an unnecessary _NEW_TEXTURE_STATE but:
  - adding an index argument to _mesa_set_enable causes a lot of changes (~140 callers)
  - enable_texture (called by _mesa_set_enable) might cause a _NEW_TEXTURE_STATE
    anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-19 20:04:07 -04:00
Pierre-Eric Pelloux-Prayer
ff0cafc8f3 mesa: add EXT_dsa glGetTextureLevelParameter*vEXT functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-19 20:04:06 -04:00
Pierre-Eric Pelloux-Prayer
5fb9c9d628 mesa: add EXT_dsa gl(Copy)Texture(Sub)Image1D/2D/3DEXT functions
Added functions:
- glTextureImage1DEXT
- glTextureImage2DEXT
- glTextureImage3DEXT
- glTextureSubImage1DEXT
- glTextureSubImage3DEXT
- glCopyTextureImage1DEXT
- glCopyTextureImage2DEXT
- glCopyTextureSubImage1DEXT
- glCopyTextureSubImage2DEXT
- glCopyTextureSubImage3DEXT
- glGetTextureImageEXT

All but the last one can be compiled in a display list.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-19 20:04:03 -04:00
Pierre-Eric Pelloux-Prayer
f8ad95c45f mesa: move lookup_texture_ext_dsa up in teximage.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-19 20:04:01 -04:00
Pierre-Eric Pelloux-Prayer
9dd1f7cec0 mesa: pass gl_texture_object as arg to not depend on state
This will allow to use the same functions for EXT_dsa implementation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-19 20:03:57 -04:00
Pierre-Eric Pelloux-Prayer
0d8826f723 mesa: refactor get_texture_image to remove duplicate code
Move shared code in a new function (_get_texture_image) and use it instead
of duplicating the same lines.
Will be also used by the EXT_dsa functions (GetTextureImageEXT and GetMultiTexImageEXT).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-19 20:03:40 -04:00
Jeremy Newton
666ea30017 pipe-loader: use radeonsi for MM if amdgpu dri is used
The amdgpu dri is used for the closed source AMD driver. Since this driver
does not implement multimedia, we fall back to radeonsi in mesa to do
multimedia. This corrects the dri driver name for when it is set to amdgpu.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-19 19:59:02 -04:00
Eric Engestrom
1a25980c46 egl: drop incorrect pkg-config file for glvnd
With b01524fff0 ("meson: don't build libGLES*.so with GLVND")
we dropped the incorrect pkg-config files for GLES*.

Since then, the glvnd issue of its missing files has become painfully
apparent, since it break the build for everyone using glvnd.

NVIDIA has had a fix for a few years now, but has yet to accept it:
https://github.com/NVIDIA/libglvnd/pull/86

Since the breakage is already there, let's clean up everything on our side
while we wait for NVIDIA to accept the fix.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-20 00:07:06 +01:00
Eric Engestrom
e8febd6cba docs: simplify Fixes: git command
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-19 22:24:28 +00:00
Eric Engestrom
0e34e1a0ce mesa/tests: add missing dep_thread
Fixes: f8c27c2775 ("state_tracker: Move the format test out to be an actual unit test.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-07-19 23:03:42 +01:00
Eric Engestrom
6f8b5872ab util: drop strncat(), strcmp(), strncmp(), snprintf() & vsnprintf() MSVC fallbacks
It would seem MSVC>=2015 is now C99-compliant wrt these functions:
strncat:   https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strncat-strncat-l-wcsncat-wcsncat-l-mbsncat-mbsncat-l?view=vs-2017
strcmp:    https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strcmp-wcscmp-mbscmp?view=vs-2017
strncmp:   https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strncmp-wcsncmp-mbsncmp-mbsncmp-l?view=vs-2017
snprintf:  https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/snprintf-snprintf-snprintf-l-snwprintf-snwprintf-l?view=vs-2017
vsnprintf: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/vsnprintf-vsnprintf-vsnprintf-l-vsnwprintf-vsnwprintf-l?view=vs-2017

Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
085c3abf27 util: use standard name for vsnprintf()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
dffeaa55dd util: use standard name for snprintf()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
00e23cd969 util: use standard name for vasprintf()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
59c2dd1b8c util: use standard name for sprintf()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
321d971b08 util: use standard name for strcmp()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
7abc739696 util: use standard name for strcasecmp()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
88ddb2e186 util: use standard name for strncmp()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
27b9eea557 util: use standard name for strncat()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
3ba199abd1 util: use standard name for strdup()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
09a8a39940 util: use standard name for strchrnul()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
367bb55c17 util: drop unused vsprintf() wrapper
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
e7db1806af util: drop unused strchr() wrapper
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Eric Engestrom
84e85035cf util: drop unused strstr() wrapper
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Jason Ekstrand
6301f80b84 nir: Only rematerialize comparisons with all SSA sources
Otherwise, you may end up moving a register read and that could result
in an incorrect shader.  This commit fixes a rendering issue in Elite:
Dangerous.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111152
Fixes: 3ee2e84c60 "nir: Rematerialize compare instructions"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-19 19:45:36 +00:00
Daniel Schürmann
e352b4d650 spirv: Fix order of barriers in SpvOpControlBarrier
Semantically, the memory barrier has to come first to wait
for the completion of pending memory requests.
Afterwards, the workgroups can be synchronized.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-19 10:37:37 -07:00
Caio Marcelo de Oliveira Filho
4061a3f6c9 nir: use a switch when printing intrinsic indices
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-07-19 10:04:52 -07:00
Rhys Perry
e8644122ed nir/algebraic: mark a few comparison simplifications as precise
No vkpipeline-db changes found.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-07-19 16:33:01 +00:00
Rhys Perry
79801b9d7d nir/algebraic: optimize contradictory iand operands
Some of these were found in a few GTAV, Rise of the Tomb Raider and
Shadow of the Tomb Raider shaders.

Results from vkpipeline-db run with ACO:
Totals from affected shaders:
SGPRS: 376 -> 376 (0.00 %)
VGPRS: 220 -> 220 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 13492 -> 11560 (-14.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 69 -> 69 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: use False instead of 0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-07-19 16:33:01 +00:00
Erico Nunes
32ced14bad lima/ppir: handle all node types in ppir_node_replace_child
ppir_node_replace_child is used by the const lowering routine in ppir.
All types need to be handled here, otherwise the src node is not updated
properly when one of the lowered nodes is a const, which results in, for
example, regalloc not assigning registers correctly.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-07-19 16:01:45 +00:00
Erico Nunes
2292f0c4b5 lima/ppir: branch regalloc fixes
The branch instruction has sources which must be handled in src handling
paths so that regalloc assigns registers to them properly.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-07-19 16:01:45 +00:00
Yevhenii Kolesnikov
32b72cbca5 main: Destroy static hash table
format_array_format_table has a static lifetime - it will be destroyed
by an atexit handler.

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 11:22:55 +03:00
Dave Airlie
248161123c radv: reset the window scissor with no clear state.
If we don't have clear state (which gfx10 doesn't currently)
we will fix to reset the scissor. AMDVLK will leave it set
to something else.

Marek also has this fix for radeonsi pending.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 11:00:44 +10:00
Dave Airlie
2ac2b98780 radv: fix crash in shader tracing.
Enabling tracing, and then having a vmfault, can leads to a segfault
before we print out the traces, as if a meta shader is executing
and we don't have the NIR for it.

Just pass the stage and give back a default.

Fixes: 9b9ccee4d6 ("radv: take LDS into account for compute shader occupancy stats")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 11:00:25 +10:00
Timothy Arceri
80c2c17e1e iris: change last_vue_stage() to look at uncompiled shaders
This allows us to find the last vue stage before we have compiled
the shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-19 09:25:47 +10:00
Timothy Arceri
30038dd5ec nir/lower_clip: add support for geometry shaders
This will be used to enabled compat profile support for geometry
shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-19 09:25:47 +10:00
Timothy Arceri
4b08bb4770 nir/lower_clip: add lower_clip_outputs() helper
This will be reused in the following patch to add support for clip
vertex lowering in geometry shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-19 09:25:47 +10:00
Timothy Arceri
a59926b3ca nir/lower_clip: add create_clipdist_vars() helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-19 09:25:47 +10:00
Timothy Arceri
e38b930876 nir/lower_clip: add a find_clipvertex_and_position_outputs() helper
This will allow code sharing in a following patch that adds support
for lowering in geometry shaders. It also allows us to exit early
if there is no lowering to do which allows a small code tidy up.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-19 09:25:47 +10:00
Alyssa Rosenzweig
0395b58c92 panfrost: Set rt_count
This doesn't quite work yet, but it illustrates how MRT is implemented
in the MFBD: rt_count is set appropriately based on the number of render
targets, while additional render target descriptors are appended on with
an index variable in them (not quite decoded since there's some aspects
we don't understand there, but conceptually this should be right).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
871ad7789f panfrost: Trace invisible BOs
Helps make the decode a little more readable (names instead of
addresses).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
17752bae8e panfrost/decode: Preserve empty tiler heap symmetry
If tiler_heap_end == tiler_heap_start, ensure it's printed the same
rather than one erroring out as hex.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
e797caa0dd panfrost: Zero polygon list body size for clears
There's no polygons, so you can't have any size to the polygon list,
although there is a minimal header.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
f475b79980 panfrost/mfbd: Unify depth-only with masked FBO path
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
629c7366a7 panfrost: Simplify set_framebuffer_state
Most of the ad hoc logic is already in Gallium.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
227c395c00 panfrost: Check for NULL surface in places
Fixes a bunch of NULL dereferences, although it does cause GPU faults of
course.

This is caused by color buffers masked out in MRT, which we'll
eventually have to solve the right way... one thing at a time.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
79b13b4376 panfrost: Expose 4 render targets
Hidden behind deqp flag as usual.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig
d56f92502e panfrost: Shrink tiler heap
128MB is excessive and 16MB is still plenty. Saves 112MB/context on
kernels without growable/heap support.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 15:25:16 -07:00
Caio Marcelo de Oliveira Filho
b6d4753568 nir/large_constants: De-duplicate constants
If a function has a constant and is called more than once, after
inlining we may end up with different variables representing the same
constant.  This commit look into the data and de-duplicate them.

The first pass now will collect the constant data in a per variable
buffer, then de-duplication happens (by sorting then linear walk), and
the second pass will use the data in var->data.location.

One side-effect of the current implementation is that constants will
be reordered.  If this turns out to be a problem is something that can
be fixed.

An alternative strategy considered was to perform this in a
per-function basis and then merge the results, the problem is that we
would have to fix up the offsets during the merge.  Given the data we
have, the current patch is good enough.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-18 12:24:24 -07:00
Caio Marcelo de Oliveira Filho
d9b67ad079 nir/large_constants: Use ralloc for var_infos
This will be used later on to allocate constant data for each
variable (and then deduplicate).  Also drop initializing found_read,
as it is already implicitly false in the literal.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-18 12:24:24 -07:00
Eric Anholt
0d8a4c67cf freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper.
Cuts a bunch of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
56f4ede73d freedreno: Convert load_barycentric_at_sample to the NIR lowering helper.
Cuts out a ton of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
61098baf42 freedreno: Convert load_barycentric_at_offset to the NIR lowering helper.
Cuts out a ton of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
cdc359c58e v3d: Use nir_shader_lower_instructions() for txf_ms lowering.
Cuts out a bunch of boilerplate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
251c64a53d nir: Allow internal changes to the instr in nir_shader_lower_instructions().
v3d's NIR txf_ms lowering wants to swizzle around the input coordinates in
NIR, but doesn't generate a new txf_ms instructions as replacement.  It's
pretty easy to allow that in nir_shader_lower_instructions, and it may be
common in lowering passes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-18 11:28:56 -07:00
Eric Anholt
c0640035fb vc4: Convert vc4_nir_lower_txf_ms to nir_shader_lower_instructions().
Cuts out a bunch of boilerplate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
40e7609603 v3d: Fix assertion failures in debug builds.
nir_lower_io leaves around deref_var instructions after lowering away
deref intrinsics.  This ends up breaking validation after v3d_nir_lower_io
removes variables not actually being stored by the shader's
store_output()s.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-07-18 11:28:56 -07:00
Alyssa Rosenzweig
1bced0fad2 panfrost: Handle Z24 textures
Just use the Z32 code.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
f29c084960 panfrost/ci: Update expectations
We just fixed some stencil tests.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
fad76470d5 panfrost: Make scissor test more robust
See v3d implementation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
5c554e235d panfrost: Use correct NO_DITHER field on MFBD
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
676b9339dd panfrost: Implement Z32F(_S8) support
Z32F uses a dediacted float path. Z32F_S8 uses separate stencil planes
in the hardware, lowered via u_transfer_helper.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
479185a1cd panfrost/decode: Don't disassemble NULL shaders
It is legal to load a shader from a NULL address, particularly when the
TILER job is used strictly for effects on the Z/S buffer with 0x0 color
mask. Don't crash the decoder in this case.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig
65d89097b8 panfrost: Copy stencil front to back if back disabled
When backside stenciling is disabled, backfacing primitives just do the
same thing as frontfacing primitives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-18 10:42:43 -07:00
Jan Zielinski
6f7306c029 swr/rast: Refactor memory API between rasterizer core and swr
This commit cleans up API between the core of the rasterizer and swr.
Some formatting changes are also done.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-07-18 16:17:00 +02:00
Andreas Baierl
4627a0c4eb lima/ppir: Add gl_PointCoord handling
Treat gl_PointCoord as a system value and
add the necessary bits for correct codegen.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
3523233027 gallium: Add PIPE_CAP_TGSI_FS_POINT_IS_SYSVAL
This adds an option to treat gl_PointCoord as a system value.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
3349a60f6f nir/tgsi: Extend tgsi_to_nir.c to support gl_PointCoord as a system value.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
f5804f1768 nir: Add gl_PointCoord system value
gl_PointCoord handling needs some special bits set in lima/ppir code
generation. Treating gl_PointCoord as a system value makes it easier
to distinguish from a regular varying.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Andreas Baierl
24af57407c glsl: Optionally declare gl_PointCoord as a system value
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Connor Abbott
b178fdf486 lima/gp: Fix problem with complex moves
When writing the scheduler, we forgot that you can't read the complex
unit in certain sources because it gets overwritten to 0 or 1. Fixing
this turned out to be possible without giving up and reducing
GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't
expect. There can be at most 4 next-max nodes that can't have moves
scheduled in the complex slot, so it actually isn't a problem for
getting the number of next-max nodes at 5 or lower. However, it is a
problem for stores. If a given node is a next-max node whose move cannot
go in the complex slot *and* is used by a store that we decide to
schedule, we have to reserve one of the non-complex slots for a move
instead of all the slots, or we can wind up in a situation where only
the complex slot is free and we fail the move. This means that we have
to add another term to the reservation logic, for stores whose children
cannot be in the complex slot.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
54434fe670 lima/gpir: Rework the scheduler
Now, we do scheduling at the same time as value register allocation. The
ready list now acts similarly to the array of registers in
value_regalloc, keeping us from running out of slots. Before this, the
value register allocator wasn't aware of the scheduling constraints of
the actual machine, which meant that it sometimes chose the wrong false
dependencies to insert. Now, we assign value registers at the same time
as we actually schedule instructions, making its choices reflect reality
much better. It was also conservative in some cases where the new scheme
doesn't have to be. For example, in something like:

1 = ld_att
2 = ld_uni
3 = add 1, 2

It's possible that one of 1 and 2 can't be scheduled in the same
instruction as 3, meaning that a move needs to be inserted, so the value
register allocator needs to assume that this sequence requires two
registers. But when actually scheduling, we could discover that 1, 2,
and 3 can all be scheduled together, so that they only require one
register. The new scheduler speculatively inserts the instruction under
consideration, as well as all of its child load instructions, and then
counts the number of live value registers after all is said and done.
This lets us be more aggressive with scheduling when we're close to the
limit.

With the new scheduler, the kmscube vertex shader is now scheduled in 40
instructions, versus 66 before.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
12645e8714 lima/gp: Mark more add-only nodes as maybe-two-slot
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
16de3dd7a6 lima/gpir: Fix some bugs in instruction handling
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
cc78a42577 lima: Reintroduce the standalone compiler
I used this to test things without needing to have a device handy.

Acked-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:33:23 +02:00
Connor Abbott
4423552ff0 nir/lower_viewport: Check variable mode first
The location is unused for shader_temp and function_temp variables, and
due to the way we nir_lower_io_to_temproraries demotes shader_out
variables to shader_temp variables, it happened to equal
VARYING_SLOT_POS for the gl_Position temporary, which made this pass
fail with the offline compiler due to this coming before vars_to_ssa.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-18 14:21:41 +02:00
Samuel Pitoiset
6e5e4bf050 radv/gfx10: set BREAK_WAVE_AT_EOI if TES or GS enable the primitive ID
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-18 10:37:10 +02:00
Samuel Pitoiset
8c692ff512 radv/gfx10: move emitting VGT_PRIMITIVEID_EN into the NGG path
And do not emit VGT_GS_MODE which is unnecessary on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-18 10:36:38 +02:00
Samuel Pitoiset
8315dbe419 radv/gfx10: do not always execute a barrier before the second shader
With NGG, empty waves may still be required to export data.

This fixes dEQP-VK.ycbcr.format.*_unorm.geometry_*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-18 10:06:34 +02:00
Samuel Pitoiset
63d670e350 radv: fix VGT_GS_MODE if VS uses the primitive ID
Found by inspection.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-18 10:03:12 +02:00
Iago Toral Quiroga
c23fa1ca07 v3d: emit correct lowering for logic operations with MSAA render targets
v2:
 - Drop the writemask from the per-sample color intrinsic (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
93d05c1c1f v3d: handle nir_intrinsic_store_tlb_sample_color_v3d
v2:
 - Move handling of output intrinsics to ntq_emit_intrinsic() (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
50016d7718 nir: add a V3D-specific intrinsic for per-sample color writes
For per-sample color writes we need the output intrinsic to pack the
sample index, which is not provided with regular store_output intrinsics
unless we figured out a way to encode it into the base or the offset.

v2:
 - Drop the writemask (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
ba520b00c4 v3d: implement per-sample tlb color writes
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
b96c2219ca v3d: refactor the tlb color write code
We want to split the tlb specifier setup from the color writes, because when
we implement per-sample color writes we want to do the latter for all the
samples, but the former only once.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
fd3ec6f55d v3d: move tlb color write emission to a helper function
We will soon be adding per-sample color writes which means additional
complexity and more indentation (we will need another loop to emit
the writes for each individual sample), so this will help keeping
things simple and a bit more readable.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
0c9919710e v3d: implement per-sample tlb color reads
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Lionel Landwerlin
3adc32df92 anv: fix format mapping for depth/stencil formats
anv_format is supposed to have a pointer back to the associated
VkFormat, we were missed this for depth/stencil formats.

This doesn't fix anything afaict, but will be needed for future
changes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 465de47bad ("anv: associate vulkan formats with aspects")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-18 09:40:01 +03:00
Dave Airlie
a68f593a0e radv: put back VGT_FLUSH at ring init on gfx10
I can find no evidence that removing this is a good idea.

Fixes: 9b116173b6 ("radv: do not emit VGT_FLUSH on GFX10")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-18 16:24:44 +10:00
Gert Wollny
45951452aa softpipe: Clamp border colors when needed
unorm and snorm require that the border color values are clamped, so when
picking the sampler view copy/clamp the border color from the sampler and
use these adjusted values.

Fixes:

  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_compressed_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_snorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_srgb_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_unorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_compressed_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_snorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_srgb_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_color
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
  dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth_uint_stencil_sample_depth

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:49:00 +02:00
Gert Wollny
230b99ce2f softpipe: set a lower minimum clamp value for texture coordinate border clamp
The value of -0.5f is not small enough to produce negative coordinates,
so lower the minimum clamp value to -1.0f. This fixes a number of tests
from
   dEQP-GLES31.functional.texture.border_clamp.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:47:23 +02:00
Gert Wollny
eae4c6df8d softpipe: Correct repeat-mirror evaluation
when mirroring the texture corrdinates the indices must be mirrored as
well and the half pixel shift must be applied in reverse.

Fixes a number of tests from:
  dEQP-GLES31.functional.texture.gather.offset.*
  dEQP-GLES31.functional.texture.gather.offsets.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:47:23 +02:00
Gert Wollny
fff624fca4 softpipe: Also mark textures as dirty when updating the framebuffer state
At this point all the draw caches are flushed to the old attached textures,
so the read caches of these textures will need to be updated too.

Fixes:
   dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-18 05:33:59 +02:00
Jonathan Marek
08514a9721 etnaviv: set DITHER_MODE
This fixes a rendering glitch observed in SDL testscale test, where alpha
blending samples with value (1.0, 1.0, 1.0, 0.0) whitens the target instead
of having no effect.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:50 -04:00
Jonathan Marek
aaf0c47c76 etnaviv: update headers from rnndb
Update to etna_viv commit a16a418.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:50 -04:00
Jonathan Marek
76adf041f2 etnaviv: fix blend color on newer GPUs
Newer GPUs use the half float ALPHA_COLOR_EXT register.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:50 -04:00
Jonathan Marek
5f73726013 etnaviv: fix alpha blending cases
We need to check rgb_func/alpha_func when determining if blend or separate
alpha is required.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:35 -04:00
Jonathan Marek
6c3c05dc38 etnaviv: fix polygon offset
Dividing the fui result by 65535 is obviously wrong, and from testing, on
GC7000L at least there is no division by 65535.

Fixes dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-17 23:07:07 -04:00
Timothy Arceri
a20a9d0c5e radv: dont store disasm string unless keep_shader_info flag set
This fixes the memory use regression from bug 111107.

Fixes: 726a31df70 ("radv: Add the concept of radv shader binaries.")

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111107
2019-07-18 00:25:55 +00:00
Dave Airlie
82a2f10529 radv/gfx10: set the pgm rsrc3/4 regs using index sh reg set
This is ported from AMDVLK, it's probably not requires unless
we want to use "real time queues", but it might be nice to just have
in place.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-18 10:24:26 +10:00
Dave Airlie
de524b2c37 radv: use correct register setter for ngg hw addr
this shouldn't matter, but it's good to be correct.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-18 10:17:37 +10:00
Eric Anholt
9689407c54 freedreno/a6xx: Drop the WFI in the program update stateobj.
Rob Clark thinks this was likely a workaround for our const buffer
update bugs, and now that it's passing tests, we should be able to
drop it.

renderdoc-traces results:

traces/android/clashofclans.rdc:  +6.1% +/-   1.1%
traces/android/candycrush.rdc:    +5.2% +/-   1.6%

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-17 16:20:12 -07:00
Eric Anholt
2170822603 freedreno/a6xx: Drop the WFI in constant uploads.
Now that the bin vs render constlen is fixed, we can skip these waits.

Improves webgl aquarium performance at 10k fish from 27fps to 33.
Some highlights from renderdoc-traces:

traces/android/minecraft.rdc:             +17.1% +/-   3.4%
traces/glmark2/ideas-speed=duration.rdc:  +11.6% +/-   2.4%
traces/android/candycrush.rdc:            +5.4%  +/-   1.1%
traces/android/clashofclans.rdc:          +4.4%  +/-   1.3%

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-17 16:20:12 -07:00
Eric Anholt
85bbdaff6c freedreno: Assert that we don't exceed constlen.
We actually could go up to vs->constlen in the binning shader on a6xx,
but for sanity let's make sure that we're always under constlen.  This
would have caught the bug fixed in 572c76fd88 ("freedreno: Clamp UBO
uploads to the constlen decided by the shader.")

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-17 16:20:12 -07:00
Eric Anholt
bc50ecfa7a freedreno: Fix more constlen overflows.
Fixes constlen overflow in
dEQP-GLES31.functional.shaders.builtin_var.compute.num_work_groups and
dEQP-GLES31.functional.image_load_store.buffer.image_size.readonly_32
and probably others.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-17 16:20:12 -07:00
Eric Anholt
b9f7f3e497 freedreno: Drop stale comment about skipping uploads.
We already skip the upload if it's unused, due to the constlen >
offset check.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-17 16:20:12 -07:00
Lepton Wu
6109df58e4 virgl: Set meta data for textures from handle.
The set of meta data was removed by commit 8083464. It broke lots of
dEQP tests when running with pbuffer surface type.

Fixes: 8083464013 ("virgl: remove dead code")
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-07-17 16:17:48 -07:00
Bas Nieuwenhuizen
f1a8967344 radv: Only save the descriptor set if we have one.
After reset, if valid does not contain the relevant bit the descriptor
can be != NULL but still not be valid.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-18 00:49:43 +02:00
Lionel Landwerlin
ce4c5474af anv: report timestampComputeAndGraphics true
Spec says :

   "timestampComputeAndGraphics specifies support for timestamps on all
    graphics and compute queues. If this limit is set to VK_TRUE, all
    queues that advertise the VK_QUEUE_GRAPHICS_BIT or
    VK_QUEUE_COMPUTE_BIT in the VkQueueFamilyProperties::queueFlags
    support VkQueueFamilyProperties::timestampValidBits of at least 36."

On gen7+ this should be true (we only have 32bits of timestamp on
gen6 and below).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 802f00219a ("anv/device: Update features and limits")
Reported-by: Timothy Strelchun <timothy.strelchun@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-17 22:46:58 +00:00
Rafael Antognolli
393f659ed8 iris: Enable fast clears on other miplevels and layers than 0.
Until now we only supported fast clear colors on the first miplevel and
layer. The main reason for it is that we can't have different fast clear
values at different levels/layers, since the surface state only supports
one clear value.

We can, however, enable it if we make sure we only use the same value
for all levels/layers, and if one of them changes, we resolve all the
others. We already do that for depth fast clears so hopefully it will be
fine for color fast clears too.

v2: Add check for partial clear too (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-17 14:53:37 -07:00
Rafael Antognolli
8bbd4f32bf iris: Allow resolving clear color of CCS_D surfaces.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-17 14:53:16 -07:00
Kenneth Graunke
df4c2ec5e1 iris: Make iris_has_color_unresolved non-static
We want to use this in the transfer code and possibly for fast clears.
2019-07-17 13:43:04 -07:00
Andreas Bergmeier
f92290a8d9 broadcom: Move v3d_get_device_info to common
In common we can use implementation for Vulkan.
2019-07-17 20:02:34 +00:00
Caio Marcelo de Oliveira Filho
891a232214 nir/large_constants: Use dominance information to find more constants
Relax the restriction that all the writes need to be in the first
block: now accept variables that have all the writes in the same
block, and all the reads are dominated by that block.

This let the pass identify large constants that are local to a helper
function.  The writes will be at the place that the function is
inlined, possibly not in the first block (but still all in the same
block).

Results for vkpipeline-db in SKL:

total instructions in shared programs: 3624891 -> 3623145 (-0.05%)
instructions in affected programs: 79416 -> 77670 (-2.20%)
helped: 16
HURT: 0

total cycles in shared programs: 1458149667 -> 1458147273 (<.01%)
cycles in affected programs: 30154164 -> 30151770 (<.01%)
helped: 14
HURT: 2

total loops in shared programs: 2437 -> 2437 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 8813 -> 8745 (-0.77%)
spills in affected programs: 2894 -> 2826 (-2.35%)
helped: 8
HURT: 0

total fills in shared programs: 23470 -> 23392 (-0.33%)
fills in affected programs: 12248 -> 12170 (-0.64%)
helped: 6
HURT: 2

LOST:   0
GAINED: 0

Results for shader-db in SKL with Iris:

total instructions in shared programs: 15379442 -> 15379392 (<.01%)
instructions in affected programs: 837 -> 787 (-5.97%)
helped: 2
HURT: 2
helped stats (abs) min: 27 max: 27 x̄: 27.00 x̃: 27
helped stats (rel) min: 10.47% max: 10.67% x̄: 10.57% x̃: 10.57%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 1.23% max: 1.23% x̄: 1.23% x̃: 1.23%
95% mean confidence interval for instructions value: -39.14 14.14
95% mean confidence interval for instructions %-change: -15.51% 6.17%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 4880 -> 4880 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 370677237 -> 370676567 (<.01%)
cycles in affected programs: 17852 -> 17182 (-3.75%)
helped: 2
HURT: 1
helped stats (abs) min: 338 max: 356 x̄: 347.00 x̃: 347
helped stats (rel) min: 13.98% max: 14.64% x̄: 14.31% x̃: 14.31%
HURT stats (abs)   min: 24 max: 24 x̄: 24.00 x̃: 24
HURT stats (rel)   min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%

total spills in shared programs: 11772 -> 11772 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 24948 -> 24948 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-17 12:50:32 -07:00
Jason Ekstrand
7ceec21b76 intel/fs: Use a strided MOV instead of a conversion for load_* destinations
In many cases, the compiler can just copy-prop the strided MOV whereas
the conversion is a bit trickier.  This cuts 5% of the instructions off
of one particular Vulkan CTS test which does lots of load_ssbo.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-17 18:44:35 +00:00
Jason Ekstrand
812b341578 nir/algebraic: Optimize comparisons and up-casts
These seem like obvious enough optimizations in the world of multiple
integer bit sizes.  The only known thing which hits these at the moment
is some Vulkan CTS tests for 16-bit SSBO values which like to up-cast
and check for equality.  However, it's something that's bound to come up
as we start seeing more integers in shaders.

The optimizations of comparisons of casted values with constants are
something which we would ideally do with range analysis.  However,
lacking that, we can do it in opt_algebraic as long as one side is a
constant.

In dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13, this commit, along
with the previous commit, reduce the number of instructions emitted on
Skylake from 55328 to 44546, a reduction of 20%.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-17 18:44:35 +00:00
Jason Ekstrand
e8505e982a nir/algebraic: Optimize comparing unpacked values
We could, in theory, add the same optimization for 64-bit unpack
operations but that's likely to fight with 64-bit integer lowering on
platforms which require it so it will require more infrastructure before
that will be a good idea.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-17 18:44:35 +00:00
Jason Ekstrand
9fed031e4e nir/algebraic: Print out the list of transforms in the C file
This helps greatly when debugging algebraic transform generators because
you can now actually see the output and verify that your transforms are
getting generated.

Acked-by: Matt Turner <mattst88@gmail.com>
2019-07-17 18:44:35 +00:00
Jason Ekstrand
68a4c796d5 intel/fs: Properly stride NULL replacement regs in DCE
This fixes some validation errors generated by certain D->W conversions
but is likely not a full solution.  Calculating an actual register
stride is a far more complex problem in general and should probably be
handled by the brw_fs_generator.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-17 18:44:35 +00:00
Eric Anholt
28a808a11b nir: Fix nir_lower_alu_to_scalar's instr filtering.
It was checking if the dest or src[0] SSA values were vectors, rather than
whether the ALU op was using the source as a vector resulting in a
nir_fdot4 making it through to vc4 and v3d:

vec1 32 ssa_6 = fdot4 ssa_4.xxxx, ssa_5

Fixes: c1cffa4249 ("nir/alu_to_scalar: Use the new NIR lowering framework")
v2: Use Jason's recommendation to look at input_sizes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-17 10:30:43 -07:00
Alyssa Rosenzweig
a301250ece panfrost: Merge varyings_mem into transient buffers
Theoretically we would like these split since varyings can have
specially optimized flags (no map, coherent local). For now, since
neither of these flags is particularly meaningful right now, merge them
together instead of special casing varyings_mem.

Saves upwards of 64MB of RAM per context.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-17 09:16:37 -07:00
Lionel Landwerlin
6f880f128f vulkan/wsi: update swapchain status on vkQueuePresent
With the following chain of events :

   vkQueuePresent()
   <- Surface resize
   vkQueuePresent()

We should be able to report SUBOPTIMAL or OUT_OF_DATE on the second
vkQueuePresent() call. Currently we only look at X11 events in the
vkAcquireNextImage() path so we're not able to report this.

This change checks the queue of events and process any available ones
to update the swapchain status.

v2: Be consistent about reporting the current error state of the
    swapchain (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111097
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-17 17:40:54 +03:00
Samuel Pitoiset
24b1b1f574 radv: add an option for disabling NGG on GFX10
Will be useful for testing the legacy path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-17 15:43:36 +02:00
Erik Faye-Lund
d59c961af9 softpipe: pass stream-out targets to draw-module early
This is essensially a port of ed53e61bec from LLVMpipe to softpipe,
as it makes things a bit simpler and more performant.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-07-17 10:43:06 +00:00
Alejandro Piñeiro
5a84960072 spirv_extensions: i965: initialize SPIR-V extensions
v2: Rebase update after changes on previous patches.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-17 10:47:27 +02:00
Alejandro Piñeiro
6ed19dcf80 spirv_extensions: add spirv_supported_extensions on gl_constants
We can use it to get real values for ARB_spirv_extensions methods.

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-17 10:45:58 +02:00
Alejandro Piñeiro
f6da2a5508 spirv_extensions: define spirv_extensions_supported
Add a struct to maintain which SPIR-V extensions are supported, and an
utility method to initialize it based on
nir_spirv_supported_capabilities.

v2:
  * Fixing code style (Ian Romanick)
  * Adding a prefix (spirv) to fill_supported_spirv_extensions (Ian Romanick)

v3: rebase update (nir_spirv_supported_extensions renamed)

v4: include AMD_gcn_shader support

v5: move spirv_fill_supported_spirv_extensions to
    src/mesa/main/spirv_extensions.c

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-17 10:45:32 +02:00
Alejandro Piñeiro
06e5daf575 spirv_extensions: add list of extensions and to_string method
Ideally this should be generated somehow. One option would be gather
all the extension dependencies listed on the core grammar, but there
would be the possibility of not including some of the extensions.

Note that spirv-tools is doing it just slightly better, as it has a
hardcoded list of extensions manually took from the registry, that
they parse to get the enum and the to_string method (see
generate_grammar_tables.py).

v2:
  * Use a macro to improve readability. (Tapani Pälli)
  * Add unreachable on the switch, no default (Eric Engestrom)
  * No typedef enum (Ian Romanick)
  * Sort extensions names (Ian Romanick)
  * Don't add extensions unlikely to be supported by Mesa at any point
    (Ian Romanick)

v3: rebase update

v4: Include AMD_gcn_shader

v5: move spirv_extensions_to_string to src/mesa/main/spirv_extensions.c

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-17 10:44:33 +02:00
Alejandro Piñeiro
a622aad869 spirv_extensions: add GL_ARB_spirv_extensions boilerplate
v2:
  * Mention extension gap at gl_API.xml (Emil Velikov)
  * Bail with INVALID_ENUM if extension not available on getStringi (Emil Velikov)
  * Use EXTRA_EXT macro when defining the extension at
    get.c/get_hash_params.py (Emil Velikov)
  * Rename source files (spirvextensions.[ch] -> spirv_extensions.[ch]) (Ian)

v3:
  * Fix GL_PROGRAM_BINARY_FORMATS glGet query, broken by error on a
    previous rebase

v4:
   * Fix rebase conflicts on getstring.c after
     GL_SHADING_LANGUAGE_VERSION query was added

v5:
   * Remove src/mapi/glapi/gen/Makefile.am as it no longer exists in
     master

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-17 10:41:44 +02:00
Samuel Pitoiset
07ff367442 radv/gfx10: implement VK_EXT_post_depth_coverage
I did implement this extension a while ago but it didn't work
on pre GFX10 for some reasons. Now all CTS pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-17 08:32:39 +02:00
Samuel Pitoiset
ed53d2c4be radv/gfx10: disable the TC compat zrange workaround
Unnecessary.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-17 08:32:36 +02:00
Samuel Pitoiset
edf1af696f radv/gfx10: fallback to the legacy path if tess and extreme geometry
This is unsupported and hangs.

This fixes GPU hangs with
dEQP-VK.tessellation.geometry_interaction.limits.output_required_*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-17 08:32:33 +02:00
Samuel Pitoiset
ae4b1fc095 radv/gfx10: always build the GS copy shader but uses it on-demand
It should be possible to build it on-demand too but it requires
more work. On GFX10, the GS copy shader is required when tess
is enabled with extreme geometry.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-17 08:32:30 +02:00
Gert Wollny
9c611fb381 softpipe: Remove unused static function
Thanks to Eric Engestrom for pointing out that there was something wrong
with that function.

Fixes: 724a73509e
  softpipe: Prepare handling explicit gradients

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-17 04:52:27 +00:00
Caio Marcelo de Oliveira Filho
e2939dc5a1 spirv: Bail when we see CounterBuffer decoration
This decoration can be ignored, so we can just skip the next steps.
Otherwise we'd have to also handle it in apply_var_decoration.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-16 20:31:12 -07:00
Kenneth Graunke
1d5ee31553 iris: Drop copy and pasted iris_timebase_scale
Lionel moved brw_timebase_scale to gen_device_info_timebase_scale a few
months ago, so we should just use that, and not our own copy in iris.
2019-07-16 17:22:48 -07:00
Jason Ekstrand
6fb685fe4b nir/regs_to_ssa: Handle regs in phi sources properly
Sources of phi instructions act as if they occur at the very end of the
predecessor block not the block in which the phi lives.  In order to
handle them correctly, we have to skip phi sources on the normal
instruction walk and handle them as a separate walk over the successor
phis.  While registers in phi instructions is a bit of an oddity it can
happen when we temporarily go out-of-SSA for control-flow manipulations.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111075
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-16 23:28:03 +00:00
Jason Ekstrand
6394680f6b spirv: Add a warning for ArrayStride on arrays of blocks
It's disallowed according to the SPIR-V spec or at least I think that's
what the spec says.  It's in a section explicitly about explicit layout
of things in the StorageBuffer, Uniform, and PushConstant storage
classes so it's not 100% clear that it applies with other storage
classes.  However, it seems like it should apply in general and
violating it can trigger (fairly harmless) asserts in NIR.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-16 17:02:08 -05:00
Caio Marcelo de Oliveira Filho
f07f516c56 anv: Increase state allocation size limit to 2MB
When running on ICL the
dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 needs more than 1M for
the shader, so bump it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-16 14:17:52 -07:00
Yevhenii Kolesnikov
3853871ef8 meta: leaking of BO with DrawPixels
ctx->Unpack.BufferObj wasn't unreferenced.

Fixes: d492e7b017 (meta: Fix invalid PBO access from DrawPixels when
trying to just alloc.)
CC: Eric Anholt <eric@anholt.net>
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 20:06:56 +00:00
Eric Anholt
e8360a64e4 swrast: Move _mesa_format_pack_colormask() to the only caller.
This avoids needing format_pack to have access to the GLenum return
functions for mesa_format.  It seems like an odd function and unlikely
to be reused.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
4d23157a8b mesa: Give _mesa_format_get_color_encoding a clearer name.
It only returned one of two values.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
35e2d31ba4 mesa: Drop redundant checks for sRGB before sRGB to linear conversion.
_mesa_get_srgb_format_linear() just returns the original format if it
wasn't sRGB.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
ece03848c2 mesa: Fold _mesa_unpack_depth_stencil_row() into its only caller.
This was the last bit of gl.h usage in format packing.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
5956b46e16 mesa: Convert format_pack/unpack off of GL types.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
3e186af5e2 mesa: Port format_pack/unpack off of _mesa_problem().
unreachable() should be plenty of debug for these.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
c6a0a976a4 mesa: Mostly switch Mesa format info off of GL types other than GLenum.
I'm considering moving most of this code to src/util/, and I want that
code to not expose GL types in its interfaces.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
93a7651d8d mesa: Rename gl_pack typedefs to mesa_pack.
These are packing mesa formats, not a GL format/type.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
20ce56ad5b mesa: Rename gl_format_info to mesa_format_info.
It's about MESA_FORMATs, after all.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
f8c27c2775 state_tracker: Move the format test out to be an actual unit test.
We want errors in the table to show up as unit test failures in MRs.
Also keeps unit test code out of the built drivers.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
9eccae671e u_format: Remove pointless comments.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
628f55717b src/util: Switch _mesa_half_to_float() to u_half.h's version.
The two implementations differ across the entire input range only in
that u_half.h preserves mantissa bits for NaNs.  The u_half.h version
shaves 15% off of the text size of half_float.o.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Eric Anholt
bb5801ad98 u_half_test: Turn it into an actual unit test.
You could break the test and meson test wouldn't complain, since we
returned success either way.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-16 12:51:13 -07:00
Mauro Rossi
3630988b1d android: radv/gfx10: generate gfx10_format_table.h
This patch adds the missing building rules for Android,
to avoid following building errors:

In file included from external/mesa/src/amd/vulkan/radv_debug.c:35:
In file included from external/mesa/src/amd/vulkan/radv_debug.h:27:
external/mesa/src/amd/vulkan/radv_private.h:95:10:
fatal error: 'gfx10_format_table.h' file not found
         ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

In file included from external/mesa/src/amd/vulkan/radv_android.c:31:
external/mesa/src/amd/vulkan/radv_private.h:95:10:
fatal error: 'gfx10_format_table.h' file not found
         ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 3dc5ec5d16 ("radv/gfx10: generate gfx10_format_table.h")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-16 21:31:24 +02:00
Rob Clark
856e84083e mesa/st: add sampler uniforms
Add sampler uniforms for the UV plane(s), so driver can count the
uniforms and get the correct sampler count.

Fixes lowered YUV on a6xx which actually wants to know # of samplers.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 18:14:44 +00:00
Rob Clark
a9f34b5631 egl/android: handle multi-fd native windows
We can hit multi-fd EGL_NATIVE_BUFFER_ANDROID case when the native
android buffer is YUV.  So we need to handle that.

Currently this went unnoticed because, even though we have two or
three fd's for YUV native android buffers, they all reference the
same backing buffer.  But we really shouldn't rely on that.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 18:14:44 +00:00
Jason Ekstrand
110669c85c st,i965: Stop looping on 64-bit lowering
Now that the 64-bit lowering passes do a complete lowering in one go, we
don't need to loop anymore.  We do, however, have to ensure that int64
lowering happens after double lowering because double lowering can
produce int64 ops.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
548da20b22 nir/lower_doubles: Handle fdiv and fsub directly
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
d7d35a9522 nir/lower_doubles: Use the new NIR lowering framework
One advantage of this is that we no longer need to run in a loop because
the new framework handles lowering instructions added by lowering.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
197a08dc69 nir/lower_doubles: Use "alu" for the nir_alu_instr
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
d65902c179 nir/lower_int64: Use the core NIR lowering framework
One advantage of this is that we no longer need to run in a loop because
the new framework handles lowering instructions added by lowering.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
c1cffa4249 nir/alu_to_scalar: Use the new NIR lowering framework
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
eb768b0a09 nir/alu_to_scalar: Use "alu" as the name for the nir_alu_instr
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
998d84fca5 nir/lower_system_values: Support lowering more intrinsics
Instead of only lowering system from variables, lower most to intrinsics
and let the lowering framework immediately lower the intrinsic.  This
will result in a bit more instruction churn but it means that NIR code
builders can just use intrinsics instead of everything having to go
through variables.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
ae8caaadee nir/lower_system_values: Drop the context-aware builder functions
Instead of having context-aware builder functions, just provide lowering
for the system value intrinsics and let nir_shader_lower_instructions
handle the recursion for us.  This makes everything a bit simpler and
means that the lowering can also be used if something comes in as a
system value intrinsic rather than a load_deref.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
58ffd7fbf6 nir/lower_system_values: Use the new generic NIR lowering helpers
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
ce3af830cb nir/lower_subgroups: Use the new generic NIR lowering helpers
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
758fdce9fe nir: Add some generic helpers for writing lowering passes
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
c74b98486a nir: Add a helper for fetching the SSA def from an instruction
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Tomeu Vizoso
75b53a159d pandecode: Add more addresses to trace
When debugging, we're given the fault_pointer unresolved, so it is
helpful to have more context in the decode.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-07-16 08:40:59 -07:00
Tomeu Vizoso
5a7688fdec panfrost: Use 64-bit descriptors globally
Midgard supports two modes of operation, 32-bit mode and 64-bit mode.
The GPU is natively 64-bit, but job descriptors can be submitted in
32-bit mode. Among other changes, 32-bit mode shortens pointer sizes to
use 32-bit pointers rather than the full 64-bit range.

The blob decides which mode to use based on the CPU bitness, so an armhf
system uses 32-bit descriptors and an aarch64 system uses 64-bit
descriptors. For a while, we mimicked this, bu inevitably this caused
the 32-bit support to lag behind as our reference platform is 64-bit.

To combat the code staleness, we traced an older GPU paired with a 64-bit
CPU (the Midgard T720 on-board the sunxi H64). From there, we could tell
which fields were really about hardware and which fields were simply
reflections of the descriptor bitness.

From there, we decided to remove support for 32-bit descriptors
entirely, using 64-bit descriptors unconditionally. There is minimal
performance penalty for this in practice, and it allows us to unify
these disparate code paths. This fixes:

   - T860 + armhf
   - T820 + armhf
   - T760 + aarch64

And will help bringup of 1st/2nd generation Midgard regardless of CPU.

[Work done by Tomeu. Commit message written by Alyssa.]

v2: Add comments preserving information about the old behaviour for
future reference. Fix a compiler warning. (Alyssa)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-16 08:40:59 -07:00
Jason Ekstrand
6a441151c2 anv: Account for dynamic stencil write disables in the PMA fix
In 6ce8592836 we started looking at the dynamic stencil state and
disabling stencil writes when the stencil mask is zero.  Unfortunately,
we never updated the PMA fix code accordingly so 3DSTATE_WM_DEPTH_STENCIL
and the PMA fix were getting out-of-sync causing hangs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109203
Fixes: 6ce8592836 "anv: Disable stencil writes when both write..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-16 15:12:45 +00:00
Alyssa Rosenzweig
5ad00fb3ed panfrost: Implement opportunistic AFBC
Rather than hardcoding a BO layout at creation-time, we implement the
ability to hint layouts at various points in a BO's lifetime,
potentially reallocating and switching layouts if it's heuristically
deemed useful to do so.

In this patch, we add a simple hinting implementation, opportunistically
compressing FBOs.

Support is hidden behind PAN_MESA_DEBUG=afbc as the implementation is
incomplete (software access to AFBC is unimplemented at the moment) and
therefore would regress significantly.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-16 07:21:08 -07:00
Alyssa Rosenzweig
d60994989e panfrost/mfbd: Zero out framebuffer_stride
We don't know what this is, so let's not pretend we do.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-16 07:19:29 -07:00
Alyssa Rosenzweig
e65e3cf596 panfrost: AFBC buffers must be cache-line aligned
Fixes a DATA_INVALID_FAULT when AFBC is paried with mipmapping.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-16 07:19:28 -07:00
Alyssa Rosenzweig
f7621a8c5f panfrost: Add Z/S and MRT BOs to the job
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-16 07:19:28 -07:00
Alyssa Rosenzweig
aaae6180bf panfrost: Set usage2 during draw, not CSO
It can change from a layout switch.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-16 07:19:28 -07:00
Sergii Romantsov
7417b43211 meta: memory leak of CopyPixels usage
Meta of CopyPixel generates a buffer object
but does not free it on cleanup.

Fixes: 37d11b13ce (meta: Don't pollute the buffer object namespace in _mesa_meta_setup_vertex_objects)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-16 13:48:47 +03:00
Samuel Pitoiset
afa102d65b radv: add radv_emit_streamout_{begin,end} helpers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-16 11:17:00 +02:00
Samuel Pitoiset
17464d205c radv: pass output values to radv_emit_stream_output()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-16 11:16:58 +02:00
Samuel Pitoiset
4dcdc4cdc5 radv: allow to select DST_SEL with RELEASE_MEM
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-16 11:16:57 +02:00
Samuel Pitoiset
3c6d6bd71f radv: allow to emit PS_DONE/CS_DONE with RELEASE_MEM
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-16 11:16:55 +02:00
Samuel Pitoiset
219dc1b25c radv: restore an assertion in handle_vs_outputs()
The NGG GS epilogue no longers call that function so the assertion
is just useless now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-16 11:16:53 +02:00
Samuel Pitoiset
68603b767f radv/gfx10: emit ES outputs of TES when it's not NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-16 11:16:51 +02:00
Samuel Pitoiset
b0f7a6e981 radv: update LATE_ALLOC_VS.LIMIT
Mirror RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-16 10:10:22 +02:00
Samuel Pitoiset
27d91062a8 radv/gfx10: support pixel shaders without exports
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-16 10:10:21 +02:00
Samuel Pitoiset
1b2bfeaaaa radv: fix gathering clip/cull distance masks for GS
For NGG, the driver relies on the VS outinfo struct.

This fixes
dEQP-VK.clipping.user_defined.clip_*_vert_tess_geom_*

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-16 10:09:37 +02:00
Samuel Pitoiset
361d549f87 Revert "radv/gfx10: don't set array pitch field on images"
It introduces too many regressions.

This reverts commit 6d50dcd80f.
2019-07-16 09:37:56 +02:00
Iago Toral Quiroga
556c299430 v3d: flag dirty state when binding new sampler states
We emit code to saturate texture coordinates when using clamp wrapping
mode so if we don't flag the dirty state here we don't get to recompile
the shaders when the wrapping mode changes.

v2:
  - Do the same when setting sampler views (Eric)
  - Use a switch statement instead of an if ladder.
  - Swap the shader stage assertion with an unreachable.

Fixes:
spec/!opengl 1.1/texwrap 1d bordercolor/gl_rgba8, border color only
spec/!opengl 1.1/texwrap 1d proj bordercolor/gl_rgba8, projected, border color only
spec/!opengl 1.1/texwrap 2d bordercolor/gl_rgba8, border color only
spec/!opengl 1.1/texwrap 2d proj bordercolor/gl_rgba8, projected, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha12, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha16, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha4, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha8, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_intensity8, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance4_alpha4, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance6_alpha2, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance8, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance8_alpha8, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_r3_g3_b2, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb10, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb10_a2, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb4, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb5, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb5_a1, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb8, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgba4, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgba8, swizzled, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha12, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha16, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha4, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha8, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_intensity8, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance4_alpha4, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance6_alpha2, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance8, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance8_alpha8, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_r3_g3_b2, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb10, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb10_a2, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb4, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb5, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb5_a1, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb8, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_rgba4, border color only
spec/!opengl 1.1/texwrap formats bordercolor/gl_rgba8, border color only
spec/!opengl 1.2/texwrap 3d bordercolor/gl_rgba8, border color only
spec/!opengl 1.2/texwrap 3d proj bordercolor/gl_rgba8, projected, border color only
spec/arb_es2_compatibility/texwrap formats bordercolor-swizzled/gl_rgb565, swizzled, border color only
spec/arb_es2_compatibility/texwrap formats bordercolor/gl_rgb565, border color only
spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_alpha, swizzled, border color only
spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_luminance_alpha, swizzled, border color only
spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_rgb, swizzled, border color only
spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_alpha, border color only
spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_luminance_alpha, border color only
spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_rgb, border color only
spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_alpha16f_arb, swizzled, border color only
spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_intensity16f_arb, swizzled, border color only
spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_luminance16f_arb, swizzled, border color only
spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_luminance_alpha16f_arb, swizzled, border color only
spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_rgb16f, swizzled, border color only
spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_rgba16f, swizzled, border color only
spec/arb_texture_float/texwrap formats bordercolor/gl_alpha16f_arb, border color only
spec/arb_texture_float/texwrap formats bordercolor/gl_intensity16f_arb, border color only
spec/arb_texture_float/texwrap formats bordercolor/gl_luminance16f_arb, border color only
spec/arb_texture_float/texwrap formats bordercolor/gl_luminance_alpha16f_arb, border color only
spec/arb_texture_float/texwrap formats bordercolor/gl_rgb16f, border color only
spec/arb_texture_float/texwrap formats bordercolor/gl_rgba16f, border color only
spec/arb_texture_rectangle/texwrap rect bordercolor/gl_rgba8, border color only
spec/arb_texture_rectangle/texwrap rect proj bordercolor/gl_rgba8, projected, border color only
spec/arb_texture_rg/texwrap formats bordercolor-swizzled/gl_r8, swizzled, border color only
spec/arb_texture_rg/texwrap formats bordercolor-swizzled/gl_rg8, swizzled, border color only
spec/arb_texture_rg/texwrap formats bordercolor/gl_r8, border color only
spec/arb_texture_rg/texwrap formats bordercolor/gl_rg8, border color only
spec/arb_texture_rg/texwrap formats-float bordercolor-swizzled/gl_r16f, swizzled, border color only
spec/arb_texture_rg/texwrap formats-float bordercolor-swizzled/gl_rg16f, swizzled, border color only
spec/arb_texture_rg/texwrap formats-float bordercolor/gl_r16f, border color only
spec/arb_texture_rg/texwrap formats-float bordercolor/gl_rg16f, border color only
spec/ext_packed_float/texwrap formats bordercolor-swizzled/gl_r11f_g11f_b10f, swizzled, border color only
spec/ext_packed_float/texwrap formats bordercolor/gl_r11f_g11f_b10f, border color only
spec/ext_texture_shared_exponent/texwrap formats bordercolor-swizzled/gl_rgb9_e5, swizzled, border color only
spec/ext_texture_shared_exponent/texwrap formats bordercolor/gl_rgb9_e5, border color only
spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_alpha8_snorm, swizzled, border color only
spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_intensity8_snorm, swizzled, border color only
spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_luminance8_alpha8_snorm, swizzled, border color only
spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_luminance8_snorm, swizzled, border color only
spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_r8_snorm, swizzled, border color only
spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_rg8_snorm, swizzled, border color only
spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_rgb8_snorm, swizzled, border color only
spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_rgba8_snorm, swizzled, border color only
spec/ext_texture_snorm/texwrap formats bordercolor/gl_alpha8_snorm, border color only
spec/ext_texture_snorm/texwrap formats bordercolor/gl_intensity8_snorm, border color only
spec/ext_texture_snorm/texwrap formats bordercolor/gl_luminance8_alpha8_snorm, border color only
spec/ext_texture_snorm/texwrap formats bordercolor/gl_luminance8_snorm, border color only
spec/ext_texture_snorm/texwrap formats bordercolor/gl_r8_snorm, border color only
spec/ext_texture_snorm/texwrap formats bordercolor/gl_rg8_snorm, border color only
spec/ext_texture_snorm/texwrap formats bordercolor/gl_rgb8_snorm, border color only
spec/ext_texture_snorm/texwrap formats bordercolor/gl_rgba8_snorm, border color only
spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_sluminance8, swizzled, border color only
spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_sluminance8_alpha8, swizzled, border color only
spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_srgb8, swizzled, border color only
spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_srgb8_alpha8, swizzled, border color only
spec/ext_texture_srgb/texwrap formats bordercolor/gl_sluminance8, border color only
spec/ext_texture_srgb/texwrap formats bordercolor/gl_sluminance8_alpha8, border color only
spec/ext_texture_srgb/texwrap formats bordercolor/gl_srgb8, border color only
spec/ext_texture_srgb/texwrap formats bordercolor/gl_srgb8_alpha8, border color only

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 08:13:28 +02:00
Samuel Pitoiset
994253b400 radv/gfx10: add missing conversions for 16-bit exports
This fixes
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_*

Found with RADV_DEBUG=checkir

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-16 08:12:34 +02:00
Samuel Pitoiset
d8844533af radv: remove unused code in radv_export_param()
It was hack for geometry shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-16 08:12:20 +02:00
Dave Airlie
6d50dcd80f radv/gfx10: don't set array pitch field on images
Setting this seems to be broken, amdvlk only sets it for quilted
textures which I'm not sure what those are.

Fixes dEQP-VK.glsl.texture_functions.query.texturesize*3d*

Fixes: bf11f1c3a4 ("radv/gfx10: add gfx10_make_texture_descriptor")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-16 10:41:27 +10:00
Vinson Lee
d1a55d9559 lima/ppir: Fix assert condition in ppir_codegen_encode_branch.
Fixes: af0de6b91c ("lima/ppir: implement discard and discard_if")
Reported-by: Coverity Scan
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-07-15 23:48:34 +00:00
Eric Anholt
82dc168f51 docs: Tell people how to easily generate the Fixes lines.
v2: Include '-s' to suppress the diff.
v3: use the git config command (Ken), use &lt; (Eric)

Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-15 16:29:31 -07:00
Caio Marcelo de Oliveira Filho
1210e8caaf spirv: Ignore ArrayStride for storage classes that should not use it
The stride was already overriden when using
lower_workgroup_access_to_offsets, so elaborate a bit the commentary
there.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-15 16:18:57 -07:00
Caio Marcelo de Oliveira Filho
026cfa1099 spirv: Fix stride calculation when lowering Workgroup to offsets
Use alignment to calculate the stride associated with the pointer
types.  That stride is used when the pointers are casted to arrays.

Note that size alone is not sufficient, e.g. struct { vec2 a; vec1 b;
} will have element an element size of 12 bytes, but the stride needs
to be 16 bytes to respect the 8 byte alignment.

Fixes: 050eb6389a "spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-15 16:18:46 -07:00
Alyssa Rosenzweig
329799257b panfrost/ci: Blacklist flush finish tests
We don't implement batch splitting quite yet which is necessary for the
ludicrous number of draw calls these tests invoke. Blacklist them for
now.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:16:19 -07:00
Alyssa Rosenzweig
c1125d0935 panfrost: Don't leak oversized transient allocations
When we allocate them, we allocate with two references accidentally,
causing them to leak uncontrollably.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
48f51e9dbb panfrost: Implement panfrost_bo_cache_evict_all
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
f02278ae87 panfrost: Implement panfrost_bo_cache_get
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
525e5dc4ed panfrost: Implement panfrost_bo_cache_put
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
9034b5586c panfrost: Add pan_bucket helper
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
eb398683d7 panfrost: Implement pan_bucket_index helper
We'll use this whenever we need to lookup a bucket.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
270733fe6a panfrost: Add BO cache data structure
Linked list of panfrost_bo* nested inside an array of buckets.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
f3464f7987 panfrost: Describe BO cache architecture
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
f3b7e1ddc7 panfrost: Stub out panfrost_bo_cache_evict
This destructor will be used to legitimately free the BOs, now that a BO
free with cacheable=0 is only a "fake" free.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
74ad5f89f8 panfrost: Stub out panfrost_bo_cache_put
..so we can intercept the BO free.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig
b5a28f61ae panfrost: Stub out panfrost_bo_cache_get
We will use this function to fetch cached BOs instead of freshly
allocating them.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:55 -07:00
Alyssa Rosenzweig
fea953e6c2 panfrost: Don't leak the blend CSO hash table
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:55 -07:00
Alyssa Rosenzweig
07a1f3d120 panfrost: Cleanup after scoreboarding
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:55 -07:00
Alyssa Rosenzweig
fae790ecfc panfrost: Allocate UBOs on the stack, not the heap
Saves a call to calloc (the maximum size is small and known at
compile-time) and fixes a leak.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 16:12:55 -07:00
Jason Ekstrand
0ba508d7a3 nir,intel: Add support for lowering 64-bit nir_opt_extract_*
We need this when doing full software 64-bit emulation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309
Fixes: cbad201c2b "nir/algebraic: Add missing 64-bit extract_[iu]8..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-15 16:08:37 -05:00
Jason Ekstrand
7a19e05e8c nir/opt_if: Clean up single-src phis in opt_if_loop_terminator
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111071
Fixes: 2a74296f24 "nir: add opt_if_loop_terminator()"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-15 19:58:51 +00:00
Pierre-Eric Pelloux-Prayer
ed98f8a63a radeonsi: verify buffer_offset value before using it
This buffer_ofset can come directly from the application (e.g: when using
glVertexAttribPointer) and can contain an invalid value.

st_atom_array already makes sure that if it's not negative so all that's left
is to verify that it's smaller that the buffer size.

Bugs related to this issue:

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105251#c52
Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109693
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-15 15:22:28 -04:00
Pierre-Eric Pelloux-Prayer
a9655f36fe st/mesa: verify that vertex buffer offset isn't negative
For drivers supporting PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSET the buffer_offset value
will be interpreted as an signed int.

An example of application code causing a negative offset:

            float b[] = { ... }; // 3 float for pos, 3 for color
            glBufferData(GL_ARRAY_BUFFER, ..., b, ...);
            glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 6 * sizeof(float), 0);
            glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 6 * sizeof(float), &b[3]);
                                                                                ^
                                                                    should be 3 * sizeof(float)

The offset is a ptr so when interpreted as a signed int it can be negative.

This commit adds a verification that (int) buffer_offset is not negative - this would
indicate an application bug. Since it's too late to emit a GL_INVALID_VALUE error,
we replace the negative offset by 0 and emit a debug message.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-15 15:22:25 -04:00
Marek Olšák
ce04fbf67c st/mesa: don't invalidate a buffer range that is mapped
This is needed to fix an issue with OpenGL when a buffer is mapped and
BufferSubData is called. In this case, we can't invalidate the buffer range.
2019-07-15 14:58:23 -04:00
Marek Olšák
fc4302d1df gallium: use MAP_DIRECTLY to mean supression of DISCARD in buffer_subdata
This is needed to fix an issue with OpenGL when a buffer is mapped and
BufferSubData is called. In this case, we can't invalidate the buffer range.
2019-07-15 14:58:23 -04:00
Kenneth Graunke
5e76c99923 iris: Better handle decoder base addresses
It can be useful to call the decoder on a single batch.  But, that batch
may not contain STATE_BASE_ADDRESS, at which point the decoder will have
no idea how to find any buffers.  We can initialize the two static bases
at the beginning of time, so it has them even if it never sees SBA.

Surface base address changes dynamically, possibly in the middle of a
batch.  So we update it at the start of each batch, making it always
start at the value we inherited from the previous one.  SBA commands
inside the batch can update it to a proper value.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-15 11:49:19 -07:00
Samuel Pitoiset
ed12be1b8f radv/gfx10: enable OC_LDS_EN for NGG GS if the ES stage is TES
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 20:05:21 +02:00
Bas Nieuwenhuizen
d4f0f1a6e2 anv: Add android dependencies on android.
Specifically needed for nativewindow for some VK_EXT_external_memory_android_hardware_buffers
functions, where we call into some AHardwareBuffer functions.

The legacy Android ext did not have us call into any Android function
at all and hence it was not noticed.

Fixes: 755c633b8d "anv: Fix vulkan build in meson."
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2019-07-15 15:23:43 +00:00
Alyssa Rosenzweig
0b83005807 panfrost: Advertise more depth/stencil formats
Fixes a regression in glmark's shadow/refract scenes.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:35 -07:00
Alyssa Rosenzweig
1aaf68d120 panfrost/mfbd: Add Z32 rendering support
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:35 -07:00
Alyssa Rosenzweig
f8e2219b08 panfrost: Fix blend_cso if nr_cbufs == 0
Fixes: 46396af1ec ("panfrost: Refactor blend infrastructure")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:35 -07:00
Alyssa Rosenzweig
318d641cd9 panfrost: Cleanup shader upload code
The old algorithm is still used (and the same issue -- namely, leaking
all shaders -- applies) but we're way more concise about it since we're
*only* using the routine for shaders nowadays; everything else is a
BO-proper or transient.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig
1ffca961ab panfrost: Remove all old allocators
With the new refactor, this all becomes dead code.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig
9981b6ef0f panfrost: Use transient memory for occlusion queries
These only last a frame anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig
594b47d917 panfrost: Remove bizarre hack
I don't think this is still necessary, and if it is, we'll have to
figure out how to fix it the right way.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig
d375d127a9 panfrost: Upload vertex descriptors to *transient* memory
It's not legal to reuse the vertex shader descriptor across frames now
that we patch it at draw-time, so upload to transient memory.

Ideally, we could be smarter about this such that subsequent draws with
the same vertex shader and same patched state would reuse the
descriptor, but for now, let's simply achieve correctness.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig
c6b59db5b4 panfrost: Delay resource mmaps
We use the new PAN_ALLOCATE_DELAY_MMAP flag to only map resources
on-demand, which should avoid mapping FBOs.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig
bd4986bafa panfrost: Cleanup PAN_ALLOCATE_*
While we're at it, prompted by a semantics issue around INVISIBLE, also
add a separate DELAY_MMAP flag.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig
2f783ede02 panfrost/drm: Don't mmap INVISIBLE buffers
On the new kernel, mmaping doesn't *hurt* per se, but it's still
wasteful for buffers explicitly marked as not needing an mmap.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-15 08:03:34 -07:00
Lionel Landwerlin
c9c8c2f7d7 anv: fix crash in vkCmdClearAttachments with unused attachment
anv_render_pass_compile() turns an unused attachment into a NULL
depth_stencil_attachment pointer so check that pointer before
accessing it.

Found with updates to existing CTS tests.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 208be8eafa ("anv: Make subpass::depth_stencil_attachment a pointer")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
2019-07-15 16:47:41 +03:00
Samuel Pitoiset
b650f3d197 radv/gfx10: export the PrimitiveID for ES stages (VS or TES)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 11:30:10 +02:00
Samuel Pitoiset
8175f6269b radv/gfx10: declare an external symbol for the ESGS ring
It will be used for stream output but for now only declares it
if VS and if the PrimitiveID needs to be exported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 11:30:08 +02:00
Samuel Pitoiset
f0a90eddb6 radv/gfx10: allocate ESGS ring space for exporting PrimitiveID
Only VS needs that. We shouldn't hardcode these values but
that's complicated to not do that for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 11:30:05 +02:00
Samuel Pitoiset
4478f14327 radv/gfx10: fix crash when emitting NGG GS prologue
ac_nir_context is initialized after the driver emits the NGG GS
prologue so it's likely to crash.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 08:51:53 +02:00
Vasily Khoruzhick
eb862c2365 lima/ppir: Fix branch codegen
"unknown_2" field is actually a size of instruction that branch
points to. If it's set to a smaller size than actual instruction
branch behavior is not defined (and it usually wedges the GPU).

Fix it by setting this field correctly.

Fixes: af0de6b91c ("lima/ppir: implement discard and discard_if")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-07-14 19:49:14 -07:00
Vasily Khoruzhick
8f0160ca24 lima/ppir: Fix assert condition in ppir_codegen_encode_discard
Fixes: af0de6b91c ("lima/ppir: implement discard and discard_if")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-07-14 19:48:55 -07:00
Jonathan Marek
4e102a6de7 etnaviv: fix incorrect varying interpolation
This corresponds to what the GC3000 blob does. The USED / UNUSED enums are
wrong, at least for GC2000/GC3000.

Without this the 3rd texture component is not interpolated correctly (flat?)
in the following test (and others):

dEQP-GLES2.functional.texture.mipmap.cube.generate.rgba8888_nicest

Strangely, when the texture is sampled from OpenGL it works correctly,
the problem only shows up for sampling by gallium/blitter. This fixes other
cube map tests which use util_blitter_blit.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-14 10:34:17 -04:00
Jonathan Marek
a9e78a44d1 etnaviv: reduce rs alignment requirement for two pixel pipes GPU
The rs alignment doesn't have to be multiplied by # of pixel pipes.

This works on GC2000 which doesn't have the SINGLE_BUFFER feature.

This fixes some cubemaps (NPOT / small mipmap levels) because aligning by 8
breaks the expected alignment of 4 for tiled format. We don't want to mess
with the alignment of tiled formats.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-14 10:34:17 -04:00
Jonathan Marek
2c393053bf etnaviv: fix nearest_linear / linear_nearest filtering on GC3000
The MIN filter is never used when not using mipmaps. This fixes that.

Interestingly, only GC3000 needs this (GC2000 works without this fix).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-14 10:34:17 -04:00
Jonathan Marek
63efb6ec6c etnaviv: fix nearest filtering
ROUND_UV rounding breaks nearest filtering.

Enable it only when nearest filtering isn't used.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-14 10:34:17 -04:00
Bas Nieuwenhuizen
1f58b6ffef radv/gfx10: Fix DCC clears.
Looks like if the reg clear bit is set, the hwardware does not use the 0/1
clears for textures.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-14 07:30:04 +00:00
Vinson Lee
730ceeddb5 meson: Add dep_thread dependency.
Fix this build error on Ubuntu 18.04.

/usr/bin/ld: src/util/libmesa_util.a(u_cpu_detect.c.o): undefined reference to symbol 'pthread_once@@GLIBC_2.2.5'

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110663
Suggested-by: Eric Engestrom <eric@@engestrom.ch>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
2019-07-13 11:39:26 -07:00
Eric Anholt
11aa32a447 gitlab-ci: Build i386 and ARM drivers in surfaceless mode.
I don't particularly care about getting x86/ARM cross-build coverage
of all the window systems, but we do want to be building src/mesa/
(for x86 asm) and gallium drivers (for vc4 NEON asm).  I'm also hoping
to use these build products for testing freedreno on actual HW (which
we do using surfaceless).

This increases the docker image from 1.4G to 1.5G.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
2019-07-13 13:46:24 +00:00
Andreas Baierl
ce81c9a2e1 lima: Fix compiler warnings for unused functions.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-07-13 13:15:05 +00:00
Caio Marcelo de Oliveira Filho
09c4037dda anv: Fix pool allocator when first alloc needs to grow
When using softpin, the first allocation was not calculating the
padding and offset correctly for the case the first allocation needed
to grow.  We were missing initialize the state.end right after
expanding the pool for the first time.

This is not a problem for non-softpin since there we don't use
leftover padding so the ends would re-arrange incrementally.

This fixes running dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 in
SKL -- the test uses a shader larger than the initial size for the
instruction pool.

Fixes: dfc9ab2ccd "anv/allocator: Add padding information."
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-12 22:25:37 -07:00
Kenneth Graunke
aa13921079 mesa: Port errors.c to util/list.h instead of simple_list.
There is widespread consensus that simple_list should go away.
This patch converts one more use to the modern kernel-style list.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-12 21:58:40 -07:00
Jason Ekstrand
974fabe810 intel: Run the optimization loop before and after lowering int64
For bindless SSBO access, we have to do 64-bit address calculations.  On
ICL and above, we don't have 64-bit integer support so we have to lower
the address calculations to 32-bit arithmetic.  If we don't run the
optimization loop before lowering, we won't fold any of the address
chain calculations before lowering 64-bit arithmetic and they aren't
really foldable afterwards.  This cuts the size of the generated code in
the compute shader in dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 by
around 30%.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-13 02:59:28 +00:00
Alyssa Rosenzweig
7103baf01f panfrost/decode: Drop _replay prefix
We don't even support replay anymore; this is just wasting characters
and adding clutter.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 16:23:53 -07:00
Alyssa Rosenzweig
0d5abfdec5 panfrost/decode: Drop _name suffixes
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 16:23:53 -07:00
Alyssa Rosenzweig
0c1874adad panfrost/decode: Add MEMORY_PROP_DIR variant
This allows dumping memory properties directly without dereferencing an
address, allowing us to fix more -Waddress-of-packed-member warnings.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 16:23:52 -07:00
Alyssa Rosenzweig
9ffe061c5e panfrost/decode: Copy embedded structs before using
Fixes some, but not all, warnings from -Waddress-of-packed-member

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 16:23:52 -07:00
Alyssa Rosenzweig
23b230d72f panfrost/decode: Remove pandecode_decode_fbd_type
It is unused.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 16:23:52 -07:00
Alyssa Rosenzweig
9eea8423a0 panfrost/midgard: Use generic outmod type
It could be midgard_outmod_float or midgard_outmod_int; don't assume
it's one or the other. Fixes -Wenum-conversion warnings.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 16:23:52 -07:00
Alyssa Rosenzweig
e173d6b1b1 panfrost: Precompute scoreboard dependents
Mali job dependency graphs, at least for GLES3.0, have the special
property that a given node will only have at most a single dependent.
This allows us to efficiently precompute the dependent array and
replace an inner loop's O(N) search with an O(1) lookup, bringing the
algorithmic complexity of scoreboarding from O(N^2) to O(N).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 16:22:15 -07:00
Alyssa Rosenzweig
b68778e6de panfrost: Remove transient pool abstraction
Now that it has been totally replaced by the borrow mechanism, it is now
unused code.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig
ee32700f37 panfrost: Subdivide fixed-size transient slabs
The whole purpose of the transient memory model is to make subdivision
stupidly easy, so let's handle that.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig
37097b2f38 panfrost: Recycle fixed-size transient BOs
The usual case. We use the bitset to mark freedom and seize it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig
0f5ad9efcc panfrost: Bookkeep transient indices
The batch now temporarily possesses the transient buffer, so it'll need
to remember that to free it later.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig
00c9a1cb75 panfrost: Rewrite allocate_transient with new abstraction
We use a fixed size slab if we can, otherwise we create a dedicated
("oversized") BO and add that to the job. In the latter case we'll get
reference counting for free so we can forget about this corner case for
the rest of the series.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig
ba02cf0e75 panfrost: Add pan_bo_for_screen helper
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig
330cd057ad panfrost: Add panfrost_transient_bo array
We would like transient allocations to occur on the screen (borrowed by
the batch) rather than on the context. Add fields to track this.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 15:31:47 -07:00
Alyssa Rosenzweig
718ebfa225 panfrost: Don't upload vertex/tiler twice
The latter upload is correct, but the former upload is unassociated with
any particular FBO and therefore becomes orphaned. We do have to upload
at draw-time at the latest, if we haven't by then.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 15:31:47 -07:00
Alyssa Rosenzweig
085004cc2c panfrost/drm: Check allocation size is positive
Zero-sized allocations will fail with an unhelpful errno from the
kernel; check size explicitly in userspace before it gets that far.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 15:31:47 -07:00
Neil Roberts
8419621176 mesa/glspirv: Validate that compute shaders are not linked with other stages
The test is based on link_shaders().

For example, it allows the following test (when run on SPIR-V mode) to
pass:
   spec/arb_compute_shader/linker/mix_compute_and_non_compute.shader_test

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:42 +02:00
Neil Roberts
022e9ddd1a mesa/glspirv: Validate that there is a VS when there is a TCS, TES or GS
The shader combination tests are copied from link_shaders().

For example, it allows the following tests (when run on SPIR-V mode) to
pass:
   spec/arb_tessellation_shader/linker/no-vs
   spec/arb_tessellation_shader/linker/tcs-no-vs
   spec/arb_tessellation_shader/linker/tes-no-vs
   spec/glsl-1.50/linker/gs-without-vs

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Alejandro Piñeiro
e4210b93e4 i965: don't use disk cache with SPIR-V shaders
Right now we don't support disk cache for SPIR-V shaders (from
ARB_gl_spirv), so let's avoid writing the program data to or reading
it from the disk if any in-use shaders use SPIR-V.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Alejandro Piñeiro
bb3bbdfbbd glsl/shader_cache: handle SPIR-V shaders
Right now we don't have cache support for SPIR-V shaders (from
ARB_gl_spirv). Right now they are properly skipped because they fall
on the ff shader code path (no key, no name), but it would be better
to update current comments, and add some guards.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Arcady Goldmints-Orlov
637b168470 nir/linker: Initialize UniformDataDefaults when using SPIR-V
Allocate UniformDataDefaults and fill in the data defaults when
linking a SPIR-V program. Among other things, this allows program
serialization to work.

It allows the following piglit test (when run on SPIR-V mode) to pass:
  spec/arb_get_program_binary/execution/uniform-after-restore.shader_test

v2: use memcpy to initialize UniformDataDefaults

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Arcady Goldmints-Orlov
761b0fe95f glsl/serialize: Update write_program_resource_data() to handle NULL input and output variable names
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Arcady Goldmints-Orlov
c3122d2431 glsl/serialize: Handle NULL uniform name in write_uniforms()
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
0baa553fab mesa/main: Fix UBO/SSBO ACTIVE_VARIABLES query (ARB_gl_spirv)
When querying MAX_NUM_ACTIVE_VARIABLES, NUM_ACTIVE_VARIABLES and
ACTIVE_VARIABLES over SSBO and UBO interfaces, we filter the variables
which are active using the variable's name and looking for it in the
program resource list. If it is in the program resource list, the
variable will be considered active.

However due to ARB_gl_spirv where name reflection information is not
mandatory, we can use the UBO/SSBO binding and variable offset to
filter which variables which are active.

v2: use RESOURCE_UBO/UNI macros instead of direct castings, update
    comment (Alejandro)

v3: Change signature of _mesa_program_resource_find_active_variable
    to simplify calling it. Also, squash the fix for find_binding_offset
    for arrays of blocks (Arcady)

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
161de77e0f mesa/shader_query: Fix LOCATION_INDEX query (ARB_gl_spirv)
When querying GL_LOCATION_INDEX using glGetProgramResourceiv
we already know the index of the resource, we do not need to find
it using the name, which is convenient for shaders coming from
SPIR-V binaries where names are optional.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
8818553f18 mesa/shaderapi: Fix TRANSFORM_FEEDBACK_VARYING program query
Fixes the program queries API (glGetProgramiv):
TRANSFORM_FEEDBACK_VARYINGS and TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH
in two cases:

  1. ARB_enhaced_layouts:

The queries were not working for GLSL shaders which specify the
varyings using enhanced layouts. We were returning the info as if the
varyings could only be specified using the API.

  2. ARB_gl_spirv:

TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH should return 1 if there is no
name reflection information available.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
8792abff9d mesa/uniforms: Fix GetUniformLocation (ARB_gl_spirv)
From the ARB_gl_spirv specification, glGetUniformLocation should
return -1 when no name reflection is available.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
96d6156678 mesa/shader_query: Fix NAME_LENGTH queries (ARB_gl_spirv)
For shaders constructed from SPIR-V binaries, it is possible that
no name reflection information is available. In that case,

 - glGetProgramInterfaceiv(.., pname=MAX_NAME_LENGTH, ..)
 - gletProgramResourceiv(.., props=NAME_LENGTH, ..)

should return 1.

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Alejandro Piñeiro
3ebd60b491 mesa: Fix ACTIVE_*_MAX_LENGTH program queries (ARB_gl_spirv)
Since ARB_gl_spirv it is possible to miss a lot of name reflection
information, so it is needed to add NULL name checks for several
queries, and return a specific value on those cases. This commit add
them for ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH,
ACTIVE_ATTRIBUTE_MAX_LENGTH and ACTIVE_UNIFORM_MAX_LENGTH.

From ARB_gl_spirv spec:

   "If pname is ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH, the length of
    the longest active uniform block name, including the null
    terminator, is returned. If no active uniform blocks exist, zero
    is returned. If no name reflection information is available, one
    is returned.

    If pname is ACTIVE_ATTRIBUTE_MAX_LENGTH, the length of the longest
    active attribute name, including a null terminator, is returned.
    If no active attributes exist, zero is returned. If no name
    reflection information is available, one is returned.

    If pname is ACTIVE_UNIFORM_MAX_LENGTH, the length of the longest
    active uniform name, including a null terminator, is returned. If
    no active uniforms exist, zero is returned. If no name reflection
    information is available, one is returned."

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
cafc1a40d4 nir/types: Add glsl_type_is_unsized_array helper
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
bfc5e46746 nir/linker: Fill TOP_LEVEL_ARRAY_SIZE and STRIDE
From the ARB_program_interface_query specification:

    "For the property TOP_LEVEL_ARRAY_SIZE, a single integer
    identifying the number of active array elements of the top-level
    shader storage block member containing to the active variable is
    written to <params>.  If the top-level block member is not
    declared as an array, the value one is written to <params>.  If
    the top-level block member is an array with no declared size, the
    value zero is written to <params>."

    "For the property TOP_LEVEL_ARRAY_STRIDE, a single integer
    identifying the stride between array elements of the top-level
    shader storage block member containing the active variable is
    written to <params>.  For top-level block members declared as
    arrays, the value written is the difference, in basic machine
    units, between the offsets of the active variable for consecutive
    elements in the top-level array.  For top-level block members not
    declared as an array, zero is written to <params>."

v2: move top_level_array_size and stride into nir_link_uniforms_state
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
ae2ea5ec1f nir/linker: Compute the offset for non-trivial uniform types.
ARB_gl_spirv points that the offset must be explicit, however this is
true for 'root' types. For complex types, like struct members or
arrays of arraya, it needs to be computed.

We are not using the offset stored in the gl_buffer_variables during
the uniform blocks linking because currently we do not have a way to
relate a gl_buffer_variable with its corresponding gl_uniform_storage.
The GLSL path uses the name for that, but we can not rely on that
because names are optional in SPIR-V.

Notice that uniforms non-backed by a buffer object will have an offset
equal to -1, like in the GLSL path.

v2: add offset and var_is_in_block as per-variable state in
    nir_link_uniforms_state (Arcady)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
e15c663d8e nir/linker: Add atomic counters to the program resource list
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
e1464a1cf8 nir/linker: Add XFB resources to the program resource list
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
53087a89ac nir/linker: Add BUFFER_VARIABLEs to the prog resource list
v2: use link_util_should_add_buffer_variable() (Arcady)
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
ffdb44d3a0 nir/linker: Add inputs/outputs to the program resource list
v2: added TODO comment hinting possible future refactoring of
    nir_build_program_resource_list and build_program_resource_list,
    to avoid code duplication (Alejandro, to explicitly reflect a
    valid concern from Timothy during the review).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-12 23:42:41 +02:00
Alejandro Piñeiro
691cee751a nir/linker: add ubo/ssbo to the program resource list
v2: "nir/linker: Use the stageref when adding UBO/SSBO resources"
     squashed on this one (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-12 23:42:41 +02:00
Antia Puentes
a638971929 nir/linker: Fill the uniform's BLOCK_INDEX
Binding comparison is used to determine the block the uniform is part
of. Note that to do the binding comparison we need the information in
UniformBlocks[] and ShaderStorageBlocks[] to be available, so we have
to call gl_nir_link_uniform_blocks() before linking the uniforms.

v2: add missing break (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-12 23:42:41 +02:00
Samuel Pitoiset
f239e22813 radv/gfx10: enable 1D textures
Mirror RadeonSI. This also fixes crashes in addrlib.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 18:25:45 +02:00
Andres Gomez
f4d2be03b1 intel/compiler: remove abandoned comments
c8665005: ("intel/compiler: Don't always require precise lowering of flrp")
forgot to remove some comments that didn't apply any more after the
change.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
2019-07-12 16:15:20 +00:00
Andres Gomez
9aadd5d688 nir/compiler: keep same bit size when lowering with flrp
This was probably not caught before because no supported test was
exercising the flrp lowering with other bit size different than 32.

With the arrival of VK_KHR_shader_float_controls we will have some of
those and, unless we keep the bit size, we will end with something
like:

../src/compiler/nir/nir_builder.h:420: nir_builder_alu_instr_finish_and_insert: Assertion `src_bit_size == bit_size' failed.

Fixes: 158370ed2a ("nir/flrp: Add new lowering pass for flrp instructions")
Fixes: ae02622d8f ("nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
2019-07-12 16:15:20 +00:00
Jason Ekstrand
16842b2391 anv: Properly compute image usage in CreateImageView
With separate stencil usage, we can't just grab the usage from the image
directly and have to consider the per-aspect usage instead.

Fixes: 1be38f9178 "anv:Use VK_EXT_separate_stencil_usage to avoid..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-12 16:13:48 +00:00
Samuel Pitoiset
b393b2ce95 radv/gfx10: emit DISABLE_CONSERVATIVE_ZPASS_COUNTS
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:12 +02:00
Samuel Pitoiset
8cc4e4a81e radv/gfx10: init more registers in the graphics preamble
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:12 +02:00
Samuel Pitoiset
e68b55f5e3 radv/gfx10: set HS/GS/CS.WGP_MODE
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:12 +02:00
Samuel Pitoiset
5d5e26230a radv/gfx10: emit GE_PC_ALLOC
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Samuel Pitoiset
df062afa03 radv/gfx10: enable vertex shaders without export parameters
GFX10 allows this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Samuel Pitoiset
3f76c0f47c radv/gfx10: launch 2 compute waves per CU before going onto the next CU
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Samuel Pitoiset
e631d65fc6 radv: use ac_get_compute_resource_limits()
No behaviour change.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Samuel Pitoiset
e510c5ee3b ac: import ac_get_compute_resource_limits() from RadeonSI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:11 +02:00
Alyssa Rosenzweig
5f4f8aec74 panfrost: Initialize shift/extra_flags
Don't rely on them being preinitialized to zero; this can cause junk to
appear on the wire.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 07:38:37 -07:00
Alyssa Rosenzweig
6d8490f900 panfrost: Fix build warnings
A bunch of these are from asserts not being compiled in 32-bit mode
(once Erik's ASSERTABLE stuff is merged, we'll want to switch).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-12 07:38:37 -07:00
Samuel Pitoiset
37aefb2be1 radv/gfx10: invalidate everything in L2 when shaders read data
This includes metadata as well. On GFX10, we have to invalidate
the L2 metadata cache when shaders read DCC.

Note that we still have to implement GFX10 coherency by
introducing INV_L2_METATADA but for now just flush L2.

This fixes a corruption with DCC and Talos.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 14:08:12 +02:00
Samuel Pitoiset
4e38322dd8 radv/gfx10: fix wrong emission of GE_CNTL
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 12:15:08 +02:00
Samuel Pitoiset
219d6939df radv: add more assertions to make sure packets are correctly emitted
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 12:15:06 +02:00
Alejandro Piñeiro
85b78f96a6 v3d: use inc/dec tmu operation with image atomic sub/add of 1
This allows to remove a mov of 1/-1, as it is implicit with the
operation.

As with atomic inc/dec/add, usual shader-db set doesn't include any
GLES shader using it. So using as workaround vk-gl-cts shaders, we get
this:

total instructions in shared programs: 1217013 -> 1217006 (<.01%)
instructions in affected programs: 53 -> 46 (-13.21%)
helped: 2
HURT: 0

One of the helped shader went from 40 to 34 instructions.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:51:22 +02:00
Alejandro Piñeiro
2e22879115 v3d: refactor some code from v3d40_vir_emit_image_load_store
And moved to new auxiliar method v3d40_image_load_store_tmu_op,
equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:49:29 +02:00
Alejandro Piñeiro
934ce48db8 v3d: use inc/dec tmu operation with atomic sub/add of 1
Among other things, this avoid the need of loading 1/-1 constants (so
one less operation).

The removed comment suggest the option of adding support on NIR for
inc/dec. Intel just uses an auxiliar method to get which hw operation
is needed, so no lowering is needed. And at the same time, being so
small, seems unreasonable to try to add a general one on NIR
itself. It is more easy to just adapt the method here (that is what
the patch does right now).

It is worth to note that we are not getting any change on shader-db
stats because all those methods are used on the usual shader-db set
with shaders needing GLSL > 4.2. In general there aren't too many GLSL
ES 3.1 tests.

As an alternative, we captured the GLES3/GLSL31/GLS32 used on
vk-gl-cts, even if that is not a real life usage of shaders. With
those we get the following:

total instructions in shared programs: 1217022 -> 1217013 (<.01%)
instructions in affected programs: 117 -> 108 (-7.69%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1
helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09%
95% mean confidence interval for instructions value: -2.07 -0.93
95% mean confidence interval for instructions %-change: -10.54% -5.64%
Instructions are helped.

Note that the shaders helped are really low because most of the
vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute
shaders. Although right now there is a branch around with CS support,
the usual is doing the stats against master.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:48:40 +02:00
Alejandro Piñeiro
3912a32a79 v3d: remove redefinition of tmu operations on nir_to_vir
They are already defined, although is a slightly different format on
the generated packet headers, so it was needed to change how it is
used on nir_to_vir.

In addition to allow to remove some duplicated headers, it will allow
to define just one get_op_for_atomic_add aux method later to support
using inc/dec instead of add of 1/-1.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:48:17 +02:00
Alejandro Piñeiro
c2ff38d2df v3d: tweak initial comment on pack generator script
As the files it mentions to use as reference has slightly different
names.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 11:48:09 +02:00
Yevhenii Kolesnikov
8c5692b696 glsl/link_varyings: Fix hash table leak
Hash tables were not destroyed at return.

v2: Use ralloc_context (Eric Anholt)

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-12 11:07:08 +03:00
Kenneth Graunke
712ac83033 iris: Simplify devinfo access in calculate_result_on_gpu()
We have devinfo, no need for screen->devinfo.
2019-07-12 00:33:19 -07:00
Iago Toral Quiroga
10d50f2904 v3d: remove unused definitions
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
8e50a9f6cf v3d: move implementation of some intrinsics to separate helpers
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
d69184204e v3d: emit correct lowering for logic ops with RGB10A2 render targets
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
7bf3676845 v3d: emit correct lowering for logic ops with integer render targets
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
e540775f0c v3d: add lowering for OpenGL logic operations
This implements support for OpenGL logic operations by emitting code to read
from the TLB if needed and blending the fragment output accordingly. It is
similar to VC4's blend lowering pass, but exclusive to logic operations, since
blending is otherwise supported in hardware.

The pass doesn't handle MSAA targets yet.

Fixes the following piglit tests:
spec/!opengl 1.0/gl-1.0-logicop/*
spec/!opengl 1.1/gl-1.1-xor
spec/!opengl 1.1/gl-1.1-xor-copypixels

It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which
is rendered via glamor using the XOR logic operation.

v2: fix checks for allowed variable location and maximum render target (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
7c1d708911 v3d: acquire scoreboard lock before first tlb read
Until now we have always been emitting our scoreboard locks on the last thread
switch to improve parallelism. We did this by emitting our last thread switch
right before our tlb writes at the very end of the program, where we know that
we are outside control flow.

Unfortunately, this strategy is not valid when we have tlb color reads too, as
these will happen before this point in the program and can happen inside
control flow.

To fix this we always emit a thread switch before the first tlb load and if we
see additional thread switches after that point, we change the strategy to lock
on the first thread switch.

v2: change the solution so it is expected to work in more scenarios (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
47d7c80dc7 v3d: implement tile buffer color read intrinsic
We will be emitting this intrinsic to signal TLB color loads when we implement
OpenGL logic operations, where we need to blend the fragment shader color
output with the existing color in the render target.

Per-sample TLB reads are not supported yet.

v2: fix the offset into the color_reads array (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
b0eec9e27d nir: add a new v3d-specific intrinsic for tile buffer color reads
This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.

v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
6af1bdefa9 v3d: fix size of color_reads and sample_colors arrays
We need to scale the size of these arrays to consider up to
V3D_MAX_DRAW_BUFFERS render targets and 4 components per color.

v2: we want to store each color component separately, so scale by 4 too.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
0279ac6e51 v3d: add color formats and swizzles to the fragment shader key
We are going to need these very soon to emit correct reads from the tlb
to implement logic operations.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
d26b35ba44 v3d: add helpers to emit ldtlb and ldtlbu signals
The ldtlbu version will read an implicit uniform with the TLB read
specifier and should be used for the first read in a sequence
of TLB reads (unless the default configuration is valid, in which
case we can use ldtlb). The ldtlb version is used for any subsequent
TLB read in the sequence.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
aff8885cf9 v3d: handle tlb read dependency tracking as if they were writes
Tile buffer reads are emitted as ordered sequences and cannot be reordered.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
4793e2c888 v3d: instructions with the ldtlb and ldtlbu signals are tlb instructions
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
83a66e10de v3d: tlb loads cannot be removed
Loads from the tile buffer are emitted in ordered sequences so
we cannot eliminate or reorder any of them.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
08f4dc3adc v3d: the ldtlbu signal reads an implicit uniform
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Iago Toral Quiroga
271bc8acfb v3d: handle ldtlb and ldtlbu signals during disassembly
We already have code to print these signals but the early return in the code
that checks if any signals are present present was missing the checks for them,
so it would skip printing them unless they were paired with other signals.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Samuel Pitoiset
958ee4c21a radv: report shader stage name when dumping LLVM IR
For debugging purposes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
2b6a089813 radv: tidy up radv_get_shader_name() and add NGG stages
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
ffd6a979bf radv/gfx10: update OVERWRITE_COMBINER_{MRT_SHARING,WATERMARK}
DCC related, mirror RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
c6fa4de15d radv/gfx10: do not set alignment on the ngg_emit pointer
This is invalid and this fixes a crash in LLVM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
df0a23ad1e radv/gfx10: fix exporting clip/cull distances for GS
This fixes dEQP-VK.clipping.user_defined.clip_distance.*geom*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset
edcd2bc833 radv/gfx10: fix exporting the subpass view index for GS
This fixes dEQP-VK.multiview.*geometry*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:20 +02:00
Timothy Arceri
3043908ccb mesa: save/restore SSO flag when using ARB_get_program_binary
Without this the restored program will fail the pipeline validation
checks when we attempt to use an SSO program.

Fixes: c20fd744fe ("mesa: Add Mesa ARB_get_program_binary helper functions")

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111010
2019-07-12 09:26:53 +10:00
Alyssa Rosenzweig
fe783c5b0c pan/midgard: Correct component count clamping PSIZ
Kind of a funky corner case that does not (as far as I know) apply to
organic shaders from GLES but does pop up in generated shaders from the
fixed-function desktop pipeline.

Fixes: bb483a9166 ("panfrost: Clamp point size")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-11 13:30:55 -07:00
Alyssa Rosenzweig
c4e6d759dd panfrost: Remove unused display target field
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-11 12:48:25 -07:00
Alyssa Rosenzweig
6b9edd2451 panfrost/ci: Update expectations
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-11 12:48:25 -07:00
Samuel Pitoiset
a7b7e94085 radv: only enable the GS copy shader stage if GS is enabled
Ooops.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 21:44:44 +02:00
Eric Anholt
e1fe98cc7d freedreno: Add dependency on the xml build to the winsys.
The screen header includes the common xml, and otherwise we might race
to build before it's done.

Fixes: e03259974e ("freedreno: Generate headers from xml files")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-11 12:01:01 -07:00
Kenneth Graunke
5445c176e2 iris: Disable SIMD32 when using a 16x MSAA framebuffer.
We weren't doing this documented workaround because it's sorta painful.
2019-07-11 11:34:21 -07:00
Ian Romanick
ef7b4fdf3f nir/algebraic: Recognize open-coded flrp(a, b, a)
No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

v2: Remove flrp@64 cases.  Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64.  Should that occur, the cases
can be added back.

All Gen6-Gen9 platforms had similar results. (Skylake shown)
total instructions in shared programs: 15041996 -> 15041184 (<.01%)
instructions in affected programs: 71776 -> 70964 (-1.13%)
helped: 312
HURT: 0
helped stats (abs) min: 2 max: 3 x̄: 2.60 x̃: 3
helped stats (rel) min: 0.36% max: 4.55% x̄: 1.75% x̃: 1.28%
95% mean confidence interval for instructions value: -2.66 -2.55
95% mean confidence interval for instructions %-change: -1.89% -1.61%
Instructions are helped.

total cycles in shared programs: 354303333 -> 354301807 (<.01%)
cycles in affected programs: 433742 -> 432216 (-0.35%)
helped: 206
HURT: 78
helped stats (abs) min: 2 max: 244 x̄: 21.02 x̃: 8
helped stats (rel) min: 0.06% max: 19.59% x̄: 1.72% x̃: 0.82%
HURT stats (abs)   min: 1 max: 220 x̄: 35.95 x̃: 10
HURT stats (rel)   min: 0.07% max: 30.48% x̄: 2.53% x̃: 0.56%
95% mean confidence interval for cycles value: -10.68 -0.06
95% mean confidence interval for cycles %-change: -0.99% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
0c2b3a7fc0 nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form
No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

v2: Convert the pattern directly to flrp.  There were negligible
improvements on Gen4 and Gen5, and Gen11 was actually hurt.  I believe
the problem is this optimization conflicts with the (1-x)*y =>
ffma(-x, y, y) optimization on Gen11.

Skylake
total instructions in shared programs: 15046487 -> 15041996 (-0.03%)
instructions in affected programs: 194681 -> 190190 (-2.31%)
helped: 880
HURT: 20
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17%
95% mean confidence interval for instructions value: -5.25 -4.73
95% mean confidence interval for instructions %-change: -5.11% -4.36%
Instructions are helped.

total cycles in shared programs: 354340839 -> 354303333 (-0.01%)
cycles in affected programs: 1753622 -> 1716116 (-2.14%)
helped: 786
HURT: 182
helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22
helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84%
HURT stats (abs)   min: 1 max: 440 x̄: 37.99 x̃: 9
HURT stats (rel)   min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32%
95% mean confidence interval for cycles value: -45.90 -31.59
95% mean confidence interval for cycles %-change: -3.09% -2.50%
Cycles are helped.

All Gen6-Gen8 platforms had similar results. (Broadwell shown)
total instructions in shared programs: 15055907 -> 15051466 (-0.03%)
instructions in affected programs: 196370 -> 191929 (-2.26%)
helped: 871
HURT: 26
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12%
95% mean confidence interval for instructions value: -5.21 -4.69
95% mean confidence interval for instructions %-change: -4.99% -4.24%
Instructions are helped.

total cycles in shared programs: 387729170 -> 387699745 (<.01%)
cycles in affected programs: 1816409 -> 1786984 (-1.62%)
helped: 788
HURT: 172
helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22
helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76%
HURT stats (abs)   min: 1 max: 404 x̄: 45.59 x̃: 14
HURT stats (rel)   min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43%
95% mean confidence interval for cycles value: -35.69 -25.61
95% mean confidence interval for cycles %-change: -2.88% -2.40%
Cycles are helped.

total fills in shared programs: 34712 -> 34710 (<.01%)
fills in affected programs: 7 -> 5 (-28.57%)
helped: 1
HURT: 0

LOST:   0
GAINED: 2

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
09705747d7 nir/algebraic: Reassociate fadd into fmul in DPH-like pattern
Moving the add to the other end of the sequence allows it to be fused
into an FMA.

Ice Lake
total instructions in shared programs: 17173074 -> 16933147 (-1.40%)
instructions in affected programs: 7938745 -> 7698818 (-3.02%)
helped: 35583
HURT: 90
helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6
helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45%
HURT stats (abs)   min: 1 max: 41 x̄: 2.46 x̃: 1
HURT stats (rel)   min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77%
95% mean confidence interval for instructions value: -6.80 -6.65
95% mean confidence interval for instructions %-change: -5.32% -5.22%
Instructions are helped.

total cycles in shared programs: 360881386 -> 359533568 (-0.37%)
cycles in affected programs: 189489144 -> 188141326 (-0.71%)
helped: 27250
HURT: 6707
helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16
helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35%
HURT stats (abs)   min: 1 max: 3507 x̄: 51.56 x̃: 14
HURT stats (rel)   min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27%
95% mean confidence interval for cycles value: -44.70 -34.68
95% mean confidence interval for cycles %-change: -2.75% -2.65%
Cycles are helped.

total spills in shared programs: 8943 -> 8829 (-1.27%)
spills in affected programs: 625 -> 511 (-18.24%)
helped: 6
HURT: 3

total fills in shared programs: 21815 -> 21719 (-0.44%)
fills in affected programs: 1653 -> 1557 (-5.81%)
helped: 7
HURT: 10

LOST:   11
GAINED: 3

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 15271996 -> 15040882 (-1.51%)
instructions in affected programs: 7193699 -> 6962585 (-3.21%)
helped: 33985
HURT: 30
helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6
helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85%
HURT stats (abs)   min: 1 max: 41 x̄: 4.00 x̃: 3
HURT stats (rel)   min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72%
95% mean confidence interval for instructions value: -6.87 -6.72
95% mean confidence interval for instructions %-change: -5.59% -5.48%
Instructions are helped.

total cycles in shared programs: 355520785 -> 354253799 (-0.36%)
cycles in affected programs: 185869148 -> 184602162 (-0.68%)
helped: 25824
HURT: 6287
helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16
helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41%
HURT stats (abs)   min: 1 max: 3327 x̄: 51.76 x̃: 14
HURT stats (rel)   min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28%
95% mean confidence interval for cycles value: -44.70 -34.21
95% mean confidence interval for cycles %-change: -2.87% -2.76%
Cycles are helped.

total spills in shared programs: 8835 -> 8818 (-0.19%)
spills in affected programs: 613 -> 596 (-2.77%)
helped: 5
HURT: 2

total fills in shared programs: 21738 -> 21744 (0.03%)
fills in affected programs: 1348 -> 1354 (0.45%)
helped: 5
HURT: 11

LOST:   0
GAINED: 12

Haswell
total instructions in shared programs: 13447102 -> 13381508 (-0.49%)
instructions in affected programs: 3770735 -> 3705141 (-1.74%)
helped: 11999
HURT: 29
helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3
helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87%
HURT stats (abs)   min: 3 max: 750 x̄: 54.90 x̃: 3
HURT stats (rel)   min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82%
95% mean confidence interval for instructions value: -5.71 -5.19
95% mean confidence interval for instructions %-change: -2.39% -2.30%
Instructions are helped.

total cycles in shared programs: 376342236 -> 375690458 (-0.17%)
cycles in affected programs: 155699021 -> 155047243 (-0.42%)
helped: 8397
HURT: 2876
helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18
helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49%
HURT stats (abs)   min: 1 max: 15414 x̄: 94.15 x̃: 22
HURT stats (rel)   min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41%
95% mean confidence interval for cycles value: -67.64 -48.00
95% mean confidence interval for cycles %-change: -0.99% -0.74%
Cycles are helped.

total spills in shared programs: 23134 -> 23184 (0.22%)
spills in affected programs: 1675 -> 1725 (2.99%)
helped: 13
HURT: 11

total fills in shared programs: 34550 -> 34686 (0.39%)
fills in affected programs: 1421 -> 1557 (9.57%)
helped: 13
HURT: 11

LOST:   0
GAINED: 11

Ivy Bridge
total instructions in shared programs: 12019642 -> 11987285 (-0.27%)
instructions in affected programs: 1532236 -> 1499879 (-2.11%)
helped: 5522
HURT: 110
helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3
helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88%
HURT stats (abs)   min: 1 max: 750 x̄: 18.07 x̃: 3
HURT stats (rel)   min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15%
95% mean confidence interval for instructions value: -6.25 -5.24
95% mean confidence interval for instructions %-change: -2.43% -2.26%
Instructions are helped.

total cycles in shared programs: 180214667 -> 179761900 (-0.25%)
cycles in affected programs: 31448723 -> 30995956 (-1.44%)
helped: 7191
HURT: 2838
helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17
helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40%
HURT stats (abs)   min: 1 max: 15540 x̄: 64.63 x̃: 24
HURT stats (rel)   min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51%
95% mean confidence interval for cycles value: -53.34 -36.95
95% mean confidence interval for cycles %-change: -0.81% -0.53%
Cycles are helped.

total spills in shared programs: 3599 -> 3642 (1.19%)
spills in affected programs: 1180 -> 1223 (3.64%)
helped: 12
HURT: 2

total fills in shared programs: 4031 -> 4162 (3.25%)
fills in affected programs: 876 -> 1007 (14.95%)
helped: 12
HURT: 2

LOST:   6
GAINED: 5

Sandy Bridge
total instructions in shared programs: 10850686 -> 10822890 (-0.26%)
instructions in affected programs: 1247986 -> 1220190 (-2.23%)
helped: 4699
HURT: 102
helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3
helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88%
HURT stats (abs)   min: 1 max: 16 x̄: 4.70 x̃: 3
HURT stats (rel)   min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10%
95% mean confidence interval for instructions value: -6.10 -5.47
95% mean confidence interval for instructions %-change: -2.42% -2.30%
Instructions are helped.

total cycles in shared programs: 154044149 -> 153920095 (-0.08%)
cycles in affected programs: 26037392 -> 25913338 (-0.48%)
helped: 5974
HURT: 2521
helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16
helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84%
HURT stats (abs)   min: 1 max: 862 x̄: 34.73 x̃: 20
HURT stats (rel)   min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85%
95% mean confidence interval for cycles value: -16.31 -12.90
95% mean confidence interval for cycles %-change: -0.56% -0.45%
Cycles are helped.

total spills in shared programs: 2876 -> 2957 (2.82%)
spills in affected programs: 592 -> 673 (13.68%)
helped: 6
HURT: 35

total fills in shared programs: 3157 -> 3134 (-0.73%)
fills in affected programs: 402 -> 379 (-5.72%)
helped: 6
HURT: 0

LOST:   5
GAINED: 11

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
ff9f526de3 nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)
v2: Remove flrp@64 cases.  Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64.  Should that occur, the cases
can be added back.

v3: Add a couple more patterns that just move the negation around.

No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

Skylake
total instructions in shared programs: 15279687 -> 15256058 (-0.15%)
instructions in affected programs: 4344440 -> 4320811 (-0.54%)
helped: 23455
HURT: 18
helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs)   min: 1 max: 2 x̄: 1.06 x̃: 1
HURT stats (rel)   min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34%
95% mean confidence interval for instructions value: -1.01 -1.00
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.

total cycles in shared programs: 355593755 -> 355339981 (-0.07%)
cycles in affected programs: 162089552 -> 161835778 (-0.16%)
helped: 20467
HURT: 7158
helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6
helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58%
HURT stats (abs)   min: 1 max: 4814 x̄: 47.46 x̃: 11
HURT stats (rel)   min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98%
95% mean confidence interval for cycles value: -10.39 -7.98
95% mean confidence interval for cycles %-change: -0.57% -0.47%
Cycles are helped.

total spills in shared programs: 8843 -> 8835 (-0.09%)
spills in affected programs: 190 -> 182 (-4.21%)
helped: 2
HURT: 0

total fills in shared programs: 21738 -> 21738 (0.00%)
fills in affected programs: 372 -> 372 (0.00%)
helped: 1
HURT: 1

LOST:   12
GAINED: 22

Broadwell
total instructions in shared programs: 15290523 -> 15266818 (-0.16%)
instructions in affected programs: 4314738 -> 4291033 (-0.55%)
helped: 23391
HURT: 11
helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs)   min: 1 max: 189 x̄: 18.09 x̃: 1
HURT stats (rel)   min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50%
95% mean confidence interval for instructions value: -1.04 -0.99
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.

total cycles in shared programs: 388911660 -> 388830827 (-0.02%)
cycles in affected programs: 172903324 -> 172822491 (-0.05%)
helped: 15601
HURT: 13269
helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6
helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55%
HURT stats (abs)   min: 1 max: 14904 x̄: 28.21 x̃: 6
HURT stats (rel)   min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60%
95% mean confidence interval for cycles value: -4.20 -1.40
95% mean confidence interval for cycles %-change: -0.17% -0.08%
Cycles are helped.

total spills in shared programs: 23110 -> 23069 (-0.18%)
spills in affected programs: 656 -> 615 (-6.25%)
helped: 3
HURT: 1

total fills in shared programs: 34399 -> 34398 (<.01%)
fills in affected programs: 905 -> 904 (-0.11%)
helped: 3
HURT: 1

LOST:   6
GAINED: 23

Haswell
total instructions in shared programs: 13465303 -> 13441142 (-0.18%)
instructions in affected programs: 3726999 -> 3702838 (-0.65%)
helped: 22139
HURT: 347
helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12%
95% mean confidence interval for instructions value: -1.08 -1.07
95% mean confidence interval for instructions %-change: -0.99% -0.96%
Instructions are helped.

total cycles in shared programs: 376271308 -> 376273090 (<.01%)
cycles in affected programs: 167496811 -> 167498593 (<.01%)
helped: 13206
HURT: 13281
helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8
helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80%
HURT stats (abs)   min: 1 max: 3828 x̄: 35.32 x̃: 8
HURT stats (rel)   min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61%
95% mean confidence interval for cycles value: -1.33 1.47
95% mean confidence interval for cycles %-change: 0.22% 0.36%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 23158 -> 23134 (-0.10%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0

total fills in shared programs: 34580 -> 34550 (-0.09%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0

LOST:   23
GAINED: 13

Ivy Bridge
total instructions in shared programs: 12034154 -> 12014301 (-0.16%)
instructions in affected programs: 3636209 -> 3616356 (-0.55%)
helped: 18771
HURT: 459
helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11%
95% mean confidence interval for instructions value: -1.04 -1.02
95% mean confidence interval for instructions %-change: -0.86% -0.84%
Instructions are helped.

total cycles in shared programs: 180186960 -> 180175147 (<.01%)
cycles in affected programs: 44652745 -> 44640932 (-0.03%)
helped: 12979
HURT: 11033
helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6
helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74%
HURT stats (abs)   min: 1 max: 4811 x̄: 37.61 x̃: 9
HURT stats (rel)   min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69%
95% mean confidence interval for cycles value: -2.29 1.31
95% mean confidence interval for cycles %-change: 0.11% 0.26%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3623 -> 3599 (-0.66%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0

total fills in shared programs: 4061 -> 4031 (-0.74%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0

LOST:   17
GAINED: 18

Sandy Bridge
total instructions in shared programs: 10853968 -> 10834932 (-0.18%)
instructions in affected programs: 3769957 -> 3750921 (-0.50%)
helped: 17944
HURT: 204
helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1
helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60%
HURT stats (abs)   min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel)   min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93%
95% mean confidence interval for instructions value: -1.05 -1.04
95% mean confidence interval for instructions %-change: -0.81% -0.78%
Instructions are helped.

total cycles in shared programs: 153894864 -> 153885988 (<.01%)
cycles in affected programs: 50643925 -> 50635049 (-0.02%)
helped: 9361
HURT: 10534
helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4
helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22%
HURT stats (abs)   min: 1 max: 1371 x̄: 16.42 x̃: 5
HURT stats (rel)   min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27%
95% mean confidence interval for cycles value: -1.27 0.38
95% mean confidence interval for cycles %-change: -0.03% 0.04%
Inconclusive result (value mean confidence interval includes 0).

LOST:   6
GAINED: 24

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
1259f6d802 nir: intel/vec4: Add flag to disable some algebraic optimizations
A couple patches later in this series use the flag to avoid a few
thousand shader-db regresions on all vec4 platforms.

I'm not particularly enamored with the name of this flag.  However, I
suspect the Intel vec4 backend is the only backend that will benefit
from it.  Specifically, the cases where this helps are all cases where
we want to prevent nir_opt_algebraic from rearranging instructions to
create 3-source instructions, such as ffma and flrp, with additional
immediate value or uniform sources.

The earlier commit "intel/vec4: Try to emit a single load for multiple
3-src instruction operands" solves most of the problems caused by
additional immediate values, but the restrictions on register strides
that cause problems for uniforms and shader inputs persist.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
3a1fdca5ad intel/vec4: Try to emit immediate sources for MOV
Per the comment in vec4_visitor::nir_emit_load_const, further
improvement is possible in this area.  That case would be more
complicated as I think we'd want to check that all users of the
nir_load_const_instr result intended to use the value as float.

No shader-db changes on any Gen8+ platform as these platforms do not use
the vec4 backend.

v2: Massive rebase on eeebeb211f ("intel/vec4: Try emitting non-scalar
immediates").  This commit is about twice as helpful since b04beaf41d
("intel/vec4: Try both sources as candidates for being immediates").

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13478598 -> 13474068 (-0.03%)
instructions in affected programs: 589452 -> 584922 (-0.77%)
helped: 2773
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.63 x̃: 1
helped stats (rel) min: 0.16% max: 5.66% x̄: 0.96% x̃: 0.83%
95% mean confidence interval for instructions value: -1.67 -1.60
95% mean confidence interval for instructions %-change: -0.98% -0.94%
Instructions are helped.

total cycles in shared programs: 376386916 -> 376369392 (<.01%)
cycles in affected programs: 16871628 -> 16854104 (-0.10%)
helped: 2293
HURT: 523
helped stats (abs) min: 2 max: 812 x̄: 13.80 x̃: 2
helped stats (rel) min: <.01% max: 10.18% x̄: 1.02% x̃: 0.36%
HURT stats (abs)   min: 2 max: 316 x̄: 26.99 x̃: 14
HURT stats (rel)   min: <.01% max: 19.34% x̄: 2.15% x̃: 1.43%
95% mean confidence interval for cycles value: -7.87 -4.58
95% mean confidence interval for cycles %-change: -0.52% -0.34%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10860328 -> 10857675 (-0.02%)
instructions in affected programs: 335907 -> 333254 (-0.79%)
helped: 1639
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.10% max: 5.26% x̄: 0.86% x̃: 0.70%
95% mean confidence interval for instructions value: -1.67 -1.57
95% mean confidence interval for instructions %-change: -0.89% -0.84%
Instructions are helped.

total cycles in shared programs: 153942720 -> 153934120 (<.01%)
cycles in affected programs: 5604818 -> 5596218 (-0.15%)
helped: 1494
HURT: 97
helped stats (abs) min: 2 max: 256 x̄: 7.84 x̃: 2
helped stats (rel) min: 0.01% max: 6.62% x̄: 0.35% x̃: 0.18%
HURT stats (abs)   min: 2 max: 160 x̄: 32.02 x̃: 20
HURT stats (rel)   min: 0.02% max: 3.37% x̄: 0.88% x̃: 0.56%
95% mean confidence interval for cycles value: -6.45 -4.36
95% mean confidence interval for cycles %-change: -0.32% -0.23%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8139378 -> 8137267 (-0.03%)
instructions in affected programs: 265616 -> 263505 (-0.79%)
helped: 1148
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.84 x̃: 1
helped stats (rel) min: 0.22% max: 4.76% x̄: 0.87% x̃: 0.62%
95% mean confidence interval for instructions value: -1.90 -1.78
95% mean confidence interval for instructions %-change: -0.90% -0.83%
Instructions are helped.

total cycles in shared programs: 188541756 -> 188537540 (<.01%)
cycles in affected programs: 9807004 -> 9802788 (-0.04%)
helped: 1143
HURT: 4
helped stats (abs) min: 2 max: 10 x̄: 3.70 x̃: 2
helped stats (rel) min: <.01% max: 3.01% x̄: 0.13% x̃: 0.06%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
95% mean confidence interval for cycles value: -3.80 -3.55
95% mean confidence interval for cycles %-change: -0.14% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
acd7796a07 intel/vec4: Try to emit a VF source in try_immediate_source
This commit is also a pre-requisite for the next commit.

No shader-db changes on any Gen8+ platform as these platforms do not use
the vec4 backend.

v2: Massive rebase on eeebeb211f ("intel/vec4: Try emitting non-scalar
immediates").  This change is a lot less helpful since that commit
landed (previously helped 1934 shaders on HSW) because, apparently, a
lot of the cases helped by that commit were things like vector loads of
{ 1.0, 1.0, 1.0 } that were also helped by this commit.

Haswell
total instructions in shared programs: 13480095 -> 13478598 (-0.01%)
instructions in affected programs: 229534 -> 228037 (-0.65%)
helped: 1006
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.49 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.11% x̃: 1.09%
95% mean confidence interval for instructions value: -1.54 -1.43
95% mean confidence interval for instructions %-change: -1.15% -1.07%
Instructions are helped.

total cycles in shared programs: 376385734 -> 376386916 (<.01%)
cycles in affected programs: 14101380 -> 14102562 (<.01%)
helped: 941
HURT: 56
helped stats (abs) min: 2 max: 322 x̄: 5.62 x̃: 2
helped stats (rel) min: <.01% max: 7.74% x̄: 0.51% x̃: 0.42%
HURT stats (abs)   min: 2 max: 618 x̄: 115.50 x̃: 32
HURT stats (rel)   min: 0.03% max: 4.62% x̄: 0.83% x̃: 0.44%
95% mean confidence interval for cycles value: -2.06 4.43
95% mean confidence interval for cycles %-change: -0.47% -0.39%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 12048004 -> 12046589 (-0.01%)
instructions in affected programs: 217072 -> 215657 (-0.65%)
helped: 934
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.04% max: 3.45% x̄: 1.14% x̃: 1.11%
95% mean confidence interval for instructions value: -1.57 -1.46
95% mean confidence interval for instructions %-change: -1.18% -1.10%
Instructions are helped.

total cycles in shared programs: 180285854 -> 180287608 (<.01%)
cycles in affected programs: 14103824 -> 14105578 (0.01%)
helped: 871
HURT: 53
helped stats (abs) min: 2 max: 322 x̄: 5.51 x̃: 2
helped stats (rel) min: <.01% max: 7.67% x̄: 0.50% x̃: 0.42%
HURT stats (abs)   min: 2 max: 618 x̄: 123.66 x̃: 32
HURT stats (rel)   min: 0.03% max: 4.47% x̄: 0.92% x̃: 0.46%
95% mean confidence interval for cycles value: -1.60 5.39
95% mean confidence interval for cycles %-change: -0.46% -0.37%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10861227 -> 10860328 (<.01%)
instructions in affected programs: 92969 -> 92070 (-0.97%)
helped: 624
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.11% max: 3.45% x̄: 1.05% x̃: 0.95%
95% mean confidence interval for instructions value: -1.52 -1.36
95% mean confidence interval for instructions %-change: -1.09% -1.01%
Instructions are helped.

total cycles in shared programs: 153944316 -> 153942720 (<.01%)
cycles in affected programs: 1640956 -> 1639360 (-0.10%)
helped: 601
HURT: 15
helped stats (abs) min: 2 max: 120 x̄: 3.56 x̃: 2
helped stats (rel) min: 0.02% max: 6.33% x̄: 0.18% x̃: 0.08%
HURT stats (abs)   min: 2 max: 72 x̄: 36.13 x̃: 36
HURT stats (rel)   min: 0.05% max: 3.84% x̄: 1.95% x̃: 2.00%
95% mean confidence interval for cycles value: -3.44 -1.74
95% mean confidence interval for cycles %-change: -0.18% -0.09%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8139924 -> 8139378 (<.01%)
instructions in affected programs: 69776 -> 69230 (-0.78%)
helped: 322
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.70 x̃: 1
helped stats (rel) min: 0.27% max: 3.23% x̄: 0.79% x̃: 0.54%
95% mean confidence interval for instructions value: -1.88 -1.51
95% mean confidence interval for instructions %-change: -0.85% -0.72%
Instructions are helped.

total cycles in shared programs: 188542864 -> 188541756 (<.01%)
cycles in affected programs: 3031532 -> 3030424 (-0.04%)
helped: 320
HURT: 0
helped stats (abs) min: 2 max: 20 x̄: 3.46 x̃: 2
helped stats (rel) min: <.01% max: 0.69% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -3.85 -3.07
95% mean confidence interval for cycles %-change: -0.06% -0.05%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
365b45d571 intel/vec4: Try to emit a single load for multiple 3-src instruction operands
If a 3-source instruction uses immediate values 1.0 and -1.0, just load
1.0 into a register.  Use the negation source modifier to get -1.0.
This has trivial impact now, but it prevents a few thousand regressions
on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1,
a) and flrp(1, -1, a)"

All Gen6 and Gen7 platforms had similar results. (Haswell shown)
total instructions in shared programs: 13487412 -> 13487406 (<.01%)
instructions in affected programs: 541 -> 535 (-1.11%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.33% -0.97%
Instructions are helped.

total cycles in shared programs: 376402564 -> 376402500 (<.01%)
cycles in affected programs: 10348 -> 10284 (-0.62%)
helped: 10
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2
helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29%
95% mean confidence interval for cycles value: -11.72 0.08
95% mean confidence interval for cycles %-change: -1.20% -0.36%
Inconclusive result (value mean confidence interval includes 0).

No shader-db changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick
6f6bc842f6 intel/vec4: Refactor operand fixing for ffma and flrp
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Alyssa Rosenzweig
8305766e0e panfrost: Wire up GLES2-class polygon offset
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-11 09:40:47 -07:00
Alyssa Rosenzweig
7a36c72f5d pan/decode: Depth units/factor are identical to GL
I'm not sure why I thoughtt here was an off-by-one, other than maybe bad
data collection.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-11 09:40:47 -07:00
Christian Gmeiner
a7153ebcd3 etnaviv: remove dead translate_ts_sampler_format(..) declaration
Fixes: 66411521ea ("etnaviv: combine translate_ts_sampler_format/translate_msaa_format")

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-07-11 17:51:15 +02:00
Caio Marcelo de Oliveira Filho
b390ff3517 intel/fs: Add support for SLM fence in Gen11
Gen11 SLM is not on L3 anymore, so now the hardware has two separate
fences.  Add a way to control which fence types to use.

At this time, we don't have enough information in NIR to control the
visibility of the memory being fenced, so for now be conservative and
assume that fences will need a stall.  With more information later
we'll be able to reduce those.

Fixes Vulkan CTS tests in ICL:

    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp

The whole set of supported tests in dEQP-VK.memory_model.* group
should be passing in ICL now.

v2: Pass BTI around instead of having an enum.  (Jason)
    Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets
    transformed into two.  (Jason)
    List tests fixed.  (Lionel)

v3: For clarity, split the decision of which fences to emit from the
    emission code.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-11 08:29:32 -07:00
Tomeu Vizoso
838374b6dd Revert "panfrost/midgard: Use _safe iterator"
This reverts commit 812ce2ce9e.

We massively regress with the reverted patch. So in the meantime, take
it out.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-07-11 16:53:42 +02:00
Alyssa Rosenzweig
507e297431 panfrost: Don't lie about Z/S formats
Only Z24S8 is properly supported right now, so let's be careful. Fixes a
number of issues relating to improper Z/S handling. The most obvious is
depth buffers with incorrect strides, which manifests in truly bizarre
ways and can happen commonly with FBOs.

Fixes WebGL (Aquarium runs, etc).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-11 14:27:25 +00:00
Samuel Pitoiset
cd403a931f radv/gfx10: enable geometry shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-11 15:46:02 +02:00
Bas Nieuwenhuizen
0a8ef756d3 radv/gfx10: Fix NGG GS output mask handlings for LDS indexing.
In emit_vertex we optimize storage if the output mask does not
have all bits set. Do the same in the epilogue so the indices
actually match up.

Fixes dEQP-VK.geometry.input.basic_primitive.points because it
outputs PSIZE with an output mask of 1, which cause the generic
attribute for the color to be loaded from the wrong indices.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:59 +02:00
Bas Nieuwenhuizen
f5982917ff radv/gfx10: Simplify output mask handling for NGG GS.
We only ever get in this function for a NGG GS proper.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:58 +02:00
Bas Nieuwenhuizen
7515f41c78 radv/gfx10: Do GS prologue outside of gs_threads if.
Mirror radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:56 +02:00
Samuel Pitoiset
5bbcb3f5bc radv/gfx10: implement support for GS as NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-11 15:45:53 +02:00
Bas Nieuwenhuizen
7286865f6d radv/gfx10: Use correct ES shader for es_vgpr_comp_cnt for GS.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:51 +02:00
Bas Nieuwenhuizen
45b73b3aa9 radv/gfx10: Do not allocate a gs_copy_shader on gfx10.
Will use ngg for any gs anyway.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:47 +02:00
Samuel Pitoiset
ef5efb40f4 radv/gfx10: fix VGT_SHADER_STAGES_EN for GS as NGG
The driver shouldn't set the copy shader bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-11 15:45:43 +02:00
Samuel Pitoiset
8bc3ab6f0c radv/gfx10: fix number of GS invocations for NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-11 15:45:40 +02:00
Tomeu Vizoso
812ce2ce9e panfrost/midgard: Use _safe iterator
Fixes this assertion:

../mesa/src/panfrost/midgard/midgard_schedule.c:507:schedule_block: Assertion `ins == __next && "use _safe iterator"' failed.
Trace/breakpoint trap

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-11 15:06:51 +02:00
Tomeu Vizoso
82ee48e5ef panfrost: Place the height value in the height field
In the mali_single_framebuffer descriptor.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

v2: Remove unwanted chunks
2019-07-11 15:06:47 +02:00
Samuel Pitoiset
022b1f4190 radv/gfx10: fix maximum number of mip levels for 3D images
The dimensions also have to be adjusted if the number of supported
mip levels is changed.

This fixes dEQP-VK.api.info.image_format_properties.3d.*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-11 14:44:47 +02:00
Samuel Pitoiset
f3dfdd4091 radv/gfx10: disable TC-compat HTILE for multisampled D32_SFLOAT format
For some reasons D32_SFLOAT is also affected on GFX10, it works
fine with previous generations.

This fixes some dEQP-VK.renderpass2.depth_stencil_resolve.*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-11 13:43:21 +02:00
Kenneth Graunke
a01770b9c8 iris: Fix key->input_vertices for 8_PATCH TCS mode.
We were failing to flag the program dirty when it changed.  Also, we
were unnecessarily setting key->input_vertices for SINGLE_PATCH mode,
which would reduce program cache hits.  Only set it if needed.
2019-07-11 01:18:24 -07:00
Kenneth Graunke
c58f52f0ef iris: Only set key->flat_shade if COL0/COL1 are written.
This was just laziness on my part, we already added similar checks in
the VS key handling.  Just need to do it here too.  Should improve cache
hits.
2019-07-11 00:12:50 -07:00
Kenneth Graunke
cb82d534a0 iris: Drop comment about var->data.binding not being set.
I refactored the sampler lowering passes a long time ago to ensure
that gl_nir_lower_samplers_as_deref is run and var->data.binding is set.
2019-07-11 00:12:00 -07:00
Kenneth Graunke
38f9954208 iris: Drop comments about missing NOS
These stages don't need NOS.  If they do, we can add it - the
infrastructure is there if we need it someday.
2019-07-11 00:12:00 -07:00
Kenneth Graunke
2bd1234a77 iris: Drop a TODO comment
This is literally implemented two lines above.
2019-07-11 00:12:00 -07:00
Neil Roberts
eae06b34ea glsl/builtin types: Set the precision on the depth range params
The members of gl_DepthRangeParameters are declared to be highp in
GLSL ES specs.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-11 08:04:54 +02:00
Neil Roberts
74d71dac20 glsl: Add a constructor for glsl_struct_field to specify the precision
Adds a third constructor to glsl_struct_field which has an extra
parameter to specify the precision.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-11 08:04:54 +02:00
Neil Roberts
014be60398 glsl: Add a macro for the default values for glsl_struct_field
There are two constructors for glsl_struct_field with different
parameters. Instead of repeating them for both constructors, this
patch adds a convenience macro. This will make it easier to add a
third constructor in a later patch.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-11 08:04:54 +02:00
Neil Roberts
ca6ee488e9 glsl/builtin_variables: Add a precision to the builtins
All of the builtin variables mentioned in the GLSL ES spec and the
extensions include a precision declaration which is different
depending on what the variable is used for. This patch makes it set
the corresponding precision when creating the variable. This will make
a difference once we start using the precision information for
optimisation. Previously all of the builtin variables ended up with a
precision of NONE.

v2: Made gl_PointSize and gl_FragCoord highp since GLSL ES 3.00. Fixed
    gl_MaxViewPorts to always be highp. (Eric Anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-11 08:04:54 +02:00
Kenneth Graunke
ce93bf1876 compiler: Save a single copy of the softfp64 shader in the context.
We were recompiling the softfp64 library of functions from GLSL to NIR
every time we compiled a shader that used fp64.  Worse, we were ralloc
stealing it to the GL context.  This meant that we'd accumulate lots of
copies for the lifetime of the context, which was a big space leak.

Instead, we can simply stash a single copy in the GL context, and use
it for subsequent compiles.  Having a single copy should be fine from
a memory context point of view: nir_inline_function_impl already clones
the necessary nir_function_impl's as it inlines.

KHR-GL45.enhanced_layouts.ssb_member_align_non_power_of_2 was previously
OOM'ing a system with 16GB of RAM when using softfp64.  Now it finishes
much more quickly and uses only ~200MB of RAM.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-10 22:14:36 -07:00
Timothy Arceri
ae4ccb67be radv: fix memory leak when restoring from cache
Fixes: 726a31df70 ("radv: Add the concept of radv shader binaries.")

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 10:44:29 +10:00
Kristian H. Kristensen
e03259974e freedreno: Generate headers from xml files
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Rob Clark <robdclark@gmail.com>
2019-07-10 22:05:02 +00:00
Samuel Pitoiset
51e2124a4b radv: switch to the new VS exports path
It will help for GS as NGG on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:37:02 +02:00
Samuel Pitoiset
f616d80a7a radv: set the slot_index correctly for VARYING_SLOT_CLIP_DIST1
For selecting a different SQ_EXP_POS target.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:59 +02:00
Samuel Pitoiset
c4ab33378a radv: add a new function for exporting VS outputs
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:57 +02:00
Samuel Pitoiset
ac0edc369c radv: implement new path for exporting generic varyings
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:55 +02:00
Samuel Pitoiset
0b368fc8c3 radv: use the generic export path for clip/cull distances
When they are exported to the next stage.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:52 +02:00
Samuel Pitoiset
f653e5c1d6 radv: remove an extra memcpy when exporting clip/cull distances
Cleanup.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:50 +02:00
Jason Ekstrand
14781e2122 intel/compiler: Add a "base class" for program keys
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data.  I'd like to add another thing or two and need a
place to put it.  This commit adds a new brw_base_prog_key struct which
contains those two common bits.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-10 19:35:55 +00:00
Jason Ekstrand
3a4667e502 i965/program_cache: Cast the key to char * before adding key_size
We're about to change the type of key to be brw_base_prog_key and that
will mean blindly adding the key size without a cast will lead to the
wrong calculation.  It's safer to cast to char * first anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-10 19:35:55 +00:00
Jason Ekstrand
bb14abed18 anv: Make the workaround BO a whole page
I'm not 100% sure how this ever worked because gem_create usually shoots
you if the BO size isn't page-aligned.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-10 19:35:23 +00:00
Jason Ekstrand
6a2ff217b8 anv: Set Stateless Data Port Access MOCS
This is the MOCS setting used for the A64 stateless messages which we
sometimes use for SSBO operations.

Fixes: 48ed2a7bb0 "anv: Implement VK_EXT_buffer_device_address"
Fixes: 79fb0d27f3 "anv: Implement SSBOs bindings with GPU addr..."
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-10 19:35:23 +00:00
Alyssa Rosenzweig
bb483a9166 panfrost: Clamp point size
It's not clear the hardware really has a maximum which confuses dEQP;
clamp to whatever we report as our maximum.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 11:30:00 -07:00
Alyssa Rosenzweig
7318b525a2 pan/decode: Auto style
$ astyle *.c *.h --style=linux -s8

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig
ec2a59cd7a panfrost: Move non-Gallium files outside of Gallium
In preparation for a Panfrost-based non-Gallium driver (maybe
Vulkan...?), hoist everything except for the Gallium driver into a
shared src/panfrost. Practically, that means the compilers, the headers,
and pandecode.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig
a2d0ea92ba panfrost: Style main Gallium driver
$ astyle *.c *.h --style=linux -s8

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig
e4bd6fbe51 panfrost/midgard: Apply code styling
$ astyle *.c *.h --style=linux -s8

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig
b4733b2b61 panfrost/nir: Apply NIR style
$ astyle *.c *.h --style=linux -s3

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig
c2c8983cf4 panfrost: Move midgard/nir* to nir folder
The reason for doing this is two-fold:

   1. These passes are likely to be shared with the Bifrost compiler
      Therefore, we don't want to restrict them to Midgard

   2. The coding style is different (NIR-style vs Panfrost-style)
      The NIR passes are candidates for moving upstream into
      compiler/nir, so don't block that off for stylistic reasons

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig
ef2d577769 panfrost: Typofix
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 09:45:16 -07:00
Alyssa Rosenzweig
31fc52a4e7 panfrost: Identify shared tiler structure
This is identical across SFBD/MFBD so pull it out to allow for better
code sharing.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 09:45:16 -07:00
Alyssa Rosenzweig
6eb99c78e2 panfrost/midgard: Drop unnecessary assert
Just use the #define instead.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-07-10 09:37:08 -07:00
Alyssa Rosenzweig
c1b109caec panfrost: Don't expose OES_standard_derivatives
This has not been implemented quite yet.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 09:36:03 -07:00
Erik Faye-Lund
39e7fbf24a gallium: get rid of PIPE_CAP_SM3
PIPE_CAP_SM3 has always been an odd one out of all our caps. While most
other caps are fine-grained and single-purpose, this cap encode several
features in one. And since OpenGL cares more about single features, it'd
be nice to get rid of this one.

As it turns, this is now relatively simple. We only really care about
three features using this cap, and those already got their own caps. So
we can remove it, and make sure all current drivers just give the same
response to all of them.

The only place we *really* care about SM3 is in nine, and there we can
instead just re-construct the information based on the finer-grained
caps. This avoids DX9 semantics from needlessly leaking into all of the
drivers, most of who doesn't care a whole lot about DX9 specifically.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 15:50:51 +02:00
Erik Faye-Lund
21de1bf24b gallium: give vertex-shader saturate its own cap
Shader Model 3.0 is a big promise to make to the state-tracker, and
for instance mobile hardware might support vertex-shader saturate but
not some of the other features of SM3. So let's give this its own cap
for simplicity.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-10 15:49:57 +02:00
Erik Faye-Lund
681fa03e8d gallium: give fragment-shader derivatives its own cap
Shader Model 3.0 is a big promise to make to the state-tracker, and
for instance mobile hardware might support fragment-shader derivatives
but not some of the other features of SM3. So let's give this its own
cap for simplicity.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-10 15:49:57 +02:00
Erik Faye-Lund
66ee6661e9 gallium: give fragment-shader texture-lod its own cap
Shader Model 3.0 is a big promise to make to the state-tracker, and
for instance mobile hardware might support texture lod but not some
of the other features of SM3. So let's give this its own cap for
simplicity.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-10 15:49:57 +02:00
Erik Faye-Lund
ffbd004686 mesa/st: drop needless has_shader_model3 boolean
This boolean is only consulted once during init, so there's nothing
much saved by storing this in the context. So let's just check directly
when we need it instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-10 15:49:57 +02:00
Alyssa Rosenzweig
af2949e928 panfrost: Fix copyright identifier in a few places
Oops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-07-10 06:47:15 -07:00
Alyssa Rosenzweig
629c516b76 panfrost: Bikeshed pan_screen.c comment
The asterisks were inherited from... softpipe, maybe?

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-07-10 06:47:13 -07:00
Alyssa Rosenzweig
2f7145a6de panfrost: Check GPU version before loading
Panfrost is known to only work on a select few CPU/GPU combinations at
the moment (tested system-on-chips: RK3288, RK3399, and S912). Whitelist
the combinations known to work and refuse to load on others where
nothing works yet to avoid user confusion.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-07-10 06:47:11 -07:00
Alyssa Rosenzweig
b5de423ac1 panfrost: Be more honest about PIPE_CAPs
A lot of the pan_screen.c code was cargoculted from other drivers. The
upshot is that we return true for a lot of PIPE_CAPs that we don't
actually support, resulting in us exposing way too many extensions that
we don't actually support. Be more careful.

Some CAPs we do need to fake to access higher dEQP versions (i.e. in
order to debug the features we're hiding behind the CAP). For these, we
hide the CAP behind a special PAN_MESA_DEBUG=deqp option to avoid
apps randomly using these in-development features.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-07-10 06:47:01 -07:00
Alyssa Rosenzweig
b69d5d6e19 panfrost/midgard: Hit missed scheduling opportunity
Don't try to schedule to vmul when that can't possible work (forcing a
bundle break). glmark:

total bundles in shared programs: 2700 -> 2683 (-0.63%)
bundles in affected programs: 695 -> 678 (-2.45%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.21 x̃: 1
helped stats (rel) min: 1.27% max: 7.69% x̄: 4.30% x̃: 4.77%
95% mean confidence interval for bundles value: -1.68 -0.75
95% mean confidence interval for bundles %-change: -5.63% -2.97%
Bundles are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig
2d739f6b59 panfrost/midgard: Include shader size for shader-db
It's easy to forget about, but shader size does matter for things like
i-cache, so let's include it in the analysis.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig
7ad6516f3b panfrost/midgard: Include loop count for shader-db
We have to emit it anyway for the report to be happy (with respect to
unrolling), so return an actual count rather than dummy numbers.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig
138e40d471 panfrost/midgard: Dump shader-db stats
All the kool kids are doing it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig
a2f1a06a5e panfrost/midgard: Flush undefineds to zero
Fixes a buggy dEQP test.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig
318e9933b1 panfrost/midgard: Specify channel count for broadcasting ops
bany/ball type ops read from all 4 channels even though they only write
to 1; specify this in the opcode table like we do for dot products.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig
a1a4dfa74b panfrost/midgard: Don't try to "alias" texture registers
It won't work. Just, stop it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:45:20 -07:00
Samuel Pitoiset
4cadf4309c radv: compute correct number of input vertices for NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 15:17:08 +02:00
Samuel Pitoiset
3303bc8b74 radv: remove extra code for exporting LayerID to the next stage
Now that the output usage mask is set to 0x1 the LayerID is
correctly exported in the loop above.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 15:17:08 +02:00
Samuel Pitoiset
bd86ded027 radv: set the LayerId output usage mask if FS needs it
When the stage preceding FS doesn't export it the fragment shader
might read it, even if it's 0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 15:17:08 +02:00
Alyssa Rosenzweig
53d64753e1 panfrost: Update supported formats
Much of the format selection code was inherited from softpipe (!) of all
places, and a lot of it is accordingly cruft. Later if-elses were added
in random places to workaround missing formats at various points in
history. Clean up some of this.

Theoretically, any format we can texture from we can also render to. In
practice, there are a few corner cases that we need to disable
explicitly.

For one, we do have to restrict SCANOUT formats to workaround
buggy apps (in particular, dEQP which with --deqp-surface-type=window
under Weston will end up with RGB10_A2 and complain about low alpha
precision). Just be clearer about how/why.

Also, RGB5_A1 support is still broken; let's not worry about that quite
yet.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:09 -07:00
Alyssa Rosenzweig
ced132d203 panfrost/mfbd: Cleanup format code selection
Rather than have random variables flying around and a long if-else
chain, use a switch. They're literally *designed* for this.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:09 -07:00
Alyssa Rosenzweig
da5382c0d8 panfrost/midgard: Cleanup blend switch
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig
c0c709a13a panfrost/mfbd: Handle PIPE_FORMAT_B10G10R10A2_UNORM
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig
c58c5268da panfrost/midgard: Handle PIPE_FORMAT_B10G10R10A2_UNORM
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig
c2ee937cf2 panfrost: Implement ES3-format writeout
We add support for writing out (via a blend shader):

   - RGBA4
   - RGB10_A2_UNORM
   - RGB10_A2_UINT
   - RGB5_A1_UNORM
   - R11G11B10_FLOAT

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig
46396af1ec panfrost: Refactor blend infrastructure
We would like to permit keying blend shaders against the framebuffer
format, which requires some new blending abstractions.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig
c9af7701d1 panfrost/midgard: Use unsigned blend patch offset
We would like the offset field to be unsigned, letting 0 represent "no
offset" and positive represent an offset.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig
6def428f10 panfrost/midgard: Handle pure int formats
I'm not sure I'm totally comfortable with this, but conceptually neither
float nor pure-int formats require any format conversion, except size
conversion. Going from a shaderable format (fp32 or i16, for instance)
into a blendable format (fp16) is a separate question, one we can defer
momentarily while we're not interested in actually blending.

As an aside, I'd be fascinated by an integer-based blending
implementation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig
280c777fd7 panfrost/mfbd: Handle pure int formats
We see that the render target itself turns out to be typeless
(surprise!)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig
7647e56c1f panfrost: Set rt_count_2 for bpp>4 formats
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:07 -07:00
Alyssa Rosenzweig
0c619210b2 panfrost/midgard: Implement preliminary float converters
We'll need some careful handling, but for now, get some baseline code
out for handling float formats in a blend shader.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:06 -07:00
Alyssa Rosenzweig
5849c85008 panfrost/midgard: Skip blend for REPLACE (shader)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:06 -07:00
Alyssa Rosenzweig
5e825f5cad panfrost: Handle "blend disabled" blend shaders
Normally, disabled blend can definitely be fixed-function'd away, but
if a blend shader is used merely for format conversion rather than
blending, this code path can be nevertheless hit.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
27e0c8c15d panfrost: Route format through fixed-function blending
Not all framebuffer formats are supported by the fixed-function blender.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
e7551c1bff panfrost: Pipe framebuffer format around
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
74fd914a89 panfrost/midgard: Use Gallium framebuffer formats
Ideally, we would keep Galliumisms far away from the compiler;
unfortunately, Mesa hasn't standardized on system of format codes to be
shared across APIs and across drivers, so using Gallium formats is our
best bet in the short run.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
2157fe967a panfrost/midgard: Use fp16 exclusively while blending
We now have some preliminary fp16 support available. We're not able to
expose this for GLSL quite yet, but for internal blend shaders, we're
able to do control bitness ourselves just fine. So let's fp16 that
stuff!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
0cfa54801e panfrost/midgard: Remove opt_copy_prop_tex
Eventually this should be replaced by proper tex RA / not emitting so
many silly moves to begin with / better general copy prop. For now
remove it since it breaks things.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
b113be7683 panfrost/midgard: Fix scalarification
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
e92caad744 panfrost/midgard: Handle fp16 in embedded_to_inline_constants
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
3dbedb26f5 panfrost/midgard: Eliminate redundant type convert
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
64df54d894 panfrost/midgard: Fix fp16 embedded constants
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig
f8b18a4277 panfrost/midgard: Hoist mask field
Share a single mask field in midgard_instruction with a unified format,
rather than using separate masks for each instruction tag with
hardware-specific formats. Eliminates quite a bit of duplicated code and
will enable vec8/vec16 masks as well (which don't map as cleanly to the
hardware as we might like).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
e69cf1fed9 panfrost/midgard: Allow fp16 in scalar ALU
The packing is a little different, so implement that.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
d8c084d2ca panfrost/midgard: Implement f2u16 and friends
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
954c6afa3e panfrost/midgard: Implement f2f16/f2f32
These conversions handle half-floats within the shader.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
0ed8cca008 panfrost/midgard: Verify src_bitsize == dst_bitsize
We can handle differing, but we'd prefer not to because there are
restrictions on sizing which aren't accounted for yet.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
1686ef8655 panfrost/midgard: Simplify blend read
It's not clear where the extra indirection was from (older hardware or
just older blobs?)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
952993d3bb panfrost/midgard: NIRify blend load scale/convert
The scale and type-convert can now be expressed in NIR, rather than MIR,
which is significantly more maintainable and demonstrates correctness of
the type conversion patches.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
ae42991b83 panfrost/midgard: Fix blend constant scheduling bug
Blend constant conflicts run in two directions.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
7f807ef1fa panfrost/midgard: Implement upscaling type converts
Rather than using a dest_override, we upscale integers by using a half
field with a sign-extend bit. A variant of this trick should also work
for floats, but one step at a time!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
541b329bd1 panfrost/midgard: Move blend load/store into NIR
We have dedicated intrinsics to access the raw contents of the tile
buffer so we can use a dedicated NIR pass to lower appropriately for
blend shaders, rather than introducing a bizarre hardcoded blend
epilogue that only works for RGBA8_UNORM.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig
f42e5be910 panfrost/midgard: Use nir_dest_num_components
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
4df80cab40 panfrost/midgard: Implement integer downsize ops
Oh, dear. No turning back now.

We begin implementing non-32-bit types, using downsizing integer type
conversions as the initial instructions. We implement them naively as
type-converting moves; substantially more efficient operation is
possible by copypropping the type conversion modifier, but this
optimization is not implemented here.

Size converting modifiers on Midgard allow an instruction to write to a
destination 1/2 the size, or to read from a source 1/2 the size. If we
need an extreme conversion (32-bit to 8-bit, for instance), multiple
type converting ops are chained together, which here is handled via an
algebraic pass.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
dc69d3bf8f panfrost/midgard: Move scale from MIR to NIR
This begins the process of removing blend shader specific MIR into a
more general NIR lowering pass for formats.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
d151319a3d panfrost/midgard: Passthrough nir_lower_framebuffer
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
8e4e46794e panfrost: Extend clear colour packing
Eventually, this will allow packing clear colours for all formats,
including floating-point framebuffers, pure integer buffers, and special
formats. Currently, a few of these formats are supported, and many more
are handled through a generic Gallium colour packing path (which is not
a perfect fit for the hardware, but works for many formats and is a sane
default for the moment.)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
21c863a695 panfrost/mfbd: Include codes for float framebuffers
We see the hardware doesn't actually support float framebuffers in the
native sense -- rather, it just allows higher bpp framebuffers and lets
a blend shader / additional clear_color fields sort out the formats.
This will be.. interesting.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
36b3e7ea90 panfrost: Prepare some code for MRT
Full MRT support is a while away, but in the mean time, we can remove
code that explicitly assumes nr_cbufs <= 0, to minimize the obstacles
we'll face later when we add the whole thing.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
7c82dfba8f panfrost: Use standard ALIGN_POT/INFINITY macros
We had vendored duplicates from pre-Mesa days; clean that up.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Eric Engestrom
c78d2d9840 egl: add glvnd symbols check
According to the spec [1], `__egl_Main` is the only symbol that needs to
be exported. We don't want applications directly linking against
libEGL_mesa.so (apps should always go through libEGL.so, regardless of
who is providing it), so we shouldn't export any other symbols either.

[1] https://github.com/NVIDIA/libglvnd/blob/master/include/glvnd/libeglabi.h
    (this header is the closest there is to a spec)

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
ba18b968e8 egl: rewrite entrypoints check
Part of the effort to replace shell scripts with portable python scripts.
I could've used a trivial `assert lines == sorted(lines)`, but this way
the caller is shown which entrypoint is out of order.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
b619f89e23 mapi: add shared glapi symbols check
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
1abae9e54a tu: add exported symbols check
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
0fd30c1011 vulkan: add symbols file
According to the Vulkan ICD spec [1], these two symbols must be exposed:
- vk_icdGetInstanceProcAddr
- vk_icdNegotiateLoaderICDInterfaceVersion

and this one is optional:
- vk_icdGetPhysicalDeviceProcAddr

[1] https://github.com/KhronosGroup/Vulkan-Loader/blob/master/loader/LoaderAndLayerInterface.md

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
915eab5e87 meson: remove unused env_test
No longer used as of last commit :)

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
6f305d0c61 gles: use new symbols check script
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
111c34d2ae gbm: sort symbols
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
aa6973e611 gbm: use new symbols check script
Note: the list in gbm-symbols.txt is the same as the one that was in
gbm-symbols-check, I just took the opportunity to sort it.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
1172263c87 egl: use new symbols check script
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Eric Engestrom
176f350fcf symbols-check: introduce new python script
I've re-written this in bash a couple times over the years, and then
I realised python is much more portable and already required by Mesa, so
we might as well make use of it.

I decided to still use the build system's NM instead of re-implementing
symbols extraction, to offload the complexity of keeping it compatible
with many systems (Linux, Unix, BSD, MacOS, etc.), especially when
cross-building.

This new script checks not only that nothing is exported when it
shouldn't be, but also that everything that should be exported is.
Sometimes, some symbols _can_ be exported but don't have to be, in which
case they can be prefixed with `(optional)`.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Karol Herbst
62362a4abb nv50/ir/nir: implement load/store_global
required by OpenCL

v2: fix setting globalAccess

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2019-07-10 13:23:00 +02:00
Karol Herbst
33a9b9fce5 nv50/ir/nir: handle kernel inputs
required by OpenCL

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2019-07-10 13:22:40 +02:00
Karol Herbst
2617c78fe2 nv50/ir/nir: don't assert on !main
required for OpenCL

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2019-07-10 13:22:21 +02:00
Karol Herbst
fa6bd3c639 nv50/ir/nir: parse system values first and stop for compute shaders
required by OpenCL

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2019-07-10 13:20:13 +02:00
Connor Abbott
133273aa22 nir/lower_io: Don't use variable to get deref mode
Drivers only use lower_io for modes where pointers don't have a
meaningful value, and dereferences can always be traced back to a
variable. But there can be other modes, like global mode with
VK_EXT_buffer_device_address, where pointers cannot be traced back to a
variable, and lower_io would segfault on loads/stores of these since
nir_deref_instr_get_variable() would return NULL.

Just use the mode on the deref itself to filter out these modes before
we try to get the variable.

Fixes: 118a66df99 ("radv: Use NIR barycentric coordinates")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 12:31:41 +02:00
Connor Abbott
f18b8a1174 radv: Don't optimize after lowering FS inputs
Currently this is done rather late in radv, after lowering booleans, so
it isn't safe to run additional optimizations that may add e.g. 1-bit
booleans. We could move the lowering parts earlier, but since right now
we only lower FS inputs and by this point all indirects have been
lowered away, there's no reason we should need to optimize anything.

One shader from Devil May Cry 5 was getting optimized, but only because
the optimization loop was working on 32-bit booleans which revealed an
opportunity that was hidden with 1-bit booleans, and we generated a
1-bit boolean which is invalid.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111092
Fixes: 118a66df99
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 10:10:20 +02:00
Mauro Rossi
fe3898547a android: amd/addrlib: add gfx10 support
Fix the following building error:

external/mesa/src/amd/addrlib/src/gfx10/gfx10addrlib.cpp:35:10:
fatal error: 'gfx10_gb_reg.h' file not found
         ^~~~~~~~~~~~~~~~
1 error generated.

Fixes: 78cdf9a ("amd/addrlib: add gfx10 support")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2019-07-10 09:03:55 +02:00
Mauro Rossi
b3d46cb539 android: amd/common/gfx10: add register JSON
The necessary Android makefile building rules are added
and the generation rules are simplified for readability

Fixes the following building errors:

external/mesa/src/amd/common/ac_llvm_build.c:1496:45:
error: use of undeclared identifier 'V_008F0C_IMG_FORMAT_8_UINT'
   case V_008F0C_BUF_DATA_FORMAT_8: format = V_008F0C_IMG_FORMAT_8_UINT; break;
                                             ^
Fixes: 74a26af ("amd/common/gfx10: add register JSON")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2019-07-10 09:03:51 +02:00
Mauro Rossi
2434fb3e8e android: radeonsi/gfx10: generate gfx10_format_table.h (v2)
Fix Android building rules for gfx10_format_table.h generated header

(v2) Add LOCAL_C_INCLUDES += $(intermediates)/radeonsi to fix error:

external/mesa/src/gallium/drivers/radeonsi/si_state.c:46:10:
fatal error: 'gfx10_format_table.h' file not found
         ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 0ffa229 ("radeonsi/gfx10: generate gfx10_format_table.h")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2019-07-10 09:03:46 +02:00
Chih-Wei Huang
0d394f1734 android: virgl: remove unnecessary LOCAL_C_INCLUDES
The path could be imported automatically.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
2019-07-10 08:56:47 +02:00
Chih-Wei Huang
4dc129e4f4 android: vulkan/util: fix generating vk_enum_to_str.*
The gen_enum_to_str.py generates vk_enum_to_str.c and its header at once.
However, the makefiles incorrectly list both files parallel with the same
recipes. That means both two files may be generated simultaneously by two
processes. The generating files may be truncated by another process, as
shown below:

$ cd $OUT/obj/STATIC_LIBRARIES/libmesa_vulkan_util_intermediates/util
$ ls -l

-rw-rw-r-- 1 lh lh 193713 Jul  5 13:31 vk_enum_to_str.c
-rw-rw-r-- 1 lh lh   4609 Jul  5 13:31 vk_enum_to_str.d
-rw-rw-r-- 1 lh lh      0 Jul  5 16:21 vk_enum_to_str.h

Let one file depends on the other with empty recipe to avoid the issue.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-10 08:56:37 +02:00
Chih-Wei Huang
a74285def2 android: radv: import include paths from used libraries
It's unnecessary to manually add these include paths since they could
be imported automatically.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 08:56:28 +02:00
Chih-Wei Huang
f982c6789c android: anv: import include path of libmesa_nir
Add libmesa_nir to a common LOCAL_STATIC_LIBRARIES defined by
ANV_STATIC_LIBRARIES so that its include path can be imported
automatically. Then ANV_INCLUDES is unnecessary and could be
eliminated.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 08:56:23 +02:00
Chih-Wei Huang
5cb61f27d0 android: anv: eliminate libmesa_anv_entrypoints
The dummy library libmesa_anv_entrypoints is totally unnecessary.
The four VULKAN_GENERATED_FILES could be generated and built in
libmesa_vulkan_common directly. The libraries using the generated
headers should get it via the exported include path.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 08:56:16 +02:00
Chih-Wei Huang
4338e08bd6 android: vulkan/util: fix export path
Export the correct include path so that the libraries use it can
get it automatically.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 08:56:10 +02:00
Chih-Wei Huang
e2ef281da1 android: radv: fix improper use of LOCAL_WHOLE_STATIC_LIBRARIES
The libmesa_git_sha1 is a dummy library. There is no reason to put
it into LOCAL_WHOLE_STATIC_LIBRARIES.

Move libmesa_vulkan_util to the vulkan.radv which really needs it.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 08:56:04 +02:00
Chih-Wei Huang
8ff01f0342 android: anv: fix improper use of LOCAL_WHOLE_STATIC_LIBRARIES
The libmesa_anv_entrypoints and libmesa_genxml are dummy libraries.
There is no reason to put them into LOCAL_WHOLE_STATIC_LIBRARIES.

Move libmesa_vulkan_util to the vulkan HAL which really needs it.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 08:55:59 +02:00
Chih-Wei Huang
352d91ce5b android: radv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS
The vulkan module is the final HAL. No need to export its headers
since none will import it.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 08:55:50 +02:00
Chih-Wei Huang
4fb11c01c5 android: anv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS
The vulkan module is the final HAL. No need to export its headers
since none will import it.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 08:55:42 +02:00
Jason Ekstrand
7e0fcea727 nir/loop_analyze: Pass nir_const_values directly to helpers
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
ff972c7a3a nir/loop_analyze: Properly handle swizzles in loop conditions
This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles.  Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition.  The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.

Shader-db results on Kaby Lake:

    total loops in shared programs: 4388 -> 4364 (-0.55%)
    loops in affected programs: 29 -> 5 (-82.76%)
    helped: 29
    HURT: 5

Shader-db results on Haswell:

    total loops in shared programs: 4370 -> 4373 (0.07%)
    loops in affected programs: 2 -> 5 (150.00%)
    helped: 2
    HURT: 5

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
0333649e63 nir/loop_analyze: Refactor detection of limit vars
This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure.  This makes their callers
significantly simpler.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
8f7405ed9d nir: Add some helpers for chasing SSA values properly
There are various cases in which we want to chase SSA values through ALU
ops ranging from hand-written optimizations to back-end translation
code.  In all these cases, it can be very tricky to do properly because
of swizzles.  This set of helpers lets you easily work with a single
component of an SSA def and chase through ALU ops safely.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
9a3cb6f5fe nir/loop_analyze: Bail if we encounter swizzles
None of the current code knows what to do with swizzles.  Take the safe
option for now and bail if we see one.  This does have a small shader-db
impact but it is at least safe.

Shader-db results on Kaby Lake:

    total loops in shared programs: 4364 -> 4388 (0.55%)
    loops in affected programs: 5 -> 29 (480.00%)
    helped: 5
    HURT: 29

Shader-db results on Haswell:

    total loops in shared programs: 4373 -> 4370 (-0.07%)
    loops in affected programs: 5 -> 2 (-60.00%)
    helped: 5
    HURT: 2

Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
6455fa9710 nir/loop_analyze: Use new eval_const_* helpers in test_iterations
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
268ad47c11 nir/loop_analyze: Handle bit sizes correctly in calculate_iterations
The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means.  Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way.  We also use the new
constant constructors to build the correct size constants.

Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
9f7ffe41dd nir/loop_analyze: Fix phi-of-identical-alu detection
One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions.  Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true.  If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.

Fixes: 9e6b39e1d5 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
6e984bcb92 nir/instr_set: Expose nir_instrs_equal()
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
64328f947e nir/builder: Use nir_const_value_for_* for constructing immediates
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
3acddc733f nir: Refactor nir_src_as_* constant functions
Now that we have the nir_const_value_as_* helpers, every one of these
functions is effectively the same except for the suffix they use so we
can easily define them with a repeated macro.  This also means that
they're inline and the fact that the nir_src is being passed by-value
should no longer really hurt anything.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
ce5581e23e nir: Add more helpers for working with const values
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Chia-I Wu
b44bb8bded virgl: remove virgl_transfer_queue_lists
COMPLETED_LIST is always empty.  We only need one list.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
48aefcbd6b virgl: simplify virgl_transfer_queue_extend
We can reuse virgl_transfer_queue_find_pending.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
eae4527551 virgl: remove transfer after transfer_write
Now that virgl_transfer_queue_is_queued does not search
COMPLETED_LIST, we don't need to move transfers to that list.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
bec2a85c48 virgl: improve virgl_transfer_queue_is_queued
Search only the pending list and return immediately on the first
hit.

When the transfer queue was introduced, the function was used to
deal with

  write transfer -> draw -> write transfer

sequence.  It was used to tell if the second transfer intersects
with the first transfer. If yes, the transfer queue avoided
reordering the second transfer to before the draw (by flushing) in
case the draw uses the transferred data.

With the recent changes to the transfer code, the function is used
to deal with

  write transfer -> readback transfer

We want to avoid reordering the readback transfer to before the
first transfer (also by flushing).

In the old code, we needed to track the compeleted transfers as well
to avoid reordering.  But in the new code, a readback transfer is
guaranteed to see the data from the completed transfers (in other
words, it cannot be reoderered to before the already completed
transfers).  We don't need to search the COMPLETED_LIST.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
5f6aab2ee2 virgl: fix transfers_intersect for mipmaps
We never use transfers_intersect with textures, but fix it anyway to
avoid confusion.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
6ca1bbabbe virgl: fix some false positives in transfers_overlap
Rewrite the function and check z/depth more carefully.  We
intentionally avoid u_box_test_intersection_2d because it returns
true when two boxes touch but do not intersect and can be confusing.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Marek Olšák
2b2093961e radeonsi/gfx10: enable primitive binning by default
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
9f68367d19 radeonsi/gfx10: implement primitive binning
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
4e56a2aaa8 radeonsi: simplify primitive binning enablement
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
3521297251 radeonsi: set primitive binning tunables for dGPUs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
d7e80ba1e7 radeonsi: set FLUSH_ON_BINNING_TRANSITION when needed
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
9dbe63ceea radeonsi/gfx10: use the new scan converter when binning is disabled
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
80b3f4b4bd radeonsi/gfx9: fix an oversight in primitive binning code
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
1f53a3e766 radeonsi: use BREAK_BATCH instead of FLUSH_DFSM when CB_TARGET_MASK changes
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
605900d7dd radeonsi/gfx10: don't expose unimplemented PIPE_CAP_QUERY_SO_OVERFLOW
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
270a8ab648 radeonsi/gfx10: launch 2 compute waves per CU before going onto the next CU
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
ab1f36a1d3 radeonsi/gfx10: set more registers and fields
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
9b65f6618c radeonsi/gfx10: enable LATE_ALLOC_GS
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
4985c3ee22 radeonsi/gfx10: set HS/GS/CS.WGP_MODE
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
329406ec9c radeonsi/gfx10: set GE_PC_ALLOC
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
9d1483de3b radeonsi/gfx10: enable 1D textures
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
1d3bffaf9c radeonsi/gfx10: enable image stores with DCC
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
5b50fb9b7f radeonsi/gfx10: no need to invalidate L2 for framebuffer -> texture coherency
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
fbf781e401 radeonsi/gfx10: support pixel shaders without exports
It only works if there are not color and no Z exports.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
2adc8e2736 radeonsi/gfx10: enable vertex shaders without param space allocation
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
07fe51156d radeonsi: update DCC settings from PAL
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
4002913f8d radeonsi: reorder shader IO indices for better IO space usage for tess and GS
The highest used index determines the stride for shader outputs in shaders
that use LDS or memory for outputs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
1c99a13f89 radeonsi: decrease maximum supported GENERIC varying index from 42 to 31
This can decrease LDS and/or memory usage for shader outputs when geometry
shaders or tessellation is used.

Only PS inputs support higher indices and those aren't eliminated by
kill_outputs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
6335cc6a58 radeonsi: cosmetic cleanup in si_shader_io_get_unique_index
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
3be4ed2fe1 radeonsi: fix and clean up shader_type passing
- don't pass it via a parameter if it can be derived from other parameters
- set shader_type for ac_rtld_open
- use enum pipe_shader_type instead of unsigned

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
37b26671a7 radeonsi: enable RB+ for pixel shaders with no/non-contiguous color outputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie airlied@redhat.com
2019-07-09 17:24:16 -04:00
Marek Olšák
5058d62b05 radeonsi: don't set READ_ONLY for const_uploader to fix bindless texture hangs
Bindless textures can update descriptors with WRITE_DATA.

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie airlied@redhat.com
2019-07-09 17:24:16 -04:00
Alyssa Rosenzweig
6074eae753 gallium: Add util_format_is_unorm8 check
Useful for formats that would work with the same driver code path as
RGBA8 UNORM but that don't meet the util_format_is_rgba8_variant
criteria due to a smaller channel count.

v2: Use simpler logic (suggested by Iago).

v3: Fix spelling erorr. boolean->bool (thank you airlied).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 21:17:47 +00:00
Alyssa Rosenzweig
15000c79da nir: Add Panfrost-specific blending intrinsic
This gives more flexibility than the normal store_deref/store_output
versions (particularly, it allows us to abuse the type system in awful
ways, which is necessary for efficient format conversion in blend
shaders.)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
2019-07-09 14:07:23 -07:00
Pratik Vishwakarma
177a3df7b0 radeonsi: Expose support for 10-bit VP9 decode
Fix si_vid_is_format_supported to expose support
for 10-bit VP9 decode using P016 format. Without
this change, 10-bit decode will be exposed only
for HEVC even though newer hardware support
10-bit decode for VP9.

Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2019-07-09 15:26:54 -04:00
Alyssa Rosenzweig
4a4b48fb05 nir: Add nir_imm_vec4_16
We already have nir_imm_float16 and nir_imm_vec4; let's add the ability
to easily make immediate fp16 vectors as well, now that fp16 support is
maturing in NIR/GLSL.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-09 18:43:07 +00:00
Karol Herbst
a110a8090d nvc0: remove nvc0_program.tp.input_patch_size
right now that's dead code

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-07-09 12:41:54 +02:00
Bas Nieuwenhuizen
14291342ec radv: Add a common member in the union to make things more clear.
This clarifies that the struct can be used when the shader can be
one of VS/TES.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-09 09:59:07 +00:00
Bas Nieuwenhuizen
f9070743a9 Revert "radv: keep track of whether NGG is used for GS on GFX10"
This reverts commit 63e0675d98.

The GS is merged with the preceding shader and since the preceding
shader will have as_ngg set the final binary will have is_ngg set.

So we do not need the gs key here.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-09 09:59:07 +00:00
Juan A. Suarez Romero
d33e93d332 docs: update calendar, add news item and link release notes for 19.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-07-09 11:22:13 +02:00
Juan A. Suarez Romero
3c90baf047 docs: add sha256 checksums for 19.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit e42399f4de)
2019-07-09 09:19:25 +00:00
Juan A. Suarez Romero
0f51d69087 docs: add release notes for 19.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit fe1f7b538b)
2019-07-09 09:19:24 +00:00
Connor Abbott
86968327df nir/lower_io_to_temporaries: Fix hash table leak
Fixes: c45f5db527 ("nir/lower_io_to_temporaries: Handle interpolation intrinsics")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-09 10:39:37 +02:00
Bas Nieuwenhuizen
64cd972ffb radv/gfx10: Use correct gs_out for tess point_mode.
Fixes: 204e4da9b4 "radv: Use correct gs_out with tessellation."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-09 09:52:50 +02:00
Samuel Pitoiset
3f50007ad8 radv: set correct number of VGPRs for GS on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:27 +02:00
Samuel Pitoiset
611ddf794e radv: fix VGT_ESGS_RING_ITEMSIZE for GS as NGG on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:24 +02:00
Samuel Pitoiset
eca8a478a5 radv: emit VGT_GS_MAX_VERT_OUT for legacy and NGG paths for GS
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:22 +02:00
Samuel Pitoiset
f240147cf7 radv: emit the geometry shader as NGG if enabled on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:21 +02:00
Samuel Pitoiset
63e0675d98 radv: keep track of whether NGG is used for GS on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:19 +02:00
Samuel Pitoiset
c81b719812 radv: add radv_pipeline_generate_hw_gs() helper
For legacy GS path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:17 +02:00
Samuel Pitoiset
54e2470047 radv: fix setting VGT_REUSE_OFF for TES on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:16 +02:00
Samuel Pitoiset
d2a8b63a2c radv: fix computing the number of ES VGPRS for TES on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:14 +02:00
Samuel Pitoiset
2974df819e radv: set max workgroup size to 128 for TES as NGG on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:12 +02:00
Samuel Pitoiset
53c75f17ec radv: fix allocating USER SGPRs on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:11 +02:00
Alejandro Piñeiro
71446bf8e3 v3d: Early return with handle 0 when getting a bo on the simulator
Until now we were just asking entries on the bo hash table, and don't
worry if the handle was NULL, as we were just expecting to get a NULL
in return. It seems that now the hash table assert with some reserverd
pointers, included NULL. This commit just early returns with handle 0.

This change fixes several crashes on vk-gl-cts GLES tests when using
the v3d simulator, like:
KHR-GLES3.core.internalformat.copy_tex_image.*

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-09 08:40:35 +02:00
Lionel Landwerlin
b031dd9010 vulkan/overlay: use a single macro to lookup objects
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-09 09:13:21 +03:00
Lionel Landwerlin
b3a96e69ac vulkan/overlay: add queue present timing measurement
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-09 09:13:19 +03:00
Bas Nieuwenhuizen
f7f08b2d81 radv/gfx10: Enable tess.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:04:29 +10:00
Bas Nieuwenhuizen
795adbbadd radv/gfx10: Add pipeline state support for tess.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:04:26 +10:00
Bas Nieuwenhuizen
23c6698ea2 radv/gfx10: Only set HW edge flags with gs & tess disabled.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:04:23 +10:00
Bas Nieuwenhuizen
9a8e4a07ad radv/gfx10: Add tess eval ngg shader support.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:04:20 +10:00
Bas Nieuwenhuizen
204e4da9b4 radv: Use correct gs_out with tessellation.
We should use the primitives output by the TES in that case.

There is always a separate TES if there is no GS.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:04:16 +10:00
Bas Nieuwenhuizen
343a435c46 radv/gfx10: Use correct count of max_offchip_buffers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:04:12 +10:00
Bas Nieuwenhuizen
5d0dbc2564 radv/gfx10: Load global pointers in correct userdata registers for hs/gs.
Fixes: cfaad5e3ca "radv/gfx10: implement radv_emit_global_shader_pointers()"

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:03:51 +10:00
Timothy Arceri
6b60cfd079 radeonsi: update function name in comment
This was missed in 2361558eb7

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-09 10:00:23 +10:00
Timothy Arceri
7c612c49b4 r600: remove query/apply_opaque_metadata callbacks
Theses seem to have been radeonsi specific callbacks that are no
longer needed now that these drivers no longer share this code
path.

These callbacks were removed from radeonsi in c0d44fe0e9.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-09 10:00:23 +10:00
Lionel Landwerlin
a72351cc76 vulkan/overlay: fix crash on freeing NULL command buffer
It is legal to call vkFreeCommandBuffers() on NULL command buffers.

This fix requires eb41ce1b01 ("util/hash_table: Properly handle
the NULL key in hash_table_u64").

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4438188f49 ("vulkan/overlay: record stats in command buffers and accumulate on exec/submit")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 21:49:26 +00:00
Lionel Landwerlin
6271d16320 vulkan: bump headers & registry to 1.1.114
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-09 00:09:36 +03:00
Dave Airlie
6422fa75b4 radv: only use specialised 3D meta paths on GFX9.
GFX10 appears to act like GFX8 here.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-09 06:32:28 +10:00
Ian Romanick
0349bc3ce2 mesa: Set minimum possible GLSL version
Set the absolute minimum possible GLSL version.  API_OPENGL_CORE can
mean an OpenGL 3.0 forward-compatible context, so that implies a minimum
possible version of 1.30.  Otherwise, the minimum possible version 1.20.
Since Mesa unconditionally advertises GL_ARB_shading_language_100 and
GL_ARB_shader_objects, every driver has GLSL 1.20... even if they don't
advertise any extensions to enable any shader stages (e.g.,
GL_ARB_vertex_shader).

Converts about 2,500 piglit tests from crash to skip on NV18.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109524
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110955
Cc: mesa-stable@lists.freedesktop.org
2019-07-08 12:34:09 -07:00
Caio Marcelo de Oliveira Filho
d577db293d anv: Set maxComputeSharedMemorySize to 64k
This value is supported since gen7.  See also 8514c75a26 "i965: Set
compute shader shared memory max to 64k".

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 11:35:42 -07:00
Ian Romanick
dd2dc7e707 intel/vec4: Delete vec4_visitor::emit_lrp
Effectivley unused since dd7135d55d ("intel/compiler: Use the flrp
lowering pass for all stages on Gen4 and Gen5").  I had intended to
remove this code as part of that series, but I forgot.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:11 -07:00
Ian Romanick
5450fd7a36 nir: Allow nir_ssa_alu_instr_src_components to operate on non-SSA destinations
Existing users only operate on instructions with SSA destinations.  Some
later patches add new direct calls and indirect calls (via existing NIR
functions) on instructions after going out of SSA.  At the very least,
these calls are added by:

intel/vec4: Try to emit a VF source in try_immediate_source
intel/vec4: Try to emit a single load for multiple 3-src instruction operands

The first commit adds direct calls, and the second adds calls via
nir_alu_srcs_equal and nir_alu_srcs_negative_equal.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:11 -07:00
Ian Romanick
12217de08c nir: Handle swizzle in nir_alu_srcs_negative_equal
When I added this function, I was not sure if swizzles of immediate
values were a thing that occurred in NIR.  The only existing user of
these functions is the partial redundancy elimination for compares.
Since comparison instructions are inherently scalar, this does not
occur.

However, a couple later patches, "nir/algebraic: Recognize
open-coded flrp(-1, 1, a) and flrp(1, -1, a)" combined with "intel/vec4:
Try to emit a single load for multiple 3-src instruction operands",
collaborate to create a few thousand instances.

No shader-db changes on any Intel platform.

v2: Handle the swizzle in nir_alu_srcs_negative_equal and leave
nir_const_value_negative_equal unchanged.  Suggested by Jason.

v3: Correctly handle write masks.  Add note (and assertion) that the
caller is responsible for various compatibility checks.  The single
existing caller only calls this for combinations of scalar fadd and
float comparison instructions, so all of the requirements are met.  A
later patch (intel/vec4: Try to emit a single load for multiple 3-src
instruction operands) will call this for sources of the same
instruction, so all of the requirements are met.

v4: Add unit test for nir_opt_comparison_pre that is fixed by this
commit.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:11 -07:00
Ian Romanick
ad50e812a3 nir: nir_const_value_negative_equal compares one value at a time
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:10 -07:00
Ian Romanick
bcd22b740c nir: Port some const_value_negative_equal tests to alu_src_negative_equal
The next commit will make the existing tests irrelevant.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 11:30:10 -07:00
Ian Romanick
ec96c289ea nir: Pass fully qualified type to nir_const_value_negative_equal
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:10 -07:00
Ian Romanick
0ac5ff9ecb nir: Use nir_src_bit_size instead of alu1->dest.dest.ssa.bit_size
This is important because, for example nir_op_fne has
dest.dest.ssa.bit_size == 1, but the source operands can be 16-, 32-, or
64-bits.  Fixing this helps partial redundancy elimination for compares
in a few more shaders.

v2: Add unit tests for nir_opt_comparison_pre that are fixed by this
commit.

All Intel platforms had similar results.
total instructions in shared programs: 17179408 -> 17179081 (<.01%)
instructions in affected programs: 43958 -> 43631 (-0.74%)
helped: 118
HURT: 2
helped stats (abs) min: 1 max: 5 x̄: 2.87 x̃: 2
helped stats (rel) min: 0.06% max: 4.12% x̄: 1.19% x̃: 0.81%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 5.83% max: 6.06% x̄: 5.94% x̃: 5.94%
95% mean confidence interval for instructions value: -3.08 -2.37
95% mean confidence interval for instructions %-change: -1.30% -0.85%
Instructions are helped.

total cycles in shared programs: 360959066 -> 360942386 (<.01%)
cycles in affected programs: 774274 -> 757594 (-2.15%)
helped: 111
HURT: 4
helped stats (abs) min: 1 max: 1591 x̄: 169.49 x̃: 36
helped stats (rel) min: <.01% max: 24.43% x̄: 8.86% x̃: 2.24%
HURT stats (abs)   min: 1 max: 2068 x̄: 533.25 x̃: 32
HURT stats (rel)   min: 0.02% max: 5.10% x̄: 3.06% x̃: 3.56%
95% mean confidence interval for cycles value: -200.61 -89.47
95% mean confidence interval for cycles %-change: -10.32% -6.58%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: be1cc3552b ("nir: Add nir_const_value_negative_equal")
2019-07-08 11:30:10 -07:00
Ian Romanick
47c2aa5b48 intel/vec4: Reswizzle VF immediates too
Previously, an instruction like

mul(8) vgrf29.xy:F, vgrf25.yxxx:F, [-1F, 1F, 0F, 0F]

would get rewritten as

mul(8) vgrf0.yz:F, vgrf25.yyxx:F, [-1F, 1F, 0F, 0F]

The latter does not produce the correct result.  The VF immediate in the
second should be either [-1F, -1F, 1F, 1F] or [0F, -1F, 1F, 0F].  This
commit produces the former.

Fixes: 1ee1d8ab46 ("i965/vec4: Reswizzle sources when necessary.")
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:10 -07:00
Ian Romanick
b08d704051 nir: Add unit tests for nir_opt_comparison_pre
Each tests has a comment with the expected before and after NIR.  The
tests don't actually check this.  The tests only check whether or not
the optimization pass reported progress.  I couldn't think of a robust,
future-proof way to check the before and after code.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:10 -07:00
Dongwon Kim
f734e2a042 anv: disable repacking for compression for applicable gen
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-07-08 10:54:38 -07:00
Dongwon Kim
6866765cb3 iris: disable repacking for compression for applicable gen
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-07-08 10:54:38 -07:00
Dongwon Kim
00df55cdc9 i965: disable repacking for compression for applicable gen
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-07-08 10:54:38 -07:00
Dongwon Kim
eb6d067e68 intel: add disable_ccs_repack to gen_device_info
add a new attribute, 'disable_ccs_repack' to gen_device info, which
indicates whether repacking of components in certain pixel formats
before compression needs to be disabled to keep the compatibility
with decompression capability of display controller (gen11+)

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-07-08 10:54:38 -07:00
Dongwon Kim
e6ac6d3224 intel/genxml: correct bit fields in CACHE_MODE_0 reg for gen11
correct bit fields information of CACHE_MODE_0 reg in current gen11.xml

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-07-08 10:54:37 -07:00
Caio Marcelo de Oliveira Filho
2614319259 nir: print ptr_stride for deref_casts
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-08 10:05:56 -07:00
Caio Marcelo de Oliveira Filho
9c7adaeb5f anv: Advertise VK_EXT_shader_demote_to_helper_invocation
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 08:57:25 -07:00
Caio Marcelo de Oliveira Filho
1a83c9a619 spirv: Implement SPV_EXT_demote_to_helper_invocation
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 08:57:25 -07:00
Caio Marcelo de Oliveira Filho
5a7c69399d spirv: Update the headers from latest Khronos master
This corresponds to 29c11140baaf9f7fdaa39a583672c556bf1795a1 in
https://github.com/KhronosGroup/SPIRV-Headers.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 08:57:25 -07:00
Caio Marcelo de Oliveira Filho
45f5db5a84 intel/fs: Implement "demote to helper invocation"
The "demote" intrinsic works like "discard" but don't change the
control flow, allowing derivative operations to work.  This is the
semantics of D3D discard.

The "is_helper_invocation" intrinsic will return true for helper
invocations -- both the ones that started as helpers and the ones that
where demoted.  This is needed to avoid changing the behavior of
gl_HelperInvocation which is an input (so not expected to change
during shader execution).

v2: Emit the discard jump and comment why it is safe.  (Jason)
    Rework the is_helper_invocation() that was stomping f0.1.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 08:57:25 -07:00
Caio Marcelo de Oliveira Filho
a42e8f0ed1 nir: Add demote and is_helper_invocation intrinsics
From SPV_EXT_demote_to_helper_invocation.  Demote will be implemented
as a variant of discard, so mark uses_discard if it is used.

v2: Add CAN_ELIMINATE flag to the new intrinsic.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 08:57:25 -07:00
Samuel Pitoiset
9b116173b6 radv: do not emit VGT_FLUSH on GFX10
We don't need it.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:45:23 +02:00
Connor Abbott
0c114ae3be ac/nir: Remove now-unused interp_deref handling
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-08 14:18:52 +02:00
Connor Abbott
b3a226691d radeonsi/nir: Use NIR barycentric intrinsics
This is simpler than radv, since the driver_location is already assigned
for us.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-08 14:18:46 +02:00
Connor Abbott
d1c65939e2 radeonsi/nir: Delete unreachable code
We always get gl_FragCoord as a system value, not a varying, so this is
never hit. We already set PIXEL_CENTER_INTEGER elsewhere.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-08 14:18:41 +02:00
Connor Abbott
e5536aa584 compiler: Add color system value
This is nice to have with radeonsi, where color varyings are handled
specially to avoid recompiles.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-08 14:18:34 +02:00
Connor Abbott
118a66df99 radv: Use NIR barycentric intrinsics
We have to add a few lowering to deal with things that used to be dealt
with inline when creating inputs. We also move the code that fills out
the radv_shader_variant_info struct for linking purposes to
radv_shader.c, as it's no longer tied to the NIR->LLVM lowering.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:18:25 +02:00
Connor Abbott
0cad0424e9 ac/nir: Implement barycentric intrinsics
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-08 14:18:25 +02:00
Connor Abbott
6b28808b22 intel/nir: Extract add_const_offset_to_base
Pretty much every driver using nir_lower_io_to_temporaries followed by
nir_lower_io is going to want this. In particular, radv and radeonsi in
the next commits.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Connor Abbott
c45f5db527 nir/lower_io_to_temporaries: Handle interpolation intrinsics
These weren't properly supported. This does pretty much the same thing
that the radv code did.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Connor Abbott
3a2ea2af9d nir: Avoid coalescing vars created by lower_io_to_temporaries
Right now nir_copy_prop_vars is effectively undoing
nir_lower_io_to_temporaries for inputs by propagating the original
variable through the copy created in lower_io_to_temporaries. A
theoretical variable coalescing pass would have the same issue with
output variables, although that doesn't exist yet. To fix this, add a
new bit to nir_variable, and disable copy propagation when it's set.

This doesn't seem to affect any drivers now, probably since since no one
uses lower_io_to_temporaries for inputs as well as copy_prop_vars, but
it will fix radv once we flip on lower_io_to_temporaries for fs inputs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Connor Abbott
f3e2c65041 nir: Return correct size in nir_assign_io_var_locations()
It was double-counting cases where multiple variables were assigned to
the same slot, and not handling the case where the last variable is a
compact variable.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Connor Abbott
dd81d8808d nir: Handle compact variables when assigning i/o locations
These are used in Vulkan for clip/cull distances, instead of the GLSL
lowering when the clip/cull arrays are shared.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Connor Abbott
fd5ed6b9d6 nir: Move st_nir_assign_var_locations() to common code
It isn't really doing anything Gallium-specific, and it's needed for
handling component packing, overlapping, etc.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-08 14:15:06 +02:00
Connor Abbott
27f0c3c15e radv: Make FragCoord a sysval
load_fragcoord is already handled in common code for radeonsi, so we
don't need to do anything to handle it. However, there were some passes
creating NIR with the varying, so we switch them over to the sysval. In
the case of nir_lower_input_attachments which is used by both radv and
anv, we add handling for both until intel switches to using a sysval.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Connor Abbott
64f3fc5ea6 spirv: Add an option for making FragCoord a sysval
On AMD, FragCoord should be a sysval because it is handled separately
from all the other inputs. We were already doing this in radeonsi, but
we weren't doing it with radv. It'll be much more annoying to handle
VARYING_SLOT_POS in fragment shaders when we let NIR lower FS inputs for
us, so here we add an option so that radv can get it as a system value.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Daniel Schürmann
e41e932e57 radv: Lower input attachments in NIR.
v2 (Connor)
- Fix warning in release mode using MAYBE_UNUSED

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Daniel Schürmann
c65e880a65 radv: Implement nir_intrinsic_load_layer_id().
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-08 14:14:53 +02:00
Daniel Schürmann
c31f470066 anv,nir: Move lower_input_attachments pass from ANV to NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:02:50 +02:00
Dave Airlie
1d327689f9 radv/gfx10: don't emit PFP packets on ME.
This was done for all previous GPUs.

This fixes Talos Principle launch hangs.

Fixes: 7e43022e8c (radv/gfx10: add gfx10_cs_emit_cache_flush)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 17:19:42 +10:00
Samuel Pitoiset
49e5136887 ac: select the GFX ring when halting waves with UMR on GFX10
GFX10 has two rings, so UMR want to know which one to halt.
Select the first one by default.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 09:10:57 +02:00
Bas Nieuwenhuizen
4d118ad44a radv/gfx10: Move NGG output handling outside of giant if-statement.
In merged shaders we put a big if around each shader, so both stages
can have a different number of threads. However, the NGG output code
still needs to run if the first shader is not executed.

This can happen when there are more gs threads than vs/es threads, or
when there are 0 es/vs threads (why? no clue).

Fixes: ee21bd7440 "radv/gfx10: implement NGG support (VS only)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-08 01:49:54 +02:00
Bas Nieuwenhuizen
703efab7e4 radv: Actually use VK formats for the format table.
No ETC2 or ASTC on navi so nothing to add.

Fixes: 3dc5ec5d16 "radv/gfx10: generate gfx10_format_table.h"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-07 23:10:32 +02:00
Chia-I Wu
5824130389 anv: fix VkExternalBufferProperties for host allocation
It was reported as unsupported previously.  It should be importable
and is compatible with itself.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Fixes: 69cc6272fb ("anv: Implement VK_EXT_external_memory_host")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-07 13:31:58 -07:00
Chia-I Wu
f3c7a02a62 anv: fix VkExternalBufferProperties for unsupported handles
compatibleHandleTypes must include the queried handle type.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-07 13:31:58 -07:00
Bas Nieuwenhuizen
e46b41b3ae radv: Handle cmask being disallowed by addrlib.
alignment=0 does weird things with align64.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-07 21:29:52 +02:00
Samuel Pitoiset
5eaed7ecfc radv/gfx10: enable support for NAVI10, NAVI12 and NAVI14
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen
817bd0cc2e radv/gfx10: Use GS rectlist when needed.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
ee21bd7440 radv/gfx10: implement NGG support (VS only)
This needs to be cleaned up a bit, and it probably contains
missing stuff and/or bugs.

This doesn't fix the "half of the triangles" issue.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen
9e37609d0b radv: Combine vs and tes output keys parts.
That way the same deref is valid for both shader stages.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen
d0978427cb radv/gfx10: Use new uconfig reg index packet for GFX10+.
Otherwise the hardware/firmware seems to not set the registers.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen
aeb5b1a998 radv/gfx10: Set MEM_ORDERED flags on shaders.
Scattered because depending on stage they are at offset 24/25/27/30.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
67b6888d8b radv/gfx10: emit GE_CNTL instead of IA_MULTI_VGT_PARAM for legacy mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
74d69299d1 radv/gfx10: double the number of tessellation offchip buffers per SE
Each gfx10 shader engine corresponds to two gfx9 shader engines, so scale
the number of offchip buffers accordingly.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
bf1e1a29c3 radv/gfx10: require LLVM 9+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
0f769ed398 radv/gfx10: disable geometry and tessellation shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
fe4419d3c7 radv/gfx10: disable binning
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
faf27ee9b3 radv/gfx10: disable CLEAR_STATE
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
698f9e6fd3 radv/gfx10: disable VK_EXT_transform_feedback
It requires a bunch of work, so disable for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
2141e6fc73 radv/gfx10: set user data base registers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
7e43022e8c radv/gfx10: add gfx10_cs_emit_cache_flush
The cache flush logic on GFX10 is quite different and it's
implemented with a new function.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
b0b6e27bca radv/gfx10: set the DCC constant encoding flag
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
ce3b5d4c17 radv/gfx10: do not declare streamout SGPRS
Streamout is completely different on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
352365c5e2 radv/gfx10: do not set stream output shader config
Transform feedback is really different on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
3f68329806 radv/gfx10: emit VGT_VERTEX_REUSE_BLOCK_CNTL during gfx initialization
The value doesn't need to be updated for tess.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
2a83154b4a radv/gfx10: update shader-related fields in si_emit_graphics()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
5556f16609 radv/gfx10: implement si_emit_compute()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
c90f46700d radv/gfx10: mask DCC tile swizzle by alignment
DCC alignment can be less than the alignment of the main surface. In that
case, the DCC tile swizzle needs to be masked accordingly. Should have no
impact on pre-gfx10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
b1b60a92b1 radv/gfx10: initialize GE_{MAX,MIN}_VTX_INDX/INDX_OFFSET
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
12a42c2d9f radv/gfx10: implement radv_flush_vertex_descriptors() change
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset
0ca09a7fe3 radv/gfx10: implement fill_geom_tess_rings()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:31 +02:00
Samuel Pitoiset
ebeb319f0e radv/gfx10: implement radv_CmdBindDescriptorSets()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:31 +02:00
Samuel Pitoiset
97891a0d10 radv/gfx10: implement write_buffer_descriptor()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:31 +02:00
Samuel Pitoiset
bdd8acde02 radv/gfx10: use the correct register for image descriptor dumping
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:31 +02:00
Samuel Pitoiset
e5a8f21b0e radv/gfx10: implement radv_pipeline_generate_hw_hs()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:31 +02:00
Samuel Pitoiset
4c82094b7b radv/gfx10: implement radv_fill_shader_variant()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:39 +02:00
Samuel Pitoiset
b144a70ca8 radv/gfx10: implement radv_pipeline_generate_geometry_shader()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:39 +02:00
Samuel Pitoiset
5551d6d6ea radv/gfx10: implement radv_init_sampler()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:39 +02:00
Samuel Pitoiset
4c31f3dcc0 radv/gfx10: fix PS exports for SPI_SHADER_32_AR
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:39 +02:00
Samuel Pitoiset
8574a84291 radv/gfx10: implement radv_get_device_name()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
863727c4a3 radv/gfx10: set RADV_FORCE_FAMILY
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
34b185cc43 radv/gfx10: fix a possible hang with exp pos0 with done=0 and exec=0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
b3a53de5fa radv/gfx10: set PA_SC_TILE_STEERING_OVERRIDE
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
96cd24588b radv/gfx10: set cache control registers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
9a01eded0c radv/gfx10: set llvm_has_working_vgpr_indexing
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
6b9dbb28ef radv/gfx10: update DB_DFSM_CONTROL register
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
2435b571de radv/gfx10: update DB_Z_INFO register
GFX10 uses the same register as GFX8.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
cfaad5e3ca radv/gfx10: implement radv_emit_global_shader_pointers()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
3f5ca22e9c radv/gfx10: implement radv_emit_tess_factor_ring()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
17048c1765 radv/gfx10: implement radv_emit_fb_ds_state()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
2481ac81d3 radv/gfx10: implement radv_initialise_ds_surface()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
c2a5d98148 radv/gfx10: implement radv_emit_fb_color_state()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
e80f189de0 radv/gfx10: implement radv_initialise_color_surface()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
ee8d6a2a6c radv/gfx10: implement radv_init_dcc_control_reg()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
ccce8f5915 radv/gfx10: implement radv_make_buffer_descriptor()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
549d0aeee4 radv/gfx10: implement si_set_mutable_tex_desc_fields()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
bf11f1c3a4 radv/gfx10: add gfx10_make_texture_descriptor
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
3dc5ec5d16 radv/gfx10: generate gfx10_format_table.h
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
9c1266048f radv/gfx10: increase maximum number of layers to 8192
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
0213fe09b8 radv/gfx10: increase maximum number of levels to 14
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
1f82007a9e radv/gfx10: set MAX_ALLOC_COUNT
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
c3459968cd ac/nir: unpacked GS invocation ID on GFX10+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Samuel Pitoiset
4d7c420a94 ac: add missing formats to ac_get_tbuffer_format() for GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Lionel Landwerlin
8f0f727fe4 vulkan/overlay: fix command buffer stats
Begin/Reset of command buffer both reset the content of the command
buffer. Don't forget to wipe them on Begin.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4438188f49 ("vulkan/overlay: record stats in command buffers and accumulate on exec/submit")
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 15:47:54 +03:00
Lionel Landwerlin
5493ec3c19 anv: manually add KHR_display to the list of platforms
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 38305e6c94 ("anv: replace hard-coded platform list with vk.xml parse")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111078
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-07 15:34:09 +03:00
Dave Airlie
002c8cae44 docs/features: add shader buffer and atomic support for llvmpipe 2019-07-07 16:24:21 +10:00
Dave Airlie
2f8cbdfc88 llvmpipe: enable ARB_shader_storage_buffer_object
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:24:17 +10:00
Dave Airlie
df46b3d196 llvmpipe: add support for shader buffer binding.
This add support for setting shader buffers and passing them
to draw or binding them to the fragment shader jit.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:24:12 +10:00
Dave Airlie
d8fb66a3e1 draw: add shader buffer interfaces.
This adds the interface to add mapped shader buffers,
and sets up the jit linkage for them.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:24:09 +10:00
Dave Airlie
b5ac381d8f gallivm: add buffer operations to the tgsi->llvm conversion.
This adds load, store and atomic operations. These operations
have to respect the exec_mask, and can't operate in lanes where
the execute is off. This is needed to avoid side effects seen
outside the shaders.

There is also bounds checking on the ssbo accesses vs the size
ptr.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:24:05 +10:00
Dave Airlie
a845baff16 gallivm: move mask_vec function up higher so it can be reused.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:24:01 +10:00
Dave Airlie
ab807859ea tgsi: denote which load/store/atomic channels are unsigned
llvmpipe will need this info.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:54 +10:00
Dave Airlie
e21007f426 llvmpipe: add support for ssbo to the fragment shader jit.
This just adds the ssbo ptrs to the jit fragment shader api.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:51 +10:00
Dave Airlie
69ff738eb0 draw: add support for ssbo ptrs to jit tables.
This adds ssbo/num_ssbo ptrs to the vs/gs jit tables.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:46 +10:00
Dave Airlie
e84570ba70 gallivm: add some basic SSBO limits. (v2)
v2: update ssbo size

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:44 +10:00
Dave Airlie
7c3807c1b3 util: add util_copy_shader_buffer.
This just adds an inline to copy a pipe_shader_buffer.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:40 +10:00
Dave Airlie
5ff697aa65 gallivm: add ssbo pointers to the soa build api.
Need to pass ssbo + ssbo size pointers just like constants.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:36 +10:00
Dave Airlie
2a55acbc1d gallivm: add compare exchange wrapper
This just pulls the wrapper from LLVM for older versions

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:32 +10:00
Dave Airlie
4f709c86a9 vertex shader: add exec masking (v2)
As suggested by Roland this is just a compare of fetch_max
vs the counter, much simpler than my original spaghetti code.

We require the vertex shader to have an exec mask to get proper
ssbo/image load/atore/atomics semantics

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:27 +10:00
Alexandros Frantzis
4271430dd7 virgl: Hide internal virgl_resource functions
Since the transition to virgl_resource_transfer_map(), several
previously public virgl_resource functions are not required to be public
anymore.

We also move the functions earlier in the file so they can be used
without functions declarations.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-07-06 19:30:38 -07:00
Alexandros Frantzis
e5b54d0018 virgl: Use virgl_resource_transfer_map for textures
Replace custom texture map code (for maps which don't require resolve)
with virgl_resource_transfer_map.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-07-06 19:30:37 -07:00
Alexandros Frantzis
f8975f8f2f virgl: Use virgl_resource_transfer_map for buffers
Replace custom buffer map code with virgl_resource_transfer_map.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-07-06 19:30:34 -07:00
Alexandros Frantzis
bb0a38d819 virgl: Introduce virgl_resource_transfer_map
Normal mapping of buffers and textures uses almost identical logic.
This commit extracts the this logic in the form of the
virgl_resource_transfer_map() helper function.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-07-06 19:30:22 -07:00
Jason Ekstrand
4633298fd6 iris: Use a uint16_t for key sizes
sizeof(struct brw_vs_prog_key) == 324.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-04 19:52:34 -05:00
Marek Olšák
aa5dab27f9 ac: destroy passes in ac_destroy_llvm_compiler
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:39:04 -04:00
Marek Olšák
ea64d66fde ac: use an LLVM fence instead of s.waitcnt when possible
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:39:03 -04:00
Marek Olšák
14450c8c41 ac: remove unused AC_WAIT_EXP
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:39:01 -04:00
Marek Olšák
fe5dbe75b2 ac: only set ac_dlc in ac_llvm_build.c
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:39:00 -04:00
Marek Olšák
8a71f60194 ac: replace glc,slc with cache_policy for loads
cosmetic change

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:38:56 -04:00
Marek Olšák
a29e781961 ac: replace glc,slc with cache_policy for stores
cosmetic change

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:38:54 -04:00
Jonathan Marek
5feb8adb0f etnaviv: implement buffer compression
Vivante GPUs have lossless buffer compression using the tile-status bits,
which can reduce memory access and thus improve performance.

This patch only enables compression for "V4" compression GPUs, but the
implementation is tested on GC2000(V1) and GC3000(V2). V1/V2 compresssion
looks absolutely useless, so it is not enabled.

I couldn't test if this patch breaks MSAA, because it looks like MSAA is
already broken.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:05:18 -04:00
Jonathan Marek
f6a0d17abe etnaviv: detect v4 compression
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:05:18 -04:00
Jonathan Marek
e910acb3f2 etnaviv: rs: don't use etna_compatible_rs_format when possible
This mirrors the change in blt. RS cares about this for msaa/compression.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:05:18 -04:00
Jonathan Marek
66411521ea etnaviv: combine translate_ts_sampler_format/translate_msaa_format
Both translate the same thing, so just add the missing cases into one.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:05:18 -04:00
Jonathan Marek
84c87f40fb etnaviv: fix compression format not set correctly in TS_MEM_CONFIG
VIVS_TS_MEM_CONFIG_COLOR_COMPRESSION_FORMAT() needs to be used.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:05:18 -04:00
Jonathan Marek
53475c85fd etnaviv: set correct ts_clear_value for BLT engine
BLT engine uses all ones to clear TS, set ts_clear_value to match that.
Note: ts_clear_value is never used with BLT engine.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:05:18 -04:00
Jonathan Marek
7c7eaaed4a etnaviv: remove initial CPU ts clear
Since we have "ts_valid" to avoid using uncleared ts, this memset serves
no purpose. Also it is broken because it doesn't use cpu_prep/cpu_fini.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:05:18 -04:00
Jonathan Marek
95d937852e etnaviv: implement TS_MODE for GC7000L
GC7000L has a TS mode with larger tiles, which improves performance.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:05:18 -04:00
Jonathan Marek
bc5ae6a330 etnaviv: fix ts size calculation
The size of the TS is screen->specs.bits_per_tile bits per tile, with each
tile being 64 bytes of the resource.

This gives the same result for 32bpp formats, but reduces the size of TS
for 16bpp formats by 2.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:05:09 -04:00
Jonathan Marek
2f540745ad etnaviv: update headers from rnndb
Update to etna_viv commit 8a8b13a and use new names in the code.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-07-04 14:04:47 -04:00
Eric Engestrom
c314ba2c26 scons: s/HAVE_NO_AUTOCONF/HAVE_SCONS/
Back when autotools and scons were the two build systems, it kinda made
sense to call scons "not autoconf", but autoconf's been gone for a while
now and other build systems have been added (android.mk and meson), so
the name really doesn't make any sense anymore.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-04 16:41:23 +01:00
Bas Nieuwenhuizen
bbbcb49f9b radeonsi: Fix some warnings.
../mesa/src/gallium/drivers/radeonsi/si_compute_blit.c: In function ‘si_clear_buffer’:
../mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:195:11: warning: unused variable ‘clear_alignment’ [-Wunused-variable]
  unsigned clear_alignment = MIN2(clear_value_size, 4);
           ^~~~~~~~~~~~~~~
[23/60] Compiling C object 'src/gallium/drivers/radeonsi/3cdc30e@@radeonsi@sta/si_compute_prim_discard.c.o'.
../mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c: In function ‘si_prepare_prim_discard_or_split_draw’:
../mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:1106:7: warning: unused variable ‘compute_has_space’ [-Wunused-variable]
  bool compute_has_space = sctx->ws->cs_check_space(cs, need_compute_dw, false);

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-04 11:12:27 +00:00
Nicolai Hähnle
cb07f91489 amd/common: move ac_shader_{binary,reloc} into r600 and rename
They are no longer used by radeonsi or radv.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-04 10:52:26 +00:00
Nicolai Hähnle
510e74ff48 amd/common: removed unused ac_shader_binary functions
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-04 10:52:26 +00:00
Nicolai Hähnle
b398230e6d amd/common: remove unused ac_compile_module_to_binary
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen
6a220e67ce radv: Switch to using rtld.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen
5ff651c0a7 radv: Move more stuff to variant create time.
Due to them depending on the linker result.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen
726a31df70 radv: Add the concept of radv shader binaries.
This simplifies a bunch of stuff by
(1) Keeping all the things in a single allocation, making things easier
 for the cache.
(2) creating a shader_variant creation helper.

This is immediately put to use by creating rtld shader binaries. This
is the main reason for the binaries, as we need to do the linking at
upload time, i.e. post caching. We do not enable rtld yet.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen
43f2f01cc8 radv: Add export_prim_id to the shader variant info.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen
15046ef7c8 radv: use last nir shader to determine stage in postprocessing
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen
7469516244 radv: Merge rsrc1/rsrc2 fields with the config fields.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Andres Gomez
4000428ada vulkan: Update headers to 1.1.113
Some headers were not dragged in the last update(s).

Fixes: 465ec0b145 ("vulkan: Update the XML and headers to 1.1.113")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-04 10:37:52 +00:00
Samuel Pitoiset
cce2645810 radv: do not crash when generating binning state for unknown chips
These values are only useful if binning is disabled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-04 12:22:46 +02:00
Samuel Pitoiset
8a425e057d radv: fix potential crash in the compute resolve path
If the destination attachment is UNUSED.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-04 12:22:43 +02:00
Tomeu Vizoso
0cc02c9ea6 panfrost: Take into account off-screen FBOs
In that case, ctx->pipe_framebuffer.cbufs[0] can be NULL.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: 5375d009be ("panfrost: Pass referenced BOs to the SUBMIT ioctls")
2019-07-04 10:48:09 +02:00
Christian Gmeiner
f39a7fd627 util/macros: rework DIV_ROUND_UP macro
Simplify used math.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-07-04 10:21:32 +02:00
Christian Gmeiner
e519d3c239 gitlab-ci: bump required libdrm version
Fixes following build problem:
 Message: libdrm 2.4.99 needed because amdgpu has the highest requirement
 Dependency libdrm_intel found: NO found '2.4.97' but need: '>=2.4.99'
 Dependency libdrm_intel found: NO

 meson.build:1178:4: ERROR:  Invalid version of dependency, need 'libdrm_intel' ['>=2.4.99'] found '2.4.97'.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-07-04 09:55:10 +02:00
Kenneth Graunke
9ea67f0a79 iris: Fix MOCS for grid surface
Hardcoding 4 is bad; we have a function for this now.
2019-07-03 22:24:50 -07:00
Kenneth Graunke
10560f8506 iris: Minor tidying 2019-07-03 22:24:44 -07:00
Marek Olšák
6ab23805c3 Revert "mesa/st: Passthrough scissor when clearing by quad"
This reverts commit 0a88aa3025.

It breaks a lot of piglit tests.
2019-07-04 01:08:02 -04:00
Marek Olšák
8dfdf5aae4 gallium/u_blitter: add return to fix the build 2019-07-03 23:44:14 -04:00
Alyssa Rosenzweig
0a88aa3025 mesa/st: Passthrough scissor when clearing by quad
The scissor state -is- setup, but the scissor test is not enabled. This
can prevent certain optimizations from occurring on tilers where
unaffected tiles are thrown out entirely.

v2: Only enable scissor test if the scissor test is actually set by the
app, to avoid regressing quad-based clears used for other reasons (like
a color mask).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-03 14:33:46 -07:00
Nicolai Hähnle
8845a23698 amd: add NAVI10 PCI IDs
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
92e34568b7 radeonsi/gfx10: fix legacy GS
LLVM doesn't insert s_waitcnt_vscnt before GS_DONE.

There was also the crash in legacy GS copy shader.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
dfa8e758c2 radeonsi/gfx10: disable clear state
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
0dd57f0fc0 radeonsi/gfx10: disable DPBB
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
815fd77a47 radeonsi/gfx10: disable SDMA
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
f66ee5af2f radeonsi: determine the rasterization primitive type accurately (v2)
v2: reworked version to fix bugs and make it more efficient

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
a4b3eea325 radeonsi/gfx10: consolidate & improve input_prim determination for NGG
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
969e5176c2 ac: rework ac_build_waitcnt for gfx10
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
214ddfb688 radeonsi/gfx10: implement si_shader_vs
Only used with tessellation + GS instancing.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
6cf2fb1fc4 radeonsi/gfx10: unpack GS invocation ID
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
32694456f7 radeonsi/gfx10: jump over the shader query atomic if the queries are disabled
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
244a8e6798 radeonsi/gfx10: cosmetic changes
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
09a905d930 radeonsi/gfx10: set cache control registers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
b680f723f8 radeonsi/gfx10: export correct PrimitiveID from NGG vertex shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
3203a74dcb radeonsi/gfx10: set PA_SC_TILE_STEERING_OVERRIDE
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
07aacdbfd5 radeonsi/gfx10: add a workaround for stencil HTILE with mipmapping
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
51db950419 radeonsi/gfx10: disable DCC with MSAA
It was only enabled for 2x MSAA anyway.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
6920f09f4b radeonsi/gfx10: fix GL_LINE polygon mode for decomposed primitives
We need to tell PA to accept edge flags generated by the input assembler,
because decomposed primitives shouldn't draw inner edges.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
e39d4594da radeonsi/gfx10: fix NGG GS color clamping
Just need to pass the input from ES to GS. Everything else is done.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
40e7c65590 radeonsi/gfx10: fix vertex color clamping for TES
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
cc7875150a radeonsi/gfx10: unbind NGG shaders when destroyed
This fixes glsl-max-varyings, which creates shaders, draws, and then
destroys them.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
b90ddff477 radeonsi/gfx10: don't use the GS workaround for triangle strips w/ adjancency
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
c3ac22a620 radeonsi/gfx10: don't do the query buffer atomic for blit shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
adbec817d3 radeonsi/gfx10: update spi_map if API VS (as NGG) changes and PS doesn't
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
1e39c21c23 radeonsi/gfx10: fix a possible hang with exp pos0 with done=0 and exec=0
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
683cf11b81 radeonsi/gfx10: prefetch HW GS when NGG is used
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
76898a8062 amd/common/gfx10: set DLC for llvm.amdgcn.s.buffer.load
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
7f71579064 radeonsi/gfx10: fix PS exports for SPI_SHADER_32_AR
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
4bdf44724f radeonsi/gfx10: set DLC for loads when GLC is set
This fixes L1 shader array cache coherency.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
f81aa6b0c8 radeonsi/gfx10: fix shader images
Don't promote 2D image instructions to 3D, and don't set z=BASE_ARRAY.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
7c805a7c67 radeonsi/gfx10: set the DCC constant encoding flag
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
6eb219e963 radeonsi/gfx10: fix intensity formats
move the ALPHA_IS_ON_MSB fixup into vi_alpha_is_on_msb

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
6944f99176 radeonsi/gfx10: allocate GDS BOs for streamout
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák
395185912d radeonsi/gfx10: make sure GDS is idle between IBs
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
5ff3aff0d6 radeonsi/gfx10: implement streamout
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
792a638b03 radeonsi/gfx10: implement streamout-related queries
The NGG hardware pipeline doesn't track these statistics automatically,
and in fact *cannot* track them automatically when API geometry shaders
are involved, so we accumulate statistics in the shader using atomic
adds.

This implementation accumulates statistics via the memory system and
the RW buffer descriptor setup. We could use GDS, but since these
atomics aren't latency-sensitive, that basically just trades off
L2$ bandwidth vs. export bus bandwidth. One single memory transaction
per shader workgroup doesn't seem too bad. The result ring buffer in
memory is needed either way to avoid pipeline stalls.

The shader code contains the atomic unconditionally, though the
GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The
atomic is simply discarded by the shader hardware in that case.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
bcd2d2e194 radeonsi/gfx10: enable the workaround for unaligned vertex fetch
Yes, really. Note that non-format buffer loads are unaffected and work
just fine with unaligned pointers (as long as SH_MEM_CONFIG is setup
correctly, which amdgpu ensures).

Fixes e.g. KHR-GL45.vertex_attrib_64bit.vao

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
22b85bfc02 radeonsi/gfx10: re-order the initialization order in si_compile_tgsi_main
It's useful to be able to access gs_ngg_scratch before creating the
main wrapping branch.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
3aa622aab1 radeonsi/gfx10: apply DCC MSAA blend workaround
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
bc25ccfe22 radeonsi/gfx10: implement si_emit_global_shader_pointers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
6bcc273de8 radeonsi/gfx10: implement si_init_tess_factor_ring
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
2492cfde66 radeonsi/gfx10: initialize EXEC for TES-as-NGG (without geometry shader)
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
591537c7fa radeonsi/gfx10: use correct VGPR for instance ID in LS shader
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
f3b9a37278 radeonsi/gfx10: implement si_shader_hs
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
e4d6b4daae radeonsi/gfx10: implement si_create_sampler_state
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
0bf3e6fae7 radeonsi/gfx10: double the number of tessellation offchip buffers per SE
Each gfx10 shader engine corresponds to two gfx9 shader engines, so scale
the number of offchip buffers accordingly.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
2afd3c421d radeonsi/gfx10: implement get_tess_ring_descriptor
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
d028440f57 radeonsi/gfx10: mask DCC tile swizzle by alignment
DCC alignment can be less than the alignment of the main surface. In that
case, the DCC tile swizzle needs to be masked accordingly. Should have no
impact on pre-gfx10.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
1666ee183e radeonsi/gfx10: implement hardware MSAA resolve
MSAA is only supported for 64KB_{R,Z}_X modes, so the micro tile
optimization that we use on gfx9 and earlier does not work.

Be very explicit about how the swizzle mode of the temporary surface is
selected.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
69c41fb8ff radeonsi/gfx10: fix binding on si_update_scratch_relocs
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
fd8758366b radeonsi/gfx10: set llvm_has_working_vgpr_indexing
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
48810ad02d radeonsi/gfx10: implement load_const_buffer_desc_fast_path
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
1b11fb148c radeonsi/gfx10: take PRIMID from the correct output when exported by GS
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
8060339278 radeonsi/gfx10: change location of instance ID shader input
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
ccdf792910 radeonsi/gfx10: set USER_DATA_ADDR offset for geometry shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
00707922d4 radeonsi/gfx10: implement si_emit_derived_tess_state
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
e0c2a4d58c radeonsi/gfx10: implement si_shader_gs
This is only used in the legacy, non-NGG path.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
2864d53deb radeonsi/gfx10: implement preload_ring_buffers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
56cab3e996 radeonsi/gfx10: implement si_set_ring_buffer
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
3c1aeb834f radeonsi/gfx10: allow rectangle outputs from NGG primitive shader
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
77e715541c radeonsi/gfx10: emit VGT_GS_OUT_PRIM_TYPE from draw and add it to VS_STATE
With NGG, the VGT_GS_OUT_PRIM_TYPE can change without a shader change.

The VS_STATE is required for both streamout and culling from a vertex
shader without pre-compiling outprim-specific variants.

We could consider compiling specialized variants in the future. We
could also consider compiling the NGG logic as an epilog.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
4ecc39e1aa radeonsi/gfx10: NGG geometry shader PM4 and upload
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
a04aa4be2b radeonsi/gfx10: generate geometry shaders for NGG
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
efe1cd4859 radeonsi/gfx10: use the correct register for image descriptor dumping
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
1ce52c1e37 radeonsi/gfx10: emit GE_CNTL instead of IA_MULTI_VGT_PARAM for legacy mode
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
77c0f9e7ba radeonsi/gfx10: initialize GE_{MAX,MIN}_VTX_INDX/INDX_OFFSET
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
47c9505a92 radeonsi/gfx10: setup registers for OpenGL compute
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
b8d3fd46d6 radeonsi/gfx10: set user data base registers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
016a465d7d radeonsi/gfx10: implement gfx10_shader_ngg
For pipelines without API GS. We will later expand this to cover NGG
geometry shaders as well.

Note that the vtx offset passed into the GS part is just the
vertex index multiplied by VGT_ESGS_RING_ITEMSIZE.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
d0c204a1e0 radeonsi/gfx10: add NGG registers to si_init_config
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
ae00cae0b7 radeonsi/gfx10: update shader-related fields in si_init_config
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
1dee01ee13 radeonsi/gfx10: implement si_shader_ps
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
612489bd5d radeonsi/gfx10: generate VS and TES as NGG merged ESGS shaders
This does not support geometry shading yet. Also missing are streamout
and NGG-specific optimizations.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
e86256c512 radeonsi/gfx10: distinguish between merged shaders and multi-part shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
4063ea95e9 radeonsi/gfx10: update si_get_shader_name
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
8ec60d3031 radeonsi/gfx10: add as_ngg shader key bit
Also add the shader main part NGG variant, so that in principle
we can switch between legacy in NGG modes.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
40b12c0f5a radeonsi/gfx10: implement si_update_shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
5726ec0d24 radeonsi/gfx10: implement si_build_vgt_shader_config
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
b45c3debe8 radeonsi/gfx10: keep track of whether NGG is used
We always use NGG by default, except when tessellation is enabled with
extreme geometry shader amplification.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
226f650d92 radeonsi/gfx10: document NGG shader stages
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
7bb9bb0540 radeonsi/gfx10: implement gfx10_emit_cache_flush
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
0c6c6810bd radeonsi/gfx10: add si_context::emit_cache_flush
The introduction of GCR_CNTL makes cache flush handling on gfx10
sufficiently different that it makes sense to just use a separate
function.

Since emit_cache_flush is called quite early during context init,
we initialize the pointer explicitly in si_create_context.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
08e2a62b07 radeonsi/gfx10: implement DB registers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
372652bccc radeonsi/gfx10: set CB registers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
44adae42ae radeonsi/gfx10: always set up sample locations
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
79b1eaf2fd radeonsi/gfx10: use Z32_FLOAT_CLAMP for upgraded depth textures
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
c049a6f895 radeonsi/gfx10: implement vertex format changes
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
62f73d8214 radeonsi/gfx10: implement si_set_{constant,shader}_buffer
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
21ac1da0d1 radeonsi/gfx10: implement si_make_buffer_descriptor
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
7bc818aef1 radeonsi/gfx10: implement si_set_mutable_tex_desc_fields
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
8598a999ea radeonsi/gfx10: gfx10 can render up to 8192 layers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
3f2b2b52d0 radeonsi/gfx10: add gfx10_make_texture_descriptor
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
595a7f7c47 radeonsi/gfx10: add pipe_screen::make_texture_descriptor
Texture descriptors in gfx10 are very different.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
4afce5efdd radeonsi/gfx10: determine view->is_integer based on the pipe_format
It was convenient, but NUM_FORMAT no longer exists in gfx10.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
3163db3ba4 radeonsi/gfx10: implement si_is_format_supported
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
0ffa2292b3 radeonsi/gfx10: generate gfx10_format_table.h
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
af29ad7cc6 radeonsi/gfx10: set MAX_ALLOC_COUNT
The number for Vega was copied from PAL and has no effect because of MIN2.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
594010e366 radeonsi/gfx10: require LLVM 9
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
de99e0a563 radeon/vcn: update for new vcn enc interface
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
9ab1e427bb radeonsi: enable jpeg decode for navi10
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
6480c7b577 radeon/vcn: implement vcn 2.0 jpeg decode
Use direct register to implement vcn 2.0 jpeg deocde

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
0cd7953ece radeon/vcn: add direct register bool
VCN 2.0 uses direct register space where VCN 1.0 uses some indirect registers

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
7a5c22d32a radeon/vcn: add defines for vcn 2.0 jpeg
Add neccesary register defines for vcn 2.0 jpeg deocde

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
0c27971157 radeon/vcn: use variable to assign ib cmd
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
587b9c5dae radeon/vcn: implement vcn 2.0 encode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
40e1bed389 radeon/vcn: add vcn2.0 encode skeleton
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
(v2: build fix -- Nicolai)
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
8f6272d494 radeon/vcn: move vcn1.0 specific defines to c
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
b5287a9fa6 radeon/vcn: assign function pointer with ib functions
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
9940a6e066 radeon/vcn: add function pointer for ib functions
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
c6b5188505 radeon/vcn: move header related algorithm to vcn_enc
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
dd46740bc2 radeon/vcn: move add buf func to common file
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Boyuan Zhang
e6ca4d1bd8 radeon/vcn: move cs defines to enc header file
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Leo Liu
874881b26b radeon/vcn: add VP9 support for Navi10
It requires bigger DPB and context buffers

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Leo Liu
9bbb546c4f radeonsi: enable encode support for newer HW
Previously it was Raven only allowed to do so

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Leo Liu
d6acd29c9a radeon/vcn: add VCN2 set of internal registers for IB
From VCN2.0, the RBC have different views on the registers

Signed-off-by: Leo Liu <leo.liu@amd.com>
(v2: rebase -- Nicolai)
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Leo Liu
a38268ea5b radeonsi/uvd: allow newer HW to create HW decoder
Previously it was Raven only allowed to do so

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
84e7ee421f ac/surface/gfx10: allow "rotated" micro mode
Standard mode does not support DCC.

The R is retconned to "render target" on gfx10.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
a66be784c3 ac/surface/gfx10: DCC is only supported with SW_64KB_{Z,R}_X modes
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
97ddcfff7c amd/addrlib/gfx10: forbid DCC for swizzle modes which the hardware does not support
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
9eb4a79345 amd/addrlib/gfx10: fix assertion in Addr2IsValidDisplaySwizzleMode
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
6d416ac7e1 amd/common/gfx10: print gfx10 registers in debug dumps
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
70fd27d1e3 amd/common/gfx10: CMASK is only used for FMASK
All regular color compression is done via DCC.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
b52bf8f12a amd/common/gfx10: support new tbuffer encoding
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
c067aaa580 amd/common/gfx10: pad shader buffers for instruction prefetch
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
227c29a80d amd/common/gfx10: implement scan & reduce operations
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
7ba80c1d19 amd/common/gfx10: add GS_ALLOC_REQ message define
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
4c364c89e2 amd/common/gfx10: print out GCR_CNTL as part of {ACQUIRE,RELEASE}_MEM
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
74a26af913 amd/common/gfx10: add register JSON
A small number of fields now need new disambiguation.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
536782b0b7 amd/common: add GFX10 chips
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Marek Olšák
677bb80c98 meson: require libdrm_amdgpu 2.4.99 for Navi
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
db7e7a6cb5 radv: gfx10 is not supported
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Marek Olšák
78cdf9a99f amd/addrlib: add gfx10 support
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
112bf7f900 radeonsi: make emit_streamout_output externally accessible
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
e241b405ca radeonsi: pass the context to query destroy functions
We'll need this in the future.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
064f195ef0 radeonsi: make si_restore_qbo_state externally available
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
04e27ec136 radeonsi: make get_primitive_id externally visible
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
5059a4df8a radeonsi: make si_llvm_export_vs externally available
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
4a774ba893 radeonsi: various si_translate_*format functions only apply to pre-gfx10
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Marek Olšák
c53e6ea05d radeonsi: use a fragment shader blit instead of DB->CB copy for ZS CPU mappings
This mainly removes and simplifies code that is no longer needed.

There were some issues with the DB->CB stencil copy on gfx10, so let's
just use a fragment shader blit for all ZS mappings. It's more reliable.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-07-03 15:51:12 -04:00
Marek Olšák
6686d8a130 gallium/u_blitter: implement copying from ZS to color and vice versa
This is for drivers that can't map depth and stencil and need to blit
them to a color texture for CPU access.

This also useful for drivers using separate depth and stencil.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-07-03 15:51:12 -04:00
Marek Olšák
13a5e9d685 gallium/util: rewrite depth-stencil blit shaders
- merge all 3 functions (Z, S, ZS)
- don't write the color output
- read the value from texel.x, then write it to position.z or stencil.y
  (don't use the value from texel.y or texel.z)

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-07-03 15:51:12 -04:00
Marek Olšák
131d40cfc9 st/mesa: accelerate glCopyPixels(STENCIL)
Tested-by: Dieter Nützel
2019-07-03 15:50:04 -04:00
Yevhenii Kolesnikov
65dc4db08e glsl/standalone: meson test for --dump-builder
Added meson test for standalone compiler with --dump-builder option
on builtin texture* functions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107767
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-07-03 12:13:37 -07:00
Sergii Romantsov
9f85b4940c glsl/standalone: exit on unsupported texture functions
glsl/standalone with --dump-builder will exit when unsupported texture
functions are encountered.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107767
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-07-03 12:13:37 -07:00
Pierre-Eric Pelloux-Prayer
ea5b7de138 radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled
gl_SampleMaskIn is 1 when R_028BE0_PA_SC_AA_CONFIG is 0, so this commit rework the conditions
controlling this register.

Before it was set if the sctx->framebuffer had a sample count > 1.

Now we still require this condition, but we also need either:
  - GL_MULTISAMPLE to be enabled
  - to be executing an operation that doesn't depends on GL state using u_blitter.

This fixes the arb_sample_shading/sample_mask piglit tests on radeonsi.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-03 14:59:21 -04:00
Brian Paul
7bb3d6acec gallium/u_blitter: enable MSAA when blitting to MSAA surfaces
If we're doing a Z -> Z MSAA blit (for example) we need to enable
msaa rasterization when drawing the quads so that we can properly
write the per-sample values.

This fixes a number of Piglit ext_framebuffer_multisample blit tests
such as ext_framebuffer_multisample/no-color 2 depth combined with
the VMware driver.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-07-03 14:59:15 -04:00
Alexandros Frantzis
e5be4351c2 virgl: Clear the valid buffer range when possible
If we are discarding the whole resource, we don't care about previous contents,
and the resource storage is now unused, either because we have created new
resource storage, or because we have waited for the existing resource storage
to become unused, or because the transfer is unsynchronized.

In the last two cases this commit marks the storage as uninitialized, but only
if the resource is not host writable (in which case we can't clear the valid
range, since that would result in missed readbacks in future transfers).

In the first case, when the whole resource discard involves a reallocation, the
reallocation and subsequent rebinding already update the valid buffer range
appropriately.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-07-03 09:59:55 -07:00
Jan Zielinski
243db4980c swr/swr: Enable ARB_viewport_array
The rasterizer core supported ARB_viewport_array,
but the swr layer connecting core to Gallium state
tracker only allowed one viewport.

We add support for multiple viewports to swr layer.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-07-03 14:43:28 +02:00
Bas Nieuwenhuizen
c6cb9b197d radv: Support VK_EXT_queue_family_foreign.
Basically same as external for now.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Only case we might need to handle differently in the near future
is Raven's case of displayable DCC which is not renderable. But
we don't support that yet.
2019-07-03 10:56:21 +00:00
Bas Nieuwenhuizen
8a053254b8 radv: Fix interactions between variable descriptor count and inline uniform blocks.
Fixes: d7e6541cc7 "radv: Only allocate supplied number of descriptors when variable."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-03 10:43:35 +00:00
Michel Dänzer
11a3679e3a winsys/amdgpu: Make KMS handles valid for original DRM file descriptor
Getting a DMA-buf fd and converting that to a handle using our duplicate
of that file descriptor (getting at which requires passing a
radeon_winsys pointer to the buffer_get_handle hook) makes sure of this,
since duplicated file descriptors reference the same file description
and therefore the same GEM handle namespace.

This is necessary because libdrm_amdgpu may use a different DRM file
descriptor with a separate handle namespace internally, e.g. because it
always reuses any existing amdgpu_device_handle for the same device.
amdgpu_bo_export returns a handle which is valid for that internal
file descriptor.

Bugzilla: https://bugs.freedesktop.org/110903
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-03 09:19:07 +00:00
Michel Dänzer
cb446dc0fa winsys/amdgpu: Add amdgpu_screen_winsys
It extends pipe_screen / radeon_winsys and references amdgpu_winsys.
Multiple amdgpu_screen_winsys instances may reference the same
amdgpu_winsys instance, which corresponds to an amdgpu_device_handle.

The purpose of amdgpu_screen_winsys is to keep a duplicate of the DRM
file descriptor passed to amdgpu_winsys_create, which will be needed
in the next change.

v2:
* Add comment in amdgpu_winsys_unref explaining why it always returns
  true (Marek Olšák)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-03 09:19:07 +00:00
Michel Dänzer
6fce296400 winsys/amdgpu: Use amdgpu_winsys helper instead of open-coded casts
Cleanup to prevent breakage with the next change, no functional change
intended in this one.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-03 09:19:07 +00:00
Juan A. Suarez Romero
e06bc0b166 intel: fix wrong format usage
Do not use the view format when filling the surface state.

Fixes dEQP-VK.image.texel_view_compatible.compute.extended.texture.*

Fixes: fb1350c76f ("intel: Add and use helpers for level0 extent")

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-03 10:14:54 +02:00
Samuel Pitoiset
a7b6a869a7 radv: only allocate a 32-bit value for the TC-compat range metadata
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 08:52:01 +02:00
Samuel Pitoiset
6baa453dd5 radv: remove unused code in radv_update_tc_compat_zrange_metadata()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 08:51:58 +02:00
Samuel Pitoiset
a21f23c811 radv: add radv_get_depth_pipeline() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 08:51:42 +02:00
Mike Blumenkrantz
e005470466 iris: assert isl_surf_init success in resource_from_handle
this can fail unexpectedly due to bugs, so it's good to provide feedback
when this occurs

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-07-02 15:39:44 -07:00
Jason Ekstrand
e708261cb7 anv: Advertise a more accurate minTexelBufferOffsetAlignment
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-02 22:28:44 +00:00
Jason Ekstrand
0bc657f2db anv: Implement VK_EXT_texel_buffer_alignment
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-02 22:28:44 +00:00
Jason Ekstrand
465ec0b145 vulkan: Update the XML and headers to 1.1.113
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-02 22:28:44 +00:00
Caio Marcelo de Oliveira Filho
050eb6389a spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup
From OpPtrAccessChain description in the SPIR-V spec (1.4 rev 1):

    For objects in the Uniform, StorageBuffer, or PushConstant storage
    classes, the element’s address or location is calculated using a
    stride, which will be the Base-type’s Array Stride when the Base
    type is decorated with ArrayStride. For all other objects, the
    implementation will calculate the element’s address or location.

For non-CL shaders the driver should layout the Workgroup storage
class, so override any explicitly set ArrayStride in the shader.  This
currently fixes only the lower_workgroup_access_to_offsets case, which
is used by anv.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
2019-07-02 12:15:01 -07:00
Karol Herbst
95a7fd0f10 nouveau: handle new CAPS
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-07-02 20:09:44 +02:00
Jason Ekstrand
fa869f45c8 intel/fs: Use nir_lower_interpolation on gen11+
On gen11, the removed the PLN instruction so we have to emit a pile of
MAD to emulate it.  We may as well do that in NIR so we can optimize and
later schedule it.

Shader-db results on Ice Lake:

    total instructions in shared programs: 17145644 -> 16556440 (-3.44%)
    instructions in affected programs: 11507454 -> 10918250 (-5.12%)
    helped: 35763
    HURT: 42085
    helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18
    helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49%
    HURT stats (abs)   min: 1 max: 248 x̄: 2.22 x̃: 2
    HURT stats (rel)   min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47%
    95% mean confidence interval for instructions value: -7.67 -7.47
    95% mean confidence interval for instructions %-change: -4.46% -4.29%
    Instructions are helped.

    total loops in shared programs: 4370 -> 4370 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 360624645 -> 368220857 (2.11%)
    cycles in affected programs: 269631244 -> 277227456 (2.82%)
    helped: 15583
    HURT: 65874
    helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32
    helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44%
    HURT stats (abs)   min: 1 max: 238638 x̄: 133.87 x̃: 20
    HURT stats (rel)   min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97%
    95% mean confidence interval for cycles value: 67.42 119.09
    95% mean confidence interval for cycles %-change: 3.61% 3.73%
    Cycles are HURT.

    total spills in shared programs: 8943 -> 8981 (0.42%)
    spills in affected programs: 1925 -> 1963 (1.97%)
    helped: 44
    HURT: 14

    total fills in shared programs: 21815 -> 21925 (0.50%)
    fills in affected programs: 3511 -> 3621 (3.13%)
    helped: 41
    HURT: 18

    LOST:   70
    GAINED: 14

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand
2b79a9e5a5 intel/fs: Implement nir_intrinsic_load_fs_input_interp_deltas
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand
8e7d066682 intel/fs: Actually implement the load_barycentric intrinsics
If they never get used, dead code should clean them up.  Also, we rework
the at_offset and at_sample intrinsics so they return a proper vec2
instead of returning things in PLN layout.  Fortunately, copy-prop is
pretty good at cleaning this up and it doesn't result in any actual
extra MOVs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Rob Clark
5787a2dfe3 nir: add pass to lower load_interpolated_input
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Boris Brezillon
5375d009be panfrost: Pass referenced BOs to the SUBMIT ioctls
Instead of manually adding the BOs from the various SLAB pools plus
the one backing the color FB, we insert them in the BO set attached
to the job and let panfrost_drm_submit_job() pass all BOs from this set
to the SUBMIT ioctl.
This means we are now passing all referenced BOs and let the scheduler
wait on referenced BO fences if needed.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 15:00:21 +02:00
Boris Brezillon
3557746e0d panfrost: Make SLAB pool creation rely on BO helpers
There's no point duplicating the code, and it will help us simplify
the bo_handles[] filling logic in panfrost_drm_submit_job().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:59:28 +02:00
Boris Brezillon
c684a79669 panfrost: Add the panfrost_drm_{create,release}_bo() helpers
To avoid the panfrost_memory <-> panfrost_bo dance done in
panfrost_resource_create_bo() and panfrost_bo_unreference().

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
948fddfc42 panfrost: Move the mmap BO logic out of panfrost_drm_import_bo()
So we can re-use it for the panfrost_drm_create_bo() function we are
about to introduce.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
8d4afcdacc panfrost: Avoid passing winsys handles to import/export BO funcs
Let's keep a clear split between ioctl wrappers and the rest of the
driver. All the import BO function need is a dmabuf FD and the screen
object, and the export one should only take care of generating a dmabuf
FD out of a BO object. Winsys handle manipulation should stay in the
resource.c file.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
aa5bc35f31 panfrost: Move BO meta-data out of panfrost_bo
That's what most (all?) implementation seem to do, and my understanding
is that a BO is just a bunch of memory that can be used for anything GPU
related, not only texture/FB resources.

Let's move those meta data in panfrost_resource so we can use
panfrost_bo for all kind of memory allocation and make BO allocation
more consistent.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
c4f4193ad4 panfrost: Stop exposing internal panfrost_drm_*() functions
panfrost_drm_submit_job() and panfrost_fence_create() are not used
outside of pan_drm.c.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:58:51 +02:00
Boris Brezillon
6ba61324f0 panfrost: Get rid of the "free imported BO" logic
bo->imported was never set to true which means this path was never taken.
Moreover, panfrost_drm_free_imported_bo() is doing missing the munmap()
call which seems wrong because the import BO function calls mmap().

Let's just kill this function along with the ->imported field.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:57:35 +02:00
Boris Brezillon
079aaa9c6d panfrost: Get rid of the panfrost_driver abstraction leftovers
Commit 5f81669d88 ("panfrost: Remove the panfrost_driver abstraction")
left a few things behind, remove them now.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:57:35 +02:00
Boris Brezillon
6608642d21 panfrost: Move scanout res creation out of panfrost_resource_create()
Which improves readability and help us avoid a memory leak.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-07-02 14:57:35 +02:00
Boris Brezillon
873b7b93e8 panfrost: Add the sampled texture BO to the job
Otherwise we get random use-after-{free,unmap} errors.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
Changes in v2:
- Move the panfrost_job_add_bo() call out of the loop
2019-07-02 14:57:35 +02:00
Samuel Pitoiset
6cc213b3c1 radv: enable DCC for layers on GFX8
It's currently only enabled if dcc_slice_size is equal to
dcc_slice_fast_clear_size because the driver assumes that
portions of multiple layers are contiguous but it's not
always true.

Still not supported on GFX9.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:38:02 +02:00
Samuel Pitoiset
233224c7f7 radv: do not enable DCC for mipmapped arrays because performance is worse
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:38:00 +02:00
Samuel Pitoiset
e41e575e24 radv: implement clearing DCC layers on GFX8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:56 +02:00
Samuel Pitoiset
e47c68b7b0 radv: merge radv_dcc_clear_level() into radv_clear_dcc()
This will help for clearing DCC arrays because we need to know
the subresource range.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:51 +02:00
Samuel Pitoiset
f772fe6a11 radv: add support for decompressing DCC layers with compute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:49 +02:00
Samuel Pitoiset
83297baf2d ac: compute the DCC fast clear size per slice on GFX8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:44 +02:00
Samuel Pitoiset
6517d226ac ac: compute the size of one DCC slice on GFX8
Addrlib doesn't provide this info. Because DCC is linear, at least
on GFX8, it's easy to compute the size of one slice.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-02 09:37:41 +02:00
Kenneth Graunke
457a55716e iris: Defer closing and freeing VMA until buffers are idle.
There will unfortunately be circumstances where we cannot re-use a
virtual memory address until it's no longer active on the GPU.  To
facilitate this, we instead move BOs to a "dead" list, and defer
closing them and returning their VMA until they are idle.  We
periodically sweep these away in cleanup_bo_cache, which triggers
every time a new object's refcount hits zero.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-02 07:23:55 +00:00
Kenneth Graunke
07f3455664 iris: Add an explicit alignment parameter to iris_bo_alloc_tiled().
In the future, some images will need to be aligned to a larger value
than 4096.  Most buffers, however, don't have any such requirement,
so for now we only add the parameter to iris_bo_alloc_tiled() and
leave the others with the simpler interface.

v2: Fix missing alignment in vma_alloc, caught by Caio!

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
2019-07-02 07:23:55 +00:00
Iago Toral Quiroga
042aeffd5b v3d: do not flush jobs that are synced with 'Wait for transform feedback'
Generally, we achieve this by skipping the flush on calls to
v3d_flush_jobs_writing_resource() when we detect that the resource is written
in the current job from a transform feedback write.

The exception to this is the case where the caller is about to map the
resource, in which case we need to flush immediately since we can only emit
'Wait for transform feedback' commands on rendering jobs. We add a parameter
to the function so the caller can identify that scenario.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-02 08:57:20 +02:00
Iago Toral Quiroga
88cbc4f7f6 v3d: emit 'Wait for transform feedback' commands when needed
The hardware can flush transform feedback writes before reads in the same
job by inserting this command.

This patch detects when the rendering state for the current draw call reads
resources that had been previously written by transform feedback in the
same job and inserts the 'Wait for transform feedback' command before
emitting the new draw.

v2 (Eric):
  - this was intended to look at job->tf_write_prscs for TF jobs.
  - clear job->tf_write_prscs after we emit the TF flush.
  - can skip flushes for fragment shader reads from TF.

v3 (Eric):
  - all resources in job->tf_write_prscs are resources written by TF so
   we don't need to check if they are bound to PIPE_BIND_STREAM_OUTPUT.
  - documented optimization opportunity for geometry stages.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-02 08:57:20 +02:00
Iago Toral Quiroga
c7dff0e614 v3d: keep track of resources written by transform feedback
The hardware provides a feature to sync reads from previous transform feedback
writes in the same job so if we use this mechanism we no longer have to flush
the job.

In order to identify this scenario we need a mechanism to identify resources
that are written by transform feedback.

v2: use _mesa_pointer_set_create (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-02 08:57:20 +02:00
Mike Blumenkrantz
c8dcc308cc st/dri: fix typo in format table for GR1616 format
the dri image format here should match the fourcc format

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-01 15:17:10 -07:00
Mike Blumenkrantz
08fc14a979 st/dri: pass dri2_format_mapping directly to dri2_create_image_from_winsys
this makes the entire struct available for use here

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-01 15:16:56 -07:00
Mike Blumenkrantz
2cc85670a7 mesa/st: simplify format usage in st_bind_egl_image
the formats handled in the switch statement will always return an
unknown mesa format, so process them directly and leave the default
case for other/unknown formats

no functional changes

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-01 15:16:43 -07:00
Kenneth Graunke
9b1b971491 iris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls.
If our resource_copy_region size is a small number of DWords, then
instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
a CS stall).  We also try and select the optimal batch.

Improves performance in Shadow of Mordor on Low settings at 1920x1080
on Skylake GT4e by 0.689096% +/- 0.473968% (n=4).  It tries to copy
4 bytes of data to a buffer which was most recently used as a writable
compute shader SSBO.  Previously we were switching from compute to the
render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.

I arbitrarily decided to support 4/8/12/16 bytes.  Jason thinks this
is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.
2019-07-01 13:59:49 -07:00
Bas Nieuwenhuizen
d7e6541cc7 radv: Only allocate supplied number of descriptors when variable.
Fixes: b5e04e9217 "radv: Support allocating variable size descriptor sets."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-01 20:53:33 +02:00
Eric Engestrom
177c35bf13 egl: simplify loop
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
2019-07-01 19:35:22 +01:00
Eric Anholt
67ffb853f0 sparc: Reuse m_vector_asm.h.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-01 11:14:29 -07:00
Eric Anholt
20294dceeb mesa: Enable asm unconditionally, now that gen_matypes is gone.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 11:14:10 -07:00
Eric Anholt
52a39a332f mesa: Replace gen_matypes with a simple header for V4F/mat layout.
We can greatly simplify our builds by just hardcoding GLvector4f and
GLmatrix's layouts.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-01 11:12:15 -07:00
Eric Anholt
1738b38ce8 matypes: Drop some unused defines.
Most of these haven't been used since the conversion from checked-in
matypes to generation.  By cutting down the generated contents, this
should clarify why the file is generated: we need
architecture-specific offsets to the V4F fields in the asm that uses
it.

v2: Keep matrix offsets to prevent x86 build breakage..

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-07-01 11:09:26 -07:00
Eric Engestrom
1835f30097 meson: drop duplicate source & inc_dir
These two are already pulled from `idep_vulkan_util_headers`.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-07-01 18:53:57 +01:00
Eric Engestrom
04e0ac59b1 swrast: simplify function pointer calls
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-07-01 18:51:49 +01:00
Eric Engestrom
fbf7c38da3 egl/wayland: use bitset.h for formats bit set
Currently only 7 formats are supported, but we don't want the 16 limit
(it's an `unsigned`) to hit us by surprise :]

Let's use bitset.h's BITSET magic to allow us to have any number of
formats, with a static assert to make sure we don't forget to update it.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-01 18:35:54 +01:00
Sagar Ghuge
d5f63990b4 intel/tools: Add assembler unit tests for ROL/ROR instructions
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
e9c35dd7cc intel/tools: Add ROL/ROR support in assembler
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
456557a837 nir: Add lower_rotate flag and set to true in all drivers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
1e92e83856 intel/compiler: Emit ROR and ROL instruction
v2: Reorder patch (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
80117117bd nir: Add optimization to use ROR/ROL instructions
v2: 1) Add more optimization rules for ROL/ROR (Matt Turner)
    2) Add lowering rules for ROL/ROR (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
81d342e2a1 nir: Add urol and uror opcodes
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
83fdec0f0d intel/compiler: Enable the emission of ROR/ROL instructions
v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support
       align16 mode (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Alyssa Rosenzweig
8d74749f81 panfrost: Implement instanced rendering
We implement GLES3.0 instanced rendering with full support for instanced
arrays (via instance divisors). To do so, we use the new invocation
helpers to invoke a triplet of (1, vertex_count, instance_count), rather
than simply (1, vertex_count, 1). We rewrite the attribute handling code
into a new pan_instancing.c file which handles both the simple LINEAR
case for non-instanced as well as each of the new instancing cases:
MODULO (for per-vertex attributes), POT and NPOT divisors.

As a side effect, we rework how vertex buffers are handled, duplicating
them to be 1:1 with vertex descriptors to simplify instancing code paths
dramatically. This might be a performance regression, but this remains
to be seen; if so, we can always deduplicate later with some added logic
in pan_instancing.c

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:50:57 -07:00
Alyssa Rosenzweig
e9e22546ff panfrost/decode: Compute padded_num_vertices for MODULO
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:49:18 -07:00
Alyssa Rosenzweig
9b97ed1250 panfrost/midgard: Emit type appropriate ld_vary
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:56 -07:00
Alyssa Rosenzweig
aa333ac6ad panfrost/midgard: Add unsigned ld/st ops
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig
bbc050b82e panfrost/midgard: Use the appropriate ld_attr type
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig
c9b164f9b5 panfrost: Implement dispatch helpers
Rather than open-coding workgroups_shift_* type fields, we include a
general routine for packing the vertex/tiler/compute descriptor based on
the provided dispatch parameters.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig
8fd748de3d panfrost: Remove ancient comment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig
9fe4fd8a9c panfrost: Extend software tiling to larger bpp
Should not affect lima.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-01 07:40:19 -07:00
Alyssa Rosenzweig
f2801f7775 panfrost: Rewrite u-interleaving code
Rather than using a magic lookup table with no explanations, let's add
liberal comments to the code to explain what this tiling scheme is and
how to encode/decode it efficiently.

It's not so mysterious after all -- just reordering bits with some XORs
thrown in.

v2: Correct copyright identifier. Fix spelling error. Switch space_4 to
a LUT. Fix comment typo. Use LUT instead of space_x tricks. Fallback on
generic rather than split up unaligned writes.

v3: Correct stride order (fixes crash loading). Correct coordinate
system mishap.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
2019-07-01 07:39:51 -07:00
Rob Clark
02893fe73a freedreno: update generated registers
Corrects the a3xx texconst state for TILE_MODE.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-07-01 06:15:52 -07:00
Samuel Pitoiset
d8b079e4c7 radv: rework how the number of VGPRs is computed
Just a cleanup, it shouldn't change anything.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:27 +02:00
Samuel Pitoiset
e3baa54195 radv: gather if a vertex shaders needs the instance ID
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:24 +02:00
Samuel Pitoiset
17cb7ea6fc radv: fix decompressing DCC levels with compute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:22 +02:00
Samuel Pitoiset
f4d2c47cf6 radv: the number of VGPR_COMP_CNT for GS is expected to be 0 on GFX8
Just move around the switch case. GFX9+ is handled below.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:19 +02:00
Samuel Pitoiset
b4477fa4d4 radv: reduce number of VGPRs for TESS_EVAL if primitive ID is not used
We only need to 2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:17 +02:00
Samuel Pitoiset
cc50c85e13 radv: make sure to mark the image as compressed when clearing DCC levels
Found while working on DCC for arrays.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:58:56 +02:00
Michel Dänzer
3fd21a6b77 targets/opencl: Add clangASTMatchers library as dependency
Fixes link failure since clang r364424 "[clang/DIVar] Emit the flag for
params that have unmodified value", clangCodeGen depends on
clangASTMatchers now.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-07-01 12:54:40 +02:00
Caio Marcelo de Oliveira Filho
5ad283550b glsl/nir: Lower buffers using Binding instead of Names
When using ARB_gl_spirv, the block names are optional and the uniform
blocks are referred using Bindings instead.  Teach
gl_nir_lower_buffers to handle those.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:27 -05:00
Alejandro Piñeiro
2af2235a32 glspirv: Enable the new deref-base UBO/SSBO path on gl_spirv
Among other things, it supports arrays of arrays of UBO/SSBO (default
codepath doesn't).

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>

v2: nir_address_format_vk_index_offset got renamed to
    nir_address_format_32bit_index_offset (after rebase against master)

v3: the ptr_type fields in spirv_to_nir_options got changed to be of
    type nir_address_format.

v4: remove phys_ssbo_addr_format and push_const_addr_format as they are
    not used by glspirv

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-30 16:58:27 -05:00
Alejandro Piñeiro
cae501b394 i965: call to gl_nir_link_uniform_blocks
When using a SPIR-V shader. Note that needs to be done before linking
uniforms, so when creating the uniform storage entries, block_index
could be filled properly (among other things).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:27 -05:00
Alejandro Piñeiro
678140e195 i965: use GLboolean for all brw_link_shader returns
The function had a mix of true/GL_TRUE and false/GL_FALSE
returns. Using GL_TRUE/GL_FALSE as the function returns a GLboolean.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:27 -05:00
Alejandro Piñeiro
a69a48d65a nir/linker: update already processed uniforms search for UBOs/SSBOs
Until now, we were using the uniform explicit location to check if the
current nir variable was already processed while adding entries on the
uniform storage. But for UBOs/SSBOs, entries are added too but we lack
a explicit location.

For those we need to rely on the UBO/SSBO binding and the unifor
storage block_index. In that case several uniforms would need to be
updated at once.

v2: (from Timothy review)
   * Improve wording and fix typos of some long comments.
   * Rename update_uniform_storage for mark_stage_as_active

v3: (from cmarcelo review)
   * Fixed some comment typos

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:27 -05:00
Alejandro Piñeiro
de05a6ccf5 nir/linker: fill up uniform_storage with explicit data
Specifically, offset, stride (coming from arrays or matrices) and
row_major.

On GLSL, most of that info is computed using the layout qualifier, but
on ARB_gl_spirv they are explicit, and for Mesa, included on the
glsl_type.

From ARB_gl_spirv spec:

   "Mapping of layouts

      std140/std430 -> explicit *Offset*, *ArrayStride*, and
                       *MatrixStride* Decoration on struct members""

    "7.6.2.spv SPIR-V Uniform Offsets and Strides

    The SPIR-V decorations *GLSLShared* or *GLSLPacked* must not be
    used. A variable in the *Uniform* Storage Class decorated as a
    *Block* must be explicitly laid out using the *Offset*,
    *ArrayStride*, and *MatrixStride* decorations"

For offset we needed to include the parent and index_in_parent while
processing the type, as the offset is maintained on glsl_struct_field
of the parent type, not on the type itself.

v2: Fix the default values for MATRIX_STRIDE, ARRAY_STRIDE and
    ROW_MAJOR when the variable is not backed by a buffer object
    (Antia Puentes).

v3: Update after Jason series "SPIR-V: Use NIR deref instructions for
    UBO/SSBO access" that included just one explicit stride, instead
    of a previous patch we wrote that had matrix_stride and
    array_stride (Alejandro)

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:27 -05:00
Alejandro Piñeiro
eb50d1d2a6 nir/linker: use only the array element type for array of ssbo/ubo
For this interfaces, the inner members are added only once as uniforms
or resources, in opposite to other cases, like a uniform array of
structs.

For those guessing why a issue (16) from ARB_program_interface_query
was used, instead of a quote of the core spec: The core spec is not
really clear about how members of arrays of blocks should be
enumerated.

On GLSL this was also problematic, specially when we were trying to
pass the 4.5 CTS tests. See commit "glsl: Fix program interface
queries relating to interface blocks"
(4c4d9e4f03), as a reference. That one
also needed to rely on issue (16) to justify the change, pointing that
the core spec needs to be clarified.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:26 -05:00
Alejandro Piñeiro
eec1d5f801 nir/linker: fill is_shader_storage for uniforms
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:26 -05:00
Alejandro Piñeiro
5723919282 nir/linker: add gl_nir_link_uniform_blocks.c
Adding the ability to link uniform blocks and shader storage blocks
using NIR, intended for ARB_gl_spirv support. Among other things, this
linking needs to take into account that everything should work without
names, as they could be not present, while the GLSL IR uniform block
linking was wrote with the names on its core.

The other major difference compared with the GLSL IR linker is that we
don't deal with layouts. There are no references to std140, std430,
etc. Layouts are expressed through explicit offset, array stride and
matrix stride. That simplifies how the buffer size are computed. But
also means that we couldn't use the existing methods at glsl_types, so
we needed to implement new methods.

It is worth to note that this linking do a iteration over the
glsl_types, similarly to what the linking uniforms do. A possible
future improvement would be refactor both cases to try to share more
code that it sharing right now. On GLSL IR there are a class visitor,
specialized on each case, for that sharing. As adding a class visitor
on C would more complicated, for now we are just iterating on both.

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>

v2: (from Timothy review)
   * Fix variable name convention
   * Stop to use _function_name convention
   * Don't use // for comments
   * "nir/linker: Keep track of the stages referencing an UBO/SSBO"
     squashed with this patch

v3: (from Caio review)
   * Don't delete the linked shader on failure
   * Use rzalloc_array to avoid some explicit initializations

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-30 16:58:26 -05:00
Alejandro Piñeiro
39f4ef57d6 nir_types: add glsl_type_is_leaf helper
Helper used to know when a glsl_type is a leaf when iteraring through
a complex type. Note that GLSL IR linking also uses the concept of
leaf while doing the same iteration, although in that case it uses a
visitor. See link_uniform_blocks, process_array_leaf and others as
reference.

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>

v2:
   * Moved from gl_nir_linker to nir_types, so it could be used on nir
     xfb gathering (Timothy Arceri)
   * Minor update after Timothy's series about record to struct
     renaming landed master.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:26 -05:00
Alejandro Piñeiro
0019d61527 glsl/nir: add glsl_types::explicit_size plus nir C wrapper
While using SPIR-V shaders (ARB_gl_spirv), layout data is not implicit
to a specific value (std140, std430, etc) but explicitly included on
the type (explicit values for offset, stride and row_major).

So this method is equivalent to the existing std140_size and
std430_size, but using such explicit values.

Note that the value returned by this method is only valid if such data
is set, so when dealing with SPIR-V shaders.

v2: (all changes suggested by Jason Ekstrand)
   * Iterate through all struct members, instead of assume that fields
     are ordered by offset
   * Use else if
   * Take into account the case that explicit_stride > elem_size, to
     fine graine the final size on arrays and matrices
   * Handle different bit-sizes in general, not just 32 and 64.

v3: (change suggested by Caio Marcelo de Oliveira Filho)
   * fix up explicit_size() to consider interface types

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-30 16:58:26 -05:00
Alejandro Piñeiro
c23522add2 glsl_types: add type::bit_size and glsl_base_type_bit_size helpers
Note that the nir_types glsl_get_bit_size is not a wrapper of this
one, because for bools at the nir level, we want to return size 1, but
at the glsl_types we want to return 32.

v2: reuse the new method in order to simplify is_16bit and is_32bit
    helpers (Timothy)

v3: add a comment clarifying the difference between
    glsl_base_type_bit_size and glsl_get_bit_size.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:26 -05:00
Alejandro Piñeiro
12355c7e91 nir: add is_in_ubo/ssbo/block helpers
Equivalent to the already existing ir_variable is_in_buffer_block and
is_in_shader_storage_block, adding the uniform buffer object one. I'm
using the short forms (ssbo, ubo) to avoid having method names too
long.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:26 -05:00
Alejandro Piñeiro
15f134412f spirv/nir: fill up nir variable info for ubos and ssbo
The data for some nir variables is only filled up for some specific
modes. We need now too for UBO/SSBO, as such info would be used when
linking for OpenGL (ARB_gl_spirv).

There is an existing comment just before that code (starts with XXX)
that points that binding still needs to be filled up for uniform
variables at that point, and that should be fixed, although it doesn't
specify why that's a problem or what would be the alternative. For now
doing the same for UBO/SSBO, and will hope that the future fixing is
done for all of them.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:26 -05:00
Alejandro Piñeiro
7d7ab34d5f spirv/nir: create nir variable for UBO/SSBO
Providing nir variables for UBO/SSBO it is not required for Vulkan,
but it is needed for OpenGL (ARB_gl_spirv), like for example, to
gather info from the UBO/SSBO while linking.

In opposite with most cases where the nir variables is created, here
the type assigned is the full type (not just the bare type). This is
needed because while linking using the nir shader we need the explicit
layout info (explicit stride, explicit offset, row_major, etc).

Also, we need to assign an interface type, used also on the OpenGL
linker if it is a UBO/SSBO. See ir_variable::is_in_buffer_block as
example.

v2: assign interface_type to be the variable type, not need to be
    arrayness (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-30 16:58:26 -05:00
Gert Wollny
75d8b4e795 vl: Use CS composite shader only if TEX_LZ and DIV are supported
Enable the compute shader copositer only when TEX_LZ is supported by the driver.

v2: Also check whether DIV is supported.

https://bugs.freedesktop.org/show_bug.cgi?id=110783

Fixes: 9364d66cb7
 gallium/auxiliary/vl: Add video compositor compute shader render

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-30 18:41:38 +02:00
Gert Wollny
843723e2f7 gallium: Add CAP for opcode DIV
Not all drivers support TGSI_OPCODE_DIV, so we should have a cap to be able
to check this.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-30 18:41:35 +02:00
Gert Wollny
187c308b96 vl: replace DIV-ADD with MAD using inverse size
Optimize the shader a bit by emitting MAD with the inverse size values
instead of DIV+ADD.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-30 18:41:26 +02:00
Jonathan Marek
89381191a9 etnaviv: blt: blit with the original format when possible
This fixes BGR565 blit: currently BGRA444 is used for the blit, but with
swizzles from the original BGR565 format, so the 4 alpha bits are set to 1.
We can't just use the swizzle from the 'compatible' format, since there are
cases where BGR<->RGB swap needs to happen.

We can avoid all this trouble by using the original formats and only
falling back to the 'compatible' format when we need to.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-29 21:49:50 -04:00
Jonathan Marek
a99a265b14 etnaviv: clear all bits for 24bpp depth without stencil
For fast clear to happen, all bits must be cleared.

This allows using fast clear for 24bpp depth without stencil.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-29 21:49:50 -04:00
Eric Engestrom
74f064ae90 mesa: use binary search for MESA_EXTENSION_OVERRIDE
Not a hot path obviously, but the table still has 425 extensions, which
you can go through in just 9 steps with a binary search.

The table is already sorted, as required by other parts of the code and
enforced by mesa's `main-test`.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-30 01:45:36 +01:00
Eric Engestrom
b738d4494c gitlab-ci: test meson installation
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-29 21:46:37 +00:00
Eric Engestrom
5f9764bc0b anv: fix indentation
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-29 22:41:06 +01:00
Eric Engestrom
42eb85a9d8 anv: fix typo
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-29 22:41:06 +01:00
Eric Engestrom
38305e6c94 anv: replace hard-coded platform list with vk.xml parse
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-29 22:38:54 +01:00
Chih-Wei Huang
bb75c73e96 android: fix typo LOCAL_EXPORT_C_INCLUDES
Should be LOCAL_EXPORT_C_INCLUDE_DIRS.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
2019-06-29 17:17:49 +02:00
Mauro Rossi
c237654dca android: virgl: fix generated virgl_driinfo.h building rules
Changelog in Android makefile:
- Add LOCAL_MODULE_CLASS, intermediates and LOCAL_GENERATED_SOURCES
- Use LOCAL_EXPORT_C_INCLUDE_DIRS to export $(intermediates) path
- Move generated header rules before 'include $(BUILD_STATIC_LIBRARY)'

Fixes the following building error:

In file included from external/mesa/src/gallium/targets/dri/target.c:1:
external/mesa/src/gallium/auxiliary/target-helpers/drm_helper.h:257:16:
fatal error: 'virgl/virgl_driinfo.h' file not found
      #include "virgl/virgl_driinfo.h"
               ^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: cf800998a ("virgl: Add driinfo file and tie it into the build")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Review-by: Chih-Wei Huang <cwhuang@linux.org.tw>
2019-06-29 16:25:01 +02:00
Lionel Landwerlin
5847de6e9a intel/compiler: don't use byte operands for src1 on ICL
The simulator complains about using byte operands, we also have
documentation telling us.

Note that add operations on bytes seems to work fine on HW (like ADD).
Using dwords operands with CMP & SEL fixes the following tests :

   dEQP-VK.spirv_assembly.type.vec*.i8.*

v2: Drop the GLK changes (Matt)
    Add validator tests (Matt)

v3: Drop GLK ref (Matt)
    Don't mix float/integer in MAD (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
BSpec: 3017
Cc: <mesa-stable@lists.freedesktop.org>
2019-06-29 12:56:09 +00:00
renchenglei
500b45a98a egl: Enable eglGetPlatformDisplay on Android Platform
This helps to add eglGetPlatformDisplay support on Android
Platform.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-29 12:20:17 +01:00
Ian Romanick
02c6cd8481 nir/serach: Increase maximum commutative expressions from 4 to 8
No shader-db change on any Intel platform.  No shader-db run-time
difference on a certain 36-core / 72-thread system at 95% confidence
(n=20).

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-28 18:56:19 -07:00
Ian Romanick
1a43cf9a40 nir/algebraic: Don't mark expression with duplicate sources as commutative
There is no reason to mark the fmul in the expression

    ('fmul', ('fadd', a, b), ('fadd', a, b))

as commutative.  If a source of an instruction doesn't match one of the
('fadd', a, b) patterns, it won't match the other either.

This change is enough to make this pattern work:

    ('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
                          ('fadd', 1.0, ('fneg', a))),
                 ('fmul', ('flrp', a, 1.0, a), b))

This pattern has 5 commutative expressions (versus a limit of 4), but
the first fmul does not need to be commutative.

No shader-db change on any Intel platform.  No shader-db run-time
difference on a certain 36-core / 72-thread system at 95% confidence
(n=20).

There are more subpatterns that could be marked as non-commutative, but
detecting these is more challenging.  For example, this fadd:

    ('fadd', ('fmul', a, b), ('fmul', a, c))

The first fadd:

    ('fmul', ('fadd', a, b), ('fadd', a, b))

And this fadd:

    ('flt', ('fadd', a, b), 0.0)

This last case may be easier to detect.  If all sources are variables
and they are the only instances of those variables, then the pattern can
be marked as non-commutative.  It's probably not worth the effort now,
but if we end up with some patterns that bump up on the limit again, it
may be worth revisiting.

v2: Update the comment about the explicit "len(self.sources)" check to
be more clear about why it is necessary.  Requested by Connor.  Many
Python fixes style / idom fixes suggested by Dylan.  Add missing (!!!)
opcode check in Expression::__eq__ method.  This bug is the reason the
expected number of commutative expressions in the bitfield_reverse
pattern changed from 61 to 45 in the first version of this patch.

v3: Use all() in Expression::__eq__ method.  Suggested by Connor.
Revert away from using __eq__ overloads.  The "equality" implementation
of Constant and Variable needed for commutativity pruning is weaker than
the one needed for propagating and validating bit sizes.  Using actual
equality caused the pruning to fail for my ('fmul', ('fadd', 1, a),
('fadd', 1, a)) case.  I changed the name to "equivalent" rather than
the previous "same_as" to further differentiate it from __eq__.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-28 18:56:19 -07:00
Ian Romanick
cae1af4339 nir/search: Log Boolean constants instead of asserting
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-28 18:56:19 -07:00
Ian Romanick
8d6b35fffd nir/algebraic: Fail build when too many commutative expressions are used
Search patterns that are expected to have too many (e.g., the giant
bitfield_reverse pattern) can be added to a white list.

This would have saved me a few hours debugging. :(

v2: Implement the expected-failure annotation as a property of the
search-replace pattern instead of as a property of the whole list of
patterns.  Suggested by Connor.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-28 18:56:19 -07:00
Ian Romanick
57704b8d22 nir/algebraic: Fix whitespace error
Trivial

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-28 18:56:19 -07:00
Alyssa Rosenzweig
f8fca4fe61 panfrost: Allow R11G11B10 rendering
Doesn't fully work yet, but better than crashing.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 18:48:13 -07:00
Alyssa Rosenzweig
7692ad19fb panfrost: Default to util_pack_color for clears
This might help as we bringup more render-target formats.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 18:48:13 -07:00
Ian Romanick
b04beaf41d intel/vec4: Try both sources as candidates for being immediates
For some reason, when I first wrote try_immediate_source, I thought the
sources had already been ordered so that the immediate value was the
second source.  That's rubbish.  The generator assumes *neither* source
is immediate, and it relies on later copy/constant propagation passes to
do the reordering.

For this reason, the changes to try_immediate_source have to go to some
efforts to reorder the operands and tell the caller when it reordered
them.  The generator for comparison instructions uses this to determine
when the comparison needs to change (e.g., from GT to LT).

No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

Haswell
total instructions in shared programs: 13484431 -> 13480500 (-0.03%)
instructions in affected programs: 441138 -> 437207 (-0.89%)
helped: 1883
HURT: 0
helped stats (abs) min: 1 max: 49 x̄: 2.09 x̃: 1
helped stats (rel) min: 0.07% max: 8.91% x̄: 1.10% x̃: 0.90%
95% mean confidence interval for instructions value: -2.19 -1.98
95% mean confidence interval for instructions %-change: -1.14% -1.06%
Instructions are helped.

total cycles in shared programs: 376420286 -> 376406400 (<.01%)
cycles in affected programs: 15995668 -> 15981782 (-0.09%)
helped: 1692
HURT: 219
helped stats (abs) min: 2 max: 764 x̄: 13.78 x̃: 4
helped stats (rel) min: <.01% max: 9.69% x̄: 0.69% x̃: 0.35%
HURT stats (abs)   min: 2 max: 516 x̄: 43.09 x̃: 22
HURT stats (rel)   min: 0.02% max: 12.09% x̄: 2.30% x̃: 1.13%
95% mean confidence interval for cycles value: -9.70 -4.83
95% mean confidence interval for cycles %-change: -0.42% -0.28%
Cycles are helped.

total spills in shared programs: 23166 -> 23158 (-0.03%)
spills in affected programs: 66 -> 58 (-12.12%)
helped: 2
HURT: 0

total fills in shared programs: 34592 -> 34580 (-0.03%)
fills in affected programs: 75 -> 63 (-16.00%)
helped: 2
HURT: 0

Ivy Bridge
total instructions in shared programs: 12051590 -> 12048513 (-0.03%)
instructions in affected programs: 355911 -> 352834 (-0.86%)
helped: 1481
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.08 x̃: 1
helped stats (rel) min: 0.07% max: 4.92% x̄: 1.08% x̃: 0.90%
95% mean confidence interval for instructions value: -2.17 -1.98
95% mean confidence interval for instructions %-change: -1.12% -1.04%
Instructions are helped.

total cycles in shared programs: 180319624 -> 180307642 (<.01%)
cycles in affected programs: 15591028 -> 15579046 (-0.08%)
helped: 1340
HURT: 174
helped stats (abs) min: 2 max: 764 x̄: 14.19 x̃: 2
helped stats (rel) min: <.01% max: 8.68% x̄: 0.64% x̃: 0.32%
HURT stats (abs)   min: 2 max: 518 x̄: 40.41 x̃: 14
HURT stats (rel)   min: 0.02% max: 8.37% x̄: 1.59% x̃: 0.67%
95% mean confidence interval for cycles value: -10.85 -4.97
95% mean confidence interval for cycles %-change: -0.45% -0.31%
Cycles are helped.

All Gen6 and earlier platforms had simlar results. (Sandy Bridge shown)
total instructions in shared programs: 10863159 -> 10861462 (-0.02%)
instructions in affected programs: 157839 -> 156142 (-1.08%)
helped: 715
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.37 x̃: 2
helped stats (rel) min: 0.23% max: 4.33% x̄: 1.07% x̃: 0.85%
95% mean confidence interval for instructions value: -2.53 -2.21
95% mean confidence interval for instructions %-change: -1.13% -1.02%
Instructions are helped.

total cycles in shared programs: 153957782 -> 153948778 (<.01%)
cycles in affected programs: 3171648 -> 3162644 (-0.28%)
helped: 696
HURT: 62
helped stats (abs) min: 2 max: 390 x̄: 15.72 x̃: 4
helped stats (rel) min: 0.02% max: 10.57% x̄: 0.57% x̃: 0.12%
HURT stats (abs)   min: 2 max: 300 x̄: 31.29 x̃: 2
HURT stats (rel)   min: 0.11% max: 7.23% x̄: 0.83% x̃: 0.34%
95% mean confidence interval for cycles value: -15.65 -8.11
95% mean confidence interval for cycles %-change: -0.56% -0.36%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-28 18:13:18 -07:00
Ian Romanick
379cf3bb87 intel/vec4: Try immediate sources for dot products too
No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

All Haswell and earlier platforms has similar results. (Haswell shown)
total instructions in shared programs: 13484467 -> 13484431 (<.01%)
instructions in affected programs: 8540 -> 8504 (-0.42%)
helped: 33
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1
helped stats (rel) min: 0.31% max: 1.53% x̄: 0.49% x̃: 0.35%
95% mean confidence interval for instructions value: -1.19 -0.99
95% mean confidence interval for instructions %-change: -0.60% -0.38%
Instructions are helped.

total cycles in shared programs: 376420572 -> 376420286 (<.01%)
cycles in affected programs: 56260 -> 55974 (-0.51%)
helped: 26
HURT: 5
helped stats (abs) min: 2 max: 204 x̄: 11.85 x̃: 2
helped stats (rel) min: 0.11% max: 3.08% x̄: 0.39% x̃: 0.13%
HURT stats (abs)   min: 2 max: 6 x̄: 4.40 x̃: 6
HURT stats (rel)   min: 0.03% max: 0.35% x̄: 0.24% x̃: 0.35%
95% mean confidence interval for cycles value: -22.91 4.45
95% mean confidence interval for cycles %-change: -0.56% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-28 17:16:16 -07:00
Ian Romanick
eeebeb211f intel/vec4: Try emitting non-scalar immediates
Sometimes an instruction has a vector as a source, but all of the
components have the same value.  For example,

    vec3 32 ssa_16 = load_const (1.0, 1.0, 1.0)
    ...
    vec3 32 ssa_82 = fadd ssa_16, -ssa_81.xyz

No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

Haswell
total instructions in shared programs: 13487811 -> 13484467 (-0.02%)
instructions in affected programs: 421981 -> 418637 (-0.79%)
helped: 1859
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.80 x̃: 1
helped stats (rel) min: 0.04% max: 9.80% x̄: 1.04% x̃: 0.84%
95% mean confidence interval for instructions value: -1.85 -1.74
95% mean confidence interval for instructions %-change: -1.07% -1.00%
Instructions are helped.

total cycles in shared programs: 376423252 -> 376420572 (<.01%)
cycles in affected programs: 14800970 -> 14798290 (-0.02%)
helped: 1519
HURT: 329
helped stats (abs) min: 2 max: 462 x̄: 10.59 x̃: 4
helped stats (rel) min: 0.03% max: 16.73% x̄: 0.79% x̃: 0.36%
HURT stats (abs)   min: 2 max: 598 x̄: 40.74 x̃: 16
HURT stats (rel)   min: <.01% max: 10.32% x̄: 2.56% x̃: 0.98%
95% mean confidence interval for cycles value: -3.53 0.63
95% mean confidence interval for cycles %-change: -0.30% -0.09%
Inconclusive result (value mean confidence interval includes 0).

total fills in shared programs: 34601 -> 34592 (-0.03%)
fills in affected programs: 91 -> 82 (-9.89%)
helped: 9
HURT: 0

Ivy Bridge
total instructions in shared programs: 12053565 -> 12051626 (-0.02%)
instructions in affected programs: 298103 -> 296164 (-0.65%)
helped: 1228
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.58 x̃: 1
helped stats (rel) min: 0.04% max: 3.57% x̄: 0.91% x̃: 0.81%
95% mean confidence interval for instructions value: -1.63 -1.53
95% mean confidence interval for instructions %-change: -0.95% -0.88%
Instructions are helped.

total cycles in shared programs: 180322270 -> 180319922 (<.01%)
cycles in affected programs: 14123840 -> 14121492 (-0.02%)
helped: 1036
HURT: 195
helped stats (abs) min: 2 max: 462 x̄: 11.93 x̃: 2
helped stats (rel) min: 0.03% max: 14.05% x̄: 0.82% x̃: 0.35%
HURT stats (abs)   min: 2 max: 598 x̄: 51.33 x̃: 16
HURT stats (rel)   min: <.01% max: 9.68% x̄: 3.02% x̃: 0.72%
95% mean confidence interval for cycles value: -4.92 1.10
95% mean confidence interval for cycles %-change: -0.35% -0.07%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10864286 -> 10863189 (-0.01%)
instructions in affected programs: 159722 -> 158625 (-0.69%)
helped: 724
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.52 x̃: 1
helped stats (rel) min: 0.10% max: 2.91% x̄: 0.79% x̃: 0.62%
95% mean confidence interval for instructions value: -1.58 -1.46
95% mean confidence interval for instructions %-change: -0.82% -0.75%
Instructions are helped.

total cycles in shared programs: 153967938 -> 153957926 (<.01%)
cycles in affected programs: 1923186 -> 1913174 (-0.52%)
helped: 654
HURT: 56
helped stats (abs) min: 2 max: 170 x̄: 20.00 x̃: 4
helped stats (rel) min: 0.03% max: 11.82% x̄: 0.89% x̃: 0.18%
HURT stats (abs)   min: 2 max: 390 x̄: 54.75 x̃: 32
HURT stats (rel)   min: 0.05% max: 6.92% x̄: 3.09% x̃: 2.92%
95% mean confidence interval for cycles value: -17.42 -10.78
95% mean confidence interval for cycles %-change: -0.76% -0.40%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8142677 -> 8141721 (-0.01%)
instructions in affected programs: 139511 -> 138555 (-0.69%)
helped: 588
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.63 x̃: 1
helped stats (rel) min: 0.21% max: 4.39% x̄: 0.84% x̃: 0.46%
95% mean confidence interval for instructions value: -1.70 -1.55
95% mean confidence interval for instructions %-change: -0.89% -0.78%
Instructions are helped.

total cycles in shared programs: 188549394 -> 188547676 (<.01%)
cycles in affected programs: 3171960 -> 3170242 (-0.05%)
helped: 527
HURT: 0
helped stats (abs) min: 2 max: 18 x̄: 3.26 x̃: 2
helped stats (rel) min: <.01% max: 0.80% x̄: 0.08% x̃: 0.06%
95% mean confidence interval for cycles value: -3.49 -3.03
95% mean confidence interval for cycles %-change: -0.09% -0.07%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-28 17:16:06 -07:00
Eric Anholt
8fd8964302 nir: Fix lowering of bitfield_insert to shifts.
The bfi/bfm behavior change replaced the bfi/bfm usage in
lower_bitfield_insert_to_shifts with actual shifts like the name says,
but it failed to handle the offset=0, bits==32 case in the new
lowering.

v2: Use 31 < bits instead of bits == 32, to get the 31 < (iand bits,
    31) -> false optimization.

Fixes regressions in dEQP-GLES31.*bitfield_insert* on freedreno.

Fixes: 165b7f3a44 ("nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-06-28 16:38:23 -07:00
Dylan Baker
97c2c4546c Revert "meson: Add support for using cmake for finding LLVM"
This reverts commit 5157a42765.

There is a meson bug that causes llvm to always be statically linked,
which is obviously not what we want. I haven't had time to look into it
yet, but for now let's just revert it.
2019-06-28 16:36:38 -07:00
Dylan Baker
69f9fbab8a Revert "meson: try to use cmake as a finder for clang"
This reverts commit 0ba0c0c15c.
2019-06-28 16:36:27 -07:00
Eric Engestrom
78aa4a3c0a mesa: stop trying new filenames if the filename existing is not the issue
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 23:37:49 +01:00
Eric Engestrom
d02d2b626b mesa: use os_file_create_unique()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 23:37:49 +01:00
Eric Engestrom
1b259f1ae7 util: add os_file_create_unique()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 23:37:49 +01:00
Alyssa Rosenzweig
9de4325b27 panfrost: Disable DXT-style texture compression
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 15:10:05 -07:00
Alyssa Rosenzweig
e8ae998c1b panfrost: Dump unknown formats before aborting
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 15:10:05 -07:00
Alyssa Rosenzweig
68a5b58fb9 panfrost/midgard: Fix 3D texture regression
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 15:10:05 -07:00
Alyssa Rosenzweig
601d4d3157 panfrost: Add some special formats
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 15:10:05 -07:00
Alyssa Rosenzweig
e32af4b5c3 panfrost/midgard: Implement integer sampler
Turns out one of the magic bits in the texture instruction meant
'float'. Different magic bits mean int and uint then :)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 15:09:07 -07:00
Alyssa Rosenzweig
7d30000628 panfrost: Remove dubious assert
We already *can* support texture formats with bpp > 4, so..

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 15:09:07 -07:00
Alyssa Rosenzweig
7f5481258c panfrost: Implement primitive restart
For GLES3, just pass the flag through.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 15:09:07 -07:00
Anuj Phogat
804d1bd111 i965/icl: Apply WA_1606682166 to compute workloads
We missed the workaround for compute workloads in earlier patches.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-28 14:02:13 -07:00
Anuj Phogat
d96cba7754 Revert "iris/icl: Add WA_2204188704 to disable pixel shader panic dispatch"
SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.

We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.

This reverts commit 9c421d6b47.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-28 14:02:13 -07:00
Anuj Phogat
387e43b52f Revert "anv/icl: Add WA_2204188704 to disable pixel shader panic dispatch"
SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.

We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.

This reverts commit 2be60e0c73.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-28 14:02:13 -07:00
Anuj Phogat
7746d4edef Revert "i965/icl: Add WA_2204188704 to disable pixel shader panic dispatch"
SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.

We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.

This reverts commit 85ecd14ef6.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-28 14:02:13 -07:00
Anuj Phogat
db093d028c i965/icl: Fix WA_1606682166
An earlier change was setting the SamplerCount = 0 for Gen 11
under #if GEN_GEN < 7. This commit fixes the problem.

This WA has also been added to the linux kernel.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-28 14:02:13 -07:00
Rob Clark
9753d7381c freedreno/ir3: small cleanup
`target` cannot be NULL here.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-28 13:02:59 -07:00
Rob Clark
016a9ab2f9 freedreno/ir3: fix missing (ss) in dummy bary.f case
In case we need to insert a dummy bary.f for the (ei) flag, it also
needs (ss) so we don't release varying storage to the next VS wave
before the ldlv completed.  Fixes random failures in:

dEQP-GLES3.functional.transform_feedback.random.interleaved.lines.*

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-28 13:02:59 -07:00
Rob Clark
21beddd3bc freedreno/a6xx: wire up dither state
Fixes:
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.rebind_rbo_rgba4
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4_stencil_index8
dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.rebind_rbo_rgba4_depth_component16
dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.no_rebind_rbo_rgba4_depth_component16
dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.rebind_rbo_rgba4_stencil_index8
dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.no_rebind_rbo_rgba4_stencil_index8

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-28 13:02:59 -07:00
Arfrever Frehtes Taifersar Arahesis
b120a02b21 meson: Improve detection of Python when using Meson >=0.50.
Previously, on systems where multiple versions of Python 3 (e.g. 3.6 and 3.7)
are installed, wrong version of Python 3 could have been used.

The proper fix requires availability of path() method in Meson's python
module, which has been added in Meson 0.50:
https://github.com/mesonbuild/meson/pull/4616

Distro Bug: https://bugs.gentoo.org/671308
Signed-off-by: Arfrever Frehtes Taifersar Arahesis <Arfrever@Apache.Org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

v2: - Add missing `endif` keyword (Dylan)
2019-06-28 12:51:21 -07:00
Pierre-Eric Pelloux-Prayer
c81c784a4a radeon/uvd: fix calc_ctx_size_h265_main10
Left shift was applied twice.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110702

Reviewed-by: Leo Liu <leo.liu@amd.com>
Tested-by: <irherder@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: <mesa-stable@lists.freedesktop.org>
2019-06-28 15:44:48 -04:00
Pierre-Eric Pelloux-Prayer
1f7d8f9786 mesa: add display list support for gl(Compressed)TextureSubImage2DEXT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:35 -04:00
Pierre-Eric Pelloux-Prayer
360ef82765 mesa: add glTextureParameteri/iv/f/fvEXT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:34 -04:00
Pierre-Eric Pelloux-Prayer
29194648a6 mesa: extend _mesa_lookup_or_create_texture to support EXT_dsa
Adds a boolean to implement EXT_dsa specifics.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:32 -04:00
Pierre-Eric Pelloux-Prayer
274104ec38 mesa: refactor bind_texture
Splits texture lookup and binding actions.

The new _mesa_lookup_or_create_texture will be useful to implement the EXT_direct_state_access extension.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:30 -04:00
Pierre-Eric Pelloux-Prayer
6535964fdf mesa: extract helper function for glTexParameter*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:29 -04:00
Pierre-Eric Pelloux-Prayer
32eefb7451 mesa: add buffer != 0 checks to glNamedBufferEXT functions
The EXT_direct_state_access spec says:

    INVALID_OPERATION is generated by GetNamedBufferParameterivEXT,
    GetNamedBufferPointervEXT, GetNamedBufferSubDataEXT,
    MapNamedBufferEXT, NamedBufferDataEXT, NamedBufferSubDataEXT, and
    UnmapNamedBufferEXT if the buffer parameter is zero.

This commits adds buffer != 0 validation to the implemented functions.

glNamedBufferStorageEXT isn't included in this list and the EXT_buffer_storage
doesn't says that buffer = 0 is an error either so I didn't add the same
validation for this function.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:26 -04:00
Marek Olšák
0de2754aa7 mesa: fix a typo in map_named_buffer_range
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:25 -04:00
Timothy Arceri
9c53a2ecb7 mesa: add support for glMapNamedBufferEXT()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:24 -04:00
Timothy Arceri
76e25edf6a mesa: add support for glUnmapNamedBufferEXT()
Since the ARB DSA function glUnmapNamedBuffer() is only exposed
for 3.1 or above we make glUnmapNamedBuffer() an alias of
glUnmapNamedBufferEXT() rather than the other way around.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:21 -04:00
Timothy Arceri
b5f930ea05 mesa: add support for glCompressedTextureSubImage2DEXT()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:20 -04:00
Timothy Arceri
b82b3d28d3 mesa: add support for glTextureSubImage2DEXT()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:19 -04:00
Timothy Arceri
cb0f25a926 mesa: add support for glMapNamedBufferRangeEXT()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:16 -04:00
Timothy Arceri
eec5c01b5e mesa: add support for glNamedBufferStorageEXT
This is available in ARB_buffer_storage when
EXT_direct_state_access is present.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:14 -04:00
Timothy Arceri
83ed9485b7 mesa: add support for glNamedBuffer*DataEXT()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:41:12 -04:00
Timothy Arceri
0972b0b059 mesa: add support for glBindMultiTextureEXT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:40:54 -04:00
Pierre-Eric Pelloux-Prayer
c37f03d464 mesa: delete framebuffer texture attachment sampler views
When a context is destroyed the destroy_tex_sampler_cb makes sure that all the
sampler views created by that context are destroyed.
This is done by walking the ctx->Shared->TexObjects hash table.

In a multiple context environment the texture can be deleted by a different context,
so it will be removed from the TexObjects table and will prevent the above mechanism
to work.
This can result in an assertion in st_save_zombie_sampler_view because the
sampler_view owns a reference to a destroyed context.

This issue occurs in blender 2.80.

This commit fixes this by explicitly releasing sampler_view created by the destroyed
context for all texture attachments.

Fixes: 593e36f956 (st/mesa: implement "zombie" sampler views (v2))
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110944
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-28 15:29:08 -04:00
James Clarke
7389bf9761 meson: GNU/kFreeBSD has DRM/KMS and requires -D_GNU_SOURCE
This is a regression from the old autotools build system.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-28 19:06:46 +00:00
Kenneth Graunke
65e0c4b64f gallium/u_transfer_helper: Don't leak a reference to the resource.
We pipe_resource_reference when handling transfers in map, we need to
do a corresponding unreference in unmap.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2019-06-28 11:25:56 -07:00
Eric Engestrom
6227e6faee meson: only add empty lines betwen active summary sections
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-28 19:15:18 +01:00
Eric Engestrom
5819bc0e5c meson: bump required libdrm version to 2.4.81
dbb4457d98 started using drmDevicesEqual(), which was
introduced in libdrm 2.4.81

We could either copy the function locally, or bump the required version.
Since the function is non-trivial and 2.4.81 is old enough already,
I suggesting the latter.

Fixes: dbb4457d98 ("egl: add EGL_EXT_device_drm support")
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-28 19:03:04 +01:00
Emil Velikov
4ec32413f3 ac: change ac_query_gpu_info() signature
Currently libdrm_amdgpu provides a typedef of the various handles. While
the goal was to make those opaque, it effectively became part of the API

To the best of my knowledge there are two ways to have opaque handles:
 - "typedef void *foo;" - rather messy IMHO
 - "stuct foo;" and use "struct foo *" through the API

In our case amdgpu_device_handle is used only internally, plus
respective code is not used or applicable for r300 and r600. Hence we
copied the typedef.

Seemingly this will be a problem since libdrm_amdgpu wants to change the
API, while not updating the code(?).

Either way, we can safely s/amdgpU_device_handle/void */ and carry on.

Cc: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak at amd.com>
2019-06-28 17:49:32 +01:00
Tomeu Vizoso
7c745f6148 panfrost: Only tag AFBC addresses when sampling
Rendering to AFBC was broken, as the HW will complaint loudly if we pass
a tagged pointer in bifrost_render_target.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Fixes: 3609b50a64 ("panfrost: Merge AFBC slab with BO backing")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 15:40:52 +02:00
Jose Fonseca
3573412981 gallivm: Improve lp_build_rcp_refine.
Use the alternative more accurate expression from
https://en.wikipedia.org/wiki/Division_algorithm#Newton%E2%80%93Raphson_division

v2: Use lp_build_fmuladd as suggested by Roland

Tested by enabling this code path, and running lp_test_arit.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-06-28 11:48:12 +01:00
Tomeu Vizoso
0ec8a292fb panfrost/ci: Don't error out on RK3288
At the moment we don't have enough people to ensure that RK3288 is
regression-free, so don't fail the CI in that case.

For now we'll focus on not regressing on RK3399 and we can expand to
other SoCs as more people join the effort.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 11:13:04 +02:00
Tomeu Vizoso
6a26d6f4d9 panfrost/ci: Don't print every kernel file
As there's lots of them and Gitlab struggles rendering logs with so many
lines.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 11:13:04 +02:00
Tomeu Vizoso
61b793dde4 panfrost/ci: Fix the image name
These changes will make sure we get the right image from the container
registry.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 11:13:04 +02:00
Tomeu Vizoso
0315350d9e panfrost/ci: Remove batching
Panfrost has grown and doesn't leak as much as before.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-28 11:13:04 +02:00
Kenneth Graunke
847ef8ee4f iris: Don't leak resources in iris_create_surface for incomplete FBOs
We were failing to pipe_resource_unreference on the failure path due
to a non-renderable format.  Instead of fixing this, just move the
checks earlier, before we even bother with refcounting or calloc.
2019-06-28 01:13:11 -07:00
Samuel Pitoiset
ef1787dbc9 radv: only enable VK_AMD_gpu_shader_{half_float,int16} on GFX9+
These two extensions are supported on GFX8 but the throughput
of 16-bit floats/integers is same as 32-bit. Also, shaderInt16
is only enabled on GFX9+ for the same reason, be more consistent.

This fixes a crash with Wolfenstein II because it expects
shaderInt16 to be enabled when VK_AMD_gpu_shader_half_float is
exposed. Note that AMDVLK only enables these extensions on GFX9+.

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-28 08:40:44 +02:00
Samuel Pitoiset
5d6d29ed5d radv: add si_emit_ia_multi_vgt_param() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-28 08:40:42 +02:00
Alexandros Frantzis
7da90a7cc9 virgl: Don't allow creating staging pipe_resources
Staging buffers are now created directly by the virgl_staging_mgr. We
don't need to support creating staging pipe_resources.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Alexandros Frantzis
5388be039b virgl: Use virgl_staging_mgr
Use an instance of virgl_staging_mgr instead of u_upload_mgr to handle
the staging buffer. This removes the need to track the availability
of the staging manager, since virgl_staging_mgr can handle concurrent
active allocations.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Alexandros Frantzis
790d1a0b17 virgl: Add tests for virgl_staging_mgr
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Alexandros Frantzis
55a58dfcfb virgl: Introduce virgl_staging_mgr
Add a manager for the staging buffer used in virgl. The staging manager
is heavily inspired by u_upload_mgr, but is simpler and is a better fit
for virgl's purposes. In particular, the staging manager:

* Allows concurrent staging allocations.
* Calls the virgl winsys directly to create and map resources, avoiding
  unnecessarily going through gallium resources and transfers.

olv: make virgl_staging_alloc_buffer return a bool

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Alexandros Frantzis
6a03f25522 virgl: Store the virgl_hw_res for copy transfers
Store the virgl_hw_res instead of the pipe_resource for copy transfer
sources. This prepares the codebase for a change to provide only the
virgl_hw_res for the staging buffers in upcoming commits.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-28 04:30:02 +00:00
Kenneth Graunke
bed305fb7a iris: Fix major resource leak in iris_set_shader_images
We were failing to unreference the old image resource.  Instead of open
coding this and doing it badly, just use the copier function which does
the right thing.
2019-06-27 19:08:46 -07:00
Kenneth Graunke
255c71ec07 gallium: Make util_copy_image_view handle shader_access
A while back, we added a new field, but failed to update the copier.
I believe iris is the only current user of the new field, and it hasn't
used the copier, so noone noticed.

Fixes: 8b626a22b2 st/mesa: Record shader access qualifiers for images
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-27 19:06:19 -07:00
Kenneth Graunke
0d6fc6f07e gallium: Teach GALLIUM_REFCNT_LOG about array textures
Otherwise they are classified as pipe_martian_resource, and don't
contain any helpful information about the texture.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-27 16:56:15 -07:00
Nanley Chery
02f6995d76 isl: Don't align phys_level0_sa by block dimension
Aligning phys_level0_sa by the compression block dimension prior to
mipmap layout causes the layout of compressed surfaces to differ from
the sampler's expectations in certain cases. The hardware docs agree:

From the BDW PRM, Vol. 5, Compressed Mipmap Layout,

   The compressed mipmaps are stored in a similar fashion to
   uncompressed mipmaps [...]

   The following exceptions apply to the layout of compressed (vs.
   uncompressed) mipmaps:
      * [...]
      * The dimensions of the mip maps are first determined by applying
	the sizing algorithm presented in Non-Power-of-Two Mipmaps
	above. Then, if necessary, they are padded out to compression
	block boundaries.

The last bullet indicates that alignment should not be done for
calculating a miplevel's dimensions, but rather for determining miplevel
placement/padding. Comply with this text by removing the extra
alignment.

Fixes some fbo-generatemipmap-formats piglit failures on all tested
platforms (SNB-KBL).

v2:
- Note fixed platforms.
- Update some consumers via a helper function.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-27 23:38:38 +00:00
Nanley Chery
fb1350c76f intel: Add and use helpers for level0 extent
Prepare for a bug fix by adding and using helpers which convert
isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of
surface elements.

v2:
- Update iris (Ken).
- Update anv.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-27 23:38:37 +00:00
Dylan Baker
0ba0c0c15c meson: try to use cmake as a finder for clang
Clang (like LLVM), very annoyingly refuses to provide pkg-config, and
only provides cmake (unlike LLVM which at least provides llvm-config,
even if llvm-config is terrible). Meson has gained the ability to use
cmake to find dependencies, and can successfully find Clang. This change
attempts to use cmake to find clang instead of a bunch of library
searches, when paired with -Dcmake_prefix_path we can much more reliably
use cmake to control which clang we're getting. This is only enabled for
meson >= 0.51, which adds the required options.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-27 22:12:02 +00:00
Dylan Baker
5157a42765 meson: Add support for using cmake for finding LLVM
Meson has support for using cmake as a finder for some dependencies,
including LLVM. Using cmake has a lot of advantages: it needs less meson
maintenance to keep working (even for llvm updates); it works more
sanely for cross compiles (as llvm-config is a compiled binary not a
shell script). Meson 0.51.0 also has a new generic variable getter that
can be used to get information from either cmake, pkg-config, or
config-tools dependencies, which is needed for cmake. We continue to
support using llvm-config if you don't have cmake installed, or if cmake
cannot find a suitable version.

Fixes: 0d59459432
       ("meson: Force the use of config-tool for llvm")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-27 22:12:02 +00:00
Kenneth Graunke
3d3685d354 iris: Fix memory leak of SO targets
We need to pitch these on context destroy.
2019-06-27 14:59:39 -07:00
Kenneth Graunke
d65819f054 iris: Fix memory leak for draw parameter resources
Need to pitch these on context destroy.
2019-06-27 14:59:39 -07:00
Kenneth Graunke
50eb1c1396 iris: Drop u_upload_unmap
We use persistent maps so this does nothing.
2019-06-27 14:59:39 -07:00
Lionel Landwerlin
836225840c intel/compiler: fix derivative on y axis implementation
This rewrites the ddy in EXECUTE_4 mode with a loop to make it more
obvious what is going on and also sets the group each of the 4 threads
in the groups are supposed to execute.

Fixes the following CTS tests :

   dEQP-VK.glsl.derivate.dfdyfine.dynamic_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2134ea3800 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")
2019-06-27 18:14:58 +00:00
Eric Engestrom
53f17c4efd meson: set up a proper internal dependency for xmlconfig
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-27 17:42:25 +00:00
Eric Engestrom
ad0ee5bfa5 xmlconfig: add missing #include
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-27 17:42:25 +00:00
Eric Engestrom
069e6d587e xmlpool: fix typo in comment
s/otions/options/, and while here let's give the full path to xmlpool.h
since `../` won't be true in the generated file.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-27 17:42:25 +00:00
Kenneth Graunke
d6683e118f iris: Also properly restore INTERFACE_DESCRIPTOR_DATA buffer object
We were at least cleaning up this reference, but we were failing to
pin it in iris_restore_compute_saved_bos.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
340df53d6a iris: Fix resource tracking for CS thread ID buffer
Today, we stream the compute shader thread IDs simply because they're
(annoyingly) relative to dynamic state base address.  We could upload
them once at compile time, but we'd need a separate non-streaming
uploader for IRIS_MEMZONE_DYNAMIC, and I'm not sure it's worth it.

stream_state pins the buffer for use in the current batch, but also
returns a reference to the pipe_resource.  We dropped this reference
on the floor, leaking a reference basically every time we dispatched
a compute shader after switching to a new one.

The reason it returns a reference is so that we can hold on to it and
re-pin it in iris_restore_compute_saved_bos, which we were also failing
to do.  So if we actually filled up a batch with repeated dispatches to
the same compute shader, and flushed, then continued dispatching, we
would fail to pin it and likely GPU hang.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
16d334951e iris: Only bother with thread ID upload if doing MEDIA_CURBE_LOAD
We were unconditionally uploading the new data, but then conditionally
using it with MEDIA_CURBE_LOAD.  If we're not going to emit the command,
there's no point in uploading the data.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
8f51f1ba6e iris: Do MEDIA_CURBE_LOAD when IRIS_DIRTY_CS is set, not constants
We only use push the compute shader thread IDs, not any actual constant
buffer data.  So we should track the compute shader variant changing,
not constbuf changes.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
85c72da1b1 iris: Drop UBO range stuff from iris_restore_compute_saved_bos
Compute doesn't use UBO ranges (annoyingly), so this is dead code.
2019-06-27 08:12:22 -07:00
Kenneth Graunke
f94ebf0c9d iris: Properly align interface descriptor data addresses
MEDIA_INTERFACE_DESCRIPTOR's Interface Descriptor Data Start Address
field's docs say: "This bit specifies the 64-byte aligned address..."

And we were doing 32.  Superfluous thread ID uploading was apparently
saving us from GPU hangs in most cases.
2019-06-27 08:12:22 -07:00
Andrii Simiklit
62c6059584 mesa: use a correct function return type
v2: standard 'bool' can be used
     ( Eric Engestrom <eric.engestrom@intel.com> )

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
2019-06-27 07:53:41 +00:00
Tomeu Vizoso
9bef1f1ff1 panfrost/decode: Mention the address of a few descriptors
When the fault_pointer field in the header is set, we can get some idea
of which descriptor the HW isn't happy with if we know their addresses.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-27 09:13:48 +02:00
Tomeu Vizoso
de02fb19ed panfrost/decode: Wait for a job to finish before dumping
Then we can get some information back about any exception that might
have happened.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-27 09:13:42 +02:00
Tomeu Vizoso
fa36c194fd panfrost/decode: Decode exception status
Arm's kernel driver mentions how to decode this field, which makes a bit
clearer what had happened.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-27 09:13:35 +02:00
Tomeu Vizoso
b26c2b4840 panfrost/decode: Print AFBC struct when appropriate
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-27 09:12:56 +02:00
Samuel Pitoiset
d5004f60be radv: only export clip/cull distances if PS reads them
The only exception is the GS copy shader which emits them
unconditionally.

Totals from affected shaders:
SGPRS: 71320 -> 71008 (-0.44 %)
VGPRS: 54372 -> 54240 (-0.24 %)
Code Size: 2952628 -> 2941368 (-0.38 %) bytes
Max Waves: 9689 -> 9723 (0.35 %)

This helps Dota2, Doom, GTAV and Hitman 2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-27 08:56:37 +02:00
Samuel Pitoiset
1e9ccc5429 radv: fix FMASK expand if layerCount is VK_REMAINING_ARRAY_LAYERS
This doesn't fix anything known, but it's likely going to
break if layerCount is ~0U.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-27 08:56:34 +02:00
Kenneth Graunke
8551dc17a7 iris: Disable loop unrolling in GLSL IR.
Leave it to NIR instead, like i965 does.  Thanks to Tim Arceri for
noticing that I'd left this enabled by accident.

shader-db results on Skylake:

total instructions in shared programs: 15522628 -> 15521642 (<.01%)
instructions in affected programs: 94008 -> 93022 (-1.05%)
helped: 34
HURT: 33
helped stats (abs) min: 12 max: 48 x̄: 33.82 x̃: 42
helped stats (rel) min: 0.06% max: 22.14% x̄: 9.86% x̃: 10.89%
HURT stats (abs)   min: 1 max: 16 x̄: 4.97 x̃: 3t
HURT stats (rel)   min: 0.82% max: 3.77% x̄: 1.73% x̃: 1.53%
95% mean confidence interval for instructions value: -20.08 -9.35
95% mean confidence interval for instructions %-change: -5.95% -2.36%
Instructions are helped.

total cycles in shared programs: 367105221 -> 367074230 (<.01%)
cycles in affected programs: 10017660 -> 9986669 (-0.31%)
helped: 266
HURT: 184
helped stats (abs) min: 1 max: 9556 x̄: 151.35 x̃: 12
helped stats (rel) min: 0.08% max: 59.91% x̄: 4.66% x̃: 1.67%
HURT stats (abs)   min: 1 max: 1716 x̄: 50.37 x̃: 6
HURT stats (rel)   min: <.01% max: 24.40% x̄: 2.42% x̃: 0.85%
95% mean confidence interval for cycles value: -133.90 -3.84
95% mean confidence interval for cycles %-change: -2.44% -1.10%
Cycles are helped.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-26 22:55:03 -07:00
Kenneth Graunke
acadeaff6a st/mesa: Set EmitNoIndirectSampler if GLSLVersion < 400.
This patch changes the code which sets EmitNoIndirectSampler to check
the core profile GLSL version, rather than the ARB_gpu_shader5 extension
enable.  st/mesa exposes ARB_gpu_shader5 if GLSLVersion (in core
profiles) or GLSLVersionCompat (in compat profiles) >= 400.

The Intel drivers do not currently expose ARB_gpu_shader5 in compat
profiles.  But the backend can absolutely handle indirect samplers.
Looking at the core profile version number should be a good indication
of what the driver supports.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-26 22:54:52 -07:00
Kenneth Graunke
116144d65e iris: Delete dead ice->state.streamout_strides field.
Nothing uses this, it must be a remnant from an earlier approach.
2019-06-26 20:17:22 -07:00
Caio Marcelo de Oliveira Filho
085c0f1f13 nir/algebraic: Add helpers and a rule involving wrapping
The helpers are needed so we can use the syntax `instr(cond)` in the
algebraic rules.  Add simple rule for dropping a pair of mul-div of
the same value when wrapping is guaranteed to not happen.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-26 14:13:02 -07:00
Caio Marcelo de Oliveira Filho
5a143965b8 spirv: Implement NoSignedWrap and NoUnsignedWrap decorations
When handling the specified ALU operations, check for the decorations
and set nir_alu_instr no_signed_wrap and no_unsigned_wrap flags accordingly.

v2: Add a glsl_base_type_is_unsigned_integer() helper.  (Karol)

v3: Rename helper to glsl_base_type_is_uint().

v4: Use two flags, so we don't need the helper anymore.  (Connor)

v5: Pass alu directly to handle function.  (Jason)

Reviewed-by: Karol Herbst <kherbst@redhat.com> [v3]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-26 14:13:02 -07:00
Caio Marcelo de Oliveira Filho
ae37237713 nir: Add a no wrapping bits to nir_alu_instr
They indicate the operation does not cause overflow or underflow.
This is motivated by SPIR-V decorations NoSignedWrap and
NoUnsignedWrap.

Change the storage of `exact` to be a single bit, so they pack
together.

v2: Handle no_wrap in nir_instr_set.  (Karol)

v3: Use two separate flags, since the NIR SSA values and certain
    instructions are typeless, so just no_wrap would be insufficient
    to know which one was referred to.  (Connor)

v4: Don't use nir_instr_set to propagate the flags, unlike `exact`,
    consider the instructions different if the flags have different
    values.  Fix hashing/comparing.  (Jason)

Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-26 14:13:02 -07:00
Dylan Baker
f97dcb7a55 docs: add news item and link release notes for 19.0.8
This is an emergency release due to a critical bug.
2019-06-26 13:48:06 -07:00
Dylan Baker
290495a431 docs: Add mesa 19.0.8 sha256 sums 2019-06-26 13:46:30 -07:00
Dylan Baker
10a24925a0 docs: Add docs for 19.0.8 2019-06-26 13:46:29 -07:00
Jonathan Marek
a70ff70158 nir: remove fnot/fxor/fand/for opcodes
There doesn't seem to be any reason to keep these opcodes around:
* fnot/fxor are not used at all.
* fand/for are only used in lower_alu_to_scalar, but easily replaced

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-26 15:26:10 -04:00
Jonathan Marek
0b5a483baa nir: opt_vectorize: combine different constant sources
We can vectorize instructions with different constant sources by creating
a new load_const and using that.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 14:56:28 -04:00
Alyssa Rosenzweig
10688257bd panfrost/midgard: Merge embedded constants
In Midgard, a bundle consists of a few ALU instructions. Within the
bundle, there is room for an optional 128-bit constant; this constant is
shared across all instructions in the bundle.

Unfortunately, many instructions want a 128-bit constant all to
themselves (how selfish!). If we run out of space for constants in a
bundle, the bundle has to be broken up, incurring a performance and
space penalty.

As an optimization, the scheduler now analyzes the constants coming in
per-instruction and attempts to merge shared components, adjusting the
swizzle accessing the bundle's constants appropriately. Concretely,
given the GLSL:

   (a * vec4(1.5, 0.5, 0.5, 1.0)) + vec4(1.0, 2.3, 2.3, 0.5)

instead of compiling to the naive two bundles:

   vmul.fmul [temp], [a], r26
   fconstants 1.5, 0.5, 0.5, 1.0

   vadd.fadd [out], [temp], r26
   fconstants 1.0, 2.3, 2.3, 0.5

The scheduler can now fuse into a single (pipelined!) bundle:

   vmul.fmul [temp], [a], r26.xyyz
   vadd.fadd [out], [temp], r26.zwwy
   fconstants 1.5, 0.5, 1.0, 2.3

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 10:01:36 -07:00
Alyssa Rosenzweig
a0a34946d8 panfrost/midgard: Share swizzle compose
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 10:01:36 -07:00
Alyssa Rosenzweig
f6fde45d5c panfrost/midgard: Share swizzle/mask code
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 10:01:36 -07:00
Alyssa Rosenzweig
0979ea9de8 panfrost: Fix checksumming typo
Fixes: 3e6c6bb0 ("panfrost: Merge checksum buffer with main BO")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 09:58:30 -07:00
Kenneth Graunke
ab009b7d6e iris: Fix overzealous query object batch flushing.
In the past, each query object had their own BO.  Checking if the batch
referenced that BO was an easy way to check if commands were still
queued to compute the query value.  If so, we needed to flush.

More recently (c24a574e6c), we started using an u_upload_mgr for query
objects, placing multiple queries in the same BO.  One side-effect is
that iris_batch_references is a no longer a reasonable way to check if
commands are still queued for our query.  Ours might be done, but a
later query that happens to be in the same BO might be queued.  We don't
want to flush in that case.

Instead, check if the current batch's signalling syncpt is the one we
referenced when ending the query.  We know the syncpt can't have been
reused because our query is holding a reference, so a simple pointer
comparison should suffice.

Removes all batch flushing caused by query objects in Shadow of Mordor.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2019-06-26 09:49:01 -07:00
Kenneth Graunke
db878a728c iris: Make an iris_batch_get_signal_syncpt() helper.
This returns a pointer to the signalling syncpt, without incrementing
the reference count.  This can be useful for comparisons.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2019-06-26 09:49:01 -07:00
Boris Brezillon
443e530194 panfrost: Remove unneeded check in panfrost_scissor_culls_everything()
The ss local var is guaranteed to be != NULL. Get rid of this useless
check.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 09:35:25 -07:00
Alyssa Rosenzweig
d4575c3071 panfrost: Update copyright identifiers
"Collabora, Ltd." should be listed in lieu of simply "Collabora"

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Daniel Stone <daniels@collabora.com>
2019-06-26 09:10:51 -07:00
Alyssa Rosenzweig
b0e8941df1 panfrost/midgard: Reorder to permit constant bias
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 09:08:37 -07:00
Alyssa Rosenzweig
213b62810d panfrost/midgard: Add helper to encode constant bias
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 09:08:37 -07:00
Alyssa Rosenzweig
b51727ea28 panfrost/midgard: Handle negative immediate bias
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-26 09:08:37 -07:00
Rob Clark
1833827eac freedreno: correct batch_depends_on() logic
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-26 08:43:02 -07:00
Rob Clark
2b10bb6e5e freedreno: drop unused arg from fd_batch_flush()
The `force` arg has been unused for a while.. but apparently I forgot to
garbage collect it.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-26 08:43:02 -07:00
Timothy Arceri
5f809e2707 st/glsl: fix silly regression finding gs/tes variants
Fixes: d19fe5e67a ("st/glsl: support clamping color outputs in compat for gs/tes")

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-06-26 23:13:02 +10:00
Timothy Arceri
d19fe5e67a st/glsl: support clamping color outputs in compat for gs/tes
This support requires the driver to be a NIR driver as we use the
NIR lowering pass to do the clamping.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-26 00:36:48 +00:00
Timothy Arceri
f5f31612d3 nir: add tess support to nir_lower_clamp_color_outputs()
This will be used to add compat profile support for higher GL
versions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-26 00:36:48 +00:00
Sagar Ghuge
06807e1948 glsl: Fix round64 conversion function
Fix round64 function to handle round to nearest even cases specially
with positive and negative numbers with fraction part 0.5.

v2: 1) Simplify unused bits (Elie Tournier)

Fixes:
   KHR-GL45.gpu_shader_fp64.builtin.round_dvec2
   KHR-GL45.gpu_shader_fp64.builtin.round_dvec3
   KHR-GL45.gpu_shader_fp64.builtin.round_dvec4
   KHR-GL45.gpu_shader_fp64.builtin.roundeven_double
   KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec2
   KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec3
   KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec4

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-06-25 15:19:10 -07:00
Alyssa Rosenzweig
e8f4c9f56c panfrost/ci: Add RK3288 flipflops I don't want to deal with right now
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:42:58 -07:00
Alyssa Rosenzweig
70a87a915d panfrost/ci: Update failures list
A ton of tests were fixed by this series. A few were incorrectly passing
before (QualityError, for instance) and now are explicitly failing. A
few legitimate regressions but overwhelmingly positive.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig
ddf5f04edf panfrost/ci: Set MESA_GLES_VERSION_OVERRIDE=3.0
Fixes cube map tests due to disagreements between Mesa, dEQP, and the
spec...

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig
33f3cac1c2 panfrost/ci: Run full set of mipmap tests
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig
f34635c699 panfrost: Advertise support for other 8-bit UNORM formats
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig
310ca6ba40 panfrost: Use pipe_surface->format directly in blitter
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig
5cfb4248c6 panfrost: Invert swizzle for rendering
Fixes rendering to e.g. alpha textures.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig
b96f119d85 panfrost: Honour first_layer...last_layer when sampling
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig
0ad17f56ae panfrost: Use the sampler_view target (not the textures)
u_blitter gets "special treatment" and uses this mechanism to cast
cube maps to 2D textures in order to texelFetch them.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig
faf8ad4875 panfrost/midgard: Assert guard texelFetch against cubemaps
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig
124f6b541b panfrost: Zero pixels in any axis is zero pixels total
Multiplication, not addition, so switch the logic operator.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
06211f45a7 panfrost: Respect mip level when wallpapering
Fixes DATA_INVALID_FAULT raised when wallpapering while rendering to a
mipmap.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
6729912a4b panfrost/midgard: Fixup NIR texture op
In a vertex shader, a tex op should map to txl, as there *must* be a LOD
given to the hardware (implicitly or explicitly).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
17adcfc008 panfrost: Support (non-)seamless cube maps
Identify the seamless cubemap bit and passthrough the Gallium state
rather than setting unconditionally.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
3e6c6bb0af panfrost: Merge checksum buffer with main BO
This is similar to the AFBC merge; now all (non-imported) buffers use a
common backing buffer. Reenables checksumming, eliminating a performance
regression.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
a9fc1c8399 panfrost/decode: Limit MRT blend count
I thought I already fixed this. Maybe that was a dream...? Then again, I
might be dreaming now.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
65e9d9b625 panfrost: Clamp tile coordinates before job submission
Fixes TILE_RANGE_FAULT raised on some tests in
dEQP-GLES3.functional.fbo.blit.*

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
7005c0d83b panfrost: Use dedicated u_blitter context for wallpapers
The main ctx->blitter instance should be reserved for blits originated
from Gallium (like mipmap generation). Since wallpapering is
conceptually different -- wallpaper blits can be triggered by Gallium
blits -- the blitter pipes must be separate to avoid potential u_blitter
recursion.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
64b7bd3f90 panfrost: Sanity check layer
It doesn't make sense to try to render to multiple array elements at
once.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
eb3c09716b panfrost: Divide array_size by 6 for cubemaps
Addresses the disparity between Mali and Gallium definitions of
array_size.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
65bc56b568 panfrost: Use get_texture_address for framebuffer computations
Allows for sharing some code as well as theoretically allowing cubemap
rendering.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
3609b50a64 panfrost: Merge AFBC slab with BO backing
Rather than tracking AFBC memory "specially", just use the same codepath
as linear and tiled. Less things to mess up, I figure. This allows us to
use the standard setup_slices() call with AFBC resources, allowing
mipmapped AFBC resources.

Unfortunately, we do have to disable AFBC (and checksumming) in the
meantime to avoid functional regressions, as we don't know _a priori_ if
we'll need to access a resource from software (which is not yet hooked
up with AFBC) and we don't yet have routines to switch the layout of a
BO at runtime.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
aea3f0ac1d panfrost: Z/S can't be tiled
As far as we know, Utgard-style tiling only works for color render
targets, not depth/stencil, so ensure we don't try to tile it (rather
than compress or plain old linear) and drive ourselves into a corner.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
ad56dd4e97 panfrost: Enable mipmapping
Now the autogeneration of mipmaps is working (via u_blitter), we can
finally enable mipmaps!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
5aeffa9517 panfrost: Enable blitting
Now that all the prerequisites breaking u_blitter are fixed, we can
finally hook up panfrost_blit.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
06d192c742 panfrost: Allow texelFetch for wallpaper blits
We just implemented the routine; we may as well use it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
f4bb7f096c panfrost/midgard: Implement texelFetch (2D only)
txf instructions can result from blits, so handle them rather than
crash. Only works for 2D textures (not even 2D array texture) due to a
register allocation constraint that may not be sorted for a while.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
4ac42f2b38 panfrost: Skip flushes only for wallpapers, not any blit
We need the flush from u_blitter for a normal blit (e.g. for mipmaps);
it's only wallpaper-related blits that are special-cased.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
ffcc4d1c4e panfrost: Handle generate_mipmap ourselves
To avoid interference with the wallpaper code, we need to do some state
tracking when generating mipmaps. In particular, we need to mark the
generated layers as invalid before generating the mipmap, so we don't
try to backblit them if they already had content.

Likewise, we need to flush both before and after generating a mipmap
since our usual set_framebuffer_state flushing isn't quite there yet.
Ideally better optimizations would save the flush but I digress.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig
f57dfe4cdd panfrost: Disable mipmapping if necessary
If a mipfilter is not set, it's legal to have an incomplete mipmap; we
should handle this accordingly. An "easy way out" is to rig the LOD
clamps.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-25 13:39:17 -07:00
Kenneth Graunke
748e5dac72 intel/blorp: Disable sampler state prefetching on Gen11
Sampler state prefetching is broken on Gen11, and WA_160668216 says
to disable it.  Apparently sampler state prefetching also has basically
zero impact on performance, so we don't need to worry there.

i965, anv, and iris already handle this correctly, but we missed BLORP.
Ideally the kernel should globally disable this by writing SARCHKMD, at
which point we wouldn't have to worry about it.  But let's be defensive
and handle it ourselves too.

v2: separate out from BTP workaround in case we change that eventually

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> [v1]
2019-06-25 13:29:31 -07:00
Jason Ekstrand
0a364a4a74 anv/descriptor_set: Only write texture swizzles if we have an image view
When immutable samplers are set we call write_image_view with a NULL
image view.  This causes issues on IVB where we have to fake texture
swizzling.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110999
Fixes: d2aa65eb18 "anv: Emulate texture swizzle in the shader when..."
2019-06-25 19:43:25 +00:00
Chia-I Wu
74786b3aa3 virgl: add VIRGL_DEBUG_XFER
When set, do as requested and skip any transfer optimization.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-25 12:01:45 -07:00
Chia-I Wu
e93d918b65 virgl: add VIRGL_DEBUG_SYNC
When set, wait after every each flush.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-25 12:01:43 -07:00
Chia-I Wu
119b5701e1 virgl: fix the value of VIRGL_DEBUG_BGRA_DEST_SWIZZLE
VIRGL_DEBUG_BGRA_DEST_SWIZZLE should use bit 3.  Make some cosmetic
changes as well.

Fixes: a478e56fbd
    virgl: Add debug flag to bypass driconf to enable the BGRA tweaks

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-25 12:01:14 -07:00
Samuel Pitoiset
8ea7ee1536 radv: rename and re-document cache flush flags
SMEM and VMEM caches are L0 on gfx10. Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-25 18:38:37 +02:00
Samuel Pitoiset
5411f47056 radv: set DISABLE_CONSTANT_ENCODE_REG to 1 for Raven2
Ported from RadeonSI, will be emitted for GFX10 too.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-25 16:45:15 +02:00
Samuel Pitoiset
34bef8a0d7 radv: clear CMASK layers instead of the whole buffer on GFX8
This reduces the size of fill operations needed to clear CMASK
for layered color textures.

GFX9 unsupported for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-25 16:36:28 +02:00
Samuel Pitoiset
476b907a3b radv: clear FMASK layers instead of the whole buffer on GFX8
This reduces the size of fill operations needed to clear FMASK
for layered color textures.

GFX9 unsupported for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-25 16:36:25 +02:00
Samuel Pitoiset
a5ba386b3f radv: always initialize levels without DCC as fully expanded
This fixes a rendering issue with RoTR/DXVK.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-25 16:36:23 +02:00
Sergii Romantsov
1931c97a1d i965: leaking of upload-BO with push constants
In case of any enabled VS members from: uses_firstvertex,
uses_baseinstance, uses_drawid, uses_is_indexed_draw
leaks may happens.
Call gen6_upload_push_constants allocates
stage_stat->push_const_bo. It than takes pointer from
push_const_bo to draw_params_bo (in the call
brw_prepare_shader_draw_parameters by brw_upload_data)
and do reference which finally haven't got unreferenced.

Fixes leak:
 136 bytes in 1 blocks are definitely lost in loss record 6 of 13
    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    by 0xC2B64B7: bo_alloc_internal (brw_bufmgr.c:596)
    by 0xC2B6748: brw_bo_alloc (brw_bufmgr.c:672)
    by 0xC314BB3: brw_upload_space (intel_upload.c:88)
    by 0xC2EBBC5: gen6_upload_push_constants (gen6_constant_state.c:155)
    by 0xC9E4FA6: gen9_upload_vs_push_constants (genX_state_upload.c:3300)
    by 0xC2E0EDA: check_and_emit_atom (brw_state_upload.c:540)
    by 0xC2E0EDA: brw_upload_pipeline_state (brw_state_upload.c:659)
    by 0xC2E0FF1: brw_upload_render_state (brw_state_upload.c:681)
    by 0xC2C5D2D: brw_draw_single_prim (brw_draw.c:1052)
    by 0xC2C62CB: brw_draw_prims (brw_draw.c:1175)
    by 0xC488AD1: vbo_exec_vtx_flush (vbo_exec_draw.c:386)
    by 0xC485270: vbo_exec_FlushVertices_internal (vbo_exec_api.c:652)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
2019-06-25 12:26:25 +00:00
Juan A. Suarez Romero
81d28c69ea docs: update calendar, add news item and link release notes for X.Y.Z
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-06-25 13:02:37 +02:00
Juan A. Suarez Romero
2c06071521 docs: fix some typos in 19.0.7 release notes
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-06-25 13:01:56 +02:00
Juan A. Suarez Romero
4a2b502a6b docs: add sha256 checksums for 19.1.1
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit d54dc24d6d)
2019-06-25 12:56:49 +02:00
Juan A. Suarez Romero
5f7c66676f docs: add release notes for 19.1.1
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 22eddd8b9d)
2019-06-25 12:56:46 +02:00
Tapani Pälli
7a6e5a4bc3 intel/compiler: silence a warning of using different enum type
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-25 10:09:22 +03:00
Eric Engestrom
e9286eb60b egl: replace dead vfunc with an error
st/egl used to support eglCreatePbufferFromClientBuffer, but now that
it's gone, any call to it would segfault.

Let's return a nice error instead.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 07:47:19 +01:00
Eric Engestrom
eeacd66324 egl: delete unused vfuncs
Nobody ever uses these, so let's just hard code them instead.

If an EGL driver ever comes around that needs them they're trivial to
re-add.

Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 07:47:19 +01:00
Eric Engestrom
83f01f5261 egl: drop empty eglfallbacks.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-25 06:36:54 +00:00
Eric Engestrom
757d2fb48d egl: move eglGetSyncAttrib() fallback to eglapi.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 06:36:54 +00:00
Eric Engestrom
26d5ca44ba egl: move eglSwapInterval() fallback to eglapi.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 06:36:54 +00:00
Eric Engestrom
9dc00c8433 egl: move eglSurfaceAttrib() fallback to eglapi.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 06:36:54 +00:00
Eric Engestrom
58be9d50a7 egl: move eglQuerySurface() fallback to eglapi.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 06:36:54 +00:00
Eric Engestrom
b792b3ebd7 egl: move eglQueryContext() fallback to eglapi.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 06:36:54 +00:00
Eric Engestrom
7f848f9713 egl: move eglGetConfigAttrib() fallback to eglapi.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 06:36:54 +00:00
Eric Engestrom
1b76cca40f egl: move eglChooseConfig() fallback to eglapi.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 06:36:53 +00:00
Eric Engestrom
b883d7f567 egl: move eglGetConfigs() fallback to eglapi.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-25 06:36:53 +00:00
Rob Clark
927fb50727 freedreno/a5xx: fix batch leak in fd5 blitter path
Fixes: 3d198926a4 freedreno: use fd_bc_alloc_batch instead of fd_batch_create.
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-24 18:43:20 -07:00
Marek Olšák
4a1421aa26 radeonsi: don't set spi_ps_input_* for monolithic shaders
The driver doesn't use these values and ac_rtld has assertions
expecting the value of 0.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Marek Olšák
1d6e358c36 radeonsi: rename and re-document cache flush flags
SMEM and VMEM caches are L0 on gfx10.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Marek Olšák
aa8d6e0507 radeonsi: fix AMD_DEBUG=nofmask
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Marek Olšák
f46efacd01 radeonsi: flatten the switch for DPBB tunables
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-24 21:04:10 -04:00
Marek Olšák
ac4b1e2f0a radeonsi: set the calling convention for inlined function calls
otherwise the behavior is undefined

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-24 21:04:10 -04:00
Nicolai Hähnle
610e1a81f7 radeonsi: refactor si_update_vgt_shader_config
We'll have to extend this at some point, and using a bitfield union in
this way makes it easier to get the right index without excessive
branching.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Nicolai Hähnle
bd3a3fd25a amd/rtld: update the ELF representation of LDS symbols
The initial prototype used a processor-specific symbol type, but
feedback suggests that an approach using processor-specific section
name that encodes the alignment analogous to SHN_COMMON symbols is
preferred.

This patch keeps both variants around for now to reduce problems
with LLVM compatibility as we switch branches around.

This also cleans up the error reporting in this function.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Marek Olšák
0032f6b8a0 ac/surface: remove addrlib_family_rev_id
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 21:04:10 -04:00
Dylan Baker
032fe7d602 docs: update calendar, add news item and link release notes for 19.0.7 2019-06-24 16:24:05 -07:00
Dylan Baker
7badae431a docs: Add SHA256 sums for 19.0.7 2019-06-24 16:22:21 -07:00
Dylan Baker
8c0e5c4cfc Docs add 19.0.7 release notes 2019-06-24 16:22:20 -07:00
Ian Romanick
ee1c69fadd glsl: Don't increase the iteration count when there are no terminators
Incrementing the iteration count was intended to fix an off-by-one error
when the first terminator was superseded by a later terminator.  If
there is no first terminator or later terminator, there is no off-by-one
error.  Incrementing the loop count creates one.  This can be seen in
loops like:

    do {
        if (something) {
            // No breaks or continues here.
        }
    } while (false);

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Abel Briggs <abelbriggs1@hotmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110953
Fixes: 646621c66d ("glsl: make loop unrolling more like the nir unrolling path")
2019-06-24 14:32:33 -07:00
Eric Anholt
5c4289dd4b freedreno: Only upload the used part of UBO0 to the constant buffer.
We were pessimistically uploading all of it in case of indirection,
but we can just bump that when we encounter indirection.

total constlen in shared programs: 2529623 -> 2485933 (-1.73%)

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-24 14:23:07 -07:00
Eric Anholt
852704976a freedreno: Stop treating UBO 0 specially in UBO uploading.
ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we
need to upload (all of it, since it will lower indirect UBO 0 accesses
from load_ubo back to indirection on the constant buffer).

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-24 14:23:07 -07:00
Rob Clark
572c76fd88 freedreno: Clamp UBO uploads to the constlen decided by the shader.
If the NIR-level analysis decided to move UBO loads to the constant
file, but the backend decided not to load those constants, we could
upload past the end of constlen.  This is particularly relevant for
pre-a6xx, where we emit a different constlen between bin and render
variants.

(Fix by Rob, commit message by anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-24 14:23:07 -07:00
Alyssa Rosenzweig
c1ca138475 panfrost: Allow up to 16 UBOs
This is the hardware max, as far as I can tell.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig
b670becb1e panfrost: DRY between shader stage setup
Just a little spring cleanup, extending UBOs to vertex shaders in the
process.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig
5e2c3d40bd panfrost/midgard: Implement UBO reads
UBOs and uniforms now use a common code path with an explicit `index`
argument passed, enabling UBO reads.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig
f28e9e868b panfrost: Handle disabled/empty UBOs
Prevents an assert(0) later in this (not so edge) case. We still have to
have a dummy there.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig
bd2fc60a8a panfrost: Identify "uniform buffer count" bits
We've known about this for a while, but it was never formally in the
machine header files / decoder, so let's add them in.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig
856e03902b panfrost: Upload UBOs
Now that all the counting is sorted, it's a matter of passing along a
GPU address and going.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig
4c6d751274 panfrost: Allow for dynamic UBO count
We already uploaded UBOs, but only a fixed number (1) for uniforms;
let's upload as many as we compute we need.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig
5d60be4e24 panfrost: Report UBO count
We look at the highest set bit in the UBO enable mask to work out the
maximum indexable UBO, i.e. the UBO count as we need to report to the
hardware.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig
ca2caf01df panfrost: Constant buffer refactor
We refactor panfrost_constant_buffer to mirror v3d's constant buffer
handling, to enable UBOs as well as a single set of uniforms.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig
f35f373850 panfrost: Replace varyings for point sprites
This doesn't handle Y-flipping, but it's good enough to render the stars
in Neverball.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:56:22 -07:00
Alyssa Rosenzweig
be03060066 panfrost: Track point sprites in fragment shader key
In preparation for lowering point sprites, track them like we track
alpha testing state.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-24 12:56:16 -07:00
Caio Marcelo de Oliveira Filho
7fc907118e i965: Move resources lowering after NIR linking
Those either depend on information filled by the NIR linking steps OR
are restricted by those:

- gl_nir_lower_samplers: depends on UniformStorage being set by the
  linker.

- brw_nir_lower_image_load_store: After 6981069fc8 "i965: Ignore
  uniform storage for samplers or images, use binding info" we want
  this pass to happen after gl_nir_lower_samplers.

- gl_nir_lower_buffers: depends on UniformBlocks and
  SharedStorageBlocks being set by the linker.

For the regular GLSL code path, those datastructures are filled
earlier.  For NIR linking code path we need to generate the nir_shader
first then process it -- and currently the processing works with all
shaders together.  So move the passes out of brw_create_nir into its
own function, called by the brwProgramStringNotify and
brw_link_shader().

This patch prepares ground for ARB_gl_spirv, that will make use of NIR
linker.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-24 11:44:03 -07:00
Caio Marcelo de Oliveira Filho
6e2ff10886 glsl/nir: Fix copying 64-bit values in uniform storage
The iterator `i` already walks the right amount now that is
incremented by `dmul`, so no need to `* 2`.  Fixes invalid memory
access in upcoming ARB_gl_spirv tests.

Failure bisected by Arcady Goldmints-Orlov.

Fixes: b019fe8a5b "glsl/nir: Fix handling of 64-bit values in uniform storage"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-24 11:32:14 -07:00
Caio Marcelo de Oliveira Filho
390ff8ac54 glsl/nir: Fix copying vector constant values
For n_columns == 1, we have a vector which is handled by the else
case.  Fixes invalid memory access in upcoming ARB_gl_spirv tests.

Failure bisected by Arcady Goldmints-Orlov.

Fixes: 81e51b412e "nir: Make nir_constant a vector rather than a matrix"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-24 11:32:14 -07:00
Daniel Schürmann
0daeb1d127 amd/common: lower bitfield_extract to ubfe/ibfe.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
48a75e7af0 amd/common: lower bitfield_insert to bfm & bitfield_select
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
a8b0b6e52b nir: introduce lowering of bitfield_insert to bfm and a new opcode bitfield_select.
bitfield_select is defined as:
bitfield_select(mask, base, insert) = (mask & base) | (~mask & insert)
matching the behavior of AMD's BFI instruction.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
1403c3a7bf nir/algebraic: Use unsigned comparison when lowering bitfield insert/extract
This lets us use the optimization pattern
(('ult', 31, ('iand', b, 31)), False) to remove the
bcsel instruction for code originating in D3D shaders.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
4eeb49ea71 nir/algebraic: Remove unnecessary iand of [iu]bfe and bfm sources
The [iu]bfe and bfm instructions are defined to only use the five
least significant bits.
This optimizes a common pattern from D3D -> SPIR-V translation.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
165b7f3a44 nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.

Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann
a74f256c58 nir/algebraic: add optimization pattern for ('ult', a, ('and', b, a)) and friends.
These optimizations are based on the fact that
'and(a,b) <= umin(a,b)'.
For AMD, this series moves the optimization from LLVM to NIR,
so currently no vkpipeline-db changes here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-24 18:42:20 +02:00
Andreas Baierl
fa6ea16a8d lima/ppir: Add fsat op
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-24 16:41:33 +02:00
Andreas Baierl
f1d89bbc2f lima/ppir: Add fneg op
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-24 16:41:33 +02:00
Andreas Baierl
512397058d lima/ppir: Add fabs op
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-24 16:41:33 +02:00
Eric Engestrom
2d2e824fae util: support "y" and "n" in env_var_as_boolean()
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-24 12:49:13 +00:00
Andreas Baierl
0cb9ce12fd lima/ppir: lower ffma in ppir
Since we cannot handle ffma in ppir, lower it on nir level already.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-24 11:57:57 +00:00
Samuel Pitoiset
946193ae00 radv: add support for VK_AMD_buffer_marker
This simple extension might be useful for debugging purposes.
GAPID has support for it.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-24 10:50:54 +02:00
Tapani Pälli
ff77b0415b meson: error out if platforms contains empty string
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110939
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-06-24 08:40:18 +03:00
Nataraj Deshpande
d94fca5420 anv: Add HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED in vk_format
When HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED is used, then the platform
gralloc module will select a format based on the usage flags provided by
the camera device and the other endpoint of the stream.

The patch fixes crash in vulkan when the test is run with camera stream
set to HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED.

Test: android.graphics.cts.CameraVulkanGpuTest#testCameraImportAndRendering
on chromebook with camera HAL3.

v2: use AHARDWAREBUFFER_FORMAT_IMPLEMENTATION_DEFINED and take
    AHARDWAREBUFFER_USAGE_CAMERA_MASK in to account (Gurchetan)

Fixes: f1654fa7e3 "anv/android: support creating images from external format"
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-24 08:28:18 +03:00
Timur Kristóf
3b6d787e40 iris: move sysvals to their own constant buffer
This commit moves the sysvals to a separate, new constant buffer
at the end (before the shader constants). It also allows us to
remove the special handling we had for cbuf0, and enables all
constant buffers to support user-specified resources and user
buffers.

v2: (by Kenneth Graunke)
- Rebase on the previous patch to fix system value uploading.
- Fix disk cache num_cbufs calculation
- Fix passthrough TCS to report num_cbufs = 1 so upload actually occurs
- Change upload_sysvals to assert that num_cbufs > 0 when
  num_system_values > 0.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-23 18:33:23 +02:00
Kenneth Graunke
ebc8c20b3e iris: Mark cbuf0 as not needing uploading every single time
I neglected to mark cbuf0_needs_upload = false after uploading it.
The obvious fix regressed user clip plane tests, because of a second
bug: we also forgot to mark that they may need re-uploading when
changing shader programs (which may have more or less system values).

Thanks to Timur Kristóf for catching the original issue.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
2019-06-23 18:32:11 +02:00
Eric Engestrom
188dbb1679 Revert "egl: drop empty eglfallbacks.c" and "egl: move fallback calls to eglapi.c"
This reverts commits cc4b68a801 and
b27fb3eaca.

These caused a bunch of EGLSync tests to crash when they were previously
failing.

I have a hunch the tests are doing something wrong, like using
extensions without checking for they support, but until the issue is
investigated I'm just reverting these commits.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-22 21:59:06 +01:00
Eric Engestrom
cc4b68a801 egl: drop empty eglfallbacks.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-22 15:17:42 +00:00
Eric Engestrom
b27fb3eaca egl: move fallback calls to eglapi.c
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-22 15:17:42 +00:00
Eric Engestrom
262b767023 egl: drop _eglReturnFalse() fallbacks
v2: drop them altogether, they should never get called in the
    first place (Emil)

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-22 15:17:42 +00:00
Eric Engestrom
82487ede62 egl: remove unnecessary eglGetProcAddress() fallback
No need to add a function that returns `false` only to be cast into
a pointer, we can just use the existing `return NULL` :)

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-22 15:17:42 +00:00
Eric Engestrom
30ecd86947 egl: remove NULL assignments after calloc()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-22 15:17:42 +00:00
Eric Engestrom
64c7c05b71 egl: move bad_param check further up
This way other functions added in these entrypoints don't need to check
anything.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-22 15:17:42 +00:00
Kenneth Graunke
262787b9bc iris: Drop bo != NULL check from blorp 48b invalidate function.
There is always a BO.
2019-06-21 20:50:42 -05:00
Kenneth Graunke
5da37a826b Revert "iris: Don't check VF address high bits when there is no buffer."
This reverts commit db8f57a5cb.

This is bonkers.  There will always be a BO.
2019-06-21 20:50:42 -05:00
Eric Anholt
4449572c47 freedreno: Only upload UBO pointers for UBOs that haven't been lowered.
total constlen in shared programs: 2485933 -> 2462236 (-0.95%)

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-21 17:14:43 -07:00
Eric Anholt
01d0bad9ef freedreno: Remove silly return from ir3_optimize_nir().
We only ever return the shader we were passed in (but internally
modified).

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-21 17:14:43 -07:00
Eric Anholt
56842d33d5 freedreno: Fix up end range of unaligned UBO loads.
We need the constants uploaded to cover the NIR offset plus the size,
not the aligned-down start of our upload range plus the size.  Fixes
mistaken UBO analysis with mat3 loads.

Fixes: 893425a607 ("freedreno/ir3: Push UBOs to constant file")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-21 17:14:43 -07:00
Eric Anholt
5e7c96b95d freedreno: Fix UBO load range detection on booleans.
NIR 1-bit bool dests will have a bit size of 1, and thus a calculated
"bytes" of 0.  load_ubo is always loading from dwords in the source.

Fixes: 893425a607 ("freedreno/ir3: Push UBOs to constant file")
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-21 17:14:43 -07:00
Eric Anholt
23a7feda63 freedreno: Stop reporting max_const in shader-db.
We end up uploading constlen regardless, so max_const would only get
you slightly improved granularity in const usage in comparison.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-21 17:14:43 -07:00
Eric Anholt
ee2e1e85d4 freedreno: Include binning shaders in shader-db.
We want to see if we've improved our binning VS output, as well as the
render VS.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-21 17:14:43 -07:00
Marek Olšák
8ab9f3a857 include: update GL headers from the registry
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-06-21 19:00:52 -04:00
Alyssa Rosenzweig
a6bef350ed panfrost: Fix unused variable warning
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-21 13:06:49 -07:00
Boris Brezillon
5f81669d88 panfrost: Remove the panfrost_driver abstraction
The non-DRM backend is gone. Let's get rid of the panfrost_driver
abstraction and call the panfrost_drm_xxx() functions directly.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-21 13:01:49 -07:00
Boris Brezillon
e8257f3de8 panfrost: Remove the perf counters interface
The DRM backend has a dummy implementation and the non-DRM backend is
gone, so let's remove this perf counter interface.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-21 13:01:12 -07:00
Tomeu Vizoso
0bcbccf887 panfrost: ci: Fix parsing of crashed tests
Without this fix, LAVA isn't parsing crashes as failed tests, because
the shell logging is interspersed within the fake deqp output.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-21 09:35:35 -07:00
Alyssa Rosenzweig
d38ac21297 panfrost: Conditionally submit fragment job
If there are no tiling jobs and no clears, there is no need to submit a
fragment job (relevant for transform feedback).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-21 09:35:35 -07:00
Alyssa Rosenzweig
cd5d618b5c panfrost: Implement rasterizer discard
D'aww, look how cute that is now that scoreboarding is setup.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-21 09:35:31 -07:00
Alyssa Rosenzweig
26c5a145a7 panfrost: Track buffer initialization
We want to know if a given slice of a buffer is initialized at a
particular point in the execution of the program. This is accomplished
easily enough -- start out uninitialized and upon an operation writing
to the buffer, mark it initialized.

The motivation is to optimize away expensive operations (like wallpaper
blits) when reading from an uninitialized buffer; since it's
uninitialized, the results of these operations are undefined, and it's
legal to take the fast path ^_^

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-21 09:35:09 -07:00
Alyssa Rosenzweig
f0854745fd panfrost: Implement command stream scoreboarding
This is a rather complex change, adding a lot of code but ideally
cleaning up quite a bit as we go.

Within a batch (single frame), there are multiple distinct Mali job
types: SET_VALUE, VERTEX, TILER, FRAGMENT for the few that we emit right
now (eventually more for compute and geometry shaders). Each hardware
job has a mali_job_descriptor_header, which contains three fields of
interest: job index, a dependencies list, and a next job pointer.

The next job pointer in each job is used to form a linked list of
submitted jobs. Easy enough.

The job index and dependencies list, however, are used to form a
dependency graph (a DAG, where each hardware job is a node and each
dependency is a directed edge). Internally, this sets up a scoreboarding
data structure for the hardware to dispatch jobs in parallel, enabling
(for example) vertex shaders from different draws to execute in parallel
while there are strict dependencies between tiling the geometry of a
draw and running that vertex shader.

For a while, we got by with an incredible series of total hacks,
manually coding indices, lists, and dependencies. That worked for a
moment, but combinatorial kaboom kicked in and it became an
unmaintainable mess of spaghetti code.

We can do better. This commit explicitly handles the scoreboarding by
providing high-level manipulation for jobs. Rather than a command like
"set dependency #2 to index 17", we can express quite naturally "add a
dependency from job T on job V". Instead of some open-coded logic to
copy a draw pointer into a delicate context array, we now have an
elegant exposed API to simple "queue a job of type XYZ".

The design is influenced by both our current requirements (standard ES2
draws and u_blitter) as well as the need for more complex scheduling in
the future. For instance, blits can be optimized to use only a tiler
job, without a vertex job first (since the screen-space vertices are
known ahead-of-time) -- causing tiler-only jobs. Likewise, when using
transform feedback with rasterizer discard enabled, vertex jobs are
created (to run vertex shaders) with no corresponding tiler job. Both of
these cases break the original model and could not be expressed with the
open-coded logic. More generally, this will make it easier to add
support for compute shaders, geometry shaders, and fused jobs (an
optimization available on Bifrost).

Incidentally, this moves quite a bit of state from the driver context to
the batch, which helps with Rohan's refactor to eventually permit
pipelining across framebuffers (one important outstanding optimization
for FBO-heavy workloads).

v2: Add comment explaining the meaning of "primary batch" as suggested
by Tomeu (trivial - not reviewed).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
2019-06-21 09:35:02 -07:00
Anuj Phogat
e334a595e4 intel/icl: Add new ICL PCI-IDs
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-21 08:38:08 -07:00
Jason Ekstrand
1a9e5b9094 anv: Implement "pop-free" clipping
This is the preferred clipping mode since it doesn't mean your points
disappear the moment part of the point crosses over the edge of the
viewport and that lines have weird endpoints at viewport edges.  We've
just never bothered to hook it up until now.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-21 14:18:59 +00:00
Jason Ekstrand
4a757d6c31 anv: Enable the guardband clip test
In workloads where there is a lot of geometry drawn that crosses over
the edge of the viewport, this should substantially improve clipper
performance.  Not really sure why it's taken 3 years to turn it on but
we never got around to it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-21 14:18:59 +00:00
Jason Ekstrand
13f0c278c5 i965,iris: Move guardband calculations to a common location
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-21 14:18:59 +00:00
Mauro Rossi
60c581b57d android: virgl: fix libmesa_winsys_virgil_common build and dependencies
Fixes the following building errors and resolves Bug 110922
Fixes gallium_dri target missing symbols at linking.

external/mesa/src/gallium/winsys/virgl/drm/Android.mk:
error: libmesa_winsys_virgl (STATIC_LIBRARIES android-x86_64) missing libmesa_winsys_virgl_common (STATIC_LIBRARIES android-x86_64)
...
external/mesa/src/gallium/winsys/virgl/vtest/Android.mk:
error: libmesa_winsys_virgl_vtest (STATIC_LIBRARIES android-x86_64) missing libmesa_winsys_virgl_common (STATIC_LIBRARIES android-x86_64)
...
build/core/main.mk:728: error: exiting from previous errors.

In file included from external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_socket.c:34:
external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_winsys.h:35:10:
fatal error: 'virgl_resource_cache.h' file not found
         ^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

In file included from external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_winsys.c:32:
external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_winsys.h:35:10:
fatal error: 'virgl_resource_cache.h' file not found
#include "virgl_resource_cache.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: b18f09a ("virgl: Introduce virgl_resource_cache")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
2019-06-21 15:53:29 +02:00
Mauro Rossi
cf389ba895 android: winsys/amdgpu,radv: fix generated amdgfxregs.h header dependecies
Fix android building errors in winsys/amdgpu and radv
due to 'amdgfxregs.h' not found.

Changelog:
amd/common - generated $(intermediated)/common path is added to exports
winsys/amdgpu - libmesa_amd_common static dependency is added
radv - correct generated $(intermediated)/common path is added to includes

Fixes: f480b8a ("amd/common: use generated register header")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-06-21 15:53:23 +02:00
Samuel Pitoiset
9bf47fefe0 radv: add support for VK_KHR_depth_stencil_resolve
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 14:50:38 +02:00
Samuel Pitoiset
e67fc11c26 radv: pass sample locations for transitions before depth/stencil resolves
HTILE decompressions need the user sample locations if specified
in the current subpass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 14:50:35 +02:00
Samuel Pitoiset
396da5c029 radv: clear the depth/stencil resolve attachment if necessary
The driver might need to clear one aspect of the depth/stencil
resolve attachment before performing the resolve itself.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 14:50:33 +02:00
Samuel Pitoiset
c7872237bf radv: decompress HTILE if the resolve src image is compressed
It's required to decompress HTILE before resolving with the
compute path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 14:50:27 +02:00
Samuel Pitoiset
29c4d44cee radv: select the depth/stencil resolve method based on some conditions
Only fallback to the compute path for layers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 14:50:24 +02:00
Samuel Pitoiset
5cf350f565 radv: implement all depth/stencil resolve modes using compute
This path supports layers but it requires to decompress HTILE
before resolving. The driver also needs to fixup HTILE after
the resolve. This path is probably slower than the graphics one.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 14:50:19 +02:00
Samuel Pitoiset
cdc6efddf9 radv: implement all depth/stencil resolve modes using graphics
When using graphics, the driver doesn't need to decompress HTILE
before resolving. This path currently doesn't support layers
so we have to fallback to the compute path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 14:50:15 +02:00
Samuel Pitoiset
e52ad9f845 radv: record if a render pass has depth/stencil resolve attachments
Only supported with vkCreateRenderPass2().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 14:50:12 +02:00
Samuel Pitoiset
ac6369a2d0 radv: rename has_resolve to has_color_resolve
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 14:50:10 +02:00
Samuel Pitoiset
203f60ebf2 radv: emit framebuffer state from primary if secondary doesn't inherit it
Otherwise fast color/depth clears can't work because they depend
on the framebuffer.

This fixes the following CTS (when the small hint is disabled):
- dEQP-VK.geometry.layered.1d_array.secondary_cmd_buffer
- dEQP-VK.geometry.layered.2d_array.secondary_cmd_buffer
- dEQP-VK.geometry.layered.cube.secondary_cmd_buffer
- dEQP-VK.geometry.layered.cube_array.secondary_cmd_buffer

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110810
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107986
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-21 13:49:35 +02:00
Eric Engestrom
6a9dd62882 drisw: move build logic to build systems
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-21 11:35:39 +00:00
Tomeu Vizoso
1cbe2ad394 panfrost: ci: Exclude two more flip-flop from results
These three tests pass on RK3399, but fail on RK3288:

dEQP-GLES2.functional.shaders.matrix.div.const_lowp_mat2_mat2_vertex
dEQP-GLES2.functional.shaders.operator.unary_operator.pre_increment_effect.highp_ivec4_vertex
dEQP-GLES2.functional.shaders.texture_functions.vertex.texture2dprojlod_vec3

They reliably pass when run individually, but reliably fail when run in
a full CI run.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-21 10:45:12 +02:00
Gert Wollny
ef4429d9c5 gallium/st: Add Gallium hud to swrast drivers
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-06-21 08:54:57 +02:00
Iago Toral Quiroga
4d8f82946b v3d: flush jobs writing to vertex buffers used in the current draw call
This can happen when any of our vertex buffers was written by a previous
transform feedback draw.

Fixes the following piglit tests:
spec/ext_transform_feedback/position-render-bufferbase
spec/ext_transform_feedback/position-render-bufferbase-discard
spec/ext_transform_feedback/position-render-bufferoffset
spec/ext_transform_feedback/position-render-bufferoffset-discard
spec/ext_transform_feedback/position-render-bufferrange
spec/ext_transform_feedback/position-render-bufferrange-discard

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-21 08:06:13 +02:00
Iago Toral Quiroga
eb44dcc219 v3d: flush jobs reading from transform feedback output buffers
If we are about to write to a transform feedback buffer, we should
make sure that we flush any prior work that intended to read from
any of these buffers.

Fixes piglit test:
spec/ext_transform_feedback/immediate-reuse

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-21 08:06:13 +02:00
Iago Toral Quiroga
42572f2f7d v3d: add a helper to check if transform feedback is enabled
v2: We should be safe assuming that bind_vs != NULL (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-21 08:06:13 +02:00
Dave Airlie
00a56acc23 llvmpipe: make remove_shader_variant static.
this isn't used outside this file.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-06-21 10:27:57 +10:00
Eric Engestrom
955c63d364 util/os_file: resize buffer to what was actually needed
Fixes: 316964709e "util: add os_read_file() helper"
Reported-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-20 21:49:30 +00:00
Tomeu Vizoso
2743e34f20 panfrost: ci: Update expectations
These tests have been fixed recently.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-20 20:57:41 +02:00
Alyssa Rosenzweig
195e297a92 panfrost/midgard: Broadcast swizzle
Fixes regression in shaders using ball/etc by explicitly passing through
the number of channels in the NIR op and broadcasting the last
components of the channel appropriately, as the Midgard ops are all vec4
implicitly but NIR can be vec2/3.

v2: Don't also regress every other swizzle in Equestria.

v3: Don't regress the swizzles at Canterlot High either.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-20 20:52:04 +02:00
Kenneth Graunke
31de802e7e iris: Use stream uploader for shader draw parameters.
Most vertex data lives in user VBOs in IRIS_MEMZONE_OTHER, which
typically have high bits set to 0xffff.  The shader draw parameters were
being uploaded in IRIS_MEMZONE_DYNAMIC, which have high bets set to 0x2.
This was causing a lot of ping-ponging of high bits, leading to
unnecessary VF cache flushing.

Cuts 7.2% of the flushes in the Civizilation VI demo on Kabylake GT2.
2019-06-20 13:32:16 -05:00
Kenneth Graunke
db8f57a5cb iris: Don't check VF address high bits when there is no buffer.
If there is no buffer, then it doesn't matter.  Leave the old stale
high bits in place (for next time) and don't bother invalidating.

Cuts 5.6% of the flushes in the Civilization VI demo on Kabylake GT2.
2019-06-20 13:32:16 -05:00
Kenneth Graunke
ecc500398f iris: Drop RT flushes from depth stencil clearing flushes.
These write depth and stencil, not color writes, so there's no need
to flush the render target.
2019-06-20 13:32:16 -05:00
Kenneth Graunke
1d63af0f2c iris: Don't bother with PIPE_CONTROLs for CPU writes and no history
If a buffer has no usage history, we don't have any read only cache
invalidates to do.  If we've written it with the CPU, we don't need
to flush the render cache.  The only bit remaining is the CS stall
from iris_flush_bits_for_history.  We can just skip the PIPE_CONTROL
in this case.

This is pretty common - an app creates a buffer, fills it with data,
and then binds it for some purpose.

Cuts 36% of the flushes in Manhattan 3.0 on Kabylake GT2.
2019-06-20 13:32:16 -05:00
Kenneth Graunke
dfff6e10b4 iris: Only do an RT flush for transfer maps if using copy_region.
If we wrote the data via the CPU, there's no point in doing a render
target flush.  If using BLORP, we do want a render target flush so the
data lands.
2019-06-20 13:32:15 -05:00
Kenneth Graunke
c4c17ab3ec iris: Use iris_flush_bits_for_history in iris_transfer_flush_region
Instead of using the combined iris_flush_and_dirty_for_history, use
iris_flush_bits_for_history directly - we were already using the split
out iris_dirty_for_history.  There's no need to dirty twice, and we can
avoid the looping altogether for non-buffers.
2019-06-20 13:32:15 -05:00
Kenneth Graunke
6890340c31 iris: Avoid double flushing in iris_transfer_flush_region when copying.
My intention was to have iris_copy_region not do flushing, and leave
that up to the callers.  iris_resource_copy_region needs to do this,
but iris_transfer_flush_region was already doing it.  The net result
was that we were doing it twice for transfers.

So, move the flushing from iris_copy_region to iris_resource_copy_region
so that it only happens in the callers as I intended.
2019-06-20 13:32:15 -05:00
Kenneth Graunke
64fb20ed32 iris: Fix iris_flush_and_dirty_history to actually dirty history.
When I split iris_flush_and_dirty_history into two helper functions,
I accidentally made it stop dirtying.  Which was...sort of the point.

Fixes: 21688a306b iris: Split iris_flush_and_dirty_for_history into two helpers.
2019-06-20 13:32:15 -05:00
Kenneth Graunke
5e501ffeb2 iris: Add maybe_flush calls to texture_barrier and memory_barrier
Otherwise, tests which loop on glMemoryBarrier may run us out of
batch space with piles of flushing.  (Ideally, we'd elide those bonus
PIPE_CONTROLs, but presumably this isn't that common of a case...)

Piglit's arb_pipeline_statistics_query-comp would hit this case after
some of the next patches remove other PIPE_CONTROLs with maybe_flushes.
2019-06-20 13:32:15 -05:00
Kenneth Graunke
d4a4384b31 iris: Implement INTEL_DEBUG=pc for pipe control logging.
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush.  That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on.  It should make it easier to determine whether we're doing
too many flushes and why.
2019-06-20 13:32:15 -05:00
Alyssa Rosenzweig
c378829a0d panfrost: Skip shading unaffected tiles
Looking at the scissor, we can discard some tiles. We specifially don't
care about the scissor on the wallpaper, since that's a no-op if the
entire tile is culled.

v2: Clarify clear comment (not reviewed but trivial).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-20 09:30:38 -07:00
Eric Engestrom
65b016b146 glx: fix glvnd pointer types
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110709
Fixes: 22a9e00aab ("glx: Implement the libglvnd interface.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-20 17:21:37 +01:00
Eric Engestrom
e0ee790ba7 glx: drop misleading comment about the file being "generated"
This `gen_scrn_dispatch.pl` has never existed, in the sense that NVIDIA
never published it.  There have been a number (6) of commits to fix
various things in there over the years, and never anything from NVIDIA.

For all intents and purposes this file is hand-written and
hand-maintained, and we're on our own.

Let's make this clear by removing this misleading comment.

Suggested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-20 16:19:58 +00:00
Boris Brezillon
56434450f6 nir/lower_tex: Add an assert() in nir_lower_txs_lod()
We don't expect the output of a TXS instruction to be wider than a
vec3. Add an assert() to make sure this never happens.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 09:15:53 -07:00
Tomeu Vizoso
babc3ad291 panfrost: Set job requirements during draw
Right now we are doing it at a moment when we don't have all the
information we need.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Rohan Garg <rohan.garg@collabora.com>
Cc: Rohan Garg <rohan.garg@collabora.com>
Fixes: bfca21b622 ("panfrost: Figure out job requirements in pan_job.c")
2019-06-20 18:07:04 +02:00
Alyssa Rosenzweig
dc668203db panfrost/meson: Link with libpanfrost_shared
Fixes: 035a07c0 ("panfrost: Switch to lima tiling")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 08:56:38 -07:00
Hyunjun Ko
f7f8fb1b55 freedreno/ir3: fix typo
Fixes: a9b556d3a0 ("freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov")
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-20 08:34:09 -07:00
Alyssa Rosenzweig
546236e27f panfrost: Load from tiled images
Now that we have lima tiling code available, use it to load from a tiled
source.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 08:22:38 -07:00
Alyssa Rosenzweig
035a07c0ae panfrost: Switch to lima tiling
Lima and Panfrost both have implementations of software tiling
(the Lima one was forked off the Panfrost one which was forked off the
original Lima one...). Switch to the most recent Lima code, since it's
more complete than ours at this point.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 08:22:38 -07:00
Alyssa Rosenzweig
7b46f09f26 panfrost: Fix tiled NPOT textures with bpp<4
Panfrost's tiling routines (incorrectly) ignored the source stride,
masking this bug; lima's routines respect this stride, causing issues
when tiling NPOT textures whose stride is not a multiple of 64
(for instance, NPOT textures with bpp=1).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 08:22:38 -07:00
Alyssa Rosenzweig
413242277a lima,panfrost: Move lima_tiling.c/h to /src/panfrost
This will allow both drivers to share this code. Both drivers
build-tested with meson. Android build not tested.

v2: Change naming from tiling->shared, in case Lima and Panfrost can
share more in the future. Fix Android build system.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-and-tested-by: Qiang Yu <yuq825@gmail.com>
2019-06-20 08:06:35 -07:00
Kenneth Graunke
c57b4c86c0 iris: Use render_batch/compute_batch locals in memory_barrier
We have them, may as well use them.
2019-06-20 10:04:38 -05:00
Lionel Landwerlin
4a61be24fe anv: only resort to sync fds internally with no syncobj support
We can rely on only one kind of synchronization object (drm-syncobj)
when it is available. This reduces the number of file descriptors we
use in our implementation.

This will be required later for timeline semaphores implementation, at
this point we won't ever want to use anything else but syncobjs.

v2: Only use has_syncobj for semaphores (Jason)

v3: Only has_syncobj in assert on semaphores in QueueSubmit (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-20 14:59:51 +00:00
Alyssa Rosenzweig
1d7e53a854 panfrost: Remove other commented pointers
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:05 -07:00
Alyssa Rosenzweig
2608da14b9 panfrost/decode: Elide more zero fields
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:05 -07:00
Alyssa Rosenzweig
cfc2218a8c panfrost/decode: Remove memory comments
These do more harm than good at this point.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig
8643b89c48 panfrost: Add missing 0x in invocation_count
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig
b6d46d09c2 panfrost/decode: Skip decode of fragment backend in non-fragment
This is all zero for anything but fragment shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig
ae2bfab7b7 panfrost/decode: Clip mali_compute_fbd at 64-bytes
Looking at internal evidence (later fields including a literal other
compute job inception-style, seeming memory corruption, no clear
function, and the field after this being a pointer to *itself*), it
looks like this is really a much smaller descriptor.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig
3faf33488a panfrost/decode: Print COMPUTE uniforms as pointers
In OpenGL, uniforms generally represent fp32 vec4s (at least in highp
mode). In OpenCL, they represent vec2s of 64-bit pointers.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig
0021fae7f8 panfrost/decode: Show int uniforms
Float is ambiguous.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig
1f7dfee1b4 panfrost/decode: Expand pointers in compute descriptor
Just as an aid.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig
0aa5d89acb panfrost/decode: Identify "compute FBD"
There is fundamentally not a framebuffer associated with a compute job.
Allocate a new structure for it so we don't mess up graphics when
decoding.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 07:48:04 -07:00
Tomeu Vizoso
4f881237c3 panfrost: Allocate panfrost_job in panfrost_context
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 15:48:35 +02:00
Tomeu Vizoso
b5db7cce60 panfrost: Release transient pools
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 15:48:35 +02:00
Tomeu Vizoso
6cec937e22 panfrost: ci: Exclude flip-flops from results
These tests are failing at times, blacklist for now:

dEQP-GLES2.functional.fbo.render.shared_colorbuffer_clear.tex2d_rgba
dEQP-GLES2.functional.fbo.render.shared_colorbuffer_clear.tex2d_rgb
dEQP-GLES2.functional.shaders.matrix.mul.dynamic_highp_mat4_vec4_vertex

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-20 15:48:15 +02:00
Alejandro Piñeiro
6a159bca9d util: add empty line before virgl options
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-06-20 15:21:39 +02:00
Alejandro Piñeiro
790c3dbac8 util: add missing DRI_CONF_OPT_END
When DRI_CONF_GLES_EMULATE_BGRA was added for the virgl driver, it
missed a DRI_CONF_OPT_END.

This make some drivers, like v4c/v3d to crash with the following
error:
Fatal error in __driConfigOptions line 99, column 2: mismatched tag.

Not sure why it doesn't fail with virgl.

Fixes: b793663449
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-20 14:11:30 +02:00
Eric Engestrom
a9e09d56a9 isl: tag unreachable path as such
GCC should be able to figure out that all the possible enum values are
exhausted in the switch() and all the branches return from the function,
but apparently it doesn't, so let's tell the compiler explicitly.

This gets rid of the following warnings in GCC 9:

    [1/24] Compiling C object 'src/intel/isl/60d23f8@@isl@sta/isl.c.o'.
    ../src/intel/isl/isl.c: In function ‘isl_surf_init_s’:
    ../src/intel/isl/isl.c:1569:10: warning: ‘array_pitch_el_rows’ may be used uninitialized in this function [-Wmaybe-uninitialized]
     1569 |    *surf = (struct isl_surf) {
          |    ~~~~~~^~~~~~~~~~~~~~~~~~~~~
     1570 |       .dim = info->dim,
          |       ~~~~~~~~~~~~~~~~~
     1571 |       .dim_layout = dim_layout,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~
     1572 |       .msaa_layout = msaa_layout,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~
     1573 |       .tiling = tiling,
          |       ~~~~~~~~~~~~~~~~~
     1574 |       .format = info->format,
          |       ~~~~~~~~~~~~~~~~~~~~~~~
     1575 |
          |
     1576 |       .levels = info->levels,
          |       ~~~~~~~~~~~~~~~~~~~~~~~
     1577 |       .samples = info->samples,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~
     1578 |
          |
     1579 |       .image_alignment_el = image_align_el,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     1580 |       .logical_level0_px = logical_level0_px,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     1581 |       .phys_level0_sa = phys_level0_sa,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     1582 |
          |
     1583 |       .size_B = size_B,
          |       ~~~~~~~~~~~~~~~~~
     1584 |       .alignment_B = base_alignment_B,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     1585 |       .row_pitch_B = row_pitch_B,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~
     1586 |       .array_pitch_el_rows = array_pitch_el_rows,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     1587 |       .array_pitch_span = array_pitch_span,
          |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     1588 |
          |
     1589 |       .usage = info->usage,
          |       ~~~~~~~~~~~~~~~~~~~~~
     1590 |    };
          |    ~
    ../src/intel/isl/isl.c:1488:24: warning: ‘*((void *)&phys_total_el+4)’ may be used uninitialized in this function [-Wmaybe-uninitialized]
     1488 |    struct isl_extent2d phys_total_el;
          |                        ^~~~~~~~~~~~~
    ../src/intel/isl/isl.c:1335:38: warning: ‘phys_total_el’ may be used uninitialized in this function [-Wmaybe-uninitialized]
     1335 |       isl_align_div(phys_total_el->w * tile_el_scale,
          |                     ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
    ../src/intel/isl/isl.c:1488:24: note: ‘phys_total_el’ was declared here
     1488 |    struct isl_extent2d phys_total_el;
          |                        ^~~~~~~~~~~~~

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-20 12:05:14 +00:00
Samuel Pitoiset
f179febde0 radv: enable DCC for mipmapped color textures on GFX8
It's tricky on GFX9, so only GFX8 for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-20 11:04:02 +02:00
Samuel Pitoiset
17f94e1984 radv: do not fast clears if one level can't be fast cleared
And fallback to slow color clears.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-20 11:03:58 +02:00
Samuel Pitoiset
450bce522a radv: add fast clears support for mipmapped color images with DCC
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-20 11:03:57 +02:00
Samuel Pitoiset
fa903ba799 radv: add radv_dcc_clear_level() helper
For clearing only one level.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-20 11:03:53 +02:00
Samuel Pitoiset
b92d87f7f0 radv: re-initialize DCC metadata after decompressing using compute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-20 11:03:52 +02:00
Samuel Pitoiset
dc6e3053a7 radv: initialize levels without DCC during layout transitions
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-20 11:03:49 +02:00
Thomas Hellstrom
71b43490dd svga: Support ARB_buffer_storage
This basically boils down to supporting persistent and coherent buffer
storage.
We chose to use coherent buffer storage for all persistent buffers
even if it's not explicitly specified, since using glMemoryBarrier to
obtain coherency would be particularly expensive in our driver stack,
and require a lot of additional bookkeeping.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-06-20 09:30:22 +02:00
Thomas Hellstrom
8c01e5ed5f gallium/util: Make it possible to disable persistent maps in the upload manager
For svga, the use of persistent / coherent maps is typically slightly
slower than without them. It's probably a bit case-dependent and
possible to tune, but for now, make sure we can disable those.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-06-20 09:30:22 +02:00
Thomas Hellstrom
3b828c4e68 svga: Map vertex- index- and constant buffers ansynchronously when reading
With SWTNL and index translation we're mapping buffers for reading. These
buffers are commonly upload_mgr buffers that might already be referenced
by another submitted or unsubmitted GPU command. A synchronous map will
then trigger a flush and sync, at least on Linux that doesn't distinguish
between read- and write referencing. So map these buffers async. If they
for some obscure reason happen to be dirty (stream-output, buffer-copy),
the resource_buffer code will read-back and sync anyway. For persistent /
coherent buffers a corresponding read-back and sync will happen in the
kernel fault handler.

Testing: Piglit quick. No regressions.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-06-20 09:30:22 +02:00
Thomas Hellstrom
f51915ba62 svga: Fix index buffer uploads
In the case of SWTNL and index translation we were uploading index buffers
and then reading out from them using the CPU. Furthermore, when translating
indices we often cached the results with an upload_mgr buffer, causing the
cached indexes to be immediately discarded on the next write to that
upload_mgr buffer.

Fix this by only uploading when we know the index buffer is going to be
used by hardware. If translating, only cache translated indices if the
original buffer was not a user buffer. In the latter case when we're not
caching, use an upload_mgr buffer for the hardware indices.

This means we can also remove the SWTNL hand-crafted index buffer upload
mechanism in favour of the upload_mgr.

Finally avoid using util_upload_index_buffer(). It wastes index buffer
space by trying to make sure that the offset of the indices in the
upload_mgr buffer is larger or equal to the position of the indices in
the source buffer. From what I can tell, the SVGA device does not
require that.

Testing done: Piglit quick. No regressions.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-06-20 09:30:22 +02:00
Thomas Hellstrom
4f59d51d82 winsys/svga: Make it possible to specify coherent resources
Add a flag in the surface cache key and a winsys usage flag to
specify coherent memory.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-06-20 09:30:22 +02:00
Thomas Hellstrom
4412be40dd gallium/util: Make u_debug_flush support persistent maps
Previously unsynchronized maps have been assumed to also be persistent,
Now destinguish between persistent and unsynchronized map and also support
PIPE_TRANSFER_PERSISTENT from ARB_buffer_storage.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-06-20 09:30:22 +02:00
Gert Wollny
a478e56fbd virgl: Add debug flag to bypass driconf to enable the BGRA tweaks
This useful for testing, also because with vtest the dri configuration
is not read.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:38 +02:00
Gert Wollny
5dbecf7863 virgl: Add a tweak to set the value for emulated queries of GL_SAMPLES_PASSED
On GLES hosts GL_SAMPLES_PASSED is emulated by GL_ANY_SAMPLES_PASSED which returns a boolen.
With this tweak the value that is returned if any sample passed can be set. This
may be of iterest when an application decides whether some geometry is rendered based
on an amount of visibility and not just a binary desicion. virgelrenderer sets a default
of 1024 on th host.

v2: Remove reference from virgl and correct description (Emil)
v3: Send the tweak binary encoded instead of using strings (Gurchetan)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:38 +02:00
Gert Wollny
59757dbad6 virgl: Add tweak to apply a swizzle when drawing/blitting to a emulated BGRA texture
With Qemu this final swizzle is not needed, but with vtest it is, i.e. it depends on
how a program using virglrenderer uses the surface that is rendered to, hence
a tweak is added.

v2: Update description and fix spelling (Emil)
v3: Send tweak as binary value instead of using strings (Gurchetan)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:38 +02:00
Gert Wollny
b793663449 virgl: Add driconf tweak for emulating BGRA surfaces on GLES
These tweaks are used to fix rendering issues with Valve games and
at least also "The Raven Remastered" when run on a GLES host.

v2: Fix type in define and remove virgl from driconf option (Emil)
v3: Encode tweak binary instead of using strings (Gurchetan)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:38 +02:00
Gert Wollny
13d4a34c44 virgl: Add override for BGRA format to use swizzled SRGB format
Tie in the check whether the host supports tweaks and whether this tweak
is enabled.

v2: Add comment about the emulated formats not being used directly in the
    guest (Gurchetan)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:38 +02:00
Gert Wollny
22edafb239 virgl: Add code to accept BGRx_SRGB as RGBx_SRGB
This will be enabled in later patches by the emulation tweak.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:38 +02:00
Gert Wollny
d8967b7951 virgl: Add skeleton to evaluate cap and send tweaks
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:38 +02:00
Gert Wollny
28dc096e15 virgl: factor out format host bits check
This will make it a single location when we want to replace a format.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:38 +02:00
Gert Wollny
30eb1fdc51 gallium/virgl: Add code path for virgl to read driconf
This works only for the drm variant of virgl and not for the vtest
variant.

v2: Rebase, replace the configuration query function by a pointer to
    the configuration data.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:38 +02:00
Gert Wollny
cf800998af virgl: Add driinfo file and tie it into the build
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-06-20 08:50:37 +02:00
Caio Marcelo de Oliveira Filho
9b0720c436 glspirv: Call pass to lower frexp instructions
These were previously handled by the spirv_to_nir, but that changed to
be an explict pass in 23d30f4099 "spirv,nir: lower
frexp_exp/frexp_sig inside a new NIR pass"

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-19 22:07:57 -07:00
Caio Marcelo de Oliveira Filho
12131096fa spirv: Restrict use of descriptor intrinsics to Vulkan
In ARB_gl_spirv we'll be able to use variables for uniform buffers, so
don't use the descriptor intrinsics to lower the block access.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-19 22:07:51 -07:00
Nicolai Hähnle
21dd881416 ac/rtld: report better error messages for LDS overallocation
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-19 20:30:32 -04:00
Marek Olšák
b64bd5887e ac/rtld: check correct LDS max size
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-19 20:30:32 -04:00
Nicolai Hähnle
1ee0f0d315 radeonsi: add s_sethalt to shaders for debugging
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-19 20:30:32 -04:00
Nicolai Hähnle
87182200c7 ac/rtld: fix sorting of LDS symbols by alignment
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-06-19 20:30:32 -04:00
Bas Nieuwenhuizen
d1c04835ab meson: Allow building radeonsi with just the android platform.
Just as was allowed by autotools.

Fixes: 108d257a16 "meson: build libEGL"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-19 23:42:49 +00:00
Bas Nieuwenhuizen
755c633b8d anv: Fix vulkan build in meson.
Apparently the android part was never ported to meson.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-19 23:27:46 +00:00
Bas Nieuwenhuizen
4c300bd328 radv: Fix vulkan build in meson.
Apparently the android part was never ported to meson.

CC: <mesa-stable@lists.freedesktop.org>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-19 23:27:46 +00:00
Jason Ekstrand
ef323d02d8 anv/image: Set different usage flags for shadow surfaces
For the block BLOCK_TEXEL_VIEW_COMPATIBLE case, this didn't matter
because the flags were already more-or-less what we wanted.  However,
for gen7 stencil shadow images, it still had ISL_SURF_USAGE_STENCIL_BIT
so we were getting W-tiled which isn't what we want for the shadow.  By
passing just ISL_SURF_USAGE_TEXTURE_BIT (and CUBE if we care), we now
get something that's actually texturable.

Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
2019-06-19 22:21:46 +00:00
Jason Ekstrand
215f9f83f5 anv: Flush caches in anv_image_copy_to_shadow
Copies to a shadow image happen during a VkCmdPipelineBarrier or at
subpass transitions.  We could potentially be a bit more conservative
but these transitions shouldn't happen often and it's better to have our
bases covered.

Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
2019-06-19 22:21:46 +00:00
Jason Ekstrand
81e51b412e nir: Make nir_constant a vector rather than a matrix
Most places in NIR, we treat matrices like arrays.  The one annoying
exception to this has been nir_constant where a matrix is a first-class
thing.  This commit changes that so a matrix nir_constant is the same as
an array nir_constant.  This makes matrix nir_constants a tiny bit more
expensive but shrinks all others by 96B.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 21:05:54 +00:00
Jason Ekstrand
b019fe8a5b glsl/nir: Fix handling of 64-bit values in uniform storage
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 21:05:54 +00:00
Jason Ekstrand
a54e397152 spirv: Only copy needed components for OpSpecConstantOp
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 21:05:54 +00:00
Jason Ekstrand
96bb9c9277 spirv: Use a single path for OpSpecConstantOp of OpVectorShuffle
Now that nir_const_value is a scalar, there's no reason why we need
multiple paths here and it's just extra paths to keep working.  While
we're here, we also add a vtn_fail_if check that component indices are
in-bounds.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 21:05:54 +00:00
Jason Ekstrand
280e5442e5 spirv: Use vtn_constan_uint() for array lengths and gather components
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 21:05:54 +00:00
Jason Ekstrand
aa11c2e75e spirv: Add a vtn_constant_int helper
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 21:05:54 +00:00
Jason Ekstrand
93f4aa9889 glsl/types: Add a real is_integer helper
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 20:28:52 +00:00
Jason Ekstrand
f0920e266c glsl/types: Rename is_integer to is_integer_32
It only accepts 32-bit integers so it should have a more descriptive
name.  This patch should not be a functional change.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 20:28:52 +00:00
Jason Ekstrand
21a7e6d569 glsl/types: Ignore bit sizes in contains_integer()
All of the callers for this function are looking at interpolation
qualifiers and want to make sure they're declared flat.  Any 64-bit
integer inputs need to be flat.  It's also makes the function make more
sense since "integer" is fairly generic.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 20:28:52 +00:00
Jason Ekstrand
0d1fb380b1 glsl/types: Handle all bit sizes in glsl_type_is_integer
All of the callers of this function really just want to know if the type
is an integer and don't care about bit size.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-19 20:28:52 +00:00
Caio Marcelo de Oliveira Filho
feb0cdcb52 glsl/nir_opt_access: Update uniforms correctly when only vars change
Even if only variables access flags are changed, the existing NIR
infrastructure expects metadata to be explicitly preserved, so do
that.  Don't care about avoiding preserve to be called twice since the
cost is negligible.

This scenario can be triggered by dead variables, and also by other
intrinsics that read the variables -- but not cause progress to be
made when processing the intrinsics.

Fixes: f2d0e48ddc "glsl/nir: Add optimization pass for access flags"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-19 12:50:41 -07:00
Caio Marcelo de Oliveira Filho
d7ea433a5f glsl/nir: Fix getting the sampler dim when arrays are involved
Unwrap any array in the variable type so we can get the sampler dim.

This fixes piglit test
spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore-const-uniform-index.shader_test.

Fixes: f2d0e48ddc "glsl/nir: Add optimization pass for access flags"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-19 12:50:39 -07:00
Jory Pratt
10e8d46601 meson: Search for execinfo.h
Rather than checking __GLIBC__/__UCLIBC__ macros as a proxy for
execinfo.h presence, just check directly. This allows the build to work
on musl.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-19 12:16:18 -07:00
Jory Pratt
fd7b7f14d8 util: Heap-allocate 256K zlib buffer
The disk cache code tries to allocate a 256 Kbyte buffer on the stack.
Since musl only gives 80 Kbyte of stack space per thread, this causes a
trap.

See https://wiki.musl-libc.org/functional-differences-from-glibc.html#Thread-stack-size

(In musl-1.1.21 the default stack size has increased to 128K)

[mattst88]: Original author unknown, but I think this is small enough
            that it is not copyrightable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-19 12:16:18 -07:00
Kenneth Graunke
9c19d07b1c anv: Fix wrong printf formatter
%lu is for unsigned long, %zu is for size_t.  Just cast the data.
2019-06-19 11:57:01 -05:00
Kenneth Graunke
bbbf7a538c iris: Bail on queries for INTEL_NO_HW=1.
We don't execute any of the commands to record snapshots, so we can't
actually produce a real result.  We do however need to avoid waiting
on a syncpt which will never be signalled.  So, just return 0.
2019-06-19 11:55:43 -05:00
David Riley
11e74daae5 virgl: Support VIRGL_BIND_SHARED
Support a new virgl bind type for shared buffers.

Signed-off-by: David Riley <davidriley@chormium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-06-19 07:28:47 -07:00
Lionel Landwerlin
bc62673dce anv: write spirv-nir logs back to the application
Using the existing VK_EXT_debug_report extension.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-19 15:45:52 +03:00
Connor Abbott
53a7649e5d ac/nir: Set speculatable for buffer loads where allowed
This brings the nir path in line with the TGSI path.

Totals from affected shaders:
SGPRS: 2984 -> 2984 (0.00 %)
VGPRS: 2792 -> 2652 (-5.01 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 247380 -> 248072 (0.28 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 121 -> 132 (9.09 %)
Wait states: 0 -> 0 (0.00 %)

Most of the change came from DiRT: Showdown, and came from sinking SSBO
loads.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:28 +02:00
Connor Abbott
77be5b2f88 nir: Use reorderable access flag
No changes with radeonsi shader-db.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:28 +02:00
Connor Abbott
a1c737927c nir: Add a helper to determine if an intrinsic can be reordered
This is simple now, but we're going to be adding a few more conditions
to this later.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:28 +02:00
Connor Abbott
6fc83c253f st/nir: Use gl_nir_opt_access
Nothing uses its results yet, that will come with the following commits.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:28 +02:00
Connor Abbott
f2d0e48ddc glsl/nir: Add optimization pass for access flags
Right now, this just deduces when we can arbitrarily reorder SSBO and
image loads, matching the existing logic in radeonsi's TGSI->LLVM pass.
This approach can't handle some things that nir_opt_copy_prop_vars can,
but it can handle images, and with GCM it lets us hoist reads outside of
loops. We can also pass this information to LLVM which lets it do its
own optimizations on it.

This is GLSL only as I haven't tested it on Vulkan yet, and it would
probably need a few changes to work there.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:28 +02:00
Connor Abbott
c813c5776d nir: Add reorderable memory access enum
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:28 +02:00
Connor Abbott
75063fbac5 nir/copy_prop_vars: Ignore volatile accesses
The spec explicitly says that volatile writes can't be removed and
volatile reads do not guarantee that the same value will still be around
after the read, as if there were a barrier after each read/write. Just
ignore them.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:28 +02:00
Connor Abbott
364996d70d glsl/nir: Propagate access qualifiers
We were completely ignoring these before, except for putting them on
variables. While we're here, don't set access qualifiers when converting
to bindless since glsl_to_nir will already have set a more accurate
qualifier that includes any qualifiers on struct members that are
dereferenced.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:27 +02:00
Connor Abbott
6f20643b47 nir: Allow qualifiers on copy_deref and image instructions
In the next commit, we'll properly handle access qualifiers on struct
members by propagating them to load/store instructions, but these
instructions had no way to specify the qualifier.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:27 +02:00
Connor Abbott
3bf8981c51 ac,radeonsi: Always mark buffer stores as inaccessiblememonly
inaccessiblememonly means that it doesn't modify memory accesible via
normal LLVM pointers. This lets LLVM's dead store elimination, memcpy
forwarding, etc. ignore functions with this attribute. We don't
represent descriptors as pointers, so this property is always true of
buffer and image stores. There are plans to represent descriptors via
pointers, but this just means that now nothing is inaccessiblememonly,
as LLVM will then understand loads/stores via its usual alias analysis.

Radeonsi was mistakenly only setting it if the driver could prove that
there were no reads, and then it was cargo-culted into ac_llvm_build
and ac_llvm_to_nir. Rip it out of everything.

statistics with nir enabled:

Totals from affected shaders:
SGPRS: 152 -> 152 (0.00 %)
VGPRS: 128 -> 132 (3.12 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 9324 -> 9244 (-0.86 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 17 -> 17 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

The only difference was a manhattan31 shader.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-19 14:08:27 +02:00
Eric Engestrom
4db2c1e2fe egl: add missing #include
close() is in <unistd.h>

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-19 12:05:58 +00:00
Samuel Pitoiset
0a313cc285 radv: disable viewport clamping even if FS doesn't write Z
This fixes new CTS dEQP-VK.pipeline.depth_range_unrestricted.*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-19 11:18:50 +02:00
Samuel Pitoiset
e91c1ea06c radv: implement compressed FMASK texture reads with RADV_PERFTEST=tccompatcmask
This allows us to disable the FMASK decompress pass when
transitioning from CB writes to shader reads.

This will likely be improved and enabled by default in the future.

No CTS regressions on GFX8 but a few number of multisample CTS
failures on GFX9 (they look related to the small hint).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-19 10:06:39 +02:00
Samuel Pitoiset
a7f75377ab radv: fix FMASK expand with SRGB formats
Found while working on DCC for MSAA.

Fixes: 6b976024a8 ("radv: add support for FMASK expand")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-19 07:53:53 +02:00
Tomeu Vizoso
0fcf73bc2d panfrost: Move to use ralloc for some allocations
We have some serious leaks, so plug some and also move to ralloc to
limit the lifetime of some objects to that of their parent.

Lots more such work to do.

For some reason, this fixes:

dEQP-GLES2.functional.lifetime.attach.deleted_output.texture_framebuffer

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-19 07:34:15 +02:00
Mathias Fröhlich
5743a36b2b egl: Don't add hardware device if there is no render node v2.
Do not offer a hardware drm backed egl device if no render node
is available. The current implementation will fail on this
egl device. On top it issues a warning that is actually missleading.
There are finally more error paths that can fail on the way to a
hardware backed egl device. Fixing all of them would kind of require
opening the drm device and see if there is a usable driver associated
with the device. The taken approach avoids a full probe and fixes at
least this kind of problem on kvm virtualization hosts I observe here.

Fixes: dbb4457d98 ("egl: add EGL_EXT_device_drm support")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-06-19 07:17:23 +02:00
Christian Gmeiner
8dd26fa2f0 etnaviv: support GL_ARB_seamless_cubemap_per_texture
Passes spec@amd_seamless_cubemap_per_texture@amd_seamless_cubemap_per_texture

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-By: Guido Günther <agx@sigxcpu.org>
2019-06-19 00:39:50 +02:00
Christian Gmeiner
a13efb3cdb etnaviv: update headers from rnndb
Update to etna_viv commit a3bf0da.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-19 00:39:50 +02:00
Dave Airlie
378ea92bf6 radeonsi: fix undefined shift in macro definition
Pointed out by coverity

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-19 08:32:36 +10:00
Dave Airlie
93ba356544 nouveau: fix frees in unsupported IR error paths.
This is pointless in that we won't ever hit those paths in real life,
but coverity complains.

Fixes: f014ae3c7c ("nouveau: add support for nir")
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-06-19 08:32:19 +10:00
Rohan Garg
ad284f794c panfrost: Move clearing logic into pan_job
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 12:32:43 -07:00
Chia-I Wu
98eda99ab8 virgl: fix sync issue regarding discard/unsync transfers
GL_MAP_INVALIDATE_BUFFER_BIT cannot be treated as
GL_MAP_INVALIDATE_RANGE_BIT naively.  When we run into

  ptr = glMapBufferRange(buf, 0, size,
          GL_WRITE_BIT|GL_MAP_INVALIDATE_BUFFER_BIT);
  memcpy(ptr, data1, size);
  glUnmapBuffer(buf);
  ptr = glMapBufferRange(buf, size, size,
          GL_WRITE_BIT|GL_MAP_UNSYNCHRONIZED_BIT);
  memcpy(ptr, data2, size);
  glUnmapBuffer(buf);

we never want data1 to be copy_transfer'ed.  Because that would mean
that data2 might overwrite valid data.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis alexandros.frantzis@collabora.com
Fixes: a22c5df079 ("virgl: Use buffer copy transfers to avoid waiting when mapping")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-18 10:38:21 -07:00
Alyssa Rosenzweig
2a717f300b panfrost: Enable sRGB
Now that sRGB formats are supported for both rendering and sampling,
advertise support.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig
5aa51ba97f panfrost: Disable AFBC on sRGB buffers
The performance impact is slightly mitigated by tiling the render
target, but it's undeniably still slow compared to AFBC. Unfortunately,
it doesn't look like AFBC and sRGB play nice...

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig
6585bb9f52 panfrost: Enable sRGB fixed-function blending
For fixed-function, we have hardware to handle sRGB so we just set a
flag. For blend shaders, it's rather more involved; this is currently
unimplemented. Assert it out for now; we don't need it quite yet.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig
4b137da409 panfrost: Specify sRGB in the render target
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig
58c34e4a6c panfrost: Implement sRGB texturing
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig
31a4ef847c panfrost: Add sRGB render target flag
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig
01e1eecb95 panfrost: Implement tiled rendering
We already can sample from Mali's linear/tiled encoding (the one from
Utgard -- AFBC is mostly unrelated); let's be able to render to it as
well.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig
d50795109b panfrost: Decode rendering block type
A mode for rendering tiled/uncompressed was noticed, so we reshuffle the
MFBD render target definitions to explicitly include block type.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:59:28 -07:00
Alyssa Rosenzweig
83c02a5ea9 panfrost: Refactor texture targets
This combines the two cmdstream bits "is_3d" and "is_not_cubemap" into a
single 2-bit texture target selection, noticing it's the same as the
2-bit selection in Midgard and Bifrost texturing ops. Accordingly, we
share this definition and add the missing entry for 1D/buffer textures.

This requires a nontrivial (but functionally similar) refactor of all
parts of the driver to use the new definitions appropriately.
Theoretically, this should add support for buffer textures, but that's
obviously not tested and probably wouldn't work.

While doing so, we notice the sRGB enable bit, which we document and
decode as well here so we don't forget about it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:59:28 -07:00
Rohan Garg
bfca21b622 panfrost: Figure out job requirements in pan_job.c
Requirements for a job should be figured out in pan_job.c

v2: [Alyssa] Fix early return

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:52:20 -07:00
Rohan Garg
debb85d1ec panfrost: Reset job counters once the job is submitted
Move the reset out of frame invalidation into job submission

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:52:20 -07:00
Rohan Garg
0f43a2ae8a panfrost: Initial implementation of panfrost_job_submit
Start fleshing out panfrost_job

v2: [Alyssa: Remove unused variable, warning introduced]

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 09:52:01 -07:00
Gurchetan Singh
2daf3d8215 virgl_hw: add YUV support
Add corresponding entries from p_format.h

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-18 09:18:58 -07:00
Gurchetan Singh
2480ce802a virgl: sync to virglrenderer virgl_hw.h
It's nice to keep these two files in sync, as they define
guest userspace <---> host userspace communcation.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-18 09:18:48 -07:00
Jason Ekstrand
58cb865313 anv: Make border colors the right size and alignment on HSW
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-18 16:07:08 +00:00
Lionel Landwerlin
51076eb87c imgui: bump imgui memory editor copy
Getting rid of a compiler warning :

In file included from ../src/intel/tools/aubinator_viewer.cpp:225:
../src/imgui/imgui_memory_editor.h: In member function ‘void MemoryEditor::DisplayPreviewData(size_t, const u8*, size_t, MemoryEditor::DataType, MemoryEditor::DataFormat, char*, size_t) const’:
../src/imgui/imgui_memory_editor.h:637:16: warning: enumeration value ‘DataType_COUNT’ not handled in switch [-Wswitch]
         switch (data_type)
                ^

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-18 15:34:13 +00:00
Alyssa Rosenzweig
9402970751 panfrost/midgard: Enable autovectorization
Enable nir_opt_vectorize.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 06:44:13 -07:00
Connor Abbott
47e7c6961a nir: add a vectorization pass
This effectively does the opposite of nir_lower_alus_to_scalar, trying
to combine per-component ALU operations with the same sources but
different swizzles into one larger ALU operation. It uses a similar
model as CSE, where we do a depth-first approach and keep around a hash
set of instructions to be combined, but there are a few major
differences:

1. For now, we only support entirely per-component ALU operations.
2. Since it's not always guaranteed that we'll be able to combine
equivalent instructions, we keep a stack of equivalent instructions
around, trying to combine new instructions with instructions on the
stack.

The pass isn't comprehensive by far; it can't handle operations where
some of the sources are per-component and others aren't, and it can't
handle phi nodes. But it should handle the more common cases, and it
should be reasonably efficient.

[Alyssa: Rebase on latest master, updating with respect to typeless
moves]

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-18 06:43:30 -07:00
Boris Brezillon
c3558868da panfrost: Add support for TXS instructions
This patch adds support for nir_texop_txs instructions which are needed
to support the OpenGL textureSize() function. This is also needed to
support RECT texture sampling which is currently lowered to 2D sampling +
a TXS() instruction by the nir_lower_tex() helper.

Changes in v2:
* Split options for the 1st and 2nd tex lowering passes

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 06:36:07 -07:00
Boris Brezillon
5c17f84ae2 panfrost: Prepare things to support non-native texture ops
We are about to add support for the TXS (texture size) op which is not
implemented using a midgard texture instruction. Let's rename emit_tex()
into emit_texop_native() and repurpose emit_tex() as a dispatcher.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 06:36:07 -07:00
Boris Brezillon
c57f7d0f15 panfrost: Move sysval upload logic out of panfrost_emit_for_draw()
We're about to add more sysval types, and panfrost_emit_for_draw()
is big enough, so let's move the sysval upload logic in a separate
function.

We also add one sub-function per sysval type to keep the
panfrost_upload_sysvals() small/readable.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 06:36:07 -07:00
Boris Brezillon
bd49c8f0eb panfrost: Make the sysval logic more generic
We are about to add support for nir_texop_txs which requires adding a
sysval/uniform containing the texture size. Let's change the
emit_sysval_read() prototype to take a nir_instr object instead of
a nir_intrinsic_instr one so we can re-use this function when emitting
a sysval for a txs instruction.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 06:36:07 -07:00
Boris Brezillon
296c5fd25d nir/lower_tex: Add a way to lower TXS(non-0-LOD) instructions
The V3D driver has an open-coded solution for this, and we need the
same thing for Panfrost, so let's add a generic way to lower TXS(LOD)
into max(TXS(0) >> LOD, 1).

Changes in v2:
* Use == 0 instead of !
* Rework the minification logic as suggested by Jason
* Assign cursor pos at the beginning of the function
* Patch the LOD just after retrieving the old value

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 06:36:07 -07:00
Boris Brezillon
0e489fd360 nir/lower_tex: Update ->sampler_dim value before calling get_texture_size()
get_texture_size() will create a txs instruction with ->sampler_dim set
to the original tex->sampler_dim. The condition to call lower_rect()
only checks the value of ->sampler_dim and whether lower_rect is
requested or not. This leads to an infinite loop when calling
nir_lower_tex() with the same options until it returns false.

In order to avoid that, let's move the tex->sampler_dim patching before
get_texture_size() is called. This way the txs instruction will have
->sampler_dim set to GLSL_SAMPLER_DIM_2D and nir_lower_tex() won't try
to lower it on the subsequent passes.

Changes in v2:
* Add Jason R-b
* Add a comment explaining why we patch ->sampler_dim at the beginning
  of the lower_rect() func

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 06:36:07 -07:00
Boris Brezillon
352b1d9c31 nir/lower_tex: Actually report when projector lowering happened
The code considers that projector lowering was done even if it's not
really the case. Change the project_src() prototype to return a bool
encoding whether projector lowering happened or not and update the
progress var accordingly in nir_lower_tex_block().

---
Changes in v2:
* Add Jason R-b
* Drop the part suggesting that nir_lower_rect() could be called in
  a do-while(progress) loop.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 06:36:07 -07:00
Tomeu Vizoso
6f60fec48f panfrost: Adapt to constant name change in UABI
We hadn't updated the kernel header after the driver got into mainline.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 15:26:08 +02:00
Tomeu Vizoso
5ad5777f89 panfrost: ci: Update results
Alyssa fixed some failing tests last night.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-18 15:25:01 +02:00
Samuel Pitoiset
c16bf48bfc radv: adjust the DCC base VA for mipmapped color attachments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-18 12:24:26 +02:00
Samuel Pitoiset
6ee40efd02 radv: fix color decompressions for FMASK/CMASK
Only skip levels without DCC when it's a DCC decompression.
Whoops.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-18 12:09:04 +02:00
Samuel Pitoiset
42a41a9e4a radv: do not decompress levels without DCC with the graphics path
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-18 11:24:50 +02:00
Samuel Pitoiset
e8917dcadb radv: do not decompress levels without DCC with the compute path
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-18 11:24:41 +02:00
Samuel Pitoiset
864ddda8a3 radv: check if DCC is enabled per mip not for the whole image
In other words, make use of radv_dcc_enabled() instead of
radv_image_has_dcc() all over the places.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-18 11:24:36 +02:00
Iago Toral Quiroga
79a30543ee v3d: implement simultaneous peripheral access exceptions for V3D 4.1+
Shader-db results:

total instructions in shared programs: 9117550 -> 9102719 (-0.16%)
instructions in affected programs: 1752873 -> 1738042 (-0.85%)
helped: 7076
HURT: 478
helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2
helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07%
HURT stats (abs)   min: 1 max: 7 x̄: 1.41 x̃: 1
HURT stats (rel)   min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54%
95% mean confidence interval for instructions value: -2.00 -1.92
95% mean confidence interval for instructions %-change: -1.58% -1.50%
Instructions are helped.

total max-temps in shared programs: 1327774 -> 1327728 (<.01%)
max-temps in affected programs: 1025 -> 979 (-4.49%)
helped: 47
HURT: 2
helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1
helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17%
95% mean confidence interval for max-temps value: -1.06 -0.82
95% mean confidence interval for max-temps %-change: -8.89% -5.49%
Max-temps are helped.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-18 08:09:03 +02:00
Iago Toral Quiroga
6d97c8fac1 v3d: only flush jobs accessing the query BO when reading query results
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-18 08:09:03 +02:00
Iago Toral Quiroga
5491883a9a v3d: add a helper function to flush jobs using a BO
v2: use _mesa_set_search() (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-18 08:09:03 +02:00
Kenneth Graunke
e8cd7a30d5 iris: Support more RGBX pipe formats.
Without them, the state tracker falls back to an RGBA format, but it
doesn't always manage to override the swizzle for us.  So we lose the
information that the API expects an X channel, where alpha is garbage
and reads back as 1.  We have no equivalent ISL RGBX format for these,
so we just use RGBA directly and override the swizzle in all cases.
2019-06-17 21:52:38 -05:00
Kenneth Graunke
3c10a2726b glsl: Fix out of bounds read in shader_cache_read_program_metadata
The VaryingNames array has NumVaryings entries.  But BufferStride is
a small array of MAX_FEEDBACK_BUFFERS (4) entries.  Programs with
more than 4 varyings would read out of bounds.

Also, BufferStride is set based on the shader itself, which means that
it's inherently already included in the hash, and doesn't need to be
included again.  At the point when shader_cache_read_program_metadata
is called, the linker hasn't even set those fields yet.  So, just drop
it entirely.

Fixes valgrind errors in KHR-GL45.transform_feedback.linking_errors_test.

Fixes: 6d830940f7 glsl/shader_cache: Allow shader cache usage with transform feedback

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-17 21:22:19 -05:00
Jason Ekstrand
9672b7044c anv: Set STATE_BASE_ADDRESS upper bounds on gen7
This should fix floating-point border color on all gen7 HW.  Integer is
still thoroughly busted on gen7 because it doesn't exist on IVB and it's
crazy on HSW.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-17 18:53:07 -05:00
Bas Nieuwenhuizen
925c04b4c7 radv: Disable linear tiled compressed textures.
Support got removed in the new addrlib update.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-18 01:00:49 +02:00
Jason Ekstrand
1be38f9178 anv:Use VK_EXT_separate_stencil_usage to avoid stencil shadows on gen7
Whenever stencil texturing is not required (most of the time), we can
use VK_EXT_separate_stencil_usage to only create the shadow image when
VK_IMAGE_USAGE_SAMPLED_BIT is required for stencil.  Of course, this
depends on applications to use the extension but hopefully DXVK and
similar translators are doing so and that covers most of the apps.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-17 22:32:26 +00:00
Jason Ekstrand
f3ea0cf828 anv: Add stencil texturing support for gen7
Intel hardware didn't get support for sampling from W-tiled (required
for stencil) images until Broadwell so we can't directly sample from
stencil.  Instead, if we want to support stencil texturing on gen7
hardware, we have to keep a texture-capable shadow copy around and use
BLORP to update when stencil changes.  The one thing this commit does
not implement is self-dependencies with stencil input attachments.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99493
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-17 22:32:26 +00:00
Jason Ekstrand
4faa3145b1 anv/blorp: Update shadow images when clearing or uploading
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-17 22:32:26 +00:00
Jason Ekstrand
2b736d9e6c anv/cmd_buffer: Add a stencil transition helper
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-17 22:32:26 +00:00
Jason Ekstrand
86fc268142 anv/blorp: Take an aspect in anv_image_copy_to_shadow
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-17 22:32:26 +00:00
Jason Ekstrand
fcbefe013a anv/formats: Re-arrange the way se set some flag bits
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-17 22:32:26 +00:00
Kenneth Graunke
659d4f613e iris: Make resource_copy_region handle packed depth-stencil resources.
Also copy along the separate stencil buffer if needed.

Fixes Piglit's arb_copy_image-formats.
2019-06-17 17:29:09 -05:00
Kenneth Graunke
a36f1542ae iris: Order CS stall and TC invalidate for format reinterpretation hacks
This should ensure the TC invalidate happens after the stall.

Fixes KHR-GL43.copy_image.functional which does a CopyImage (blorp_copy)
from a buffer (using R8G8B8A8_UINT), then GetTexImage to read back the
original image (using R10G10B10A2_UNORM).
2019-06-17 16:38:08 -05:00
Kenneth Graunke
94b9f50e63 iris: Be more aggressive at post-format-reintepret TC invalidate hack
When copying/blitting with format reinterpretation, we invalidate the
texture cache before/after.  Before is so the source of the copy works,
and after is to get rid of our new data in the "wrong" format to protect
future attempts to sample.

When I ported these hacks to iris, I tried to be cautious by only
bothering with the hacks if the batch referenced the BO.  This makes
some sense for the before case.  If it isn't referenced, the texture
cache can't really have any data for the BO (since it's also invalidated
between batches).  But we still need to do the after case regardless,
as we've just polluted the cache with hazardous entries.
2019-06-17 16:38:08 -05:00
Gert Wollny
2b87753a84 virgl: Assume sRGB write control for older guest kernels or virglrenderer hosts
When the host virglrenderer is an older version that doesn't check the sRGB write
control feature, or when the guest kernel doesn't support CAPS v2, then the guest
will only report support for GL 2.1 on a GL 3.3 host, even though it was supporting
3.3 with earlier guest mesa versions.

By also checking the host feature check version this regression can be avoided.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110921
Fixes: 2845939d6a
   virgl: Set sRGB write control CAP based on host capabilities

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-17 21:16:11 +00:00
Rob Clark
21c795ab07 freedreno/a6xx: disallow UBWC for x24s8
Fixes:
  dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_2d
  dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_2d
  dEQP-GLES31.functional.stencil_texturing.misc.compare_mode_effect

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-17 20:29:13 +00:00
Rob Clark
4e72abcd97 freedreno/a6xx: un-swap X24S8_UINT
The stencil is actually in the .w component, but we used to use SWAP to
remap the channels.  This doesn't work when tiled/ubwc.

Fixes:
  dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_2d_array
  dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_cube
  dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_2d_array
  dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_cube
  dEQP-GLES31.functional.stencil_texturing.misc.base_level
  dEQP-GLES31.functional.texture.border_clamp.formats.stencil_index8.nearest_size_pot
  dEQP-GLES31.functional.texture.border_clamp.formats.stencil_index8.nearest_size_npot
  dEQP-GLES31.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_pot
  dEQP-GLES31.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_npot
  dEQP-GLES31.functional.texture.border_clamp.sampler.uint_stencil

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-17 20:29:13 +00:00
Samuel Pitoiset
6e3aee4630 radv: add mipmaps support for DCC decompression on compute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 22:20:53 +02:00
Samuel Pitoiset
ebb1db96d5 radv: add mipmaps support for color decompressions (DCC/FMASK/CMASK)
And some cleanups.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 22:20:53 +02:00
Samuel Pitoiset
00f0e5c6fd radv: set the DCC/FCE predicates from the base level
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 22:20:53 +02:00
Samuel Pitoiset
7832e75ea8 radv: load the fast color clear values from the base level
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 22:20:53 +02:00
Samuel Pitoiset
7971697efe radv: store the DCC predicate for each mip
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 22:20:53 +02:00
Samuel Pitoiset
38aa386e96 radv: store the FCE predicate for each mip
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 22:20:53 +02:00
Samuel Pitoiset
7295512037 radv: store the fast color clear values for each mip
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 22:20:53 +02:00
Samuel Pitoiset
58506fec63 radv: allocate DCC metadata for each mip
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 22:20:53 +02:00
Caio Marcelo de Oliveira Filho
4b0bc664a5 gallium: Remove unused util_ringbuffer
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-06-17 13:02:44 -07:00
Caio Marcelo de Oliveira Filho
397d1a18ef llvmpipe: Don't use u_ringbuffer for lp_scene_queue
Inline the ring buffer and signal logic into lp_scene_queue instead of
using a u_ringbuffer.  The code ends up simpler since there's no need
to handle serializing data from / to packets.

This fixes a crash when compiling Mesa with LTO, that happened because
of util_ringbuffer_dequeue() was writing data after the "header
packet", as shown below

    struct scene_packet {
       struct util_packet header;
       struct lp_scene *scene;
    };

    /* Snippet of old lp_scene_deque(). */
    packet.scene = NULL;
    ret = util_ringbuffer_dequeue(queue->ring,
                                  &packet.header,
                                  sizeof packet / 4,
    return packet.scene;

but due to the way aliasing analysis work the compiler didn't
considered the "&packet->header" to alias with "packet->scene".  With
the aggressive inlining done by LTO, this would end up always
returning NULL instead of the content read by
util_ringbuffer_dequeue().

Issue found by Marco Simental and iThiago Macieira.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110884
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-06-17 13:02:44 -07:00
Alyssa Rosenzweig
390126e70a panfrost/midgard: Simplify 2D array logic
It shouldn't matter if we stick a z in for non-arrays, anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 12:52:51 -07:00
Alyssa Rosenzweig
a3ae3cb8e9 panfrost/midgard: Handle non-zero component in store
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 12:52:51 -07:00
Alyssa Rosenzweig
2c9e124f81 panfrost/midgard: Apply writemask to LUTs
Fixes LUT instructions with NIR registers.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 12:52:50 -07:00
Marek Olšák
eba932ea43 amd: update addrlib
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 15:14:55 -04:00
Nicolai Hähnle
d15cc1f55a radeonsi: reduce MAX_GEOMETRY_OUTPUT_VERTICES
This fixes piglit spec@glsl-1.50@gs-max-output on gfx9.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 15:14:51 -04:00
Alyssa Rosenzweig
aef01dd2e5 panfrost: Cleanup default blend mode
Just encode the Mali magic number for `replace` rather than awkwardly
forcing Gallium structures through.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 10:45:52 -07:00
Alyssa Rosenzweig
fbbb29aa5b panfrost: Don't accidentally include blend shader
Some residual dirty state can leak through across frames; zero this out.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 10:45:52 -07:00
Alyssa Rosenzweig
565c446dab panfrost/midgard: Use typeless moves internally
We switch all fmov to (i)mov, following the NIR switch. This simplifies
some code surrounding blend shaders and should have no functional
changes elsewhere.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 10:45:52 -07:00
Chia-I Wu
1fece5fa5f virgl: better support for PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE
When the resource to be mapped is busy and the backing storage can
be discarded, reallocate the backing storage to avoid waiting.

In this new path, we allocate a new buffer, emit a state change,
write, and add the transfer to the queue .  In the
PIPE_TRANSFER_DISCARD_RANGE path, we suballocate a staging buffer,
write, and emit a copy_transfer (which may allocate, memcpy, and
blit internally).  The win might not always be clear.  But another
win comes from that the new path clears res->valid_buffer_range and
does not clear res->clean_mask.  This makes it much more preferable
in scenarios such as

  access = enough_space ? GL_MAP_UNSYNCHRONIZED_BIT :
                          GL_MAP_INVALIDATE_BUFFER_BIT;
  glMapBufferRange(..., GL_MAP_WRITE_BIT | access);
  memcpy(...); // append new data
  glUnmapBuffer(...);

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-17 09:36:31 -07:00
Chia-I Wu
9975a0a84c virgl: add virgl_rebind_resource
We are going support reallocating the HW resource for a
virgl_resource.  When that happens, the virgl_resource needs to be
rebound to the context.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-17 09:36:31 -07:00
Chia-I Wu
7e0508d9aa virgl: save virgl_hw_res in virgl_transfer
When PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE is properly supported,
virgl_transfer might refer to a different virgl_hw_res than
virgl_resource does.  We need to save the virgl_hw_res and use the
saved one.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-17 09:36:31 -07:00
Chia-I Wu
ad1ef35dc1 virgl: add resource_reference to virgl_winsys
It works similar to pipe_resource_reference but is for virgl_hw_res.
It can also replace resource_unref.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-17 09:36:31 -07:00
Alyssa Rosenzweig
73bf669e3f panfrost/midgard: Add rounding mode specific opcodes
This adds a set of opcodes for performing moves and type conversions
with respect to particular rounding modes, required for OpenCL.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 09:32:31 -07:00
Alyssa Rosenzweig
9865b79a88 panfrost: Drop draws with complete scissor
The hardware support for scissoring requires minimally 1 pixel to be
drawn. If the scissor culls *everything*, we need to drop the draw
entirely early on.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 09:29:09 -07:00
Alyssa Rosenzweig
3a9b7692f1 panfrost: Disable pipelining temporarily
Pipelined rendering is important for performance but is not working
right these days. Disable it for correctness until the panfrost_job
refactor is enabled and we can do it right.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 09:25:52 -07:00
Alyssa Rosenzweig
d4aed00214 panfrost/mfbd: Handle rendering to linear mipmap
In anticipation of more general mipmapping support, we implemented
support for rendering to linear mipmaps (a very simple case).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:42:54 -07:00
Alyssa Rosenzweig
531715431f panfrost: Implement sampling from non-zero initial levels
In preparation for more complex mipmap operations. glGenerateMipmap() in
particular, as implemented by u_blitter, requires reading from non-zero
initial mip levels.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:42:54 -07:00
Alyssa Rosenzweig
a5f5b0640c panfrost: Resource management for linear 2D texture arrays
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:36:15 -07:00
Alyssa Rosenzweig
dabfc71d36 panfrost/midgard: Adjust swizzles for 2D arrays
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:36:14 -07:00
Alyssa Rosenzweig
67a34acd00 panfrost: Set array_size to permit array textures
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:36:14 -07:00
Alyssa Rosenzweig
bdf169abb3 panfrost: Decode array textures
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:36:14 -07:00
Alyssa Rosenzweig
0ae6bbe8a9 panfrost: Implement 3D texture resource management
Passes dEQP-GLES3.functional.texture.format.unsized.*3d*

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:36:14 -07:00
Alyssa Rosenzweig
36a7b2b018 panfrost: Specify 3D in texture descriptor
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:28:13 -07:00
Alyssa Rosenzweig
8429beef5e panfrost/midgard: Fix 3D texture masks/swizzles
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:28:13 -07:00
Alyssa Rosenzweig
56f9b47efd panfrost/midgard: Add swizzle_of/mask_of helpers
These make manipulating vectors in the Midgard compiler easier.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:28:13 -07:00
Alyssa Rosenzweig
8d1adc091b panfrost: Enable helper invocations when texturing
it turns out we have explicit control over helper invocations; if a
particular bit in the fragment shader descriptor is set, helper
invocations are launched; if it clear, they are not. Helper invocations
are required whenever computing derivatives, whether explicitly
(dFdx/dFdy) *or* implicitly (any texturing). Accordingly, we set this
bit when texturing to fix edge case behaviour (literally, haha).

Thank you to Jason Ekstrand and Ilia Mirkin for pointing out the
representative dEQP test failed along triangle edges and for suggesting
helper invocations / derivatives as a list of suspect pieces (which led
to discovering the helper invocations enable bit in the first place).

Ideally we would use the new NIR analysis pass for this, but that hasn't
landed quite yet.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 08:22:37 -07:00
Alyssa Rosenzweig
0219b99500 panfrost: Handle missing texture case
In some cases, Gallium can give us bad info about the texture count,
counting some NULL textures. We pass Gallium's info to the hardware
blindly, which can confuse the hardware in edge cases. This patch
adjusts accordingly.
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
443f9ae0ad panfrost: Remove forced flush on clears
This worked around a bug in oooold versions of Panfrost. Nowadays, its
presence is, at best, *creating* bugs. Let's wack it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
6460442049 panfrost: Flush scanout too
In a poorly coded app, the framebuffer can be partially drawn, an FBO
switched, switch back to the framebuffer and keep drawing, etc.
Reordering would fix this, but for now we need to just be careful about
flushing scanout too.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
fc3f57bd7f panfrost: Improve viewport (clipping) robustness
On more complex apps (possibly using desktop GL specific extensions?),
our viewport code was getting wacky results for unclear reasons. Let's
be a little less wacky.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
f9ecca2ff0 panfrost: Disable the tiler for clear-only jobs
To do so, we route some basic information through to the FBD creation
routines (currently just a binary toggle of "has draws?"). Eventually,
more refactoring will enable dynamic hierarchy mask selection, but right
now we do the most basic.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
ac68946d9d panfrost: Identify and decode mfbd_flags
Previously known as the unk3 field.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
12d4289bf9 panfrost: Stub out hierarchy mask selection
Quite a bit of refactoring in the main driver will be necessary to make
use of this effectively, so the implementation is incomplete.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
6434f5c494 panfrost: Rename misc_0 -> tiler_polygon_list
Just for readability.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
e2c2ccd5b8 panfrost: Sanity check tiler polygon list size
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
953cc4b540 panfrost: Compute and use polygon list body size
This is a bit of a hack, but it gets the point across.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
b660953733 panfrost: Use polygon list header size computation
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
edfba9bee2 panfrost: Calculate polygon list header size
As per the notes at the beginning of pan_tiler.c, we implement a routine
to calculate the size of the polygon list header given the framebuffer
dimensions and the provided hierarchy mask.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig
e88ff9ad85 panfrost: Add pan_tiler.h header
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:47:49 -07:00
Alyssa Rosenzweig
21eb411d2f panfrost: Document tile size heuristic
I'm not sure how the blob does it, but this seems to be a dead simple
test and roughly corresponds to what I've noticed from the blob, so
maybe it's good enough.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:47:49 -07:00
Alyssa Rosenzweig
7f26bb3553 panfrost: Rename tiler fields per tiler research
Following the research into Midgard's hierarchical tiling
infrastructure, we now understand (in broad stokes) the purpose of each
tiler field in the MFBD. Additionally, we understand more of the tiling
fields in the SFBD and in Bifrost's structures, although this knowledge
is still incomplete.

Update the names, decoder, and comments to reflect this new
understanding.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:47:49 -07:00
Alyssa Rosenzweig
8d6fb66e3a panfrost: Add notes about the tiler allocations
This explains how the polygon list is allocated, updating the headers
appropiately to sync the terminology.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:47:49 -07:00
Alyssa Rosenzweig
85e745f2b4 panfrost: Integrate kernel names for tiler FBD
These names are from the replay workaround in kbase; they begin to shine
some light on the meaning of these fields. In particular, we now
understand why the "tiler_meta" field has the effect it does on
performance in certain scenes (controlling tile granularity).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-17 07:47:49 -07:00
Bas Nieuwenhuizen
1a7caac9e9 radv: Add asserts that buffer descriptors are created with valid buffer formats.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-17 10:56:50 +00:00
Bas Nieuwenhuizen
4107590911 radv: Decompress DCC when the image format is not allowed for buffers.
Otherwise the buffer loads/stores in the bufimage meta operations fail.

If we decompress DCC then we can use the "canonical" format compatible
with the not-supported format.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-17 10:56:50 +00:00
Samuel Pitoiset
e9875fc0b6 radv: make sure to init the DCC decompress compute path state
This fixes a segfault when forcing DCC decompressions on compute
because internal meta objects are not created since the on-demand
stuff.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 11:30:49 +02:00
Samuel Pitoiset
4c7ef1b02e ac: make ac_compute_cmask() a static function
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 11:30:47 +02:00
Samuel Pitoiset
cf77d3abf1 radv: rely on ac_compute_cmask() for CMASK info
Instead of re-computing in the driver. The 3d and cube flags
are correctly set, so the same values should returned by
ac_compute_surface().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 11:30:44 +02:00
Samuel Pitoiset
6880b42cfc radv: silent a compiler warning in radv_CmdPushDescriptorSetKHR()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-17 09:53:26 +02:00
Tomeu Vizoso
e655d63644 panfrost: ci: Speed things up a bit by skipping a git clone
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-17 09:17:53 +02:00
Tomeu Vizoso
f1efb0f254 panfrost: ci: Exclude all blend tests from results
As they randomly fail on T760.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-17 09:17:53 +02:00
Samuel Pitoiset
b5012a0518 ac: update llvm.amdgcn.icmp intrinsic name for LLVM 9+
LLVM r363339 changed llvm.amdgcn.icmp.i* to llvm.amdgcn.icmp.i64.i*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-17 08:58:33 +02:00
Erico Nunes
d72bbb2c89 lima: lower fmod in ppir and gpir
Since commit 4f3c82c72c fmod is no longer being lowered in nir, and
ends up crashing lima programs with "unsupported nir_op: fmod" in both
ppir and gpir.
There seems to be no mod operation in hardware in utgard and there is an
optimization in nir to lower fmod to instructions that lima already
implements, so let's use that.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-16 10:11:59 +00:00
Rob Clark
a417c323ad freedreno/a6xx: re-enable UBWC for depth/stencil
Now that we can blit depth/stencil in a way that plays nicely with UBWC,
re-enable it.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
2019-06-15 07:33:04 -07:00
Rob Clark
363a9ed614 freedreno/a6xx: handle z24s8/z24x8 blits with u_blitter
Now that it can turn these blits into rendering to RB6_Z24_UNORM_S8_UINT
it can properly handle cases where only one of depth+stencil is being
blit.  And this avoids lying about he format, which completely doesn't
work when UBWC is used.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
2019-06-15 07:33:04 -07:00
Rob Clark
a96ae18de6 freedreno/a6xx: handle fallback for rewritten blits ourself
For re-written z/s blits, we want to use the re-written `pipe_blit_info`
even if we have to fallback to 3d pipe (`u_blitter`).  So handle that
fallback ourself.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
2019-06-15 07:33:04 -07:00
Rob Clark
94c36a8554 freedreno/a6xx: rename variable
The name 'separate' doesn't make a while lot of sense, as only one of
the cases is the blit actually split.  But split out from previous patch
in an attempt to reduce the noise.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
2019-06-15 07:33:04 -07:00
Rob Clark
5fe7b627eb freedreno/a6xx: consolidate z/s blit handling
This will get even simpler with the next patch

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
2019-06-15 07:33:04 -07:00
Rob Clark
4c75d62ce8 gallium: add z24s8_as_r8g8b8a8 format
This maps to a special format that recent generations of adreno have,
for blitting z24s8.  Conceptually it is similar to doing Z and/or S
blits by pretending it is r8g8b8a8 (with appropriate writemask).  But
it differs when bandwidth compression is used, as z24 is a different
type from r8g8b8.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
2019-06-15 07:33:04 -07:00
Kenneth Graunke
1d75f52589 st/mesa: Respect GL_TEXTURE_SRGB_DECODE_EXT in GenerateMipmaps()
Apparently, we're supposed to look at the texture object's built-in
sampler object's sRGB decode setting in order to decide whether to
decode/downsample/re-encode, or simply downsample as-is.  Previously,
we had just respected the pipe_resource's format.

Fixes SKQP's Skia_Unit_Tests.SRGBMipMaps test.

(This ports commit 337a808062 from i965
to st/mesa for Gallium drivers.)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 20:13:46 +00:00
Erico Nunes
3ddea5e8c5 lima: fix dynarray usage in lima_submit_add_bo
Commit de8a919702 refactored dynarray usage and changed the size of the
allocation in lima_submit_add_bo.
That causes a segfault in programs running with lima.
This commit restores the allocation size back to the previous size.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-06-14 20:47:35 +02:00
Alyssa Rosenzweig
9ab8d31f32 panfrost: Fix variant selection
Fixes 1acffb ("panfrost: Unify...")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-14 10:35:07 -07:00
Marek Olšák
abe9a51d27 ac: add radeon_info::is_amdgpu instead of checking drm_major == 3
and clean up

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-14 13:31:18 -04:00
Mauro Rossi
bbbbea243a android: amd/common: fix missing include path
Fixes the following building error in Android:

In file included from external/mesa/src/amd/common/ac_llvm_helper.cpp:34:
In file included from external/mesa/src/amd/common/ac_llvm_build.h:30:
In file included from external/mesa/src/compiler/nir/nir.h:40:
In file included from external/mesa/src/compiler/nir_types.h:36:
external/mesa/src/compiler/glsl_types.h:37:10: fatal error: 'main/config.h' file not found
         ^~~~~~~~~~~~~~~
1 error generated.

Fixes: bd4c661 ("ac,ac/nir: use a better sync scope for shared atomics")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-14 18:36:10 +02:00
Mauro Rossi
51e24af8fd android: radv: fix necessary dependecies
Fixes building errors due to libmesa_util and libexpat dependencies:

In file included from external/mesa/src/amd/vulkan/radv_device.c:52:
external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found
         ^~~~~~~~~~~~~~~~~~~
1 error generated.

FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/vulkan.radv_intermediates/LINKED/vulkan.radv.so
...
external/mesa/src/util/xmlconfig.c:670: error: undefined reference to 'XML_ParserCreate'
...
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: 3c2e826 ("radv: Add support for driconf.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-14 18:35:10 +02:00
Alejandro Piñeiro
d317944c24 docs: document three NIR_ envvars
Initially I was only interested on documenting NIR_PRINT, as today I
needed to check the code to find this envvar, that at the moment I
vaguely remembered that existed.

As we are here, though, let's just document all of them (assuming that
makes sense).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 16:18:43 +02:00
Alexandros Frantzis
83829abe03 virgl: Return immediately when finding a compatible resource in the cache
When searching for resources in the cache, we previously released all
expired resources even after having found a compatible resource.

This commit changes this behavior to return immediately when finding a
compatible resource, so that the operation finishes more quickly.  This
moves more of the burden of releasing expired resources to cache
addition, which, since it happens at resource destruction time, it's
less time critical.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-14 12:59:51 +03:00
Alexandros Frantzis
801753d4b3 virgl: Use virgl_resource_cache in the vtest winsys
Replace the cache implementation in the vtest winsys with
virgl_resource_cache.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-14 12:59:49 +03:00
Alexandros Frantzis
13f70d3668 virgl: Use virgl_resource_cache in the drm winsys
Replace the cache implementation in the drm winsys with
virgl_resource_cache.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-14 12:59:43 +03:00
Alexandros Frantzis
b18f09a509 virgl: Introduce virgl_resource_cache
Introduce a resource cache implementation that can be used by any virgl
winsys backend.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-14 12:58:51 +03:00
Haihao Xiang
8ead5bebdb i965: support UYVY for external import only
It is similar with YUYV

Fixes: 165e704719 ("i965/i915: Add UYVY as the supported format")
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-14 15:45:56 +08:00
Neil Roberts
34d4b3e367 glsl: Set default precision on record members
Record types have their own slot to store the precision for each
member in glsl_struct_field. Previously if the member didn’t have an
explicit precision qualifier this was being left as
GLSL_PRECISION_NONE. This patch makes it take into account the type’s
default precision qualifier like it does for regular variables in
apply_type_qualifier_to_variable.

This has the additional benefit of correctly reporting an error when a
float type is used in a struct without declaring the default type.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 09:29:53 +02:00
Neil Roberts
235425771c glsl/linker: Make precision matching optional in intrastage_match
This function is confusingly also used to match interstage interfaces
as well as intrastage. In the interstage case it needs to avoid
comparing the precisions. This patch adds a parameter to specify
whether to take the precision into account or not so that it can be
used for both cases.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 09:29:53 +02:00
Neil Roberts
19b27a8569 glsl/linker: Don’t check precision for shader interface
On GLES, the interface between vertex and fragment shaders doesn’t
need to have matching precision.

Section 4.3.10 of the GLSL ES 3.00 spec:

“The type of vertex outputs and fragment inputs with the same name
 must match, otherwise the link command will fail. The precision does
 not need to match.”

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 09:29:53 +02:00
Neil Roberts
230d1e8d86 compiler/types: Making comparing record precision optional
On GLES, the interface between vertex and fragment shaders doesn’t
need to have matching precision. This adds an extra argument to
glsl_types::record_compare to disable the precision comparison. This
will later be used for the shader interface check.

In order to make this work this patch also adds a helper function to
recursively compare types while ignoring the precision.

v2: Call record_compare from within compare_no_precision to avoid
    duplicating code (Eric Anholt).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 09:29:53 +02:00
Lucas Stach
ab74699190 etnaviv: fix some pm query issues
The offsets to read the query results were off-by-one, which causes the
counters to report bogus increasing values.

Also the counter result is u32, so we need to initialize the query type
to reflect that.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-14 09:06:28 +02:00
Iago Toral Quiroga
360b832c58 v3d: do not setup execute flags for else block in uniform control flow
Either all channels executed the 'then' block, in which case all
channels will directly jump to the 'endif' block at the end of the
'then' block, or all channels execute the 'else' block (so no
execution masking is necessary).

Shader-db results:

total instructions in shared programs: 9119238 -> 9117550 (-0.02%)
instructions in affected programs: 401252 -> 399564 (-0.42%)
helped: 855
HURT: 77

total uniforms in shared programs: 3022622 -> 3022605 (<.01%)
uniforms in affected programs: 3566 -> 3549 (-0.48%)
helped: 17
HURT: 0

total max-temps in shared programs: 1327762 -> 1327774 (<.01%)
max-temps in affected programs: 619 -> 631 (1.94%)
helped: 2
HURT: 15

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 08:00:52 +02:00
Iago Toral Quiroga
2a2501247b nir: detect more dynamically uniform expressions
Shader-db results for v3d:

total instructions in shared programs: 9132728 -> 9119238 (-0.15%)
instructions in affected programs: 596886 -> 583396 (-2.26%)
helped: 1118
HURT: 224

total threads in shared programs: 234298 -> 234308 (<.01%)
threads in affected programs: 10 -> 20 (100.00%)
helped: 5
HURT: 0

total uniforms in shared programs: 3022949 -> 3022622 (-0.01%)
uniforms in affected programs: 29163 -> 28836 (-1.12%)
helped: 108
HURT: 37

total max-temps in shared programs: 1328030 -> 1327762 (-0.02%)
max-temps in affected programs: 10097 -> 9829 (-2.65%)
helped: 263
HURT: 15

total spills in shared programs: 3793 -> 3777 (-0.42%)
spills in affected programs: 432 -> 416 (-3.70%)
helped: 16
HURT: 0

total fills in shared programs: 4380 -> 4266 (-2.60%)
fills in affected programs: 828 -> 714 (-13.77%)
helped: 16
HURT: 0

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 08:00:52 +02:00
Tapani Pälli
287b58f827 ir3: initialize progress false before ir3_nir_lower_imul
Removes a compiler warning about uninitialized variable.

Fixes: c02ffd2700 "ir3: Use the new NIR lowering pass for integer multiplication"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rob Clark <robclark@gmail.com>
Reviewed-by: Eduardo Lima <elima@igalia.com>
2019-06-14 08:21:42 +03:00
Boris Brezillon
749c544b84 panfrost: Fix general purpose varying handling
When both the fragment and vertex shaders point to the same varying
location they expect to share the same varying slot.
Make sure vertex and fragment varyings pointing to the same loc have
->src_offset set to the same value.

[Alyssa: In addition a patch implement txs, this fixes GALLIUM_HUD on
Panfrost]

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-13 10:54:18 -07:00
Marek Olšák
7566a9a58a ac/registers: use better names for disambiguated definitions
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-13 13:52:06 -04:00
Marek Olšák
08ab9b70ce ac/registers: remove deprecated/inapplicable definitions
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-13 13:52:06 -04:00
Caio Marcelo de Oliveira Filho
5bd48ff252 iris: Enable INTEL_shader_atomic_float_minmax
Supported only for gen >= 9.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-13 09:03:58 -07:00
Caio Marcelo de Oliveira Filho
81835f87a4 gallium: Add PIPE_CAP_ATOMIC_FLOAT_MINMAX
Used to enable INTEL_shader_atomic_float_minmax.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-13 09:03:58 -07:00
Rob Clark
9f10e40cde freedreno/a6xx: fix MAX_INDICES
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-13 08:56:27 -07:00
Rob Clark
ce12ac8c2b freedreno/blitter: remove dead code
The src/dst format is overriden from the pipe_blit_info, so this just
logic just serves to confuse the reader.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-13 08:56:27 -07:00
Rob Clark
a8be53211d freedreno: turn staging cube into 2d-array
Since we could only need a subset of the layers, and otherwise we
trigger an assert in util_max_layer()

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-13 08:56:27 -07:00
Tomeu Vizoso
3adf9b0757 panfrost: ci: Exclude some tests from results
These are tests that regressed in RK3288 but still pass on RK3399.

So we still have a CI we can rely on, add them to the flip-flop list for
now.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-06-13 17:45:27 +02:00
Tomeu Vizoso
50901a27f6 panfrost: ci: Update test expectations
Some tests got fixed since the last update, but also some regressions
crept in.

To keep the CI green, add the regressions to the expected failures.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-06-13 17:45:22 +02:00
Connor Abbott
37b92b0ae6 nir: Don't manually index intrinsic index enum
This fixes a rebase fail in ea51275e07, and prevents it from happening
again. There's no reason to do this manually.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-13 17:10:41 +02:00
Erik Faye-Lund
901795238b docs: work around broken altsoftware.com link
altsoftware.com seems to no longer be around, and is currently being
held by a domain squatter. Let's link to waybackmachine instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
795b5d923f docs: work around broken dsbox.com link
dsbox.com now forwards to haystax.com, which is tehcnially unrealted to
this link. Let's link to waybackmachine instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
b16e77e051 docs: work around broken sgi.com links
sgi.com now forwards to hpe.com, which is technically unrelated to these
links. Let's link to waybackmachine instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
a9956ed87a docs: update link to OpenGL FAQ
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
12f4cd6a09 docs: update link to the Linux OpenGL ABI
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
e19448c102 docs: update link to glw
GLW is currently living in gitlab, the cgit-page is just a mirror.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
04fc0bc3f3 docs: fixup link-target
Just a couple of lines above, we have this exact same link, but this
time with a leading "www.". Let's match that.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
372f9f6947 docs: eliminate another stale autoconf-reference
Meson is what should tell you about these issues, not the configure
script. We no longer have that.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
c9d396710b docs: replace autoconf with meson
We no longer have an autoconf build-system to maintain, but we do have a
meson build-system. So let's mention that instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
f4f78a59b0 docs: update required packages
Automake and libtool are no longer required to build, instead we need
meson and ninja-build.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
26287b91ac docs: remove pointless haiku-comment
The only build system that doesn't support Haiku is `Android.mk`,
which also doesn't support most other platforms either, so there is
no need to single it out.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Erik Faye-Lund
c339c0175a docs: fixup typo
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-13 14:14:05 +00:00
Daniel Schürmann
c58dff753c radv: enable AMD_shader_ballot with RADV_PERFTEST_SHADER_BALLOT ('shader_ballot')
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann
deedc0b31d amd/common: add support for AMD_shader_ballot functions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann
7a858f274c spirv/nir: add support for AMD_shader_ballot and Groups capability
This commit also renames existing AMD capabilities:
 - gcn_shader -> amd_gcn_shader
 - trinary_minmax -> amd_trinary_minmax

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann
ea51275e07 nir: add intrinsics for AMD_shader_ballot
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann
f2277c327a radv: enable shader_subgroup_vote & shader_subgroup_ballot extensions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann
1b89ebeede nir/spirv: add support for the SubgroupBallotKHR SPIR-V capability
This capability is required for the VK_EXT_shader_subgroup_ballot extension.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann
de56ebadce nir/spirv: add support for the SubgroupVoteKHR SPIR-V capability
This capability is required for the VK_EXT_shader_subgroup_vote extension.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Alejandro Piñeiro
17c2c9cd67 v3d: fix checking twice auf flag
Seems a C&P error, and should check for auf/muf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110902
Fixes: 8f065596d2 "v3d: Add an optimization pass for redundant flags updates."

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-13 11:45:18 +02:00
Samuel Pitoiset
ca6bf9a6cd radv: flush and invalidate CB before resetting query pools on GFX9
We have to emit a CACHE_FLUSH_AND_INV_TS_EVENT to be sure all
prior GPU work is done. While we are at it, also flush and
invalidate DB.

This fixes the following CTS (when the small hint is disabled):
dEQP-VK.query_pool.statistics_query.reset_before_copy.*

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-13 11:23:48 +02:00
Bas Nieuwenhuizen
cb728f28ac vl: Always enable drm winsys.
The dri2 winsys also uses libdrm (and you can only enable dri3 if
you enable dri2), and the drm winsys only requires libdrm.

So if any winsys is enabled you can also enable the drm winsys, and
since we always want at least one winsys we can always enable it.

I removed the check for the drm platform for VA and OMX since they
do not care anymore. Since we still check for one of r600g, nouveau
or radeonsi, we are guarantueed to still only enable it by default
in a configuration that requires libdrm anyway. So for people using
va=auto, we don't suddenly start requiring libdrm were we did not
before.

This supersedes "vl: Enable DRM by default.", which I pushed, but
rolled back because it used dep_libdrm before its definition.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-13 08:25:48 +00:00
Bas Nieuwenhuizen
b4c7ce360b radv: Always disable DCC on shareable images.
Do not want it for perf reasons. Always have to disable DCC when
transferring to external queue.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-13 08:15:45 +00:00
Bas Nieuwenhuizen
0667c1f14b radv: Skip transitions coming from external queue.
Transitions to external queue should do the transition & make sure
it works on all queues.

Fixes: 8ebc7dcb59 "radv: Allow fast clears with concurrent queue mask for some layouts."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-13 08:15:45 +00:00
Mateusz Krzak
60009aefdb lima/ppir: change offset type to int
Offset doesn't need to be 64-bit. This fixes compilation error
with 64-bit off_t.

Fixes: af0de6b9 lima/ppir: implement discard and discard_if

Suggested-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Mateusz Krzak <kszaquitto@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
2019-06-13 07:43:24 +02:00
Chia-I Wu
900a80f9e4 virgl: virgl_transfer should own its virgl_resource
We should avoid having potentially dangling pointers to
pipe_resources in general.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-12 18:20:30 -07:00
Chia-I Wu
74051efbea virgl: pass virgl_context to transfer create/destroy
A pipe_transfer is a context object.  It is fine for the
constructor/destructor to have access to the context.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-12 18:20:30 -07:00
Chia-I Wu
514e12b1b8 virgl: init transfer queue from virgl_context
A pipe_transfer is a context object.  It is fine for
virgl_transfer_queue to have access to the context.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-12 18:20:30 -07:00
Chia-I Wu
308ba2c0f9 virgl: clean up virgl_transfer_queue.h
Add header guard and forward declare structs.  Move virgl_resource.h
inclusion to the C file.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-12 18:20:30 -07:00
Nicolai Hähnle
2d114e6267 radeonsi: add radeonsi_debug_disassembly option
This dumps disassembly to the pipe_debug_callback together with shader
stats.

Can be used together with shader-db to get full disassembly of all shaders
in the database.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
3bde69e789 radeonsi: fix line splitting in si_shader_dump_assembly
Compute the count since the start of the current line instead of the
count since the start of the the disassembly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
aa737e8580 radeonsi: raise the alignment of LDS memory for compute shaders
This implies that the memory will always be at address 0, which allows
LLVM to generate slightly better code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
33be5ad8a3 radeonsi: use an explicit symbol for the LSHS LDS memory
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
174fad7075 radeonsi: rename lds_{load,store} to lshs_lds_{load,store}
These functions are now only used in LS/HS shaders (both separate and
merged).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
b519ddc35c radeonsi/gfx9: declare LDS ESGS ring as an explicit symbol on LLVM >= 9
This will make it easier to use LDS for other purposes in geometry
shaders in the future.

The lifetime of the esgs_ring variable is as follows:
- declared as [0 x i32] while compiling shader parts or monolithic shaders
- just before uploading, gfx9_get_gs_info computes (among other things)
  the final ESGS ring size (this depends on both the ES and the GS shader)
- during upload, the "esgs_ring" symbol is given to ac_rtld as a shared
  LDS symbol, which will lead to correctly laying out the LDS including
  other LDS objects that may be defined in the future
- si_shader_gs uses shader->config.lds_size as the LDS size

This change depends on the LLVM changes for emitting LDS symbols into
the ELF file.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
f8315ae04b amd/rtld: layout and relocate LDS symbols
Upcoming changes to LLVM will emit LDS objects as symbols in the ELF
symbol table, with relocations that will be resolved with this change.

Callers will also be able to define LDS symbols that are shared between
shader parts. This will be used by radeonsi for the ESGS ring in gfx9+
merged shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
dc99a8cd9b radeonsi: cleanup some #includes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
1ff2440eee amd/common: use ARRAY_SIZE for the LLVM command line options
This is more convenient for changing it around during debug.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
ca21ba2a08 radeonsi: inline si_shader_binary_read_config into its only caller
Since it can only be used for reading the config of an individual,
non-combined shader, it is not very reusable anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
bf8a1ca902 radeonsi: use the new run-time linker for shaders
v2:
- fix a memory leak

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
16bee0e5f6 radeonsi: don't declare pointers to static strings
The compiler should be able to optimize them away, but still. There's
no point in declaring those as pointers, and if the compiler *doesn't*
optimize them away, they add unnecessary load-time relocations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
3c958d924a amd/common: add ac_compile_module_to_elf
A new variant of ac_compile_module_to_binary that allows us to
keep the entire ELF around.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
66da60f4da radeonsi: dump shader binary buffer contents
Help identify bugs related to corruption of shaders in memory,
or errors in shader upload / rtld.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
bf11c594dd radeonsi: return bool from si_shader_binary_upload
We didn't really use error codes anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
8b1343ca79 radeonsi: let si_shader_create return a boolean
We didn't really use error codes anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
77b05cc42d radeonsi: use ac_shader_config
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle
b3be346c68 amd/common: add a more powerful runtime linker
Using an explicit linker instead of just concatenating .text
sections will allow us to start using .rodata sections and
explicit descriptions of data on LDS that is shared between
stages.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Caio Marcelo de Oliveira Filho
608257cf82 i965: Fix INTEL_DEBUG=bat
Use hash_table_u64 instead of hash_table directly, since the former
will also handle the special keys (deleted and freed) and allow use
the whole u64 space.

Fixes crash in INTEL_DEBUG=bat when using a key with value 0 -- the
current value for a freed key.

Fixes: b38dab101c "util/hash_table: Assert that keys are not reserved pointers"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-12 15:57:16 -07:00
Caio Marcelo de Oliveira Filho
eb41ce1b01 util/hash_table: Properly handle the NULL key in hash_table_u64
The hash_table_u64 should support any uint64_t as input.  It does
special handling for the "deleted" key, storing the data in the table
itself; do the same for the "freed" key.

Fixes: b38dab101c "util/hash_table: Assert that keys are not reserved pointers"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-12 15:57:16 -07:00
Nicolai Hähnle
c129cb3861 amd/common: clarify ac_shader_binary::lds_size
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:33:21 -04:00
Nicolai Hähnle
2e96c01073 amd/common: extract ac_parse_shader_binary_config
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:33:08 -04:00
Nicolai Hähnle
de8a919702 u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros
The main motivation for this change is API ergonomics: most operations
on dynarrays are really on elements, not on bytes, so it's weird to have
grow and resize as the odd operations out.

The secondary motivation is memory safety. Users of the old byte-oriented
functions would often multiply a number of elements with the element size,
which could overflow, and checking for overflow is tedious.

With this change, we only need to implement the overflow checks once.
The checks are cheap: since eltsize is a compile-time constant and the
functions should be inlined, they only add a single comparison and an
unlikely branch.

v2:
- ensure operations are no-op when allocation fails
- in util_dynarray_clone, call resize_bytes with a compile-time constant element size

v3:
- fix iris, lima, panfrost

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:30:25 -04:00
Nicolai Hähnle
71b45bae14 u_dynarray: return 0 on realloc failure and ensure no-op
We're not very good at handling out-of-memory conditions in general, but
this change at least gives the caller the option of handling it gracefully
and without memory leaks.

This happens to fix an error in out-of-memory handling in i965, which has
the following code in brw_bufmgr.c:

      node = util_dynarray_grow(vma_list, sizeof(struct vma_bucket_node));
      if (unlikely(!node))
         return 0ull;

Previously, allocation failure for util_dynarray_grow wouldn't actually
return NULL when the dynarray was previously non-empty.

v2:
- make util_dynarray_ensure_cap a no-op on failure, add MUST_CHECK attribute
- simplify the new capacity calculation: aside from avoiding a useless loop
  when newcap is very large, this also avoids an infinite loop when newcap
  is larger than 1 << 31

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:30:25 -04:00
Nicolai Hähnle
dc75362511 freedreno: use util_dynarray_clear instead of util_dynarray_resize(_, 0)
This is more expressive and simplifies a subsequent change.

v2:
- fix one more call-site after rebase

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 18:30:25 -04:00
Alyssa Rosenzweig
1ee2366693 panfrost/midgard: Differentiate vertex/fragment texture tags
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:32:12 -07:00
Alyssa Rosenzweig
5062b612be panfrost/midgard: Assert on unknown texture source
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:32:07 -07:00
Alyssa Rosenzweig
4ea512844c panfrost/midgard: Set minimal swizzle on texture input
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:32:01 -07:00
Alyssa Rosenzweig
6ae4f9c523 panfrost/midgard: Lower texture projectors
We do have native support for perspective division on the load/store
unit, but this is for the future, something ideally we would select
generally, not just for textures. Meanwhile, flipping on projector
lowering works now.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:31:53 -07:00
Alyssa Rosenzweig
4012e06788 panfrost/midgard: Implement txl
This follows the txb implementation, but requires an adjustment to how
the cont/last flags are set.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:31:45 -07:00
Alyssa Rosenzweig
a19ca344ab panfrost/midgard: Implement txb op
We refactor the main tex handling to fit a bias argument in as well.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:31:37 -07:00
Alyssa Rosenzweig
1acffb5671 panfrost: Unify bind_vs/fs_state
This replaces bind_vs/fs_state calls to a unified bind_shader_state
call, removing a great deal of duplicated logic related to variant
selection.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:29:05 -07:00
Alyssa Rosenzweig
f8a4090f80 panfrost: Add panfrost_job_type_for_pipe helper
This logic is repeated in a bunch of places and will only grow worse as
we support more job types; collect it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:27:47 -07:00
Alyssa Rosenzweig
15fae1e38c panfrost/midgard: Extract emit_varying_read
Paralleling emit_uniform_read, this allows varying reads to be emitted
independent of an honest-to-goodness load vary instruction in the NIR.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:25:17 -07:00
Alyssa Rosenzweig
8c88bd0253 panfrost: Remove "vertex/tiler render target" silliness
I don't think these are actual structures, just figments over
cargoculting dumped memory without making any sense of it. Nothing seems
to break if the region is zeroed out, anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:21:56 -07:00
Alyssa Rosenzweig
b96df80069 panfrost/decode: Print line number of bad memory access
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:14:53 -07:00
Alyssa Rosenzweig
fc7bcee865 panfrost: Replace pantrace with direct decoding
History lesson! In the early days of a Panfrost, we had a library
independent of the driver called `panwrap` which would be LD_PRELOAD'ed
into a driver to decode its cmdstream in real-time. When upstreaming
Panfrost, we realized that we would much rather have this decode
functionality maintained in-tree to avoid divergence, but that we could
not upstream panwrap because of its use with the legacy API. So we
instead dumped GPU memory to the filesystem with an out-of-tree panwrap,
and decoded that with the in-tree pandecode module. When we migrated to
the new kernel, we just added support for doing this memory dump
directly from the driver (via a module "pantrace").

This works, but dumping memory every frame is sloooooooooooooow and
error-prone. I figured if we have pandecode in-tree, we might as well
link to it directly in the driver, allowing us to decode Panfrost's
command streams without dumping memory to the filesystem first. This
cleans up the code *substantially* and improves dumping performance by a
HUGE margin. I'm talking "several seconds per frame" to "dumping in
real-time" kind of jump.

Note to users: this removes the environmental option "PANTRACE_BASE".
Instead, for equivalent functionality set "PAN_MESA_DEBUG=trace" and
redirect stdout to the file of your choosing.

This should be debugging Panfrost much more pleasant.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-12 14:07:09 -07:00
Kevin Strasser
845ec8576a st/mesa: Add rgbx handling for fp formats
Add missing cases for fp32 and fp16 formats.

Fixes: c68334ffc0 "st/mesa: add floating point formats in st_new_renderbuffer_fb()"
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-12 19:03:47 +00:00
Kevin Strasser
ec0a68e50d gallium/winsys/kms: Fix dumb buffer bpp
The bpp in the dumb buffer creation request is hardcoded to 32, which is an
incorrect assumption as the caller is free to pick any pipe format. Use the
bpp supplied to us through util_format_get_blocksizebits().

Fixes: 3b176c441b "gallium: Add a dumb drm/kms winsys backed swrast provider"
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-12 11:44:10 -07:00
Eric Engestrom
9996ddbb27 util/futex: fix dangling pointer use
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110901
Fixes: 7dc2f47882 "util: emulate futex on FreeBSD using umtx"
Cc: Greg V <greg@unrelenting.technology>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-12 17:27:44 +01:00
Samuel Pitoiset
d378151246 radv: fix VK_EXT_memory_budget if one heap isn't available
When the visible VRAM size is equal to the VRAM size only two
heaps are exposed.

This fixes dEQP-VK.api.info.device.memory_budget.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-12 15:52:48 +02:00
Samuel Pitoiset
2ef9d2738c radv: fix occlusion queries on VegaM
The number of render backends is 16 but the enabled mask is 0xaaaa.

As noticed by Bas, allowing disabled render backends might break
the OCCLUSION_QUERY packet. We don't use it yet but keep this in
mind.

This fixes dEQP-VK.query_pool.* and dEQP-VK.multiview.*.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-12 15:51:12 +02:00
Lionel Landwerlin
93b93e5a9d anv: do not parse genxml data without INTEL_DEBUG=bat
This significantly slows down the CTS runs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 32ffd90002 ("anv: add support for INTEL_DEBUG=bat")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-06-12 12:53:35 +03:00
Lionel Landwerlin
f80679c8e8 intel/dump: fix segfault when the app hasn't accessed the device
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-12 09:49:55 +03:00
Caio Marcelo de Oliveira Filho
48d7e7a9b8 iris: Only upload surface state for grid info when needed
Special care is needed to ensure that when we have two consecutive
calls with the same grid size, we only bail in the second one if it
either don't need the surface state or the surface state was already
uploaded.

v2: Instead of having a new bool in ice->state to know whether we had
    a surface, check whether we have state->ref.  (Ken)
    Clean up the logic a little bit by adding 'grid_updated' local. (Ken)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 17:57:37 -07:00
Caio Marcelo de Oliveira Filho
f346b277d1 iris: Create binding table slot for num_work_groups only when needed
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 17:57:37 -07:00
Rui Salvaterra
7b43362f29 r300g: implement GLSL disk shader caching
This implements GLSL disk shader caching for the R300-R500 series of AMD GPUs.

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-11 20:49:34 -04:00
Richard Thier
ffd2f948fe r300g: restore performance after RADEON_FLAG_NO_INTERPROCESS_SHARING was added
v1: Fix skipped slab allocators and the buffer cache.

v2: Use only 1 domain for texture allocation

v3: Added flag for the create_fence call too

Based on Marek v1 and v2 proposed fixes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=1107812.patch

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-11 20:45:27 -04:00
Marek Olšák
ec0956a194 radeonsi: don't test SDMA perf if SDMA is disabled/unsupported 2019-06-11 20:05:21 -04:00
Marek Olšák
993bf52977 radeonsi: always interpolate PrimID as flat 2019-06-11 20:05:21 -04:00
Marek Olšák
7f7ffa0883 radeonsi: move color clamping to si_llvm_export_vs to unify the code 2019-06-11 20:05:21 -04:00
Marek Olšák
4773f5a293 radeonsi: use the ac helper for index buffer stores in the culling shader 2019-06-11 20:05:21 -04:00
Marek Olšák
579003e7bd radeonsi: use the ac helper for image stores 2019-06-11 20:05:21 -04:00
Marek Olšák
deef3833f8 radeonsi: use the ac helper for SSBO stores 2019-06-11 20:05:21 -04:00
Marek Olšák
e5fe38484a radeonsi: fixes for vec3 buffer stores in LLVM 9 2019-06-11 20:05:21 -04:00
Caio Marcelo de Oliveira Filho
9c81db8adb iris: Enable PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
This avoids lowering of CS system values by GLSL (configured by state
tracker).  In i965 we don't use that lowering, and we also shouldn't
need that in Iris.

Using it cause some unnecessary round trip between values, e.g.:
shader uses gl_LocalInvocationIndex, GLSL rewrites it in terms of
gl_LocalInvocationID, then driver rewrites those in terms of
gl_LocalInvocationIndex again.  Copy propagation can make some of
those go away, but not all as seen below.

Intel SKL shader-db results:

    total instructions in shared programs: 15595189 -> 15594556 (<.01%)
    instructions in affected programs: 74880 -> 74247 (-0.85%)
    helped: 81
    HURT: 4
    helped stats (abs) min: 2 max: 172 x̄: 7.88 x̃: 4
    helped stats (rel) min: 0.19% max: 5.66% x̄: 1.71% x̃: 1.23%
    HURT stats (abs)   min: 1 max: 2 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.45% max: 1.65% x̄: 0.76% x̃: 0.46%
    95% mean confidence interval for instructions value: -11.56 -3.34
    95% mean confidence interval for instructions %-change: -1.91% -1.28%
    Instructions are helped.

    total loops in shared programs: 4831 -> 4831 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 372136618 -> 372145628 (<.01%)
    cycles in affected programs: 9218230 -> 9227240 (0.10%)
    helped: 131
    HURT: 86
    helped stats (abs) min: 1 max: 798 x̄: 39.79 x̃: 12
    helped stats (rel) min: <.01% max: 6.75% x̄: 0.42% x̃: 0.13%
    HURT stats (abs)   min: 2 max: 2442 x̄: 165.38 x̃: 6
    HURT stats (rel)   min: <.01% max: 20.83% x̄: 0.74% x̃: 0.12%
    95% mean confidence interval for cycles value: -2.07 85.11
    95% mean confidence interval for cycles %-change: -0.22% 0.30%
    Inconclusive result (value mean confidence interval includes 0).

    total spills in shared programs: 11956 -> 11950 (-0.05%)
    spills in affected programs: 77 -> 71 (-7.79%)
    helped: 3
    HURT: 0

    total fills in shared programs: 25619 -> 25549 (-0.27%)
    fills in affected programs: 593 -> 523 (-11.80%)
    helped: 4
    HURT: 0

    LOST:   0
    GAINED: 0

    Total CPU time (seconds): 1695.69 -> 1706.03 (0.61%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 15:12:17 -07:00
Caio Marcelo de Oliveira Filho
46de3beab1 gallium: Add PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED
Tells whether or not the driver can handle gl_LocalInvocationIndex and
gl_GlobalInvocationID.  If not supported (the default), state tracker
will lower those on behalf of the driver.

v2: Add case to u_screen.c.  (Anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 15:12:17 -07:00
Caio Marcelo de Oliveira Filho
f03b21ae69 st/glsl: Perform some var optimizations
Perform those before some derefs are gone when we lower the buffers
after the st_nir_opts() call.

Intel SKL shader-db results:

    total instructions in shared programs: 15593685 -> 15590708 (-0.02%)
    instructions in affected programs: 378078 -> 375101 (-0.79%)
    helped: 777
    HURT: 44
    helped stats (abs) min: 1 max: 68 x̄: 4.07 x̃: 4
    helped stats (rel) min: 0.04% max: 31.58% x̄: 2.88% x̃: 1.37%
    HURT stats (abs)   min: 1 max: 24 x̄: 4.20 x̃: 2
    HURT stats (rel)   min: 0.17% max: 8.00% x̄: 1.60% x̃: 1.27%
    95% mean confidence interval for instructions value: -4.02 -3.23
    95% mean confidence interval for instructions %-change: -2.93% -2.35%
    Instructions are helped.

    total loops in shared programs: 4815 -> 4815 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 371965528 -> 371788566 (-0.05%)
    cycles in affected programs: 184190307 -> 184013345 (-0.10%)
    helped: 3650
    HURT: 2855
    helped stats (abs) min: 1 max: 59400 x̄: 99.45 x̃: 15
    helped stats (rel) min: <.01% max: 43.18% x̄: 2.60% x̃: 1.02%
    HURT stats (abs)   min: 1 max: 16362 x̄: 65.16 x̃: 10
    HURT stats (rel)   min: <.01% max: 66.22% x̄: 2.78% x̃: 0.81%
    95% mean confidence interval for cycles value: -53.73 -0.68
    95% mean confidence interval for cycles %-change: -0.39% -0.08%
    Cycles are helped.

    total spills in shared programs: 11936 -> 11956 (0.17%)
    spills in affected programs: 443 -> 463 (4.51%)
    helped: 0
    HURT: 8

    total fills in shared programs: 25644 -> 25619 (-0.10%)
    fills in affected programs: 2306 -> 2281 (-1.08%)
    helped: 24
    HURT: 2

    LOST:   7
    GAINED: 16

    Total CPU time (seconds): 1679.04 -> 1695.69 (0.99%)

shader-db results radeonsi (VEGA64):

    Totals from affected shaders:
    SGPRS: 180160 -> 179552 (-0.34 %)
    VGPRS: 115368 -> 114544 (-0.71 %)
    Spilled SGPRs: 5627 -> 5603 (-0.43 %)
    Spilled VGPRs: 0 -> 0 (0.00 %)
    Private memory VGPRs: 0 -> 0 (0.00 %)
    Scratch size: 0 -> 0 (0.00 %) dwords per thread
    Code Size: 7808364 -> 7803268 (-0.07 %) bytes
    LDS: 192 -> 192 (0.00 %) blocks
    Max Waves: 19202 -> 19340 (0.72 %)
    Wait states: 0 -> 0 (0.00 %)

Radeonsi results provided by Timothy.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 14:53:54 -07:00
Ville Syrjälä
6230bfeb65 anv/cmd_buffer: Reuse gen8 Cmd{Set, Reset}Event on gen7
Modern DXVK requires event support [1], but looks like it only
uses vkCmdSetEvent() + vkGetEventStatus(). So we can just
borrow the relevant code from gen8, leaving CmdWaitEvents still
unimplemented.

[1] 8c3900c533

v2: Also move CmdWaitEvents into genX_cmd_buffer.c (Jason)

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-11 16:25:07 -05:00
Ian Romanick
39f4dc23a5 intel/fs: Mark source 0 of bcsel as needing Boolean resolve
The other sources of the bcsel behave like the sources of an and or
other logical operation.  However, source zero behaves differently.
It is evaluated as a Boolean, so it needs to be resolved.

No shader-db changes, but the tests mentioned in the bug get a couple
instructions added back.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-11 12:12:07 -07:00
Rob Clark
f9f89df8bc freedreno/a5xx: enable a540
Tested-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-11 12:03:10 -07:00
Rob Clark
832010f6ac freedreno/a6xx: enable UBWC by default
Flip the FD_MESA_DEBUG flag to a disable rather than enable, drop the
obsolete comment (and bonus, drop unused softpin debug flag)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
81cc555e9a freedreno/a6xx: disallow UBWC for z24s8
This is slightly annoying because it *mostly* works.. but we have some
issues to sort out about how to blit z24s8/x24s8/z24x8 with UBWC before
we can enable UBWC by default.  For now it is a step forward to at least
enable it for non-z/s while we figure out how to blit z24s8+UBWC.

(The basic issue is that pretending z24s8 is an equivalently sized rgba
format for the purpose of blitting falls apart when UBWC is in the
picture.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
4f1319a17d freedreno/a6xx: use correct UBWC reg builders
No functional change, the registers have the same layout as MRT flags
pitch reg.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
d42ce659ed freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
490baa6974 freedreno/a6xx: disable UBWC for some formats
An older blob claims to support UBWC w/ r32ui an r32i, but not r32f.
Results from deqp indicate that it doesn't work with r32ui and r32i.

This *could* also just mean that use as "IBO" (image) is more limited
than as texture, although blob also doesn't seem to bother to try to use
UBWC with images at all, so hard to know for sure.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
8ddffa75c0 freedreno/a6xx: handle non-UWC-compatible image views
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
dac3bc9862 freedreno/a6xx: handle non-UBWC-compatible texture views
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
fe5c7b2b75 freedreno: add helper to uncompress UBWC resource
We'll need this for a few edge cases, like image/sampler view that uses
a format that UBWC does not support with a resource originally created
in a format that UBWC does support.

NOTE we *could* in some cases do an in-place uncompress.  But that has
a couple potential sharp edges:

 1) the uncompressed buffer could have different layout, ie. a5xx
    with meta and pixel data of layers/levels interleaved.

 2) if it comes mid-batch, it would force flush, or somehow fixing
    up cmdstream for draws already emitted.  But with the resource
    shadowing approach we can rely on batch re-ordering to avoid
    splitting things.. older draws see the older compressed version,
    newer draws see the new uncompressed version of the rsc.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
846b8a76bd freedreno: handle images in rebind_resource()
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
c6ae354299 freedreno: allow null discard box in shadow path
When uncompressing a UBWC buffer, we don't want to discard anything.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
12201d7a8b freedreno: swap UBWC state in shadow path
It doesn't come up yet, as so far we only hit this path with linear
buffers.  But it will when we start re-using the shadow path for
uncompressing UBWC buffers.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
3c9a31eb50 freedreno: add modifier param to fd_try_shadow_resource()
To uncompress UBWC, I want to re-use the shadow path, but we'll need a
way to request that the new buffer is not compressed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Rob Clark
3b05a120a3 freedreno: correct modifier for UBWC buffers
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-11 10:55:27 -07:00
Chia-I Wu
15323c14fd virgl: consider newly created resources idle
A newly created resource can be regarded as idle.  We don't care if
the RESOURCE_CREATE command has been retired, unless it is used for
fencing.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-11 10:03:54 -07:00
Chia-I Wu
9e4452cfd9 virgl: make resource_wait/resource_is_busy cheaper
The round trip to the kernel is expensive.  Add a local cache to
avoid it when possible.

There is a race condition when two contexts access the same resource
at the same time (e.g., ctx1 submits a cmdbuf that accesses a
resource while ctx2 maps the resource).  But that is probably an app
bug in the first place.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-11 10:03:54 -07:00
Chia-I Wu
ddc90be907 virgl: add virgl_drm_{alloc,free,clear}_res_list
Helpers to work with resource list.  virgl_drm_release_all_res is
removed.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-11 10:03:54 -07:00
Chia-I Wu
71465fe569 virgl: do not cache external resources
We should not reuse a resource for other purposes when it can still
be accessed by another process or device.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-11 10:03:54 -07:00
Alyssa Rosenzweig
7d43999e63 panfrost: Enable AFBC on depth/stencil
This seems to be a performance win, but more rigorous testing is
necessary to figure out the exact circumstances when this is good/bad.
Incidentally, this fixes non-aligned ZS.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:46:43 -07:00
Alyssa Rosenzweig
15f62b8e7c panfrost: Linear depth/stencil should be aligned
We might render to it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:46:43 -07:00
Alyssa Rosenzweig
d7ad29ce25 panfrost/midgard: Decode LOD/bias registers
For constant LODs/biases, we can use an immediate embedded in the
texture (already decoded); for non-constant, we have to use a register
squeezed into the usual immediate field, which is decoded here.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
b4a3296e77 panfrost/midgard: Decode texture offset register swizzle
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
4e9e42cc56 panfrost/midgard/disasm: include textureGather()
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
6c18ae33bc panfrost/midgard: Support negative immediate offsets
It's not at all clear why this work for texelFetch but not texture.
Maybe the top bits are dual-purpose on other texturing ops...?

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
4d8157f12d panfrost/midgard: Fix redunant mask redundancy
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
3dee556c4e panfrost/midgard/disasm: Print LOD for texelFetch
Its encoding differs slightly from the LOD used in normal texture calls.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
cda9f32909 panfrost/midgard: Identify the in_reg_full field
This is clear for texelFetch, hence the confusion with Bifrost's filter
field, but it's much more general in reality.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
445a7b523f panfrost/midgard/disasm: Correctly dump bias/LOD
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
873a3ed342 panfrost/midgard/disasm: Cleanup texture op code
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
289405392d panfrost/midgard/disasm: Add missing space
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
f4ee8d055c panfrost/midgard/disasm: LOD immediate/register select
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig
59fa7c95c8 panfrost/midgard/disasm: Use texture op name bare
This allows us to show a call to textureLod in a reasonable way.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:18 -07:00
Alyssa Rosenzweig
109460f03a panfrost/midgard/disasm: Varying perspective divides
With an extra flag, we're able to do a perspective division "for free"
while loading a varying.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:18 -07:00
Alyssa Rosenzweig
fc472007e7 panfrost/midgard: Add perspective division opcodes
...on the load/store unit, not the ALUs. Looks goofy but hey.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:18 -07:00
Alyssa Rosenzweig
b0396d6dda panfrost/midgard: Print texture offsets
This patch identifies the two modes of offsets in a texture instruction
(immediate and register, disambiguated by the bit-once-known-as
"has_offset") and implements disassembly for both.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:18 -07:00
Alyssa Rosenzweig
ed1c48e91d panfrost/midgard: Expand texture to 4-channel swizzle
This eliminates some unknowns, clarifies 3D textures, and will maybe
help with array/shadow textures?

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-11 08:44:18 -07:00
Juan A. Suarez Romero
b586ed51f3 docs: update calendar, add news item and link release notes for 19.1.0
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-06-11 17:38:22 +02:00
Juan A. Suarez Romero
cc7fc7e319 docs: Add SHA256 sums for 19.1.0
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 2a5b4e2b9f)
2019-06-11 15:26:42 +00:00
Juan A. Suarez Romero
7e8e49475c docs: Add release notes for 19.1.0
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 1517811f4f)
2019-06-11 15:26:38 +00:00
Samuel Iglesias Gonsálvez
32e1d85cb6 radv: assert on inline uniform blocks in radv_CmdPushDescriptorSetKHR()
According to the Vulkan spec, inline uniform blocks are not allowed
to be updated through vkCmdPushDescriptorSetKHR().

These are the spec quotes from "13.2.1. Descriptor Set Layout"
that are relevant for this case:

"VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies
 that descriptor sets must not be allocated using this layout, and
 descriptors are instead pushed by vkCmdPushDescriptorSetKHR."

"If flags contains
 VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all
 elements of pBindings must not have a descriptorType of
 VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT".

There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid
this case but it is implied in the creation of the descriptor set
layout as aforementioned.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-11 16:32:27 +02:00
Samuel Iglesias Gonsálvez
d0c52ff610 anv: ignore inline uniform blocks in anv_CmdPushDescriptorSetKHR()
According to the Vulkan spec, inline uniform blocks are not allowed
to be updated through vkCmdPushDescriptorSetKHR().

These are the spec quotes from "13.2.1. Descriptor Set Layout"
that are relevant for this case:

"VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies
that descriptor sets must not be allocated using this layout, and
descriptors are instead pushed by vkCmdPushDescriptorSetKHR."

"If flags contains
VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all
elements of pBindings must not have a descriptorType of
VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT".

There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid
this case but it is implied in the creation of the descriptor set
layout as aforementioned.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-11 16:25:53 +02:00
Eric Engestrom
773ff93bc4 egl: compare the whole list of attributes
`memcmp()` compares a given number of bytes, but `EGLAttrib` is larger than a byte.

Fixes: 8e991ce539 "egl: handle the full attrib list in display::options"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-11 12:18:09 +00:00
Eduardo Lima Mitev
3fb7b1fd35 freedreno/a5xx: Fix indirect draw max_indices calculation
The number of elements to draw should not be affected by the offset.

A similar fix was submitted for a6xx at 79180a05.

Fixes these dEQP tests on a5xx:

dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_8
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_separate_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_combined_grid_500x500_drawcount_2500
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_8
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_2500

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-11 08:28:45 +02:00
Samuel Pitoiset
40699f74b8 radv: remove extra assignment in radv_decompress_resolve_subpass_src()
baseArrayLayer is defined twice, trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-11 08:17:22 +02:00
Samuel Pitoiset
c39a1611ab radv: add radv_get_resolve_pipeline() helper in the graphics path
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-11 08:06:42 +02:00
Samuel Pitoiset
b06d1f029d radv: do not decompress all image layers before resolving inside a subpass
When decompressing resolve source images, we should rely on the
framebuffer layer count instead of resolving all images layers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-11 08:06:39 +02:00
Samuel Pitoiset
4efbd963ec radv: initialize the aspect mask when decompressing resolve source images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-11 08:06:35 +02:00
Samuel Pitoiset
c31a07fa85 radv: perform proper layout transitions before resolving
Use an explicit pipeline barrier for doing layout transitions
instead of duplicating some code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-11 08:06:32 +02:00
Samuel Pitoiset
92fa6264cb radv: do not resolve all image layers with compute inside a subpass
When resolving inside a subpass, we should rely on the framebuffer
layer count instead of resolving all images layers. This should
improve performance of layered resolves a bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-11 08:06:28 +02:00
Kenneth Graunke
a8588f512b iris: Bypass half-float pack/unpack lowering.
This skips GLSL IR lowering of pack/unpackHalf operations, allowing
the NIR optimizer to see them

Improves performance in Synmark2's OglCSDof by about 2x, by cutting
about 90% of the cycles from one of the compute shaders.

shader-db statistics on Skylake:

4 compute shaders went from SIMD8 to SIMD16.

total instructions in shared programs: 15598871 -> 15542568 (-0.36%)
instructions in affected programs: 143016 -> 86713 (-39.37%)
helped: 144
HURT: 0
helped stats (abs) min: 17 max: 4669 x̄: 390.99 x̃: 164
helped stats (rel) min: 7.48% max: 85.28% x̄: 30.17% x̃: 24.22%
95% mean confidence interval for instructions value: -510.50 -271.49
95% mean confidence interval for instructions %-change: -32.70% -27.65%
Instructions are helped.

total cycles in shared programs: 371973958 -> 368902103 (-0.83%)
cycles in affected programs: 5557722 -> 2485867 (-55.27%)
helped: 144
HURT: 0
helped stats (abs) min: 106 max: 1026600 x̄: 21332.33 x̃: 1697
helped stats (rel) min: 0.53% max: 88.98% x̄: 36.12% x̃: 34.67%
95% mean confidence interval for cycles value: -41570.02 -1094.64
95% mean confidence interval for cycles %-change: -38.44% -33.80%
Cycles are helped.

total spills in shared programs: 11936 -> 11903 (-0.28%)
spills in affected programs: 110 -> 77 (-30.00%)
helped: 3
HURT: 2

total fills in shared programs: 25644 -> 25178 (-1.82%)
fills in affected programs: 677 -> 211 (-68.83%)
helped: 5
HURT: 0

total loops in shared programs: 4830 -> 4829 (-0.02%)
loops in affected programs: 1 -> 0
helped: 1
HURT: 0
2019-06-10 16:01:36 -07:00
Bas Nieuwenhuizen
e0d12f79c5 radv: Handle UNDEFINED format in image format list.
Was watching a presentation on YT where this was used and it turns
out it is not invalid.

The only case it is actually valid as format in the creation of an
image or image view is with Android Hardware Buffers which have
their format specified externally.

So we can just ignore all entries with VK_FORMAT_UNDEFINED.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-10 22:21:16 +00:00
Bas Nieuwenhuizen
39c71e0025 radv: Prevent out of bound shift on 32-bit builds.
uintptr_t is 32-bits then and shifting it by 32 bits results in undefined
behavior IIRC.

Fixes: b3c8de1c55 "radv: save all descriptor pointers into the trace BO"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-10 22:18:51 +00:00
Caio Marcelo de Oliveira Filho
2cb5907508 glsl: Check order and uniqueness of interlock functions
With this commit all remaining compilation tests in Piglit for
ARB_fragment_shader_interlock will pass.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2019-06-10 14:29:32 -07:00
Caio Marcelo de Oliveira Filho
b7c9fc72fd glsl: Make interlock builtins follow same compiler rules as barriers
Generalize the barrier code to provide correct error messages for
other builtins.

Fixes most of piglit compilation tests for
ARB_fragment_shader_interlock.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2019-06-10 14:29:26 -07:00
Eduardo Lima Mitev
fb2169040a nir/opt_algebraic: Fix rules for imadsh_mix16
The rules added in patch 3addd7c are inverted:

It should be:

(al * bh) << 16 + c

instead of:

(ah * bl) << 16 + c

Fixes a number of regressions under
dEQP-GLES31.functional.draw_indirect.compute_interop.large.*
on Freedreno.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-10 22:27:46 +02:00
Alyssa Rosenzweig
e9703fb416 panfrost: Ignore discards in dead branch analysis
Fixes regressions in
dEQP-GLES2.functional.shaders.discard.dynamic_loop_*

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 08:23:08 -07:00
Samuel Pitoiset
e9316fdfd4 radv: fix setting CB_SHADER_MASK for dual source blending
CB_SHADER_MASK was computed without the second color buffer
format which looks totally wrong to me.

While we are at it, copy a comment from RadeonSI.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-10 17:21:56 +02:00
Alyssa Rosenzweig
50ffaaff3b panfrost/midgard: Disambiguate register mode
We postfix instructions by their size if a destination override is in
place (a la AT&T assembly), disambiguating instruction sizes.
Previously, "16-bit instruction, 16-bit dest, 16-bit sources"
disassembled identically to "32-bit instruction, 16-bit dest, 16-bit
sources", which is semantically distinct due to the lessened opportunity
for parallelism but (potentially) greater precision. Adding a postfix
removes the ambiguity and relieves mental gymnastics reading weird
disassemblies even in some cases that are not ambiguous.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:12 -07:00
Alyssa Rosenzweig
8027cc9975 panfrost/midgard: Expose vec8/vec16 modes
Midgard ALUs can operate in one of four modes: vec2 64-bit, vec4 32-bit,
vec8 16-bit, or vec16 8-bit. Our compiler (and indeed, any OpenGL ES
shader) only uses 32-bit (and eventually vec4 16-bit) modes in normal
circumstances. Nevertheless, the other modes do exist and are easily
accessible through OpenCL; they also come up in cases like blend
shaders.

While we have had minimal support for decoding 8-bit/64-bit modes, we
did so pretending they were vec4 in each case; 16-bit registers had a
synthetically duplicated register file to separate lo/hi halves, etc.
This works for GL, but it doesn't map to what the hardware is -actually-
doing, which can cause some headscratchingly bizarre disassemblies from
OpenCL. So, we dive in the deep end and support these other modes
natively in the disassembler, using absurdly long masks/swizzles, since
the hardware is considerably more flexible than what was exposed before.

Outside of some fixed routines for blending, none of the above is
supported in the compiler yet. But it's better to have it in the ISA
definitions and disassembler than not, for future use if nothing else.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig
2d0bda0885 panfrost/midgard: Add shifting int modifiers
As a source modifier, shift allows shifting a value left by the bit
size, useful in conjunction with a greater register mode, for instance
to implement `upsample`. As a concrete example, the following OpenCL:

   ushort hr0 = /* ... */, uint r1 = /* ... */;
   uint r2 = (convert_uint(hr0) << 16) ^ b;

compiles to the following Midgard assembly:

   ixor r, (hr0) << 16, b

In reverse, the ".hi" output modifier shifts the value right by the bit
size, leaving just the carry/overflow at the bottom. To implement *_hi
functions in OpenCL (for <64-bit), we do arithmetic in the 2x higher
mode with the .hi modifier. (For 64-bit, things are hairier, since there
is not an 128-bit int mode).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig
6780481a3f panfrost/midgard: Add integer outmods
For floats, output modifiers determine clamping behaviour. For integers,
they determine wrapping/saturation behaviour (or shifting -- see next
commit). These are very different; they are conceptually two unrelated
enums union'ed together; the distinction is responsible for many-a-bug.
While clamping behaviour for floats was clear from GL, the int behaviour
is only known From OpenCL contortion with convert_*_sat() functions.

With the underlying functions known, clean up the codebase, likely
fixing outmod type related bugs in the process.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig
215b8844ee panfrost/midgard: Note floating compares type convert
OP_TYPE_CONVERTS denotes an opcode that returns a different type than is
source (going from int-domain to float-domain or vice versa), named
after the f2i/i2f family of opcodes it covers. We care because source
mods are determined by the source type (i/f) but output modifiers are
determined by the output type (equals the source type, unless the op
type converts, in which case it's the opposite).

The upshot is that floating-point compares (feq/fne/etc) actually do
type-convert.  That is, that take in floating-points and output in
integer space (a boolean), so we mark them off this way to ensure the
correct output modifiers are used.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig
d48d991ce2 panfrost: Align linear renderable resources
It's just -easier- to render to aligned framebuffers. For winsys
targets, we already align, but even for an internal linear FBO we ought
to align everything nicely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:48:07 -07:00
Alyssa Rosenzweig
d89e0716a1 panfrost: Fix stride check when mipmapping
Now that we support custom strides on mipmapped textures (theoretically,
at least), extend the stride check to support mipmaps.  Fixes incorrect
strides of linear windows in Weston.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-10 06:47:18 -07:00
Alyssa Rosenzweig
416fc3b5ef panfrost: Refactor texture/sampler upload
We move some coding packing the texture/sampler descriptors into
dedicated functions (out of the terrifyingly long emit_for_draw
monolith), cleaning them up as we go.

The discovery triggering the cleanup is the format for including manual
strides in the presence of mipmaps/cubemaps. Rather than placed at the
end like previously assumed, they are interleaved after each address.
This difference is relevant when handling NPOT linear mipmaps.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:45:33 -07:00
Alyssa Rosenzweig
a35069a7b5 panfrost: Refactor blitting code
We refactor the wallpaper rendering code to separate the
wallpaper-specific bits from the general blitting capabilities. In the
(hopefully near) future, we'll turn this on to implement real Gallium
blits, e.g. for automatic mipmap generation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:45:25 -07:00
Alyssa Rosenzweig
d878753efa panfrost: Refactor AFBC code
This patch does a substantial cleanup of the code for handling AFBC,
moving various disparate misplaced functions into a new central
pan_afbc.c file.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:45:14 -07:00
Alyssa Rosenzweig
b4763984ac panfrost: Move pan_screen() to pan_screen.h
Trivial.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:45:05 -07:00
Alyssa Rosenzweig
a38583e352 panfrost: Always align strides to cache line (64)
(Performance tweak.)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 06:44:56 -07:00
Emil Velikov
0534fcf57d docs: fixup 19.0.5 <> 19.0.6 confusion
The title of the release notes says 19.0.5 while the rest of the file
(correctly) says 19.0.6

Fixes: fe79d75ccf ("docs: Add relnotes for 19.0.6")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan at pnwbakers.com>
2019-06-10 14:04:39 +01:00
Emil Velikov
a379b1c0ee mapi: correctly handle the full offset table
Earlier commit converted ES1 and ES2 to a new, much simpler, dispatch
generator. At the same time, GL/glapi and the driver side are still
using the old code.

There is a hidden ABI between GL*.so and glapi.so, former referencing
entry-points by offset in the _glapi_table. Hence earlier commit added
the full table of entry-points, alongside a marker for other cases like
indirect GL(X) and driver-size remapping.

Yet the patches did not handle things fully, thus it was possible to
get different interpretations of the dispatch table after the marker.

This commit fixes that adding an indicative error message to catch
future bugs.

While here correct the marker (MAX_OFFSETS) comment.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302
Fixes: cf317bf093 ("mapi: add all _glapi_table entrypoints tostatic_data.py")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-06-10 14:04:30 +01:00
Emil Velikov
497de977bd mapi: add static_date offset to EXT_dsa
As elaborated in the next patch, there is some hidden ABI that
effectively require most entrypoints to be listed in the file.

Cc: Marek Olšák <marek.olsak@amd.com>
Fixes: d2906293c4 ("mesa: EXT_dsa add selectorless matrix stackfunctions")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-06-10 14:04:25 +01:00
Emil Velikov
61960547df mapi: add static_date offset to MaxShaderCompilerThreadsKHR
As elaborated in the next patch, there is some hidden ABI that
effectively require most entrypoints to be listed in the file.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302
Cc: Marek Olšák <maraeo@gmail.com>
Fixes: c5c38e831e ("mesa: implement ARB/KHR_parallel_shader_compile")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-06-10 14:04:18 +01:00
Mathias Fröhlich
a7ecf78b90 egl: Let the caller of dri2_create_drawable decide about loaderPrivate.
In the call arguments to dri2_create_drawable decouple loaderPrivate
from dri2_surf. For all callers of dri2_create_drawable the two
pointers are the same with the exception of the gbm backed platform.
Let the calling code of dri2_create_drawable decide what
loaderPrivate shall be.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-06-10 11:06:48 +02:00
Samuel Pitoiset
91aa25f462 radv: fix alpha-to-coverage when there is unused color attachments
When alphaToCoverage is enabled, we should always write the alpha
channel of MRT0 if it's unused. This now matches RadeonSI.

This fixes the new CTS:
dEQP-VK.pipeline.multisample.alpha_to_coverage_unused_attachment.samples_*.alpha_invisible

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
2019-06-10 09:23:41 +02:00
Tomeu Vizoso
2fe7f9f2ae panfrost: ci: Switch from direct Docker use to buildah
Use the infrastructure in wayland/ci-templates to build the container
images.

This prevents from getting into some situations in which the images
wouldn't be rebuilt, and allows us to share some infrastructure with
other projects in freedesktop.org.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Michel Dänzer <michel@daenzer.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-10 08:09:23 +02:00
Kenneth Graunke
81582e9366 gallium/u_transfer_helper: Free the staging buffer on unmap.
u_transfer_helper sometimes mallocs a staging buffer, and leaked it.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-09 15:16:10 -07:00
Lionel Landwerlin
17898a9b7e intel/gpu_dump: fix argument passing
We were dropping "/' around arguments grouped together.
This was triggering failures with :

   $ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-06-09 19:45:13 +00:00
Eric Engestrom
93349d7118 util/os_file: suppress sign comparison warning
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-09 13:14:13 +00:00
Eric Engestrom
fd5c18de88 util/os_file: fix error being sign-cast back and forth
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-09 13:14:13 +00:00
Eric Engestrom
341ba406fd util/os_file: avoid shadowing read() with a local variable
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-09 13:14:13 +00:00
Eric Engestrom
7e35f20d44 util/os_file: actually return the error read() gave us
Fixes: 316964709e "util: add os_read_file() helper"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-09 13:14:13 +00:00
Alexandros Frantzis
f8f222ea36 virgl: Work around possible memory exhaustion
Since we don't normally flush before performing copy transfers, it's
possible in some scenarios to use too much memory for staging resources
and start failing. This can happen either because we exhaust the total
available memory (including system memory virtio-gpu swaps out to), or,
more commonly, because the total size of resources in a command buffer
doesn't fit in virtio-gpu video memory.

To reduce the chances of this happening, force a flush before a copy
transfer if the total size of queued staging resources exceeds a certain
limit. Since after a flush any queued staging resources will be
eventually released, this ensures both that each command buffer doesn't
require too much video memory, and that we don't end up consuming too
much memory for staging resources in total.

Fixes kernel errors reported when running texture_upload tests in glbench.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:45 -07:00
Alexandros Frantzis
e34f79c918 virgl: Remove incorrect resource wait condition
Now that we have copy transfers in place, we can remove the incorrect
resource wait condition. Copy transfers and other optimizations minimize
the performance impact of this removal, while providing the correct
behavior.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:43 -07:00
Alexandros Frantzis
236c55f650 virgl: Use copy transfers for textures
Extend copy transfers to also be used for busy textures.

Performance results:
Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:42 -07:00
Alexandros Frantzis
a22c5df079 virgl: Use buffer copy transfers to avoid waiting when mapping
We typically need to wait for a buffer to become ready before mapping,
so that we don't write new contents while the host is still using the
old contents. However, if we are allowed to discard the contents of the
mapped buffer range, then we can avoid waiting by using a staging buffer
range which we guarantee to never be busy, copying from the staging
buffer range to the target buffer in the host.

This commit implements this optimization by utilizing a dedicated
u_upload_mgr for the staging buffer.

Performance results:
Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS
glmark2 ubo, qemu before: 38 FPS after: 331 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:39 -07:00
Alexandros Frantzis
6e7726e50c virgl: Support copy transfers
Support transfers that use a different resource as the source of data to
transfer. This will be used in upcoming commits to send data to host
buffers through a transfer upload buffer, in order to avoid waiting
when the buffer resource is busy.

Note that we don't support queueing copy transfers in the transfer
queue. Copy transfers should be emitted directly in the command queue,
allowing us to avoid flushes before them and leads to better
performance.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:36 -07:00
Alexandros Frantzis
199d95f29e virgl: Add copy_transfer3d definitions
Introduce definitions for the copy_transfer3d protocol command and virgl
capability. This command transfers data to the host by copying through
another resource, and will be used in upcoming commits to avoid waiting
when transferring data for busy resources.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:34 -07:00
Alexandros Frantzis
ccec1555c1 virgl: Make VIRGL_BIND_STAGING resources cacheable
This could help performance when trying to recreate such resources for
copy transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:33 -07:00
Alexandros Frantzis
636345f496 virgl: Support VIRGL_BIND_STAGING
Support a new virgl bind type for staging buffers which don't require
dedicated host-side storage. These will be used to implement copy
transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:31 -07:00
Alexandros Frantzis
f38cdaebac virgl: Avoid unfinished transfer_get with PIPE_TRANSFER_DONTBLOCK
If we are not allowed to block, and we know that we will have to wait,
either because the resource is busy, or because it will become busy due
to a readback, return early to avoid performing an incomplete
transfer_get. Such an incomplete transfer_get may finish at any time,
during which another unsynchronized map could write to the resource
contents, leaving the contents in an undefined state.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:22 -07:00
Alexandros Frantzis
8eb8222c10 virgl: Deduplicate checks for resource caching
Also fixes a missed check for VIRGL_BIND_CUSTOM in one of the duplicate
code snippets.

Note that legacy fences also use VIRGL_BIND_CUSTOM, but we ensured they
don't go through the cache in the previous commit.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:20 -07:00
Alexandros Frantzis
e0ffcdf16a virgl: Don't try to use cached resources for legacy fences
Resources for fences should not be from the cache, since we are basing
the fence status on the resource creation busy status.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:16 -07:00
Alexandros Frantzis
8089d3658a virgl: More info about chosen alignment value
Add more info about why the value of VIRGL_MAP_BUFFER_ALIGNMENT.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:44:53 -07:00
Chia-I Wu
371743157e virgl: store all info about atomic buffers
We will need the full info.  This also speeds up
virgl_attach_res_atomic_buffers and fixes resource leaks when the
context is destroyed.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu
98fd742d7e virgl: add shader images to virgl_shader_binding_state
It replaces virgl_context::images.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu
f965efb3c8 virgl: add SSBOs to virgl_shader_binding_state
It replaces virgl_context::ssbos.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu
920c4143f0 virgl: add UBOs to virgl_shader_binding_state
It replaces virgl_context::ubos.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu
2e21d66d7a virgl: add virgl_shader_binding_state
virgl_shader_binding_state will be used to manage all per-stage
shader bindings.  For now, it manages only sampler views.

This replaces virgl_textures_info and fixes some issues

 - start_slot is now honored
 - views outside of [start_slot, slart_slot+count) are unmodified
 - views are released when the context is destroyed

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Kenneth Graunke
30314270d4 iris: Zero shs->cbuf0 when binding a passthrough TCS
Fixes valgrind errors when running two CTS tests back to back:
- KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT*
(The first test has an actual TCS, the second uses passthrough.)
2019-06-07 15:13:42 -07:00
Jason Ekstrand
1e6b32d08c intel/blorp: Only double the fast-clear rect alignment on HSW
This restriction was accidentally added to the BSpec/PRM as an
unrestricted restriction starting with the HSW docs and it was never
removed.  However, it only ever applied to HSW and actually potentially
causes problems on BDW and above where we have mipmapped fast-clears.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-06-07 22:00:55 +00:00
Rob Clark
3c456cf583 freedreno/a6xx: re-arrange program stageobj/group
Split out a separate program config state group to run early before the
other groups.

This seems to help w/ intermittent "missed tiles" (although I had
assumed that was a mem2gmem issue), or at least I can't reproduce that
issue with this patch, but can without.

It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Rob Clark
958f6ffb60 freedreno/a6xx: fix hangs with newer sqe fw
With the newer (v1.76) fw, we were getting hangs (compared to older
v1.66 fw).  Re-work the GMEM code to structure things a bit closer to
the blob.  This moves some PKT7 packets from IB2 to IB1, which I think
is what was confusing SQE and causing it to get stuck in an infinite
loop.  But in general structuring things at least closer to the same way
blob does makes it easier to compare cmdstream.

Note: this is a bit on the large side for what I'd normally consider for
stable.. but right now it is looking  like it is the newer fw that is
headed for linux-firmware.  This should defn have some soak time on
master, but probably a good idea for this patch to end up in distro mesa
builds by the time a630_sqe.fw hits linux-firmware.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Rob Clark
1d002cfade freedreno/a6xx: WFI before RB_CCU_CNTL writes
This seems to be in a block of non buffered/context regs.  Blob always
WFIs before write, so probably a good idea.

Annoyingly, compared to ealier gens, it is a bit harder to tell from the
register offset whether it is a buffered reg, it isn't as simple as
everything below 0x2000, it seems.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Rob Clark
8a02ca807d freedreno/a6xx: don't pre-dispatch texture fetch on accident
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Rob Clark
b820c09fa8 freedreno/a6xx: fix issues with gallium HUD
In some cases the draw for the text wasn't working.  This seems to be
fixed by resyncing some of the "golded registers" from blob (initial
values were based on somewhat older blob version).

Perhaps good to have a bit of soak time on master, but would be good
to eventually land in 19.x stable branches.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Nanley Chery
b4198e792c anv/cmd_buffer: Initalize the clear color struct for CNL+
On CNL+, the clear color struct is composed of RGBA channel values and
fields which are either reserved by the HW or used to control
fast-clears. Currently anv initializes the channel values to zero and
allows the other fields to be undefined.

Satisfy the MBZ field requirements by removing an optimization that
doesn't hold true for CNL+ and pulling in the number of dwords to
initialize from ISL.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-07 18:43:06 +00:00
Jon Turney
87173ded6e glx/windows: Fix compilation with -Werror-format
Fix compilation where the DWORD type is used with a format, after
-Werror-format added by c9c1e261.

Some Win32 API types are different fundamental types in the 32-bit and
64-bit versions. This problem is then further compounded by the fact
that whilst both 32-bit Cygwin and 32-bit MinGW use the ILP32 data
model, 64-bit MinGW uses the LLP64 data model, but 64-bit Cygwin uses
the LP64 data model. This makes it near impossible to write printf
format specifiers which are correct for all those targets.

In the Win32 API, DWORD is an unsigned, 32-bit type.  So, it is defined
in terms of an unsigned long, except in the LP64 data model used by
64-bit Cygwin, where it is an unsigned int.

It should always be safe to cast it to unsigned int and use %u or %x.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 11:28:48 -07:00
Kenneth Graunke
cd796120c9 iris: Rename bind_state to bind_shader_state.
bind_state is possibly the worst name ever.  For create, we used
create_shader_state, which is more descriptive.  Put shader in the name.
2019-06-07 11:26:20 -07:00
Kenneth Graunke
d5d2fb5c4c isl: Mark enum isl_channel_select packed so it becomes 1 byte.
I recently discovered that the following code lead to valgrind errors:

   struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY;
   VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle));

which is surprising, because struct isl_swizzle is simply:

   struct isl_swizzle {
      enum isl_channel_select r:4;
      enum isl_channel_select g:4;
      enum isl_channel_select b:4;
      enum isl_channel_select a:4;
   };

and the above code initializes all of them with a C99 initializer.
Iván Briano reminded me that C99 initializers don't necessarily zero
padding.  A quick inspection revealed that sizeof(struct isl_swizzle)
was 4 (rather than the expected 2).  Ian Romanick suggested changing
it to uint16_t, since this is essentially dicing up an unsigned, and
that worked.

This patch marks enum isl_channel_select packed, changing its size
from 4 bytes to 1 byte.  This then makes struct isl_swizzle 2 bytes,
with no bogus padding fields.  This eliminates valgrind undefined
memory warnings.

These isl_swizzle values become part of our BLORP blit program keys,
which are then hashed.  This undefined padding was being included in
the hashing, possibly leading to issues.  I originally saw this error
when running KHR-GL45.texture_size_promotion.functional in iris under
valgrind.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-07 11:09:44 -07:00
Alyssa Rosenzweig
e1c14b2820 panfrost/ci: Texture wrap tests are legitimately fixed
These depended on the wallpaper reload.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig
8442dde169 panfrost/midgard: Lower inot to inor with 0
We were previously lowering to inand, but the second arg was not
duplicated so inot would always return ~0. Oops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig
d415748955 panfrost/midgard: Cleanup tag fetch in disassembler
Trivial.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig
d3ad8d6b48 panfrost/midgard: Use fancy iterator
Trivial cleanup.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig
ae20bee75e panfrost/midgard: Cull dead branches
This fixes bugs with complex control flow.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig
c62f2ff852 panfrost/midgard: Add mir_print_bundle helper
This helps with debugging scheduling/emission.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig
fd6d6c1b15 panfrost/midgard/disasm: Pretty-print branch tags
Just makes it a little more obvious what's going on.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig
2ebf22c399 panfrost/ci: Note some since-fixed tests
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig
de8d49acdc panfrost/midgard: Vectorize I/O
This uses the new mesa/st functionality for NIR I/O vectorization, which
eliminates a number of corner cases (resulting in assorted dEQP
failures and regressions) and should improve performance substantial due
to lessened pressure on the load/store pipe.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig
4aced18031 panfrost/midgard: Remove varyings delay pass
This pass interfered with the more delicate path required for
non-vectorized I/O. It's also ugly and duplicating the job of an actual
honest-to-goodness scheduler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig
43568f2675 panfrost/midgard: Apply component to load_input
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Eric Engestrom
440fe0eb43 nir: fix s/&&/||/ typo
Fixes: cd73b6174b "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-07 16:06:25 +01:00
Kristian H. Kristensen
b9bbac6234 freedreno/a6xx: Drop struct stage array
This now boils down to just picking between binning or vertex shader
and dummy_fs or real fs, which we can do in a couple of lines of code
instead.  The constlen logic isn't doing what it thinks it's doing,
both constlens at this point

  MAX2(s[VS].constlen, align(state->bs->constlen, 4));

are binning shader constlens.  We'll have to revisit the constlen
logic, but this commit doesn't change how it works.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:33:12 -07:00
Kristian H. Kristensen
9382a3c11d freedreno/a6xx: Drop support for SS6_DIRECT shader upload
a6xx only supports indirect shaders.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:33:10 -07:00
Kristian H. Kristensen
0ef00ceb2e freedreno/a6xx: Share shader_t_to_opcode
We have a similar function in fd6_program.c. Move to fd6_emit.h and
share.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:33:03 -07:00
Kristian H. Kristensen
4552162e2d freedreno/a6xx: Consolidate more of dword 0 building in fd6_draw_vbo
There's already a bit of duplicated logic here and tessellation will
add more. Build up dword 0 in fd6_draw_vbo() and drop the a4xx in the
process.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:32:59 -07:00
Kristian H. Kristensen
cae6b4d741 freedreno: Move fd4_size2indextype() helper to freedreno_util.h
In preparation for refactoring fd6_draw.c a bit.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:32:34 -07:00
Samuel Pitoiset
0905189a25 radv: enable VK_EXT_sample_locations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:17 +02:00
Samuel Pitoiset
05f5fa661f radv: enable HTILE for images that might need variable sample locations
This is now supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:14 +02:00
Samuel Pitoiset
e7677a697b radv: handle sample locations during automatic layout transitions
From the Vulkan spec 1.1.109:

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. [...] and
    VkRenderPassSampleLocationsBeginInfoEXT can be chained from
    VkRenderPassBeginInfo to provide sample locations for layout
    transitions performed implicitly by a render pass instance."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:11 +02:00
Samuel Pitoiset
d0d41e58c3 radv: determine the first subpass id for every attachments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:08 +02:00
Samuel Pitoiset
f58e9f6d69 radv: handle sample locations during explicit depth/stencil transitions
From the Vulkan spec 1.1.109,

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. VkSampleLocationsInfoEXT
    can be chained from VkImageMemoryBarrier structures to provide
    sample locations for layout transitions performed by
    vkCmdWaitEvents and vkCmdPipelineBarrier calls."

This handles explicit depth/stencil layout transitions performed
with CmdWaitEvents() or CmdPipelineBarrier().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:01 +02:00
Samuel Pitoiset
a20925f2a9 radv: allow the depth decompress pass to emit dynamic sample locations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:00 +02:00
Samuel Pitoiset
2dd8dfd913 radv: allow to set dynamic sample locations to the depth decompress pass
If VK_EXT_sample_locations is used, the driver might need to emit
the sample locations specified during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:10:55 +02:00
Samuel Pitoiset
d78990c174 radv: allow to save/restore sample locations during meta operations
This will be used for the depth decompress pass that might need
to emit variable sample locations during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:10:50 +02:00
Kenneth Graunke
22025595f3 iris: Sweep the NIR in iris_create_uncompiled_shader().
We run a ton of backend specific passes here (mostly brw_preprocess_nir)
and ought to sweep up any unused memory at this point, since we're going
to hang on to this NIR for as long as the linked program lives.
2019-06-07 01:29:38 -07:00
Eduardo Lima Mitev
c02ffd2700 ir3: Use the new NIR lowering pass for integer multiplication
Shader-db stats courtesy of Eric Anholt:

total instructions in shared programs: 6480215 -> 6475457 (-0.07%)
instructions in affected programs: 662105 -> 657347 (-0.72%)
helped: 1209
HURT: 13
total constlen in shared programs: 1432704 -> 1427769 (-0.34%)
constlen in affected programs: 100063 -> 95128 (-4.93%)
helped: 512
HURT: 0
total max_sun in shared programs: 875561 -> 873387 (-0.25%)
max_sun in affected programs: 46179 -> 44005 (-4.71%)
helped: 1087
HURT: 0

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Eduardo Lima Mitev
340277ad71 ir3/nir: Add new NIR AlgebraicPass for lowering imul
Currently, ir3 backend compiler is lowering integer multiplication from:

dst = a * b

to:

dst = (al * bl) + (ah * bl << 16) + (al * bh << 16)

by emitting this code:

mull.u tmp0, a, b           ; mul low, i.e. al * bl
madsh.m16 tmp1, a, b, tmp0  ; mul-add shift high mix, i.e. ah * bl << 16
madsh.m16 dst, b, a, tmp1   ; i.e. al * bh << 16

which at that point has very low chances of being optimized.

This patch adds a new nir_algebraic.AlgebraicPass to performs this
lowering during NIR algebraic optimization passes, giving it a better
chance for optimizing the resulting code.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Eduardo Lima Mitev
3addd7c8d9 nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16
For umul_low (al * bl), zero is returned if the low 16-bits word of either
source is zero.

for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl'
is zero.

A couple of nir_search_helpers are added:

is_upper_half_zero() returns true if the highest word of all components of
an integer NIR alu src are zero.

is_lower_half_zero() returns true if the lowest word of all components of
an integer nir alu src are zero.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Eduardo Lima Mitev
e45de3a6c3 ir3/compiler: Handle new alu opcodes 'umul_low' and 'imadsh_mix16'
They directly emit ir3_MULL_U and ir3_MADSH_M16 respectively.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Eduardo Lima Mitev
c27b3758fa nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes
'umul_low' is the low 32-bits of unsigned integer multiply. It maps
directly to ir3's MULL_U.

'imadsh_mix16' is multiply add with shift and mix, an ir3 specific
instruction that maps directly to ir3's IMADSH_M16.

Both are necessary for the lowering of integer multiplication on
Freedreno, which will be introduced later in this series.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Iago Toral Quiroga
9b96ae69bc v3d: don't emit point coordinates varyings if the FS doesn't read them
We still need to emit them in V3D 3.x since there there is no mechanism to
disable them.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:29:42 +02:00
Iago Toral Quiroga
5e26e55e72 v3d: add a helper to track variables that need point coordinates
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:26:52 +02:00
Kenneth Graunke
4e3297f7d4 egl/x11: calloc dri2_surf so it's properly zeroed
Commit 2282ec0a refactored drawable creation across various platforms
into a new dri2_create_drawable helper function.

The GBM code in platform_drm.c code passed in dri2_surf->gbm_surf as the
loaderPrivate, while most other backends passed in dri2_surf directly.

To try and handle this, the patch checked if dri2_surf->gbm_surf was
non-NULL, and if so, presumed that the caller is the DRM platform and
we should use the dri2_surf->gbm_surf pointer.

This worked for most platforms, which calloc their dri2_surf structure,
zeroing the data.  Unfortunately, platform_x11.c used malloc, leaving
most of the dri2_surf as garbage.  In particular, dri2_surf->gbm_surf
was often non-NULL, causing dri2_create_drawable to try and use it,
passing a garbage pointer to the createNewDrawable hook, usually leading
to a SIGBUS or SIGSEGV when trying to dereference that bad pointer.

Since most callers calloc the data, make platform_x11.c follow suit.

Fixes crashes with i915_dri.so when running dEQP-GLES2.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-06 22:45:27 -07:00
Mark Janes
04dac69752 tests/graw: use C99 print conversion specifier for 32 bit builds
Fixes formatting errors for 32 bit compilations, eg:

  error: format specifies type 'unsigned long' but the argument has
  type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat]
  printf("result1 = %lu result2 = %lu\n", res1.u64, res2.u64);

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06 14:39:41 -07:00
Alyssa Rosenzweig
30adeb7a53 panfrost/midgard: Fix crash with unused SSA values
Crash introduced in "b38dab101ca7e0896255dccbd85fd510c47d84d1" but not
adding a Fixes tag since it's our bug anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-06 13:44:27 -07:00
Boris Brezillon
3d661a4ef9 panfrost: Report sRGB colorspace as not supported
The driver does not support sRGB yet, so let's report it as unsupported.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-06 13:41:54 -07:00
Erik Faye-Lund
c0dfe8c6df docs: do not use div for line-breaking
HTML has the <p>-tag for this purpose. It adds some margins, but that
just makes this read better, IMO.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-06 17:51:45 +00:00
Erik Faye-Lund
f3235cfa70 docs: fixup code-tag positioning
This reads better if we include the asterisk in the code-block, as it's
part of the function-reference, even though it's not technically
speaking code. But as the <code>-tag isn't purely for code, this should
be fine.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-06 17:51:45 +00:00
Erik Faye-Lund
205f960e08 docs: add missing code-tags
Looks like I missed a few cases when I recently added more code-tags
here. So let's add these cases as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-06 17:51:45 +00:00
Erik Faye-Lund
54b7a1f175 docs: add accidentally dropped "at"
When rewriting 20c56e18c2 after review, I accidentally dropped the "at"
here. Sorry for that, and let's fix it up!

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 20c56e18c2 ("docs: use proper links instead of code-tags")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-06 17:51:45 +00:00
Gurchetan Singh
110f139f98 anv: allow NV12 <--> AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 inter-op
AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 is an implementation defined
flexible YUV format.  Most of the times, it's NV12 or YV12.
On Intel, NV12 is preferred since it can be used by the display
engine.  

This API adds a dependency between gralloc and buffer consumers,
unfortunately.  Right now, the code seems to work for i915 gralloc,
but not cros_gralloc.  Add a preprocessor flag to fix this.

TEST=android.graphics.cts.MediaVulkanGpuTest#testMediaImportAndRendering

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-06 09:20:03 -07:00
Connor Abbott
9d93d2a404 ac/nir: Remove stale TODO
While we're here, copy the comment explaining this from radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-06 17:14:28 +02:00
Connor Abbott
1d55b0da59 radeonsi: Don't force dcc disable for loads
When e9d935ed0e added force_dcc_off(), we forced it off for any
preloaded image descriptor which had stores associated with them, since
the same preloaded descriptors were used for loads and stores. However,
when the preloading was removed in 16be87c904, the existing logic was
kept despite it not being necessary anymore. The comment above
force_dcc_off() only mentions stores, so only force DCC off for stores.

Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-06 17:14:28 +02:00
Gert Wollny
10895c39c3 mesa/main: Expose EXT_clip_control and related enums and the function
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-06 12:25:17 +02:00
Gert Wollny
f1f6228a38 mapi/glapi/registry: Update gl.xml to latest upstream version
The old copy didn't include EXT_clip_control, so update it.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-06 12:25:12 +02:00
Gert Wollny
8657257a6e virgl: Enable CAP_CLIP_HALFZ if host supports it
On according hosts this enables the piglits as "pass":
  arb_clip_control-*

v2: sync flag with host

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-06 12:24:53 +02:00
Charmaine Lee
f29b8fde91 svga: Remove unnecessary check for the pre flush bit for setting vertex buffers
This fixes the missing rebind when the can_pre_flush bit
is not set and the vertex buffers are the same as what have been sent.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Signed-off-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2019-06-06 10:27:10 +02:00
Deepak Rawat
72fc886826 winsys/svga/drm: Fix 32-bit RPCI send message
Depending on whether compiled with frame-pointer or not, the temporary
memory location used for the bp parameter in these macros are referenced
relative to the stack pointer or the frame pointer.
Hence we can never reference that parameter when we've modified either
the stack pointer or the frame pointer, because then the compiler would
generate an incorrect stack reference.

Fix this by pushing the temporary memory parameter on a known location on
the stack before modifying the stack- and frame pointers.

Also in case of failuire RPCI channel is not closed which lead to vmx
running out of channels.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2019-06-06 10:27:10 +02:00
Samuel Pitoiset
b9d3a6b656 radv: set the subpass before any initial subpass transitions
This might fix initial subpass transitions when multiview is used.
Noticed while implementing sample locations during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-06 10:00:29 +02:00
Nataraj Deshpande
d6724471a5 anv: Fix check for isl_fmt in assert
Checking isl_fmt returned value in assert seems appropriate
instead of format variable.

Fixes: f1654fa7e3 "anv/android: support creating images from external format"
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-06-06 09:24:08 +03:00
Iago Toral Quiroga
09d230c6cf v3d: fix scheduling dependency tracking for ALU with small immediates
We were not accountint for small immediates in the B mux so the scheduler
was interpreting these are regular register file accesses, which could
lead to additional (incorrect) write-read dependencies.

Shader-db changes:

total instructions in shared programs: 9163664 -> 9137263 (-0.29%)
instructions in affected programs: 3931035 -> 3904634 (-0.67%)
helped: 12457
HURT: 2563

total max-temps in shared programs: 1325787 -> 1325597 (-0.01%)
max-temps in affected programs: 5746 -> 5556 (-3.31%)
helped: 186
HURT: 16
helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1
helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28%
HURT stats (abs)   min: 1 max: 3 x̄: 1.12 x̃: 1
HURT stats (rel)   min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88%
95% mean confidence interval for max-temps value: -1.04 -0.84
95% mean confidence interval for max-temps %-change: -4.16% -3.07%
Max-temps are helped.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06 08:16:43 +02:00
Vasily Khoruzhick
b412e05751 lima/ppir: add missing handling of min/max ops for vec4 add slot
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-06 04:30:36 +00:00
Vasily Khoruzhick
5980565a37 lima/ppir: fix crash when program uses no registers at all
Program may need no regalloc at all, e.g. in case when program consists
of single discard op.

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-06-06 04:30:36 +00:00
Jason Ekstrand
b38dab101c util/hash_table: Assert that keys are not reserved pointers
If we insert a NULL key, it will appear to succeed but will mess up
entry counting.  Similar errors can occur if someone accidentally
inserts the deleted key.  The later is highly unlikely but technically
possible so we should guard against it too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06 00:27:53 +00:00
Jason Ekstrand
8306dabc03 util/set: Assert that keys are not reserved pointers
If we insert a NULL key, it will appear to succeed but will mess up
entry counting.  Similar errors can occur if someone accidentally
inserts the deleted key.  The later is highly unlikely but technically
possible so we should guard against it too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06 00:27:53 +00:00
Jason Ekstrand
7a18ce0b91 glsl/loop_analysis: Don't search for NULL variables in the hash table
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-06 00:27:53 +00:00
Jason Ekstrand
d96878a66a nir/propagate_invariant: Don't add NULL vars to the hash table
Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06 00:27:53 +00:00
Ian Romanick
1c30d26d89 intel/compiler: Treat b32csel as potentially producing a Boolean result for resolve analysis
If the 2nd and 3rd source are both Boolean values, we can potentially
avoid a resolve by only resolving the result of the b32csel.

No changes on any Gen6+ Intel platform.

v2: Use ?: instead of cast from bool to unsigned.  Suggested by Caio.

Iron Lake
total instructions in shared programs: 8142729 -> 8142677 (<.01%)
instructions in affected programs: 12890 -> 12838 (-0.40%)
helped: 26
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.39%
Instructions are helped.

total cycles in shared programs: 188549632 -> 188549394 (<.01%)
cycles in affected programs: 60754 -> 60516 (-0.39%)
helped: 25
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8
helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70%
95% mean confidence interval for cycles value: -12.91 -5.40
95% mean confidence interval for cycles %-change: -0.84% -0.23%
Cycles are helped.

GM45
total instructions in shared programs: 5013119 -> 5013093 (<.01%)
instructions in affected programs: 6764 -> 6738 (-0.38%)
helped: 13
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.34%
Instructions are helped.

total cycles in shared programs: 128977804 -> 128977700 (<.01%)
cycles in affected programs: 37738 -> 37634 (-0.28%)
helped: 13
HURT: 0
helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8
helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26%
95% mean confidence interval for cycles value: -8.00 -8.00
95% mean confidence interval for cycles %-change: -0.36% -0.24%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:04:17 -07:00
Ian Romanick
0ba9497e66 intel/fs: Improve discard_if code generation
Previously we would blindly emit an sequence like:

        mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
        ...
        cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
(+f0.1) cmp.z.f0.1(16)  null<1>D        g7<8,8,1>D      0D

The first move sets the flags based on the initial execution mask.
Later discard sequences contain a predicated compare that can only
remove more SIMD channels.  Often times the only user of the result from
the first compare is the second compare.  Instead, generate a sequence
like

        mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
        ...
        cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
(+f0.1) cmp.ge.f0.1(8)  null<1>F        g5<8,8,1>F      0x41700000F  /* 15F */

If the results stored in g7 and f0.0 are not used, the comparison will
be eliminated.  This removes an instruction and potentially reduces
register pressure.

v2: Major re-write of the commit message (including fixing the assembly
code).  Suggested by Matt.

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224434 -> 17198659 (-0.15%)
instructions in affected programs: 2908125 -> 2882350 (-0.89%)
helped: 18891
HURT: 5
helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02%
HURT stats (abs)   min: 9 max: 105 x̄: 51.40 x̃: 35
HURT stats (rel)   min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56%
95% mean confidence interval for instructions value: -1.39 -1.34
95% mean confidence interval for instructions %-change: -1.79% -1.73%
Instructions are helped.

total cycles in shared programs: 361468458 -> 361170679 (-0.08%)
cycles in affected programs: 38470116 -> 38172337 (-0.77%)
helped: 16202
HURT: 1456
helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18
helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18%
HURT stats (abs)   min: 1 max: 5982 x̄: 87.51 x̃: 28
HURT stats (rel)   min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64%
95% mean confidence interval for cycles value: -18.24 -15.49
95% mean confidence interval for cycles %-change: -2.26% -2.14%
Cycles are helped.

total spills in shared programs: 12147 -> 12176 (0.24%)
spills in affected programs: 175 -> 204 (16.57%)
helped: 8
HURT: 5

total fills in shared programs: 25262 -> 25292 (0.12%)
fills in affected programs: 269 -> 299 (11.15%)
helped: 8
HURT: 5

Haswell
total instructions in shared programs: 13530316 -> 13502647 (-0.20%)
instructions in affected programs: 2507824 -> 2480155 (-1.10%)
helped: 18859
HURT: 10
helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1
helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41%
HURT stats (abs)   min: 5 max: 39 x̄: 25.70 x̃: 31
HURT stats (rel)   min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31%
95% mean confidence interval for instructions value: -1.49 -1.44
95% mean confidence interval for instructions %-change: -2.42% -2.34%
Instructions are helped.

total cycles in shared programs: 377865412 -> 377639034 (-0.06%)
cycles in affected programs: 40169572 -> 39943194 (-0.56%)
helped: 15550
HURT: 1938
helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18
helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25%
HURT stats (abs)   min: 1 max: 4862 x̄: 89.17 x̃: 35
HURT stats (rel)   min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75%
95% mean confidence interval for cycles value: -14.42 -11.47
95% mean confidence interval for cycles %-change: -2.05% -1.91%
Cycles are helped.

total spills in shared programs: 26769 -> 26814 (0.17%)
spills in affected programs: 826 -> 871 (5.45%)
helped: 9
HURT: 10

total fills in shared programs: 38383 -> 38425 (0.11%)
fills in affected programs: 834 -> 876 (5.04%)
helped: 9
HURT: 10

LOST:   5
GAINED: 10

Ivy Bridge
total instructions in shared programs: 12079250 -> 12044139 (-0.29%)
instructions in affected programs: 2409680 -> 2374569 (-1.46%)
helped: 16135
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2
helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68%
95% mean confidence interval for instructions value: -2.21 -2.14
95% mean confidence interval for instructions %-change: -2.76% -2.67%
Instructions are helped.

total cycles in shared programs: 180116747 -> 179900405 (-0.12%)
cycles in affected programs: 25439823 -> 25223481 (-0.85%)
helped: 13817
HURT: 1499
helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18
helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97%
HURT stats (abs)   min: 1 max: 3684 x̄: 98.99 x̃: 52
HURT stats (rel)   min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42%
95% mean confidence interval for cycles value: -15.68 -12.57
95% mean confidence interval for cycles %-change: -1.77% -1.63%
Cycles are helped.

LOST:   8
GAINED: 10

Sandy Bridge
total instructions in shared programs: 10878990 -> 10863659 (-0.14%)
instructions in affected programs: 1806702 -> 1791371 (-0.85%)
helped: 13023
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1
helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10%
95% mean confidence interval for instructions value: -1.18 -1.17
95% mean confidence interval for instructions %-change: -1.68% -1.62%
Instructions are helped.

total cycles in shared programs: 154082878 -> 153862810 (-0.14%)
cycles in affected programs: 20199374 -> 19979306 (-1.09%)
helped: 12048
HURT: 510
helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18
helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52%
HURT stats (abs)   min: 1 max: 448 x̄: 54.39 x̃: 16
HURT stats (rel)   min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17%
95% mean confidence interval for cycles value: -17.97 -17.08
95% mean confidence interval for cycles %-change: -1.84% -1.75%
Cycles are helped.

LOST:   1
GAINED: 0

Iron Lake
total instructions in shared programs: 8155075 -> 8142729 (-0.15%)
instructions in affected programs: 949495 -> 937149 (-1.30%)
helped: 5810
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.59% -2.48%
Instructions are helped.

total cycles in shared programs: 188584610 -> 188549632 (-0.02%)
cycles in affected programs: 17274446 -> 17239468 (-0.20%)
helped: 3881
HURT: 90
helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6
helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30%
HURT stats (abs)   min: 2 max: 10 x̄: 2.80 x̃: 2
HURT stats (rel)   min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07%
95% mean confidence interval for cycles value: -9.35 -8.27
95% mean confidence interval for cycles %-change: -0.85% -0.77%
Cycles are helped.

GM45
total instructions in shared programs: 5019308 -> 5013119 (-0.12%)
instructions in affected programs: 489028 -> 482839 (-1.27%)
helped: 2912
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.54% -2.39%
Instructions are helped.

total cycles in shared programs: 129002592 -> 128977804 (-0.02%)
cycles in affected programs: 12669152 -> 12644364 (-0.20%)
helped: 2759
HURT: 37
helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4
helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31%
HURT stats (abs)   min: 2 max: 10 x̄: 3.62 x̃: 4
HURT stats (rel)   min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04%
95% mean confidence interval for cycles value: -9.53 -8.20
95% mean confidence interval for cycles %-change: -0.79% -0.70%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:04:13 -07:00
Ian Romanick
a288708506 intel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu
This is the same as the need_dest parameter to
prepare_alu_destination_and_sources.  This allows us to not change the
register that is expected to hold an result if an instruction is
re-emitted.  This is particularly a problem if the re-emitted
instruction is a partial write.  A later patch will use this feature.

No shader-db changes on any Intel platform.

v2: Don't do the Boolean resolve when there is no destination.  If the
ALU instruction didn't write a register, there's nothing to resolve.
This replaces an earlier patch "intel/fs: Allocate dummy destination
register when need_dest is false".

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:04:08 -07:00
Ian Romanick
e13a5c7d67 intel/fs: Allow cmod propagation across reads and writes of different flags
This also helps a later patch (intel/fs: Improve discard_if code
generation) on about 200 shaders.

v2: Document that other instruction sequences are also valid in
subtract_merge_with_compare_intervening_mismatch_flag_write.  Suggested
by Caio.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17224434 (<.01%)
instructions in affected programs: 296 -> 292 (-1.35%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.04% -0.81%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361468458 (<.01%)
cycles in affected programs: 2862 -> 2865 (0.10%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31%
HURT stats (abs)   min: 3 max: 4 x̄: 3.50 x̃: 3
HURT stats (rel)   min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -4.34 5.84
95% mean confidence interval for cycles %-change: -0.70% 0.90%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:03:45 -07:00
Ian Romanick
8030cb75c1 intel/fs: Fix flag_subreg handling in cmod propagation
There were two errors.  First, the pass could propagate conditional
modifiers from an instruction that writes on flag register to an
instruction that writes a different flag register.  For example,

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
    cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F

could be come

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F

Second, if an instruction writes f0.1 has it's condition propagated, the
modified instruction will incorrectly write flag f0.0.  For example,

    linterp(16) vgrf6:F, g2:F, attr0:F
    cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F
    (-f0.1) discard_jump(16) (null):UD

could become

    linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F
    (-f0.1) discard_jump(16) (null):UD

None of these cases will occur currently.  The only time we use f0.1 is
for generating discard intrinsics.  In all those cases, we generate a
squence like:

    cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F
    (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d
    (-f0.1) discard_jump(16) (null):UD

Due to the mixed types and incompatible conditions, this sequence would
never see any cmod propagation.  The next patch will change this.

No shader-db changes on any Intel platform.

v2: Fix typo in comment in test case subtract_delete_compare_other_flag.
Noticed by Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:03:40 -07:00
Ian Romanick
2dd6013933 intel/fs: Add missing tests for cmod_propagate_not
Tests like this should have been added in 4467040cb6 ("i965/fs:
Propagate conditional modifiers from not instructions").

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-05 17:03:31 -07:00
Kenneth Graunke
6a7d387394 i965: Allow signed/unsigned integer conversions in miptree up/download
BLORP now handles this so there's no reason to fall back.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-05 16:58:07 -07:00
Kenneth Graunke
f06c86358c intel/blorp: Handle SINT/UINT clamping on blits.
This patch makes blorp_blit handle SINT<->UINT blit value clamping.
After reading the source's integer data (which is expanded to 32-bit),
we either IMAX with 0 (for SINT -> UINT, to clamp negative numbers) or
UMIN with (1 << 31) - 1 (for UINT -> SINT, to clamp positive numbers
outside of the representable range).

Such blits are not allowed by the OpenGL or Vulkan APIs directly:

   The Vulkan 1.1 spec for vkCmdBlitImage says:

   "Integer formats can only be converted to other integer formats with
    the same signedness."

   The GL 4.5 spec for glBlitFramebuffer says:

   "An INVALID_OPERATION error is generated if format conversions are
    not supported, which occurs under any of the following conditions:
    [...]
    * The read buffer contains unsigned integer values and any draw
      buffer does not contain unsigned integer values.
    * The read buffer contains signed integer values and any draw buffer
      does not contain signed integer values."

However, they are useful for other operations, such as texture upload
and download, which typically are implemented via blorp_blit().  i965
has code to fall back in this case (which the next commit will delete),
and Gallium expects blit() to handle this case for texture upload.

Fixes the following tests on iris:
- GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels
- GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo
- GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-05 16:58:07 -07:00
Caio Marcelo de Oliveira Filho
1aea4cd0d9 anv/pipeline: Move lowering of nir_var_mem_global later
This let deref optimizations apply to globals before lowering them.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-05 16:57:09 -07:00
Kenneth Graunke
4f3c82c72c st/nir: Don't use GLSL IR's MOD_TO_FLOOR lowering when using NIR.
Both GLSL IR and NIR perform the same mod -> floor lowering for 32-bit
types.  But nir_lower_double_ops is slightly more defensive against
lowered drcp precision loss, and handles mod(x, x) = 0 directly.  This
works well...assuming nir_lower_double_ops actually gets an fmod op to
lower in the first place.

The previous patches enabled NIR-based lowering for the remaining
drivers, so we can stop using the GLSL IR lowering when using NIR.

Fixes KHR-GL45.gpu_shader_fp64.builtin.mod_dvec[234] on iris.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke
f4d4c42608 radeonsi: Enable NIR's lower_fmod option.
Currently, st/mesa is always calling the GLSL IR lower_instructions()
pass with MOD_TO_FLOOR set, so mod operations will be lowered before
ever reaching NIR.  This enables the same lowering at the NIR level,
which will let me shut off the GLSL IR path for NIR-based drivers.

The AMD NIR backend also has code to handle fmod, so we could
potentially skip this and still be fine.  I don't have an opinion
on that.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke
e0641e0728 vc4: Enable NIR's lower_fmod option.
Currently, st/mesa is always calling the GLSL IR lower_instructions()
pass with MOD_TO_FLOOR set, so mod operations will be lowered before
ever reaching NIR.  This enables the same lowering at the NIR level,
which will let me shut off the GLSL IR path for NIR-based drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Eric Anholt <eric@anholt.net>
2019-06-05 16:45:12 -07:00
Kenneth Graunke
b0e3bd79dc v3d: Enable NIR's lower_fmod option.
Currently, st/mesa is always calling the GLSL IR lower_instructions()
pass with MOD_TO_FLOOR set, so mod operations will be lowered before
ever reaching NIR.  This enables the same lowering at the NIR level,
which will let me shut off the GLSL IR path for NIR-based drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Eric Anholt <eric@anholt.net>
2019-06-05 16:45:12 -07:00
Kenneth Graunke
c7d1b52a2c nir: Combine lower_fmod16/32 back into a single lower_fmod.
We originally had a single lower_fmod option.  In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there.  This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.

Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again.  I'm not aware of any hardware which
need lowering for one bitsize and not the other.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke
edd45af9ba nir: Drop lower_fmod64 option.
nir_lower_doubles offers a wide variety of fp64 lowering, including
lowering fmod@64.  The version there also better handles imprecisions
due to lowered frcp@64.  Let's consolidate on one version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke
dfb18f0a28 panfrost: Switch to nir_lower_doubles instead of lower_fmod64.
I don't think panfrost actually does doubles yet, but it at least
claims to support PIPE_CAP_DOUBLES, so at least pretend to switch
to the new lowering.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke
d13059f4d5 nouveau: Use nir_lower_doubles instead of lower_fmod64 on nvc0.
We currently have two duplicate mechanisms for lowering fmod@64.
One is a nir_opt_algebraic rule keyed off of options->lower_fmod64,
and the other is nir_lower_doubles, which offers a full gamut of
fp64 lowering.  The latter works slightly better in some corner cases,
so I'm trying to eliminate lower_fmod64 and drop the redundancy.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke
fa56a3795f gallium: Drop lower_fmod64 from drivers that don't support doubles.
Neither freedreno nor nv50 expose PIPE_CAP_DOUBLES, so there's no
fmod64 to be lowered.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Dylan Baker
e8a60e5d50 docs: update calendar, and news item and link release notes for 19.0.6 2019-06-05 16:42:36 -07:00
Dylan Baker
fccd44940d docs: Add SHA256 sums for 19.0.6 2019-06-05 16:40:57 -07:00
Dylan Baker
fe79d75ccf docs: Add relnotes for 19.0.6 2019-06-05 16:40:55 -07:00
Erik Faye-Lund
ff41ac7292 docs: add day of month to all news-entries
This makes it easier to batch-convert them to other structured
markup-formats.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:46 +02:00
Erik Faye-Lund
382776d241 docs: add MD5 checksums for 9.2.2 files
These checksums were obtained by downloading the releases from
ftp://ftp.freedesktop.org/pub/mesa/older-versions/9.x/9.2.2/ and
running md5sum on them.

Hopefully the server wasn't compromised since release.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:46 +02:00
Erik Faye-Lund
1941e642bc docs: use pre-block for showing commit-note
Having a single-item list for this seems odd. Let's just use a pre-block
in stead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:46 +02:00
Erik Faye-Lund
b16e593f79 docs: switch to definition list and code-tags
A definition list is a better semantic match for what this list is
supposed to convey, so let's use that instead. And while we're at it,
let's add some code-tags around filenames, as they stand a bit more out
that way.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:46 +02:00
Erik Faye-Lund
3b0d48e219 docs: combine headings
This is more in line with how we mark-up other definition lists, and
avoids portability issues with other markup-formats.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:46 +02:00
Erik Faye-Lund
942c4daac9 docs: more code-tags in llvmpipe.html
This makes the article a bit easier to read.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:46 +02:00
Erik Faye-Lund
52667f990e docs: use more code-tags in envvars.html
This wraps code, identifiers, values and paths in code-tags, which makes
them appear in a monospace-font for readability.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:46 +02:00
Erik Faye-Lund
d311d8f424 docs: use code-tags for envvars and options
This makes it a bit easier to tell what's what.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
5639f0d5ee docs: use dl instead of ul
A HTML definition-list is more semantically strong than just some
unordered list, and renders a bit cleaner by default. So let's use that
instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
0bca0f1aa2 docs: remove pointlessly repeated list
The examples listed above are exactly the same ones are we're about to
list, so let's just keep the list that defines what they do.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
aed4ac6da8 docs: remove stray whitespace
There's some stray whitespace in these files that doesn't do anything
useful. Let's get rid of if.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
20c56e18c2 docs: use proper links instead of code-tags
These links are a bit odd in that the URLs are simply placed in
code-tags. This makes them harder to work with. Let's use proper
links instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
c59c793ae5 docs: update doxygen-links
One of these URLs are dead these days, and the other one forwards to the
current one, doxygen.nl. Let's get these links up to date.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
7c4a4fb09a docs: remove some noisy spacing in pre-blocks
These newlines caused the blocks to have trailing newlines in them,
which renders a bit noisily.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
412046f74e docs: improve quoting slightly
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
8620f53212 docs: do not use br-tag for non-significant breaks
According to the W3C, we shouldn't use the br-tag unless the line-break
is part of the content:

https://www.w3.org/TR/2011/WD-html5-author-20110809/the-br-element.html

All of these instances are for non-content usage, and is as such technically
out-of-spec. So let's either remove them, or split paragraphs, based on
how related the content are.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
d5e273aad2 docs: remove pointless line-break
Line-breaks at the end of a paragraph doesn't do anything useful,
so let's just get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
db8211a883 docs: remove pointless trailing hard-breaks
Line-break at the end of an article is quite pointless, and doesn't do
much to increase the readability. Let's get rid of them.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
74a6a68196 docs: rewrite paragraph to be free-form
These half-way structured sections are needlessly problematic to
translate cleanly to other markup-languages, so let's just make this
into a free-form paragraph instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
9e5bc2c868 docs: use h4 instead of free-standing paragraphs and br-tags
This makes this document a bit more structured, which is generally
considered a good thing for HTML. It will also translate a bit better
into other markup-formats.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:45 +02:00
Erik Faye-Lund
38652a29ae docs: slightly reword paragraph and tweak markup
This makes this paragraph a bit easier to digest.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:44 +02:00
Erik Faye-Lund
b2ac7582d9 docs: remove stray space in code-block
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:44 +02:00
Erik Faye-Lund
0114d15ed6 docs: remove some pointless spacing
The different headers and header-sizes already convey the hierarchical
structure of this document, the unusual spacing arguably just looks a
bit inconsistent with the rest of the site. Let's remove it; it looks
fine without it, and will translate better to other markup languages.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:44 +02:00
Erik Faye-Lund
392c083377 docs: add more more code-tags
It's easier to read function-names, file-names and other
"machine"-related strings if they are formatted in a monospace font. So
let's mark these up with code-tags.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:44 +02:00
Erik Faye-Lund
0ee366960c docs: use code instead of tt-tag
The tt-tag has been removed from HTML5, so let's normalize this to
code-tags intead. This just makes things a bit more consistent, as we've
mixed these left and right so far anyway.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:44 +02:00
Erik Faye-Lund
d60dc5d16f docs: use paragraph instead of double newlines
This is a bit more semantically clean in HTML, and makes us keep
content and presentation a bit more separated.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:44 +02:00
Erik Faye-Lund
9a65de343e docs: use verbatim .plan quote
This quote is now verbatim, as archived here:

https://github.com/ESWAT/john-carmack-plan-archive/blob/master/by_year/johnc_plan_1999.txt

This makes it look a bit more consistent with the following news-entry,
and makes things IMO a bit more clear.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-06-05 23:48:44 +02:00
Alyssa Rosenzweig
905d914cb6 panfrost/midgard: Verify SSA claims when pipelining
The pipeline register creation algorithm is only valid for SSA indices;
NIR registers and such cannot be pipelined without more complex
analysis. However, there are the ocassional class of "liars" -- indices
that claim to be SSA but are not. This occurs in the blend shader
prologue, for example. Detect this and just bail quietly for now.

Eventually we need to rewrite the blend shader prologue to occur in NIR
anyway (which would mitigate the issue), but that's more involved and
depends on a better understanding of pixel formats in blend shaders (for
non-RGBA8888/UNORM cases).

Fixes some blend shader regressions.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-05 14:40:08 -07:00
Alyssa Rosenzweig
dcd12aad46 panfrost/midgard: Don't assign var locations ourselves
This piece of code was cargo-culted from the ir3 standalone compiler and
made sense when we were a standalone compiler ourselves. Unfortunately,
for the online compiler, mesa/st *already handles this for us* and if we
duplicate it here, we're duplicating it *incorrectly*. So just delete
these lines and fix a heck of a lot of tests.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-05 14:40:08 -07:00
Tomeu Vizoso
de5c882973 panfrost: Reload framebuffer contents if there's no clear
If by flush time the client hasn't submitted a clear, add jobs for
reloading the framebuffer contents as the first draw in the frame.

This is required by programs such as Weston which don't do clears and
rely on the previous contents of the framebuffer being there.

Reloading the whole framebuffer on every frame without regards to what
is needed or what is going to be covered is very inefficient, but future
work will introduce support for damage regions and partial updates so we
know what needs to be actually reloaded.

Fixes quite a few tests in dEQP-EGL.functional.buffer_age.*.

[Alyssa: The context is that tilers do an implicit glClear() on every
frame, whether you asked them to or not. If you want a clear, this is
very efficient. But if you don't, you have to explicitly blit the
backbuffer back into tile memory, accomplished by a dummy texturing
draw. This patch generates that draw via u_blitter, although we could do
a bit better ourselves by eliding the vertex job. This fixes "black
rectangles in Weston/sway" as well as "video not displaying when UI
visible in mpv"]

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-05 14:35:48 -07:00
Alyssa Rosenzweig
2adf35e4f5 panfrost: Don't flip scanout
The mesa/st flips the viewport, so we respect that rather than
trying to flip the framebuffer itself and ignoring the viewport and
using a messy heuristic.

However, this brings an underlying disagreement about the interpretation
of winding order to light. The blob uses a different strategy than Mesa
for handling viewport Y flipping, so the meanings of the winding order
bit are flipped for it. To keep things clean on our end, we rename to
explicitly use Gallium (rather than flipped OpenGL) conventions.

Fixes upside-down Xwayland/egl windows.

v2: Adjust lowering configuration to correctly flip gl_PointCoord.y and
gl_FragCoord.y. v1 was R-b'd by Tomeu, but then retracted due to these
regressions which are not fixed.

Suggested-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Sort-of-reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-06-05 14:35:48 -07:00
Timur Kristóf
c94b70a178 st/nine: Use tgsi_to_nir when preferred IR is NIR.
This patch allows nine to read the preferred IR from pipe caps and use
NIR when that is preferred by the driver, by calling tgsi_to_nir. Also
adds some debug options that allow overriding it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
2019-06-05 23:32:13 +02:00
Lionel Landwerlin
c162127440 intel/perf: improve dynamic loading config detection
We're currently trying to detect dynamic loading config support by
trying to remove to test config (hard coded in the i915 driver) and
checking we get ENOENT.

This can fail if the test config was updated in Mesa but not yet in
i915.

A better way to do this is to pick an invalid ID and check for ENOENT.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 20:16:23 +00:00
Jason Ekstrand
811c05dfe6 intel/nir: Take nir_shader*s in brw_nir_link_shaders
Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_* is
enabled, we can just take a single pointer and not a pointer to pointer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 20:07:28 +00:00
Jason Ekstrand
bb67a99a2d intel/nir: Stop returning the shader from helpers
Now that NIR_TEST_* doesn't swap the shader out from under us, it's
sufficient to just modify the shader rather than having to return in
case we're testing serialization or cloning.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 20:07:28 +00:00
Jason Ekstrand
fe2fc30cb5 nir: Don't replace the nir_shader when NIR_TEST_SERIALIZE=1
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-06-05 20:07:28 +00:00
Jason Ekstrand
9eba6d9a88 nir: Don't replace the nir_shader when NIR_TEST_CLONE=1
Instead, we add a new helper which stomps one nir_shader and replaces it
with another.  The new helper effectively just changes which pointer
gets used for the base nir_shader.  It should be 99% as good at testing
cloning but without requiring that everything handle having the shader
swapped out from under it constantly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-06-05 20:07:28 +00:00
Caio Marcelo de Oliveira Filho
747926ddfb iris: Only recompile CS when needed
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-06-05 12:57:54 -07:00
Lionel Landwerlin
0430c6d18a intel/perf: fix EuThreadsCount value in performance equations
EuThreadsCount is supposed to be the number of threads per EU, not the
total number of threads in the whole device.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1fc7b95127 ("i965: Add Gen8+ INTEL_performance_query support")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 22:41:01 +03:00
Mark Janes
36d8a922de intel/tools: use C99 print conversion specifier for 32 bit builds
Fixes formatting errors for 32 bit compilations, eg:

  error: format ‘%lx’ expects argument of type ‘long unsigned int’,
  but argument 5 has type ‘uint64_t’ {aka ‘long long unsigned int’}
  [-Werror=format=]

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-05 19:25:15 +00:00
Samuel Pitoiset
8a31eaa4e2 radv: use only one descriptor in the fmask expand pass
This removes one useless SMEM load operations which pointed to
the same descriptor anyway.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-05 20:50:58 +02:00
Samuel Pitoiset
7664eb8f2b radv: set ACCESS_NON_READABLE on the fmask expand pass output image
The driver will emit GLC=1.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-05 20:50:56 +02:00
Samuel Pitoiset
8206390546 radv: remove one useless image type in the fmask expand shader
Both input and output images use the same type.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-05 20:50:53 +02:00
Kristian H. Kristensen
1e6c873f1f freedreno/ir3: Extend debug helpers to support TCS/TES/GS
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-05 11:15:04 -07:00
Kristian H. Kristensen
3da9a24f35 freedreno/a6xx: Use VALIDREG in next_regid() helper
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-05 11:15:04 -07:00
Kristian H. Kristensen
6fffc091e2 freedreno/a6xx: Remove dead code from a5xx
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-05 11:15:04 -07:00
Kristian H. Kristensen
cea39af2fb freedreno/ir3: Generalize ir3_shader_disasm()
Use a helper function to get the sysval/attribute/varying/output name
and make the disam debug output independent of shader stage.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-05 11:15:04 -07:00
Alyssa Rosenzweig
1ea987576d panfrost/midgard: Always break up fragment writeout
In a fragment shader, r0 is written out with a special branch sequence.
r0 is not a real register here, but essentially a pipeline register --
as such, it needs to be written out in full and on time, with hanging
dependencies in the bundle. Otherwise, we break up the bundle, which
costs an extra ALU cycle and adds a move.

When the scheduler ran last thing, we could do this analysis within the
scheduler. Now that RA can run after scheduling, that's no longer valid,
so we remove the analysis and always break it up (at a performance
penalty). Future work can add a post-RA/post-schedule pass to merge
writeout blocks if possible. It's a bit of a low-priority next to fixing
conformance regressions, of course.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-05 18:06:49 +00:00
Alyssa Rosenzweig
3d11b075f0 panfrost/midgard: Fix cubemap regression
Fixes: 2d9802233 ("panfrost/midgard: Extend RA to non-vec4 sources")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-05 18:06:48 +00:00
Deepak Rawat
828e1b0b4c winsys/drm: Fix out of scope variable usage
In this particular instance, struct member were used outside of the
block where it was defined. Fix this by moving the definition outside of
block.

Signed-off-by: Deepak Rawat <drawat@vmware.com>
Fixes: 569f838987 ("winsys/svga: Add support for new surface ioctl, multisample pattern")
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-06-02 22:31:07 -07:00
Alyssa Rosenzweig
c51312bc94 panfrost/midgard: Lower integer division
We use the shared nir_lower_idiv pass to lower integer division, fixing
144 dEQP tests. This pass was not applied in the past due to breakage
from iabs fixed earlier in the series.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-By: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-05 17:59:27 +00:00
Alyssa Rosenzweig
88c59798fe panfrost/midgard: Fix 1-arg ALU memory corruption
Certain ops that only take one argument have an imaginary "zero"
constant for their second argument. For instance, conversions:

   i2f [dest], [source], #0

Memory corruption meant that #0 was instead random noise. For some ops,
that doesn't matter (manifested as abnormally large code size and poor
scheduling due to extra constants in random places). But for others,
where a 1-op is emulated by a 2-op with an implicit 0 second argument,
that broke things.

Fixes iabs (emulated by iabsdiff).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-By: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-05 17:59:24 +00:00
Alyssa Rosenzweig
9f14e20fa1 panfrost/midgard: Add a bunch of new ALU ops
These ops are used to accelerate various functions exposed in OpenCL.
This commit only includes the routine additions to the table. They are
not wired through the compiler; rather, they are just here to keep a
reference for the disassembler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-By: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-05 17:58:14 +00:00
Emil Velikov
d6edccee8d egl: add EGL_platform_device support
This new 'platform' is added by default with no guards.

It is effectively a copy of the surfaceless one, with updated function
names and brand new probe function.

Due to the reuse, some of the ifdef HAVE_SURFACELESS_PLATFORM guards
have been dropped.

A worthy mention are the changes in _egFindDisplay, since the original
and dup'd fd are required, we make use of the plat_opt argument.

Note that no hacks for eglGetDisplay are added - the API works only with
the eglGetPlatformDisplay* API.

v2:
 - s/_eglCompareDeviceDisplay/_eglSameDeviceDisplay/ (Eric)
 - let ^^ return bool (Eric)
 - fixup meson build, move files() further up (Eric)
 - copy from plat. surfaceless w/o the visual cleanups
 - close and free when destroying the dpy
 - sprinkle a few _eglDeviceSupports
 - split fd handling into separate function
 - use directly the render node if no FD is given (Mathias)

v3:
 - s/dpy/disp/g
 - drop swap_buffers* callbacks
 - drop loader_set_logger()
 - drop local define
 - re-introduce _eglGetDRMDeviceRenderNode()
 - EGL_WARN on ForceSoftware with HW device - continue using the HW device
 - bail out for "EGL_MESA_device_software" until it's fixed
 - wire-up the Android build

v4:
 - use new style _eglFindDisplay()
 - split hw vs sw code paths
 - don't close the internal fd (already handled in FiniDisplay())
 - make swrast work (bit hacky bit will do for now)
 - Android for real, drop autotools
 - Correct HW + LIBGL_ALWAYS_SOFTWARE check
 - use the dri2_create_drawable() helper

v5:
 - enhance comment around fd checks (Mathias)
 - rebase for dri2_init_surface() changes

Cc: Mathias Fröhlich <Mathias.Froehlich@gmx.net>
Acked-by: Marek Olšák <marek.olsak@amd.com> (v4)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 13:35:21 -04:00
Emil Velikov
2f11957532 egl: keep the software device at the end of the list
By default, the user is likely to pick the first device so it should
not be the least performant (aka software) one.

v2: Drop odd comment (Marek)

Suggested-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 13:35:21 -04:00
Emil Velikov
2282ec0ad6 egl/dri: flesh out and use dri2_create_drawable()
Wrap the loader->createNewDrawable() dance into a helper and use it
throughout the codebase.

This addresses a cases like surfaceless (SL) on swrast (SL on kms_swrast
is fine) where we'd attempt using the wrong driver and crash out.

v2: fixup quirky GBM (Mathias)
v3: fixup GBM for real (Marek)

Cc: mesa-stable@lists.freedesktop.org
Cc: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (v2)
Signed-off-by: Marek Olšák <marek.olsak@amd.com> (v2)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-05 13:35:21 -04:00
Emil Velikov
5e0f527d60 egl: fold X11 attrib handling like other platforms
Since we no longer need special handling for X11, refactor the code to
follow the style used by all other platforms.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 13:35:21 -04:00
Adam Jackson
2b29cf2468 egl: remove Options::Platform handling
The full set of attributes is already handled with previous patches.
Thus all this is not dead code.

v2 (Emil) - split from a larger patch.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 13:35:21 -04:00
Adam Jackson
4aebd86f9a egl/x11: pick the user requested screen
At the moment the user will pass the screen number via attribs, yet we
would throw that away. Reason being that the int *screen passed to
xcb_connect() is output only.

v2 (Emil):
 - split from a larger patch
 - use xcb_connect() returned screen, as fallback
 - use helper function only as needed

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 13:35:21 -04:00
Adam Jackson
8e991ce539 egl: handle the full attrib list in display::options
Earlier spec is vague, although EGL 1.5 makes it clear:

     Multiple calls made to eglGetPlatformDisplay with the same
     parameters will return the same EGLDisplay handle.

With this commit we store and compare the full attrib list.

v2 (Emil):
 - Split into separate patches
 - Use EGLBoolean over int masked as such
 - Don't return free'd pointed on calloc failure

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 13:35:21 -04:00
Emil Velikov
72b9aa973b egl: flesh out a _eglNumAttribs() helper
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 13:35:21 -04:00
Krzysztof Raszkowski
4ff02b3edd swr: fix support for GL_ARB_copy_image extension
This commit fix support and adjusts the capabilities
returned by the SWR driver and the documentation
to correctly report the GL_ARB_copy_image extension.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-06-05 15:26:47 +00:00
Guido Günther
755fdd6f9d etnaviv: etnaviv_bo_cache_test: Use /dev/dri/renderD128 by default
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
90cc0de102 build: Build etnaviv drm tests
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
01a8ba79fe etnaviv: drm tests: Use mesa header locations
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
2b377547c3 etnaviv: Add libdrm tests as of 922d92994267743266024ecceb734ce0ebbca808
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
b921df352d build: Build etnaviv drm
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
3696235f82 etnaviv: gallium: Use internal etnaviv_drmif.h
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
95d8b4ac0b etnaviv: drm: s/bo_del/_etna_bo_del/
This avoids a conflict with freedreno's bo_del().

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
17d7282cca etnaviv: drm: s/table_lock/etna_table_lock/
This avoids a conflict with freedreno's table_lock

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
7a5b19346a etnaviv: drm: Move uapi header
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
3925d38870 etnaviv: drm: Drop excessive debugging in perfmon
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
22fa1c95ff entaviv: drm: Don't use drmMsg()
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
c93007a618 etnaviv: drm: Use _mesa_hash_table instead of drmHash
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
92fc14321f etnaviv: drm: Use mesa's ARRAY_SIZE
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
e87d128b52 etnaviv: drm: Use mesa's os_m{un,}map
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
2ebd444c10 etnaviv: drm: Use mesa's atomic definitions
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
6ab83b8474 etnaviv: drm: Drop drm_{public,private}
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
66eb554d46 etnaviv: drm: Drop inexistent headers
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
58eec3808e etnaviv: Add libdrm code as of 922d92994267743266024ecceb734ce0ebbca808
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Guido Günther
3835e21369 etnaviv: untabify
Two driver files had tabs mixed with spaces. Remove the tabs.

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-05 08:58:05 +00:00
Tomeu Vizoso
c7a6e07454 panfrost: bifrost: Fix format string in disassembler
The compiler configuration was hardened to fail on format warnings and
things stopped building.

Fixes: c9c1e26106 ("mesa: prevent common string formatting security issues")
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-By: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-05 10:40:19 +02:00
Kenneth Graunke
8d4f68ee20 iris: Free the buffer when reading from the disk cache. 2019-06-04 23:53:57 -07:00
Alyssa Rosenzweig
bfa9f56a2a panfrost/midgard: Don't promote non-SSA to pipeline registers
Fixes: 33800f4612 ("panfrost/midgard: Implement "pipeline register"
prepass")

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-05 00:12:36 +00:00
Eric Anholt
36cb209787 freedreno: Drop invalid scissor optimization.
We do support TF now, so it's no longer valid.  Besides, if we want this
optimization, we should probably have mesa/st doing it right for everyone.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-04 16:44:37 -07:00
Eric Anholt
8843b90cac freedreno: Reuse glsl_get_sampler_coordinate_components().
We have the GLSL type, so we can just ask it how many coordinates there
are.  The GLSL function already has Vulkan cases that we'd probably want
eventually.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-04 16:44:24 -07:00
Eric Anholt
fb872748ec freedreno: Improve the pi approximations in trig lowering.
When comparing our sin/cos behavior to the closed source driver, I
noticed that we were off by a bit (or, in the case of 1/2pi, 3 bits).

Fixes:
dEQP-GLES3.functional.shaders.random.trigonometric.vertex.52
dEQP-GLES3.functional.shaders.random.all_features.vertex.0

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-04 23:35:38 +00:00
Marek Olšák
ff63b99531 ac: rename LLVM <= 7 helpers for readability
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-04 18:53:46 -04:00
Marek Olšák
c9b64b58de ac: fix a typo in ac_build_wg_scan_bottom
Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-04 18:53:46 -04:00
Caio Marcelo de Oliveira Filho
04e8ff8595 glx: Fix error message when no driverName is available
Just provide a "(null)" literal in case driverName is NULL.

  In file included from ../src/glx/dri3_glx.c:76:
  ../src/glx/dri3_glx.c: In function ‘dri3_create_screen’:
  ../src/glx/dri_common.h:70:36: error: ‘%s’ directive argument is null [-Werror=format-overflow=]
     70 | #define CriticalErrorMessageF(...) dri_message(_LOADER_FATAL, __VA_ARGS__)
        |                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ../src/glx/dri3_glx.c:1002:4: note: in expansion of macro ‘CriticalErrorMessageF’
   1002 |    CriticalErrorMessageF("failed to load driver: %s\n", driverName);
        |    ^~~~~~~~~~~~~~~~~~~~~
  ../src/glx/dri3_glx.c:1002:50: note: format string is defined here
   1002 |    CriticalErrorMessageF("failed to load driver: %s\n", driverName);
        |                                                  ^~
  cc1: some warnings being treated as errors

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-04 15:28:12 -07:00
Chia-I Wu
65439291a0 virgl: resolve to correct level during texture read
When PIPE_TRANSFER_READ requires a resolve, we blit from the host
storage to a temporary storage, and do a format conversion from the
temporary storage to the guest storage.  This change makes sure we
convert to the correct level of the guest storage.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-04 21:37:03 +00:00
Chia-I Wu
067018d4e7 virgl: fix texture resolving with compressed formats
util_format_translate_3d expects the source box to be aligned to the
block size.  When resolving, make sure the size of the staging
buffer is aligned to the block size.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-04 21:37:03 +00:00
Bas Nieuwenhuizen
a6a5a6f67f freedreno: Add printf pattern string.
Some new flag setting disallows it due to being a security risk.

Fixes: c9c1e26106 "mesa: prevent common string formatting security issues"
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-04 23:20:50 +02:00
Bas Nieuwenhuizen
6256925b11 Revert "vl: Enable DRM by default."
Reason:

meson.build:586:7: ERROR: Unknown variable "dep_libdrm".

if building without x11 platform.

This reverts commit 392c60928a.
2019-06-04 23:14:56 +02:00
Alyssa Rosenzweig
4a03d37827 panfrost/midgard: .pos propagation
A previous optimization converts fmax(x, 0.0) instructions to fmov.pos.
This pass then propagates the .pos from the move up to the source
instruction (when possible). From there, copy propagation will eliminate
the move.

In the future, we might prefer to do this in common NIR code like we do
for saturate, as Bifrost can also benefit.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
5da0a33fab panfrost/midgard: Cleanup copy propagation
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
33800f4612 panfrost/midgard: Implement "pipeline register" prepass
This prepass, run after scheduling but before RA, specializes to
pipeline registers where possible. It walks the IR, checking whether
sources are ever used outside of the immediate bundle in which they are
written. If they are not, they are rewritten to a pipeline register (r24
or r25), valid only within the bundle itself. This has theoretical
benefits for power consumption and register pressure (and performance by
extension). While this is tested to work, it's not clear how much of a
win it really is, especially without an out-of-order scheduler (yet!).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
2a79afc5f0 panfrost/midgard: Helpers for pipeline
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
3c7abbfbe8 panfrost/midgard: Refactor schedule/emit pipeline
First, this moves the scheduler and emitter out of midgard_compile.c
into their own dedicated files.

More interestingly, this slims down midgard_bundle to be essentially an
array of _pointers_ to midgard_instructions (plus some bundling
metadata), rather than the instructions and packing themselves. The
difference is critical, as it means that (within reason, i.e. as long as
it doesn't affect the schedule) midgard_instrucitons can now be modified
_after_ scheduling while having changes updated in the final binary.

On a more philosophical level, this removes an IR. Previously, the IR
before scheduling (MIR) was separate from the IR after scheduling
(post-schedule MIR), requiring a separate set of utilities to traverse,
using different idioms. There was no good reason for this, and it
restricts our flexibility with the RA. So unify all the things!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
0524ab9c37 panfrost/midgard: Cleanup RA (stylistic changes)
Trivial.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
debc29b9ad panfrost/midgard: Share MIR utilities
These are more generally useful than the files they were constrained to.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
1bfa0d6ccc panfrost/midgard: Misc. cleanup for readibility
Mostly, this fixes a number of instances of lines >> 80 chars,
refactoring them into something legible.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
2d98022330 panfrost/midgard: Extend RA to non-vec4 sources
This represents a major break with the former RA design. We now use
conflicting register classes to represent the subdivision of Midgard's
128-bit registers into varying sizes and arrangement. We determine class
based on the number of components in the instructions' masks. To support
this, we include a number of helpers in the RA to allow composing
swizzles and masks, such that MIR written implicitly assuming .xyzw
sources can be transformed to use actual (non-aligned) sources.

The net result is a marked decrease in register pressure on
non-vec4-exclusive shaders. We could still be doing much better. Not
implemented yet are:

   - Register spilling
   - Per-component liveness

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
c1715b558a panfrost/midgard: Set masks on ld_vary
These masks distinguish scalar/vec2/vec3 loads from the default vec4,
which helps with assembly readability (since it's immediately obvious
how many components are _actually_ affected, rather than doing
mysterious things to an unknown number of unused components). Later in
the series, this will enable smarter register allocation, as the unused
components will not be interpreted abnormally.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
550be763fa panfrost/midgard: Fix liveness analysis bugs
This fixes liveness analysis with respect to inline constants and
branching. in practice, the symptom is abnormally high register
pressure.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
c54f3f42eb panfrost/midgard: Set int outmod for "pasted" code
These snippets of integer assembly are injected for various purposes.
Eventually, we'll want to implement these in NIR directly. Regardless,
the "default" output modifier is different between floats and ints, so
let's set the right one.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
51196c3591 panfrost/midgard: Hoist some utility functions
These were static to midgard_compile.c but are more generally useful
across the compiler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
005d9b1ada panfrost/midgard: Remove pinning
This mechanism is only used by blend shaders, so just use a move here.
Ideally, it'll be copy-propped and DCE'd away; this removes a source of
considerable indirection and will simplify RA logic.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig
d2d3cc66cf nir/algebraic: Simplify max(abs(a), 0.0) -> abs(a)
This pattern was noticed in glmark's jellyfish scene.

v2: Add inexact qualifier due to NaN behaviour.

Minimal shader-db changes (slightly helped).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-06-04 19:57:19 +00:00
Mark Janes
c9c1e26106 mesa: prevent common string formatting security issues
Adds a compile-time error for obvious security issues like:

  printf(string_var);

The proposed flag is more tolerant than -Wformat-nonliteral.
Specifically, it tolerates common mesa formatting like:

  static const char *shader_template = "really long string %d";
  printf(shader_template, uniform_number);

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110833
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-06-04 12:49:38 -07:00
Jason Ekstrand
f4ef34f207 intel/fs: Add an UNDEF instruction to avoid excess live ranges
With 8 and 16-bit types and anything where we have to use non-trivial
strides registersto deal with restrictions, we end up with things that
look like partial writes even though we don't care about any values in
the register except those written by that instruction.  This is
particularly important when dealing with loops because liveness sees
is_partial_write and the fact that an old version from a previous loop
iteration may be valid at that point and extends all purely partially
written values to the entire loop.

This commit adds a new UNDEF instruction which does nothing (the
generator doesn't emit anything) but which does a fake write to the
register.  This informs liveness that we don't care about any values
before that point so it won't consider those registers to be falsely
live.  We can safely emit UNDEF instructions for all SSA values that
come in from NIR and nearly all temporaries generated by various stages
of the compiler.  In particular, we need to insert UNDEF instructions
when we handle region restrictions because the newly allocated registers
are almost guaranteed to be partially written.

No shader-db changes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-04 14:27:30 -05:00
Caio Marcelo de Oliveira Filho
d482a8f680 spirv: Update the OpenCL.std.h header
This corresponds to commit 8b911bd2ba37677037b38c9bd286c7c05701bcda on
GitHub.

We previously tweaked OpenCL.std.h from upstream to be included in C
code.  Now upstream header can be included, however the symbol names
are slightly different (include an OpenCLstd_ prefix), so this patch
also fixes vtn_opencl.c to use those.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-04 12:12:51 -07:00
Bas Nieuwenhuizen
9701cb1034 radv: Use bo metadata for imported image tiling on Android.
This way we handle linear images etc. correctly.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-06-04 18:32:45 +00:00
Bas Nieuwenhuizen
392c60928a vl: Enable DRM by default.
If libdrm is found the pipe loader enables drm anyway, and that is
pretty much the only extra dependency this code has.

This enables creating libva display using a drm fd without having
to enable the DRM (GBM really) backend of EGL, which is completely
unrelated.

Leaving the X11 platforms alone as they would still result in the
additional inclusion of extra deps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-06-04 20:01:34 +02:00
Jason Ekstrand
c2a0335bb0 anv: Advertise support for VK_EXT_fragment_shader_interlock
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-04 17:30:51 +00:00
Jason Ekstrand
5176805471 spirv: Implement SPV_EXT_fragment_shader_interlock
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-04 17:30:51 +00:00
Jason Ekstrand
b5aa76b1df spirv: Update the headers from latest Khronos master
This corresponds to 8b911bd2ba37677037b38c9bd286c7c05701bcda in
https://github.com/KhronosGroup/SPIRV-Headers.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-04 17:30:51 +00:00
Jason Ekstrand
8339e3f010 vulkan: Update the XML and headers to 1.1.110
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-04 17:30:51 +00:00
Rhys Perry
73dda85512 ac/nir: mark some texture intrinsics as convergent
Otherwise LLVM can sink them and their texture coordinate calculations
into divergent branches.

v2: simplify the conditions on which the intrinsic is marked as convergent
v3: only mark as convergent in FS and CS with derivative groups

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-04 17:30:53 +01:00
Rhys Perry
d4a2f8b33b radv: fix some compiler warnings
Fixes -Woverflow warnings with GCC 9.1.1

v2: use a cast instead of a bitwise and

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-04 17:30:53 +01:00
Jason Ekstrand
a84de3fb7c intel/fs: Skip registers faster when setting spill costs
This might be slightly faster since we're doing one read rather than
two before we decide to skip.  The more important reason, however, is
because no_spill prevents us from re-spilling spill registers.  In the
new world in which we don't re-calculate liveness every spill, we may
not have valid liveness for spill registers so we shouldn't even look
their live ranges up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110825
Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-04 14:37:56 +00:00
Connor Abbott
d68218dbca radeonsi/nir: Fix type in bindless address computation
Bindless handles in GL are 64-bit. This fixes an assert failure in LLVM.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-04 15:15:46 +02:00
Christian Gmeiner
a6e879984c etnaviv: implement set_active_query_state(..) for hw queries
Clear w/ quad uses a normal draw which adds up to OQ. st/meta
uses set_active_query_state(..) to tell the driver to pause
queries in such cases.

Fixes spec@arb_occlusion_query@occlusion_query_meta_save piglit.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-04 14:58:02 +02:00
Samuel Pitoiset
8a35eb0602 radv: do not use gfx fast depth clears for layered depth/stencil images
The driver should only fast depth clears with the graphics path
when the view covers all image layers, otherwise this might
corrupt layers when HTILE is enabled.

Cc: 19.0 19.1 mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-04 08:55:32 +02:00
Samuel Pitoiset
33f4e04d5a ac,radv: do not emit vec3 for raw load/store on SI
It's unsupported, only load/store format with vec3 are supported.

Fixes: 6970a9a6ca ("ac,radv: remove the vec3 restriction with LLVM 9+")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-04 08:47:26 +02:00
Sagar Ghuge
3016756398 intel/compiler: Fix assertions in brw_alu3
v2: Fix assertion for src1 (Ian Romanick)

Fixes: 3b967e17 (intel/compiler: Avoid false positive assertions)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-03 23:14:34 -07:00
Kenneth Graunke
34d3103dee iris: Fix SO stride units for DrawTransformFeedback
Mesa measures in DWords.  The hardware also claims to measure in DWords.
Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ.
Which means that it really measures in bytes.  So, convert to bytes.

Without this, our offset / stride denominator was 1/4th the size it
should be, leading to 4x the vertex count that we should have had.

Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers
2019-06-03 22:51:18 -07:00
Timothy Arceri
fea36a8f43 st/glsl: make sure to propagate initialisers to driver storage
This essentially reverts 20234cfe3a.

Fixes piglit test:
tests/spec/arb_get_program_binary/execution/uniform-after-restore.shader_test

Fixes: 20234cfe3a "st/mesa: don't propagate uniforms when restoring from cache"

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110784
2019-06-04 11:36:45 +10:00
Caio Marcelo de Oliveira Filho
61de825e11 spirv: Like Uniform, do nothing for UniformId
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho
b4eff83180 spirv: Implement SpvOpCopyLogical
This is the same as SpvOpCopyObject but without the type checking,
which is how vtn_composite_copy works, so we just need to hook the
operation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho
81586e9f53 spirv: Generalize OpSelect
SPIR-V 1.4 supports OpSelect over any composite type, and also allows
scalar boolean condition for vector types -- a case which we already
handled to support old GLSLang.

Added a helper function to recursively perform nir_bcsel, that makes
easier to support structs.

v2: Replace asserts() with vtn_fail_if().  (Jason)

v3: Simplify Condition and Result types verifications.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho
17630291e5 spirv: Move OpSelect handling to a function
This will make a later change easier to review.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho
ea0e89859c nir/vars_to_ssa: Handle UNDEF_NODE in more places
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110832
Fixes: 911ea2c66f "nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:09:22 -07:00
Marek Olšák
b2bbd1a27b ac/registers: don't use the si, cik, vi names, use gfxN
trivial
2019-06-03 20:06:41 -04:00
Nicolai Hähnle
f480b8aaa4 amd/common: use generated register header 2019-06-03 20:05:20 -04:00
Nicolai Hähnle
853ef5ccba amd/common: use SH{0,1}_CU_EN definitions only of COMPUTE_STATIC_THREAD_MGMT_SE0
The automatic header generation unifies identical registers in a series
and only emits definitions for the first one. This is mostly to avoid
emitting excessive definitions for CB registers, but special-casing
an exception for this family of registers doesn't seem worth it.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
cf51009ad2 amd/common: unify PITCH_GFX6 and PITCH_GFX9
The definition of the fields differs, but PITCH_GFX9 is a mere extension
of PITCH_GFX6 that does not conflict with any other fields.

This aligns the definitions with what will be generated from the
register JSON.

The information about how large the fields really are is preserved in
the register database.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
e04215815e amd/common: rename R_3F2_CONTROL to IB_CONTROL for disambiguation
This "register" name collides with R_370_CONTROL.

This aligns the definitions with what will be generated from the
register JSON.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
cd247cf456 amd/common: cleanup DATA_FORMAT/NUM_FORMAT field names
The field layout wasn't actually changed in gfx9, so having the suffix
isn't very useful. The field *contents* were changed, but this is
reflected in the V_xxx_xxx definitions and is taken into account by
the ac_debug logic based on the register JSON.

This aligns the definitions with what will be generated from the
register JSON.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
ef6ef098af amd/common: derive ac_debug tables from register JSON 2019-06-03 20:05:20 -04:00
Nicolai Hähnle
d02286c753 amd/registers: add JSON description of packet3 fields 2019-06-03 20:05:20 -04:00
Nicolai Hähnle
67702e3319 amd/registers: add JSON descriptions of registers
The descriptions are mostly derived from parsing the existing
register headers.
2019-06-03 20:05:20 -04:00
Nicolai Hähnle
e6184b0892 amd/registers: scripts for processing register descriptions in JSON
We will derive both the debugging tables and (the majority of) the
register headers from descriptions in JSON, instead of deriving the
debugging tables from an awkward parsing of the register headers.

Some of the scripts are useful for maintaining the register database
itself. The scripts are designed to output reasonably readable JSON
by default.
2019-06-03 20:05:20 -04:00
Vinson Lee
d4e70be739 freedreno: Fix GCC build error.
../src/freedreno/vulkan/tu_device.c:900:4: error: initializer element is not constant
    .minImageTransferGranularity = (VkExtent3D) { 1, 1, 1 },
    ^

Suggested-by: Kristian Høgsberg <krh@bitplanet.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110698
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-03 16:46:54 -07:00
Mark Janes
774a088f64 mesa: Use string literals for format strings
Android build settings require format strings to be string literals.

Fixes: d2906293c4 "mesa: EXT_dsa add selectorless matrix stack functions"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110833
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 16:17:23 -07:00
Caio Marcelo de Oliveira Filho
045aeccf0e iris: Always reserve binding table space for NIR constants
Don't have a separate mechanism for NIR constants to be removed from
the table.  If unused, we will compact it away.  The use_null_surface
is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
5611444809 iris: Print binding tables when INTEL_DEBUG=bt
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
97cd865be2 iris: Compact binding tables
Change the iris_binding_table to keep track of what surfaces are
actually going to be used, then assign binding table indices just for
those.  Reducing unused bytes on those are valuable because we use a
reduced space for those tables in Iris.

The rest of the driver can go from "group indices" (i.e. UBO #2) to
BTI and vice-versa using helper functions.  The value
IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is
not used or a certain BTI is not valid.

The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be
set to skip compacting binding table.

v2: (all from Ken)
    Use BITFIELD64_MASK helper. Improve comments.
    Assert all group is marked as used when we have indirects.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
79f1529ae0 iris: Create an enum for the surface groups
This will make convenient to handle compacting and printing the
binding table.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
1c8ea8b300 iris: Handle binding table in the driver
Stop using brw_compiler to lower the final binding table indices for
surface access.  This is done by simply not setting the
'prog_data->binding_table.*_start' fields.  Then make the driver
perform this lowering.

This is a better place to perfom the binding table assignments, since
the driver has more information and will also later consume those
assignments to upload resources.

This also prepares us for two changes: use ibc without having to
implement binding table logic there; and remove unused entries from
the binding table.

Since the `block` field in brw_ubo_range now refers to the final
binding table index, we need to adjust it before using to index
shs->constbuf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
518f83236b iris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniforms
We'll change iris to perform lowering of the binding table indices
earlier (before the backend kick in), but the backend compiler uses
the result of the analysis to identify load_ubo intrinsics, so we do
the analysis after the lowering to have the right indices.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
1f8546ba2f spirv: Implement OpPtrEqual, OpPtrNotEqual and OpPtrDiff
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 13:45:09 -07:00
Caio Marcelo de Oliveira Filho
ca164ab495 nir: Add functions to subtract and compare addresses
v2: Fix comparing addresses from formats that have more than one
    component by using nir_ball_iequal().  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 13:45:09 -07:00
Caio Marcelo de Oliveira Filho
09cc3389b9 nir: Add nir_ball_iequal() helper
Similar to nir_bany_inequal().  Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 13:45:09 -07:00
Sergii Romantsov
88340372ee mesa: ARB program parser should clean parameters
Program parser allocates parameter list.
In case of parsing error some variables will not be freed.
Patch adds freeing of it.

Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 16:41:26 -04:00
Hyunjun Ko
382e3553af freedreno/ir3: fix counting and printing for half registers.
v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 13:31:51 -07:00
Neil Roberts
fb53b326c2 freedreno/ir3: Fix up the half reg source even when src instr==NULL
Previously the loop for assigning registers was bailing out early if
the register had a null source. I think the intention is that in this
case it isn’t necessary to assign a register. However it was also
missing out the part to fix up the types. This can happen if the
instruction is copy propagated to be a move from a constant half-float
input register. In that case it still needs to fix up the types.

Fixes assert in
dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump

when lowering the precision of the variables.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 13:31:51 -07:00
Neil Roberts
3222216a58 freedreno/ir3: Add a 16-bit implementation of nir_op_imul
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 13:31:51 -07:00
Hyunjun Ko
daee6bc1a1 freedreno/ir3: set dst type of alu instructions correctly.
Though it should be fixed in RA pass, it needs to be set correctly from
the beginning according to the bitsize of NIR dest.

v2: Would be better for mad,fddx,fddy to fixup later in RA pass.

[small cleanup of fallout from imov/fmov removal fallout]
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 13:31:26 -07:00
Hyunjun Ko
43d80a3e20 freedreno/ir3: adjust the bitsize of regs when an array loading.
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 12:44:03 -07:00
Hyunjun Ko
cbd1f47433 freedreno/ir3: convert back to 32-bit values for half constant registers.
It seems to handle only 32-bit values for half constant registers
within floating point opcodes according to the blob driver.
So we need to convert back to 32-bit values from 16-bit values, when a
lower precision pass is in effect.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 12:44:03 -07:00
Hyunjun Ko
a9b556d3a0 freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov.
If the type of dest reg and src reg of absneg opcode are different,
it shouldn't be considered as same type mov.

This patch becomes meaningful when we start to use mediump information for
doing precision lowering to 16bit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 12:44:03 -07:00
Hyunjun Ko
6fb8ef3da6 freedreno/ir3: set proper dst type for uniform according to the type of nir dest.
eg. uniform mediump vec4 f;

This patch means nothing since there's no mediump lowering pass for now,
but will be meaningful when the pass land in the near future.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 12:44:03 -07:00
Neil Roberts
689c3c7d40 freedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISION
Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending
on whether half_precision was set in the shader key. With support for
mediump precision, it is possible to have different outputs use
different precisions. That means we can’t have a global shader state
to specify it. Instead it now tries to copy the half-float-ness
from the nir_variable for the output into the ir3_shader_variant. This
is then used to decide whether to set half-precision for each output.

The a6xx version is copied from the a5xx code but it has not been
tested.

v2. [Hyunjun Ko (zzoon@igalia.com)] There's the half flag recently
added, which represents precision based on IR3_REG_HALF. Now use this
flag to avoid duplication.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 12:44:03 -07:00
Neil Roberts
8cd1b76b7d freedreno/ir3: Fix loading half-float immediate vectors
Previously the code to load from a constant instruction was always
using the u32 pointer. If the constant is actually a 16-bit source
this would end up with the wrong values because the pointer would be
offset by the wrong size. This fixes it to use the u16 pointer.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-03 12:44:03 -07:00
Rob Clark
7bbf21e898 freedreno/ir3: immediately schedule meta instructions
The aren't real instructions, and don't change # of live values, so no
point in them competing with real instructions.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-03 12:44:03 -07:00
Rob Clark
771d04c82d freedreno/ir3: scheduler improvements
For instructions that increase the # of live values, apply a threshold
to avoid scheduling them too early.  And factor the net change of # of
live values that would result from scheduling an instruction, to
prioritize instructions that reduce number of live values as the number
of live values increases.

For manhattan:

  total instructions in shared programs: 27869 -> 28413 (1.95%)
  instructions in affected programs: 26756 -> 27300 (2.03%)
  helped: 102
  HURT: 87

  total full in shared programs: 1903 -> 1719 (-9.67%)
  full in affected programs: 1390 -> 1206 (-13.24%)
  helped: 124
  HURT: 9

The reduction in register usage nets ~20% gain in manhattan.  (So
getting mediump support should be a huge win for gles gfxbench.)

Also significantly helps some of the more complex shadertoy shaders,
like IQ's Piano (32 to 18 regs, doubles fps).

The effect is less pronounced on smaller shaders.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-03 12:44:03 -07:00
Rob Clark
bb3aa44ade freedreno/ir3: sched should mark outputs used
Account for shader outputs and values live in any direct/indirect
successor block.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-03 12:44:03 -07:00
Pierre-Eric Pelloux-Prayer
d2906293c4 mesa: EXT_dsa add selectorless matrix stack functions
Allows the legacy matrix stacks to be manipulated without disturbing the
matrix mode selector.

Adapted from a patch from Chris Forbes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:28:51 -04:00
Pierre-Eric Pelloux-Prayer
28ce704bb0 mesa: factor out enum -> matrix stack lookup
Split this out from glMatrixMode since we're about to need it
independently for EXT_DSA.

Adapted from Chris Forbes commit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:28:49 -04:00
Timothy Arceri
b69584ad69 mesa: add new EXT_direct_state_access tokens
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:28:47 -04:00
Chris Forbes
028682f7f4 glapi: add EXT_direct_state_access
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:28:45 -04:00
Timothy Arceri
9c5d86af38 mesa: add a list of EXT_direct_state_access to dispatch sanity
This extension is huge and this gives us a TODO list of functions
to implement.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:28:33 -04:00
Pierre-Eric Pelloux-Prayer
4583f09caa radeonsi: init sctx->dma_copy before using it
Commit a1378639ab reordered context functions initializations but broke
sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma.

In this case sctx->dma_copy was assigned a value after being used in:
   sctx->b.resource_copy_region = sctx->dma_copy;

This commit moves the FORCE_DMA special case after sctx->dma_copy initialization.

See https://bugs.freedesktop.org/show_bug.cgi?id=110422

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 15:05:30 -04:00
Axel Davy
5820ac6756 d3dadapter9: Revert to old throttling limit value
Recently PIPE_CAP_MAX_FRAMES_IN_FLIGHT was changed from 2
to 1:
20909284f2

No driver seems to overwrite the default value.

One user reports severe regressions for some games.
For now, revert to the value 2 for nine.

Cc: "19.1" mesa-stable@lists.freedesktop.org

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
2019-06-03 20:37:13 +02:00
Marek Olšák
486bc1e17e ac: use amdgpu-flat-work-group-size
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-03 14:32:47 -04:00
Marek Olšák
4b11ed443b u_blitter: don't fail mipmap generation for depth formats containing stencil
Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109754

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-06-03 14:32:47 -04:00
Christian Gmeiner
3135ca4172 etnaviv: drop a bunch of duplicated gallium PIPE_CAP default code
Now that we have the util function for the default values, we can get
rid of the boilerplate.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-06-03 16:29:59 +02:00
Samuel Pitoiset
445098916a radv: flush pending query reset caches before copying results
From the Vulkan spec 1.1.108:
   "vkCmdCopyQueryPoolResults is guaranteed to see the effect of
    previous uses of vkCmdResetQueryPool in the same queue, without any
    additional synchronization."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-03 16:05:46 +02:00
Jonathan Marek
91672becc3 nir: copy intrinsic type when lowering load input/uniform and store output
Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics"

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
2019-06-03 12:46:14 +00:00
Samuel Pitoiset
6970a9a6ca ac,radv: remove the vec3 restriction with LLVM 9+
This changes requires LLVM r356755.

32706 shaders in 16744 tests
Totals:
SGPRS: 1448848 -> 1455984 (0.49 %)
VGPRS: 1016684 -> 1016220 (-0.05 %)
Spilled SGPRs: 25871 -> 25815 (-0.22 %)
Spilled VGPRs: 122 -> 122 (0.00 %)
Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread
Code Size: 55324500 -> 55301152 (-0.04 %) bytes
Max Waves: 235660 -> 235586 (-0.03 %)

Totals from affected shaders:
SGPRS: 293704 -> 300840 (2.43 %)
VGPRS: 246716 -> 246252 (-0.19 %)
Spilled SGPRs: 159 -> 103 (-35.22 %)
Scratch size: 188 -> 180 (-4.26 %) dwords per thread
Code Size: 8653664 -> 8630316 (-0.27 %) bytes
Max Waves: 60811 -> 60737 (-0.12 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 11:30:08 +02:00
Caio Marcelo de Oliveira Filho
75590604a9 nir: Return nir_type_invalid for non-numeric base types
Now that the type gathering function look at instructions that might
have other types, return invalid type instead of crashing.  That
invalid will be properly ignored later.

Fixes: c12750527b "nir: add type information to load uniform/input and store output intrinsics"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 16:27:03 -07:00
Caio Marcelo de Oliveira Filho
27497c5c02 iris: Drop unused locals from iris_clear.c to avoid warning
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-05-31 15:55:05 -07:00
Jonathan Marek
f387c2b238 nir: remove bool lowering from lower_int_to_float
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.

Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek
f6579ee204 nir: fix lower_{int,bool}_to_float for new mov opcode
It is treated like the vecN instructions which also have no type.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek
f889180ee1 nir: add lower_bitshift option
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek
887c2a6092 nir: fix gather_ssa_types
Consts and undefs can be used as different types (common with "0" constant)
so don't copy types from consts/undefs, only to them. It doesn't entirely
solve the problem that the type given to the const could be wrong , but
now the only realistic case is with "0" which is the same when casted to
float, so it doesn't matter for lower_int_to_float.

The other change is to get type information for load input/uniform and
store output, and use that to get correct results.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek
c12750527b nir: add type information to load uniform/input and store output intrinsics
This type information will be used by gather_ssa_types to get usable results

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek
6016df211f nir: improvements to native_integers removal
Improvements related to the patch that removed native_integers:
 * In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed
 * In prog_to_nir, use sge/slt and let lower_scmp lower it if needed

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Rob Clark
32131a9568 freedreno/a6xx: add 'type' to shader state key
We could have identical texture state for both VS and FS.. which would
result in VS state getting created first, and FS state mapping to the
identical cmdstream.  Resulting in VS state getting emitted twice and no
FS state emitted.

Fixes:
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:47 -07:00
Rob Clark
8b7bf5e07a freedreno/ir3: fix constlen versus indirect UBO
If we access the address of the UBO indirectly, and there is no higher
const emitted w/ direct access (like an immediate lowered to uniform)
the assembler won't figure out the correct constlen.

Fixes:
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-31 12:58:33 -07:00
Rob Clark
8eaa2d5021 freedreno/a6xx: fix GPU crash on small render targets
Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Rob Clark
f9fa456e1d freedreno/ir3: set more barrier bits
Blob is also setting the .l bit, and it seems to solve some intermittent
failures with a couple of deqp's:

dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i
dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Rob Clark
5d43b806ba freedreno/ir3: set (ss) on last_input if ldlv
It seems like (ei) handling doesn't sync on (ss), so we could end up in
a situation where we release varying storage before an ldlv for flat
shaded varyings completes.  Keep track if we've done an (ss) since the
last ldlv, and if not add (ss) flag to last_input which gets (ei).

Noticed with dEQP-GLES3.functional.fragment_out.random.24 and
dEQP-GLES3.functional.fragment_out.random.27, which previously passed by
luck because ir3_sched ordered instructions in a way that resulted in a
lucky (ss).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Rob Clark
73fb02c5d6 freedreno/ir3: add assert
The special handling for last_input assumes that all the varying loads
are in the first block.  Add an assert to catch if anyone breaks that
assumption.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Connor Abbott
8c74772edc util/hash_table: Use fast modulo computation
While we're here, copy the size table from set.c to get rid of hard tabs
in the hash_table.c version.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:35 +02:00
Connor Abbott
83667f7a61 util/set: Use fast modulo computation
Compilation times with my shader-db database:

Difference at 95.0% confidence
	-1.22312 +/- 0.726033
	-0.283979% +/- 0.168254%
	(Student's t, pooled s = 1.02177)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:30 +02:00
Connor Abbott
b87817871b util: Add a helper for faster remainders
This should be at least as fast as using fast_idiv_by_const, and has the
advantage that the precomputation is simple enough to be evaluated at
Mesa-compile time for hash tables and sets which have a fixed table of
possible divisors.

Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:27 +02:00
Connor Abbott
983b001c77 util/hash_table: Add specialized resizing add function
To keep it in sync with the set implementation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:22 +02:00
Connor Abbott
6f9beb28bb util/set: Add specialized resizing add function
A significant portion of the time spent in nir_opt_cse for the Dolphin
ubershaders was in resizing the set. When resizing a hash table, we know
in advance that each new element to be inserted will be different from
every other element, so we don't have to compare them, and there will be
no tombstone elements, so we don't have to worry about caching the
first-seen tombstone. We add a specialized add function which skips
these steps entirely, speeding up resizing.

Compile-time results from my shader-db database:

Difference at 95.0% confidence
	-2.29143 +/- 0.845534
	-0.529475% +/- 0.194767%
	(Student's t, pooled s = 1.08807)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:16 +02:00
Connor Abbott
451211741c util/hash_table: Pull out loop-invariant computations
To keep the set and hash table in sync. Note that some of this had
already been done for hash tables, in particular pulling out the
hash % ht->size computation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:09 +02:00
Connor Abbott
f7ff685649 util/set: Pull out loop-invariant computations
Unfortunately GCC can't do this for us, probably because we call the key
comparison function which GCC can't prove won't modify arbitrary memory.
This is a pretty hot function, so do the optimization manually to be
sure the compiler will get it right.

While we're here, make the computation of the new probe address use a
single conditional subtract instead of a modulo, since we know that it
won't ever get as big as 2 * ht->size before the modulo. Modulos tend to
be pretty expensive operations.

shader-db compile time results for my database:

Difference at 95.0% confidence
	-2.24934 +/- 0.69897
	-0.516296% +/- 0.159993%
	(Student's t, pooled s = 0.983684)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:14:04 +02:00
Connor Abbott
3bd0733011 nir/instr_set: Use _mesa_set_search_or_add()
Before this change, we were searching for each instruction twice, once
when checking if it exists and once when figuring out where to insert
it. By using the new function, we can do everything we need to do in one
operation.

Compilation time numbers for my shader-db database:

Difference at 95.0% confidence
	-4.04706 +/- 0.669508
	-0.922142% +/- 0.151948%
	(Student's t, pooled s = 0.95824)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:13:59 +02:00
Connor Abbott
8a838e172f util/set: Add a _mesa_set_search_or_add() function
Unlike _mesa_set_search_and_add(), it doesn't replace an entry if it's
found, returning it instead. This is useful for nir_instr_set, where
we have to know both the original original instruction and its
equivalent.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:13:45 +02:00
Jonathan Marek
1db86d8b62 freedreno/ir3: fix input ncomp for vertex shaders
ncomp is never set for vertex shaders, but a3xx and a4xx still use it.

Fixes: 831f1a05c0 freedreno/ir3: rework varying packing

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-05-31 12:21:23 -04:00
Ian Romanick
65df6122da intel/compiler: Use compare rematerialization pass
Almost all of the spill / fill benefit is in Deus Ex.

Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17196395 (-0.16%)
instructions in affected programs: 1518658 -> 1490615 (-1.85%)
helped: 1550
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2
helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45%
HURT stats (abs)   min: 5 max: 10 x̄: 6.67 x̃: 5
HURT stats (rel)   min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32%
95% mean confidence interval for instructions value: -19.86 -16.26
95% mean confidence interval for instructions %-change: -1.19% -1.04%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361288721 (-0.05%)
cycles in affected programs: 197367688 -> 197187954 (-0.09%)
helped: 990
HURT: 683
helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16
helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26%
HURT stats (abs)   min: 1 max: 12190 x̄: 905.14 x̃: 22
HURT stats (rel)   min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47%
95% mean confidence interval for cycles value: -315.45 100.58
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12147 -> 8948 (-26.34%)
spills in affected programs: 5433 -> 2234 (-58.88%)
helped: 343
HURT: 0

total fills in shared programs: 25262 -> 21814 (-13.65%)
fills in affected programs: 7771 -> 4323 (-44.37%)
helped: 343
HURT: 3

LOST:   0
GAINED: 17

Ivy Bridge
total instructions in shared programs: 12083517 -> 12081427 (-0.02%)
instructions in affected programs: 540744 -> 538654 (-0.39%)
helped: 786
HURT: 29
helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31%
95% mean confidence interval for instructions value: -2.83 -2.30
95% mean confidence interval for instructions %-change: -0.57% -0.47%
Instructions are helped.

total cycles in shared programs: 180153463 -> 180124798 (-0.02%)
cycles in affected programs: 72597920 -> 72569255 (-0.04%)
helped: 572
HURT: 249
helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13
helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26%
HURT stats (abs)   min: 1 max: 11060 x̄: 136.37 x̃: 10
HURT stats (rel)   min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32%
95% mean confidence interval for cycles value: -96.22 26.39
95% mean confidence interval for cycles %-change: -0.43% -0.23%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3625 -> 3623 (-0.06%)
spills in affected programs: 46 -> 44 (-4.35%)
helped: 1
HURT: 0

total fills in shared programs: 4065 -> 4061 (-0.10%)
fills in affected programs: 104 -> 100 (-3.85%)
helped: 1
HURT: 0

LOST:   0
GAINED: 8

Sandy Bridge
total instructions in shared programs: 10879656 -> 10878699 (<.01%)
instructions in affected programs: 275167 -> 274210 (-0.35%)
helped: 544
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25%
95% mean confidence interval for instructions value: -1.97 -1.55
95% mean confidence interval for instructions %-change: -0.43% -0.36%
Instructions are helped.

total cycles in shared programs: 154089096 -> 154081132 (<.01%)
cycles in affected programs: 4422722 -> 4414758 (-0.18%)
helped: 459
HURT: 214
helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8
helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14%
HURT stats (abs)   min: 1 max: 226 x̄: 19.99 x̃: 4
HURT stats (rel)   min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09%
95% mean confidence interval for cycles value: -15.51 -8.15
95% mean confidence interval for cycles %-change: -0.31% -0.17%
Cycles are helped.

total spills in shared programs: 2880 -> 2876 (-0.14%)
spills in affected programs: 636 -> 632 (-0.63%)
helped: 2
HURT: 0

total fills in shared programs: 3161 -> 3157 (-0.13%)
fills in affected programs: 1519 -> 1515 (-0.26%)
helped: 2
HURT: 0

LOST:   0
GAINED: 2

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8157361 -> 8155067 (-0.03%)
instructions in affected programs: 382491 -> 380197 (-0.60%)
helped: 677
HURT: 0
helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2
helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42%
95% mean confidence interval for instructions value: -3.76 -3.01
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 188588292 -> 188583040 (<.01%)
cycles in affected programs: 3155064 -> 3149812 (-0.17%)
helped: 377
HURT: 13
helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6
helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12%
HURT stats (abs)   min: 2 max: 8 x̄: 5.85 x̃: 6
HURT stats (rel)   min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04%
95% mean confidence interval for cycles value: -15.67 -11.27
95% mean confidence interval for cycles %-change: -0.45% -0.30%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Ian Romanick
3ee2e84c60 nir: Rematerialize compare instructions
On some architectures, Boolean values used to control conditional
branches or condtional selection must be propagated into a flag.  This
generally means that a stored Boolean value must be compared with zero.
Rather than force the generation of extra compares with zero, re-emit
the original comparison instruction.  This can save register pressure by
not needing to store the Boolean value.

There are several possible ares for future improvement to this pass:

1. Be more conservative.  If both sources to the comparison instruction
are non-constants, it may be better for register pressure to emit the
extra compare.  The current shader-db results on Intel GPUs (next
commit) lead me to believe that this is not currently a problem.

2. Be less conservative.  Currently the pass requires that all users of
the comparison match the pattern.  The idea is that after the pass is
complete, no instruction will use the resulting Boolean value.  The only
uses will be of the flag value.  It may be beneficial to relax this
requirement in some cases.

3. Be less conservative.  Also try to rematerialize comparisons used for
discard_if intrinsics.  After changing the way the Intel compiler
generates cod e for discard_if (see MR!935), I tried implementing this
already.  The changes were pretty small.  Instructions were helped in 19
shaders, but, overall, cycles were hurt.  A commit "nir: Rematerialize
comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit.

4. Copy the preceeding ALU instruction.  If the comparison is a
comparison with zero, and it is the only user of a particular ALU
instruction (e.g., (a+b) != 0.0), it may be a further improvment to also
copy the preceeding ALU instruction.  On Intel GPUs, this may enable
cmod propagation to make additional progress.

v2: Use much simpler method to get the prev_block for an if-statement.
Suggested by Tim.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Ian Romanick
336eab0630 nir: Add a shallow clone function for nir_alu_instr
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Tomeu Vizoso
0e1c5cc78f panfrost: Remove link stage for jobs
And instead, link them as they are added.

Makes things a bit clearer and prepares future work such as FB reload
jobs.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-31 14:37:10 +02:00
Tomeu Vizoso
da9f7ab6d4 panfrost: ci: Switch to kernel 5.2-rc2
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-31 13:51:51 +02:00
Tomeu Vizoso
77f5663cf3 panfrost: ci: Update expectations
A bunch of tests have been fixed, but some regressions have appeared on
T760.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-05-31 13:51:43 +02:00
Connor Abbott
78f33620e8 radeonsi/nir: Remove hack for builtins
We now bounds check properly in the uniform loading fast path, so
there's no need to disable it by pretending there are other UBO bindings
in use. The way this looks at the variable name was causing problems
when two piglit shaders, one with a name that triggered the hack and one
that didn't, got hashed to the same thing after stripping out the names.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 11:03:05 +02:00
Connor Abbott
fca1a35163 radeonsi/nir: Use correct location for uniform access bound
location is the API-level location, but driver_location is the actual
location the uniform gets passed to the driver. This apparently only
caused failures with builtins, where the location is 0 because it's
represented via the state tokens instead.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 11:02:57 +02:00
Connor Abbott
6571032af1 radeonsi/nir: Correctly handle double TCS/TES varyings
ac expands the store to 32-bit components for us, but we still have to
deal with storing up to 8 components, and when a varying is split across
two vec4 slots we have to calculate the address again for the second
slot, since they aren't adjacent in memory. I didn't do this on the ac
level because we should generate better indexing arithmetic for the lds
store, where slots are contiguous.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 11:02:11 +02:00
Christian Gmeiner
ca19f7639a etnaviv: blt: s/TRUE/true && s/FALSE/false
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-05-31 10:04:49 +02:00
Christian Gmeiner
9e6463e62a etnaviv: rs: s/TRUE/true && s/FALSE/false
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-05-31 10:04:49 +02:00
Bas Nieuwenhuizen
e24a7840f6 nir: Actually propagate progress in nir_opt_move_load_ubo.
Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950).

Fixes: af355aaa07 "nir: add nir_opt_move_load_ubo() optimization pass"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 07:45:43 +00:00
Samuel Pitoiset
9178076a46 radv: use RADV_CMD_DIRTY_DYNAMIC_* when restoring viewport/scissor
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-31 08:50:16 +02:00
Samuel Pitoiset
0e7b029d00 radv: use CmdPushConstants when restoring constants after meta operations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-31 08:50:13 +02:00
Jason Ekstrand
f1cb3348f1 nir/split_vars: Properly bail in the presence of complex derefs
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand
cc59503b16 nir/vars_to_ssa: Properly ignore variables with complex derefs
Because the core principle of the vars_to_ssa pass is that it globally
(within a function) looks at all of the uses of a never-indirected path
and does a full into-SSA on that path, it can't handle a path which has
any chance of having aliasing.  If a function_temp variable has a cast
or anything else which may cause aliasing, we have to assume that all
paths to that variable may alias and ignore the entire variable.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand
911ea2c66f nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer
We're about to change the meaning of get_deref_node returning NULL so we
need a non-NULL value to mean properly undefined.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand
e84194686d nir/deref: Add a has_complex_use helper
This lets passes easily detect derefs which have uses that fall outside
the standard load/store/copy pattern so they can bail appropriately.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand
8948048c6f nir/dead_cf: Call instructions aren't dead
When we inlined cf_node_has_side_effects into node_is_dead, all the
conditions flipped and we forgot to flip one.  Fortunately, it doesn't
matter right now because no one uses this pass on shaders with more than
one function.

Fixes: b50465d197 "nir/dead_cf: Inline cf_node_has_side_effects"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Dave Airlie
5441d56243 vtn: create cast with type stride.
When creating function parameters, we create pointers from ssa
values, this creates nir casts with stride 0, however we have
no where else to get this value from. Later passes to lower
explicit io need this stride value to do the right thing.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-05-31 09:57:45 +10:00
Rob Clark
372e83b95f list: add some iterator debug
Debugging use of unsafe iterators when you should have used the _safe
version sucks.  Add some DEBUG build support to catch and assert if
someone does that.

I didn't update the UPPERCASE verions of the iterators.  They should
probably be deprecated/removed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-05-30 22:11:26 +00:00
Caio Marcelo de Oliveira Filho
03ce12c5ed nir: Accept nir_var_mem_global in derefs used by phis
This mode is used by PhysicalStorageBufferEXT storage class.

Fixes: 8bdf5a008b "nir: Allow derefs to be used as phi sources"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-30 14:07:29 -07:00
Jason Ekstrand
5e43a75950 intel/blorp: Use the hardware op for CCS ambiguate on gen10+
Cannonlake hardware adds a new resolve type in 3DSTATE_PS called
FAST_CLEAR_0 which does an ambiguate.  Now that the hardware can do it
directly, we should use that instead of binding the CCS as a render
target and doing it manually.  This was tested with a full Vulkan CTS
run on Cannonlake.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-05-30 13:49:48 -07:00
Jan Zielinski
b31a31bba5 swr/rast: Enable ARB_GL_texture_buffer_range
No significant changes in the code needed to enable
the extension. Just updating SWR capabilities
and the documentation

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-05-30 15:42:15 +00:00
Jan Zielinski
cf673747ce swr/rast: fix 32-bit compilation on Linux
Removing unused but problematic code from simdlib header to fix
compilation problem on 32-bit Linux.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-05-30 15:31:15 +00:00
Jason Ekstrand
9e403dc56e intel/fs: Do a stalling MFENCE in endInvocationInterlock()
Fixes: 939312702e "i965: Add ARB_fragment_shader_interlock support"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Jason Ekstrand
859de4a748 intel/fs,vec4: Use g0 as the header for MFENCE
We set header_present but then pass it some random garbage.  Give it g0
instead.  I'm not actually sure this does anything but g0 is the usual
header data and this is what the windows driver does so it seems like a
good idea.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Samuel Pitoiset
43cc3dc9c0 radv: enable transformFeedbackStreamsLinesTriangles
The driver should already support this without any changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-30 15:42:36 +02:00
Samuel Pitoiset
da26013eb7 radv: implement VK_EXT_sample_locations and disable it
Basically, this extension allows applications to use custom
sample locations. It doesn't support variable sample locations
during subpass. Note that we don't have to upload the user
sample locations because the spec doesn't allow this.

The extension is currently disabled because the driver needs to
support variable sample locations during layout transitions. The
depth decompress needs to know them and that's a bit invasive.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-30 09:52:16 +02:00
Kenneth Graunke
e917bb7ad4 iris: Avoid holding the lock while allocating pages.
We only need the lock for:
1. Rummaging through the cache
2. Allocating VMA

We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also
SET_DOMAIN to allocate the underlying pages.  The idea behind calling
SET_DOMAIN was to avoid a lock in the kernel while allocating pages,
now we avoid our own global lock as well.

We do have to re-lock around VMA.  Hopefully this shouldn't happen too
much in practice because we'll find a cached BO in the right memzone
and not have to reallocate it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2019-05-30 00:46:37 -07:00
Kenneth Graunke
0cb380a6b3 iris: Move SET_DOMAIN to alloc_fresh_bo()
Chris pointed out that the order between SET_DOMAIN and SET_TILING
doesn't matter, so we can just do the page allocation when creating
a new BO.  Simplifies the flow a bit.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2019-05-30 00:15:26 -07:00
Kenneth Graunke
53878f7a89 iris: Be lazy about cleaning up purged BOs in the cache.
Mathias Fröhlich reported that commit 6244da8e23 crashes.
list_for_each_entry_safe is safe against removing the current entry,
but iris_bo_cache_purge_bucket was potentially removing next entries
too, which broke our saved next pointer.

To fix this, don't bother with the iris_bo_cache_purge_bucket step.
We just detected a single entry where the kernel has purged the BO's
memory, and so it isn't a usable entry for our cache.  We're about to
continue the search with the next BO.  If that one's purged, we'll
clean it up too.  And so on.

We may miss cleaning up purged BOs that are further down the list
after non-purged BOs...but that's probably fine.  We still have the
time-based cleaner (cleanup_bo_cache) which will take care of them
eventually, and the kernel's already freed their memory, so it's not
that harmful to have a few kicking around a little longer.

Fixes: 6244da8e23 iris: Dig through the cache to find a BO in the right memzone
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2019-05-29 23:38:01 -07:00
Kenneth Graunke
6244da8e23 iris: Dig through the cache to find a BO in the right memzone
This saves some util_vma thrash when the first entry in the cache
happens to be in a different memory zone, but one just a tiny bit
ahead is already there and instantly reusable.  Hopefully the cost
of a little extra searching won't break the bank - if it does, we
can consider having separate list heads or keeping a separate VMA
cache.

Improves OglDrvRes performance by 22%, restoring a regression from
deleting the bucket allocators in 694d1a08d3.

Thanks to Clayton Craft for alerting me to the regression.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 20:03:45 -07:00
Kenneth Graunke
4c2d9729df iris: Tidy BO sizing code and comments
Buckets haven't been power of two sized in over a decade.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:42:15 -07:00
Kenneth Graunke
7acc88a47c iris: Move some field setting after we drop the lock.
It's not much, but we may as well hold the lock for a bit less time.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:42:04 -07:00
Kenneth Graunke
76c5a19668 iris: Move cached BO allocation into a helper function.
There's enough going on here to warrant a helper.  This also simplifies
the control flow and eliminates the last non-error-case goto.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:41:52 -07:00
Kenneth Graunke
cea6671395 iris: Fall back to fresh allocations of mapping for zero-memset fails.
It is unlikely that we would fail to map a cached BO in order to zero
its contents.  When we did, we would free the first BO in the cache and
try again with the second.  It's possible that this next BO already had
a map setup, in which case we'd succeed.  But if it didn't, we'd likely
fail again in the same manner.

There's not much point in optimizing this case (and frankly, if we're
out of CPU-side VMA we should probably dump the cache entirely)...so
instead, just fall back to allocating a fresh BO from the kernel which
will already be zeroed so we don't have to try and map it.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:41:50 -07:00
Kenneth Graunke
042f8514e6 iris: Move fresh BO allocation into a helper function.
There's enough going on here to warrant a helper.  More cleaning coming.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:41:22 -07:00
Kenneth Graunke
06421e5be7 iris: Do SET_TILING at a single point rather than in two places.
Both the from-cache and fresh-from-GEM cases were calling SET_TILING.
In the cached case, we would retry the allocation on failure, pitching
one BO from the cache each time.  This is silly, because the only time
it should fail is if the tiling or stride parameters are unacceptable,
which has nothing to do with the particular BO in question.  So there's
no point in retrying - we should simply fail the allocation.

This patch moves both calls to bo_set_tiling_internal() below the
cache/fresh split, so we have it at a single point in time instead
of two.

To preserve the ordering between SET_TILING and SET_DOMAIN, we move
that below as well.  (I am unsure if the order matters.)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:41:08 -07:00
Kenneth Graunke
43d835cb0f iris: Use the BO cache even for coherent buffers on non-LLC.
We mark snooped BOs as non-reusable, so we never return them to the
cache.  This means that we'd need to call I915_GEM_SET_CACHING to make
any BO we find in the cache snooped.  But then again, any BO we freshly
allocate from the kernel will also be non-snooped, so it has the same
issue.  There's really no reason to skip the cache - we may as well use
it to avoid the I915_GEM_CREATE overhead.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:40:18 -07:00
Kenneth Graunke
78003014d0 iris: Fix locking around vma_alloc in iris_bo_create_userptr
util_vma needs to be protected by a lock.  All other callers of
vma_alloc and vma_free appear to be holding a lock already.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:40:16 -07:00
Kenneth Graunke
5fc11fd988 iris: Fix lock/unlock mismatch for non-LLC coherent BO allocation.
The goto jumped over the mtx_lock, but proceeded to hit the mtx_unlock.
We can simply set the bucket to NULL and it will skip the cache without
goto, and without messing up locking.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:40:15 -07:00
Marek Olšák
2285b93032 radeonsi: fix timestamp queries for compute-only contexts
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2019-05-29 21:13:35 -04:00
Marek Olšák
b5697c311b Change a few frequented uses of DEBUG to !NDEBUG
debugoptimized builds don't define NDEBUG, but they also don't define
DEBUG. We want to enable cheap debug code for these builds.
I only chose those occurences that I care about.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-29 21:13:35 -04:00
Kenneth Graunke
0f1b68ebee iris: Re-emit Surface State Base Address when context is lost.
When we hit a GPU hang, we failed to reset Surface State Base Address
right away, and would keep hanging until we filled up the binder.  Then
we'd finally get it right after a lot of repeated stumbles.  Update it
right away so we hopefully hang fewer times before succeeding.
2019-05-29 16:35:02 -07:00
Jason Ekstrand
e459d6d6df iris: Enable nir_opt_large_constants
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15306230 -> 15304726 (<.01%)
    instructions in affected programs: 4570 -> 3066 (-32.91%)
    helped: 16
    HURT: 0

    total cycles in shared programs: 361703436 -> 361680041 (<.01%)
    cycles in affected programs: 129388 -> 105993 (-18.08%)
    helped: 16
    HURT: 0

    LOST:   0
    GAINED: 2

The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal
Space Program

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-29 21:09:16 +00:00
Jason Ekstrand
9dc57eebd5 iris: Don't assume UBO indices are constant
It will be true for the constant/system value buffer because they use a
constant zero but it's not true in general.  If we ever got here when
the source wasn't constant, nir_src_as_uint would assert.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2019-05-29 21:09:16 +00:00
Jason Ekstrand
744f93f5c1 iris: Move upload_ubo_ssbo_surf_state to iris_program.c
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-29 21:09:16 +00:00
Brian Paul
e584fd894e nir: silence three compiler warnings seen with MinGW
Silence two unused var warnings.  And init elem_size, elem_align to
zero to silence "maybe uninitialized" warnings.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-29 13:59:24 -06:00
Brian Paul
c71ca65405 svga: clamp max_const_buffers to SVGA_MAX_CONST_BUFS
In case the device reports 15 (or more) buffers.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-05-29 13:59:23 -06:00
Kenneth Graunke
6892d2b94a iris: Clone before calling nir_strip and serializing
This is non-destructive and leaves the debugging information in place.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-29 18:16:32 +00:00
Kenneth Graunke
e1409aead5 iris: Only store the SHA1 of the NIR in iris_uncompiled_shader
Jason pointed out that we don't need to keep an entire copy of the
serialized NIR around, we just need the SHA1.  This does change our
disk cache key to be taking a SHA1 of a SHA1, which is a bit odd,
but should work out and be faster and use less memory.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-29 18:16:32 +00:00
Caio Marcelo de Oliveira Filho
e45bf01940 spirv: Change spirv_to_nir() to return a nir_shader
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it.  There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.

The return type reflects better what the function name suggests.  It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it.  When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho
a3bfdacb6c radv: Don't re-use entry_point pointer from spirv_to_nir
Replace its uses with checking for is_entrypoint and calling
nir_shader_get_entrypoint().

This is a preparation to change spirv_to_nir() return type.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho
ee59bac9f4 glspirv: Don't re-use entry_point pointer from spirv_to_nir
Replace its use with checking for is_entrypoint.

This is a preparation to change spirv_to_nir() return type.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-29 10:34:30 -07:00
Caio Marcelo de Oliveira Filho
c92d002982 turnip: Don't re-use entry_point pointer from spirv_to_nir
Replace its uses with nir_shader_get_entrypoint(), and change the
helper function to return nir_shader *.

This is a preparation to change spirv_to_nir() return type.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:26:22 -07:00
Chia-I Wu
0a0be7aee0 virgl: fix readback with pending transfers
When readback is true, and there are pending writes in the transfer
queue, we should flush to avoid reading back outdated data.  This
fixes piglit arb_copy_buffer/dlist and a subtest of
arb_copy_buffer/data-sync.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-05-29 16:47:04 +00:00
Caio Marcelo de Oliveira Filho
8bdf5a008b nir: Allow derefs to be used as phi sources
It is possible and valid for a pointer to be selected based on a
conditional before used, and depending on the mode, those cases will
result in a phi with derefs as sources.

To achieve this, we don't rematerialize derefs that are used by phis.
As a consequence, when converting from SSA to regs, we may have phis
that come from different blocks and are used by phis.  We now convert
those to regs too.

Validation was added to ensure only derefs of certain modes can be
used as phi sources.  No extra validation is needed for the presence
of cast, any instruction that uses derefs will validate the
deref-chain is complete (ending in a cast or a var).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-29 08:19:15 -07:00
Connor Abbott
ee2a92bcde radeonsi: Fix editorconfig
At least on vim, indenting doesn't work without this. Copied from
src/amd/vulkan.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-29 15:55:40 +02:00
Erik Faye-Lund
551b61528f mesa/main: clean up extension-check for GL_SAMPLE_MASK
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-29 10:54:09 +02:00
Erik Faye-Lund
426e896515 mesa/main: clean up extension-check for GL_SAMPLE_SHADING
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-29 10:54:09 +02:00
Erik Faye-Lund
b9e9d701dc mesa/main: correct extension-checks for GL_PRIMITIVE_RESTART_FIXED_INDEX
This shouldn't be allowed in GLES 1/2.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-29 10:54:09 +02:00
Erik Faye-Lund
34ade0dc7c mesa/main: correct extension-checks for GL_BLEND_ADVANCED_COHERENT_KHR
KHR_blend_equation_advanced_coherent isn't exposed on OpenGL ES 1.x, so
we shouldn't allow its enums there either.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-29 10:54:09 +02:00
Erik Faye-Lund
c0dabc6192 mesa/main: correct extension-checks for GL_FRAMEBUFFER_SRGB
This enum shouldn't be allowed on OpenGL ES 1.x, so let's instead
use the extenion-helpers, and check for desktop and gles extensions
separately.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-29 10:54:09 +02:00
Erik Faye-Lund
a33ff7876f mesa/main: correct extension-checks for MESA_tile_raster_order
This extension isn't enabled for GLES 1.x, so we shouldn't allow the
state there. Let's use the extension-helpers instead of CHECK_EXTENSION
for this.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-29 10:54:09 +02:00
Erik Faye-Lund
bf91d6ae4a mesa/main: make the CONSERVATIVE_RASTERIZATION_NV checks consistent
This just makes the logic of the checks for this enum the same for
gl{Enable,Disable} and for glIsEnabled. They are already functionally
the same, so this is just a minor code-cleanup.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-29 10:54:09 +02:00
Erik Faye-Lund
00c683bc8e mesa/main: make the PRIMITIVE_RESTART_NV checks consistent
{En,Dis}ableClientState(PRIMITIVE_RESTART_NV) should only work on
compatibility contextxs. While we're at it, modernize the code a bit,
by using the extension helpers instead of open-coding.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-29 10:54:09 +02:00
Samuel Pitoiset
d3771ccaa3 radv: use view format when selecting the resolve path for subpasses
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 08:53:48 +02:00
Samuel Pitoiset
017170a785 radv: always use view format when performing subpass resolves
It makes sense to use the image view formats when resolving
inside subpasses, while we have to use the image formats for
normal resolves.

Original patch by Philip Rebohle.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110348
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 08:53:46 +02:00
Samuel Pitoiset
eaeaad25f7 radv: sync before resetting a pool if there is active pending queries
Make sure to sync all previous work if the given command buffer
has pending active queries. Otherwise the GPU might write queries
data after the reset operation.

This fixes a bunch of new dEQP-VK.query_pool.* CTS failures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 08:47:54 +02:00
Kenneth Graunke
bc273dece2 intel/decoder: Use get_state_size() over guessed counts in more cases
This makes the following packets use actual driver provided sizes rather
than guessing an arbitrary number:

  - CC_VIEWPORT
  - SF_CLIP_VIEWPORT
  - BLEND_STATE
  - COLOR_CALC_STATE
  - SCISSOR_RECT

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-05-28 13:44:16 -07:00
Mike Lothian
29ea92e6a1 meson: Link Gallium drivers with ld_args_build_id
Link all Gallium drivers with ld_args_build_id to prevent failures in
Iris that uses GNU_BUILD_ID

Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=110757
Fixes: 4756864cdc "iris: Start wiring up on-disk shader cache"

Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-28 13:37:36 -07:00
Lionel Landwerlin
366811bedb nir/lower_non_uniform: safely iterate over blocks
This fixes a problem where the same instruction gets replaced twice.
This was happening when the replaced instruction would be at the end
of a block.

Replacement of :

   if ssa_8 {
                ....
      intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) /* image_dim=Buf */ /* image_array=false */ /* format=34836 */ /* access=32 */
   }

Would be :

   if ssa_8 {
      loop {
         vec1 32 ssa_47 = intrinsic read_first_invocation (ssa_44) ()
         vec1 1 ssa_48 = ieq ssa_47, ssa_44
         if ssa_48 {
            loop {
               vec1 32 ssa_49 = intrinsic read_first_invocation (ssa_44) ()
               vec1 1 ssa_50 = ieq ssa_49, ssa_44
               if ssa_50 {
                  intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) /* image_dim=Buf */ /* image_array=false */ /* format=34836 */ /* access=32 */
                  break
               } else {
        ....
   }

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3bd5457641 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-28 20:23:16 +01:00
Samuel Pitoiset
47a10edefb radv: allocate more space in the CS when emitting events
If the driver waits for CP DMA to be idle and emit an EOP event
we need more space.

This fixes a crash with Quake Champions.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-28 16:56:17 +02:00
Kenneth Graunke
6a9e39d44b iris: Ask st to vectorize our IO.
(Technically this is common code, but it doesn't affect i965 or anv.)

Improves performance of GFXBench5/gl_tess_off on Skylake GT4e at 1080p
by 9.3933% +/- 0.0305157% by eliminating all spilling in the GS.

Improves performance of GFXBench5/gl_4_off (Car Chase) on Skylake GT4e
at 1080p by 0.325208% +/- 0.0842233% (n=18).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-28 01:06:48 -07:00
Kenneth Graunke
c31b4420e7 st/nir: Re-vectorize shader IO
We scalarize IO to enable further optimizations, such as propagating
constant components across shaders, eliminating dead components, and
so on.  This patch attempts to re-vectorize those operations after
the varying optimizations are done.

Intel GPUs are a scalar architecture, but IO operations work on whole
vec4's at a time, so we'd prefer to have a single IO load per vector
rather than 4 scalar IO loads.  This re-vectorization can help a lot.

Broadcom GPUs, however, really do want scalar IO.  radeonsi may want
this, or may want to leave it to LLVM.  So, we make a new flag in the
NIR compiler options struct, and key it off of that, allowing drivers
to pick.  (It's a bit awkward because we have per-stage settings, but
this is about IO between two stages...but I expect drivers to globally
prefer one way or the other.  We can adjust later if needed.)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-28 01:06:48 -07:00
Mathias Fröhlich
1d0a8cf40d mesa: Prevent classic swrast crash on a surfaceless context v2.
This fixes the egl_mesa_platform_surfaceless piglit test as well
as the new egl_ext_device_base piglit test on classic swrast.

v2: Fix swrast surfaceless contexts on the driver side.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-28 08:27:16 +02:00
Samuel Pitoiset
15cb19ed6f radv add radv_get_resolve_pipeline() in the compute path
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-28 08:17:26 +02:00
Samuel Pitoiset
469258c3b1 radv: cleanup the compute resolve path for subpass
This makes use of radv_meta_resolve_compute_image() by filling
a VkImageResolve region instead of duplicating code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-28 08:17:23 +02:00
Timothy Arceri
d2b0246741 radeonsi: add drirc workaround for American Truck Simulator
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110711
2019-05-28 08:47:44 +10:00
Timothy Arceri
11e16ca7ce Revert "st/mesa: expose 0 shader binary formats for compat profiles for Qt"
This reverts commit 55376cb31e.

It's been over a year and both QT 5.9.5 and 5.11.0 contained a fix for the
original issue. It seems i965 only ever applied this workaround to the
18.0 branch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-28 08:46:50 +10:00
Lionel Landwerlin
2042f22e28 anv: fix apply_pipeline_layout pass for arrays of YCbCr descriptors
When using the binding tables to access arrays of YCbCr descriptors we
did not consider the offset of the accessed element. We can't do a
simple multiple because the binding table entries are tightly packed.

For example element 0 of the array could use 2 entries/planes and
element 1 could use 2 entries/planes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3bb8768b9d ("anv: toggle on support for VK_EXT_ycbcr_image_arrays")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-05-27 22:47:53 +01:00
Marek Olšák
fccced57cf radeonsi: clean up winsys creation
- unify the code
- choose radeon or amdgpu based on the DRM version, not based on which one
  succeeds first
2019-05-27 15:26:06 -04:00
Marek Olšák
bb5d82bd06 radeonsi: allow query functions for compute-only contexts 2019-05-27 15:26:06 -04:00
Marek Olšák
b257956021 ac: treat Mullins as Kabini, remove the enum
it's the same design
2019-05-27 15:10:51 -04:00
Christian Gmeiner
37af75f88c etnaviv: rs: choose clear format based on block size
Fixes following piglit and does not introduce any regressions.
  spec@ext_packed_depth_stencil@fbo-depth-gl_depth24_stencil8-blit

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2019-05-27 20:55:11 +02:00
Vasily Khoruzhick
af0de6b91c lima/ppir: implement discard and discard_if
This commit also adds codegen for branch since we need it
for discard_if.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-27 07:39:03 -07:00
Samuel Pitoiset
7a7be61398 radv: ignore the loadOp if the first use of an attachment is a resolve
Based on ANV.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-27 13:52:39 +02:00
Samuel Pitoiset
ff27eb509a radv: always dirty the framebuffer when restoring a subpass
The old code was not wrong because the transitions performed
after the resolves should re-emit the framebuffer if needed.

This change is mostly a no-op but it improves consistency
regarding other meta operations that need to save/restore subpasses.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-27 13:52:36 +02:00
Samuel Pitoiset
9af15986b0 radv: add radv_clear_htile() helper
This helper will be useful for clearing HTILE after some
depth/stencil resolves.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-27 13:52:34 +02:00
Chenglei Ren
13b38ca1e4 anv/android: fix missing dependencies issue during parallel build
The libmesa_anv_gen* modules require anv_extensions.h, patch makes sure
it gets generated as a dependency before building them.

Signed-off-by: Chenglei Ren <chenglei.ren@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2019-05-27 10:13:17 +03:00
Samuel Pitoiset
2d2e7954c3 radv: tidy up GetQueryPoolResults for occlusion queries
Just move the block that checks the availability bit into the
switch like other query types.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-27 08:50:55 +02:00
Kenneth Graunke
b5fa3abfc2 iris: Don't flag IRIS_DIRTY_URB after BLORP operations unless it changed
We already flag IRIS_DIRTY_URB when we change it, but we were
additionally flagging it on every BLORP operation, even if we didn't.
2019-05-26 17:45:18 -07:00
Dave Airlie
7fe5a8e874 Revert "mesa: unreference current winsys buffers when unbinding winsys buffers"
This reverts commit 12bf7cfecf.

This commits caused lots of problems:
https://bugs.freedesktop.org/show_bug.cgi?id=110721
https://bugs.freedesktop.org/show_bug.cgi?id=110761

Fixes: 12bf7cfecf ("mesa: unreference current winsys buffers when unbinding winsys buffers")
Pushing without review as we need to get it into next stable.
2019-05-27 09:36:28 +10:00
Alyssa Rosenzweig
659aa3dd65 panfrost/midgard: Implement fneg/fabs/fsat
Fix a regression I inadvertently caused by acking typeless movs before
implementing/pushing this *whistles*

Nothing to see here, move along folks.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-26 03:16:37 +00:00
Qiang Yu
1dc593e9b9 lima: fix lima_blit with non-zero level source resource
lima_blit will do blit between resources with different levels.
When blit from a level!=0 source, it will sample from that level
of resource as texture.

Current texture setup won't respect level when not mipmap filter.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-05-25 12:41:44 +08:00
Qiang Yu
54490b0b36 lima: fix render to non-zero level texture
Current implementation won't respect level of surface to render.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-05-25 12:41:31 +08:00
Dylan Baker
9838185056 editorconfig: Fix meson style
The syntax was wrong, resulting in it not working at all.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-24 18:44:18 +00:00
Chia-I Wu
ea1e0acfd0 virgl: remove an incorrect check in virgl_res_needs_flush
Imagine this

  resource_copy_region(ctx, dst, ..., src, ...);
  transfer_map(ctx, src, 0, PIPE_TRANSFER_WRITE, ...);

at the beginning of a cmdbuf.  We need to flush in transfer_map so
that the transfer is not reordered before the resource copy.  The
check for "vctx->num_draws == 0 && vctx->num_compute == 0" is not
enough.  Removing the optimization entirely.

Because of the more precise resource tracking in the previous
commit, I hope the performance impact is minimized.  We will have to
go with perfect resource tracking, or attempt a more limited
optimization, if there are specific cases we really need to optimize
for.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-24 17:37:40 +00:00
Chia-I Wu
56f9b60e50 virgl: reemit resources on first draw/clear/compute
This gives us more precise resource tracking.  It can be beneficial
because glFlush is often followed by state changes.  We don't want
to reemit resources that are going to be unbound.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-24 17:37:40 +00:00
Chia-I Wu
424ec2356b virgl: add missing emit_res for SO targets
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-24 17:37:40 +00:00
Roland Scheidegger
d4e8a44bf6 gallivm: fix default cbuf info.
The default null_output really needs to be static, otherwise the values
we'll eventually get later are doubly random (they are not initialized,
and even if they were it's a pointer to a local stack variable).
VMware bug 2349556.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-05-24 19:22:50 +02:00
Roland Scheidegger
84f3f1cf00 scons: fix build with llvm 9.
The x86asmprinter component is gone, and things seem to work by just
removing it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110707

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-05-24 18:28:28 +02:00
Tomeu Vizoso
9fe1a925e2 panfrost: Dereference sampled texture
We are currently leaking resources if they were sampled from. Once we
are done with a sampler, we should dereference that resource.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 16:50:09 +02:00
Tomeu Vizoso
3c81010213 panfrost: ci: Avoid pulling Docker image on every run
Jump over the container stage if we haven't changed any of the files
that involved in building the container images.

This saves 1-2 minutes in each run and helps conserve resources.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 16:50:09 +02:00
Jason Ekstrand
f2dc0f2872 nir: Drop imov/fmov in favor of one mov instruction
The difference between imov and fmov has been a constant source of
confusion in NIR for years.  No one really knows why we have two or when
to use one vs. the other.  The real reason is that they do different
things in the presence of source and destination modifiers.  However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
22421ca7be nir/builder: Merge nir_[if]mov_alu into one nir_mov_alu helper
Unless source modifiers are present, fmov and imov are the same.
There's no good reason for having two helpers.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
cd73b6174b nir/lower_to_source_mods: Stop turning add, sat, and neg into mov
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
2a39788d03 nir/source_mods: Add a helpers for setting source modifiers
It's potentially a tiny bit less efficient but the helpers make it much
easier to sort out the rules for updating source modifiers.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
8ffbb54405 intel: Implement abs, neg, and sat in the back-end
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
4fde459563 intel/nir: Call alu_to_scalar one last time before out-of-ssa
A few of our very late passes can end up generating vectors accidentally
so we need to get rid of them.  The only known case of this is the ffma
peephole which generates fneg and fabs as vectors.  Currently, they're
not a problem because they get turned into fmov which the back-end
compiler knows how to handle as a vector.  That's about to change.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
ddd08e1888 nir/builder: Remove the use_fmov parameter from nir_swizzle
This flag has caused more confusion than good in most cases.  You can
validly use imov for floats or fmov for integers because, without source
modifiers, neither modify their input in any way.  Using imov for floats
is more reliable so we go that direction.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
6c2ca2a5d3 ptn,ttn: Use nir_channel for selecting channels
Both of these passes predate the nir_channel helper.  We should just use
it instead of hand-rolling it in both passes.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Michel Zou
88eb2a1f7e scons: For MinGW use -posix flag.
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
2019-05-24 12:18:40 +01:00
Christian Gmeiner
78fb5594be etnaviv: use the correct uniform dirty bits
Found during code inspection.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-05-24 12:41:43 +02:00
Danylo Piliaiev
c82dcf89ae anv: Do not emulate texture swizzle for INPUT_ATTACHMENT, STORAGE_IMAGE
If descriptorType is VK_DESCRIPTOR_TYPE_STORAGE_IMAGE
or VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT, the imageView member of each
element of pImageInfo must have been created with the identity swizzle.

Fixes: d2aa65eb

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-24 09:20:38 +00:00
Tapani Pälli
397fe0cc50 st/dri: enable EGL_ANDROID_blob_cache on gallium drivers
Verified to work properly with Iris driver on Android Celadon. Cache
files get generated as 'com.android.opengl.shaders_cache' for each
application.

v2: check that cache was returned (Eric Engestrom)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-24 09:17:04 +03:00
Alyssa Rosenzweig
ea6b581444 panfrost: Remove the standalone compiler
Now that the online compiler and pandecode are reliable and upstreamed,
nobody is using this. If somebody does need it, it should be easy enough
to bring back, I suppose. At the moment, it's just a maintenance hazard,
since meson is silly and does double builds for compiler updates (triple
for disassembler changes).

If people need the standalone _disassembler_, that can be added
trivially into pandecode (pandecode already includes the disassembler).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-05-24 03:10:43 +00:00
Eric Engestrom
8d386e6eef vk/util: suppress warning about out-of-enum android value
src/vulkan/util/vk_enum_to_str.c: In function ‘vk_structure_type_size’:
src/vulkan/util/vk_enum_to_str.c:3335:9: warning: case value ‘1000010000’ not in enumerated type ‘VkStructureType’ {aka ‘const enum VkStructureType’} [-Wswitch]
         case VK_STRUCTURE_TYPE_NATIVE_BUFFER_ANDROID: return sizeof(VkNativeBufferANDROID);
         ^~~~

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-23 15:28:43 +00:00
Kenneth Graunke
25afbb04c2 iris: Advertise coherent framebuffer fetches
This lets us advertise GL_EXT_shader_framebuffer_fetch and
GL_KHR_blend_equation_advanced_coherent support.
2019-05-23 08:13:10 -07:00
Kenneth Graunke
cca8af0c7d gallium: Add PIPE_CAP_FBFETCH_COHERENT and expose extensions
st/mesa now exposes KHR_blend_equation_advanced_coherent and
EXT_shader_framebuffer_fetch if the new capability is supported.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-23 08:13:09 -07:00
Kenneth Graunke
87f4286137 st/mesa: Advertise GL_EXT_shader_framebuffer_fetch_non_coherent
This extension requires the ability to read from all render targets,
so we only enable it if PIPE_CAP_FBFETCH >= PIPE_CAP_MAX_RENDER_TARGETS.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-23 08:13:09 -07:00
Kenneth Graunke
a2d7834457 gallium: Change PIPE_CAP_TGSI_FS_FBFETCH bool to PIPE_CAP_FBFETCH count
TGSI's FBFETCH instruction currently only supports reading from a single
render target, but NIR intrinsics can support multiple render targets.

radeonsi can only support fetching from RT 0, but other drivers may be
able to support fetching from any render target.

To express this, this patch renames PIPE_CAP_TGSI_FS_FBFETCH to simply
PIPE_CAP_FBFETCH, and converts it from a boolean "is FBFETCH supported?"
to an integer number of render targets which can be fetched.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-23 08:13:07 -07:00
Kenneth Graunke
7d2b54e393 iris: Record state sizes for INTEL_DEBUG=bat decoding.
Felix noticed a crash when using INTEL_DEBUG=bat decoding.  It turned
out that we were sometimes placing variable length data near the end
of a buffer, and with the decoder guessing random lengths rather than
having an actual count, it was walking off the end and crashing.  So
this does more than improve the decoder output.

Unfortunately, this is a bit more complicated than i965's handling,
because we don't have a single state buffer.  Various places upload
data via u_upload_mgr, and so there isn't a central place to record
the size.  We don't need to catch every single place, however, since
it's only important to record variable length packets (like viewports
and binding tables).

State data also lives arbitrarily long, rather than being discarded on
every batch like i965, so we don't know when to clear out old entries
either.  (We also don't have a callback when an upload buffer is
released.)  So, this tracking may space leak over time.  That's probably
okay though, as this is only a debugging feature and it's a slow leak.
We may also get lucky and overwrite existing entries as we reuse BOs,
though I find this unlikely to happen.

The fact that the decoder works in terms of offsets from a state base
address is also not ideal, as dynamic state base address and surface
state base address differ for iris.  However, because dynamic state
addresses start from the top of a 4GB region, and binding tables start
from addresses [0, 64K), it's highly unlikely that we'll get overlap.

We can always improve this, but for now it's better than what we had.
2019-05-23 08:07:08 -07:00
Eric Engestrom
00cfeacf31 vk/util: drop no-op compiler warning workaround
`-Wswitch` applies to `switch()`, not `case:`, and is bypassed by the
presence of a `default:` anyway, so let's drop the `default:` and move
the warning suppression to where it can make a difference, and then it
turns out that we don't need to keep a list of special cases anymore :)

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-23 15:06:11 +00:00
Erik Faye-Lund
90e7ce5bde mesa/main: make the CONSERVATIVE_RASTERIZATION_INTEL checks consistent
INTEL_conservative_rasterization isn't exposed on compatibility
contexts, nor for GLES 3.0 and below. We already do this correctly for
gl{Enable,Disable}, but we should do the same for glIsEnabled as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-23 11:43:18 +02:00
Erik Faye-Lund
0dff3eecda mesa/main: make the FRAGMENT_PROGRAM checks consistent
IsEnabled(FRAGMENT_PROGRAM) isn't supposed to be allowed, but our
check allowed this anyway. Let's make these checks consistent, and
while we're at it, modernize them a bit.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-23 11:35:55 +02:00
Erik Faye-Lund
147751a856 mesa/main: make the TEXTURE_CUBE_MAP checks consistent
IsEnabled(TEXTURE_CUBE_MAP) isn't supposed to be allowed, but our
check allowed this anyway. Let's make these checks consistent, and
while we're at it, modernize them a bit.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-23 11:35:55 +02:00
Erik Faye-Lund
182d75d2a5 mesa/main: remove duplicate macros
These are already defined as the exactly same, so let's get rid of
the duplicate definitions.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-23 11:35:55 +02:00
Erik Faye-Lund
e002763c99 mesa/main: remove unused argument
The 'CAP' argument has been unused in both of these macros since
2010, so let's get rid of it from both.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-23 11:35:55 +02:00
Erik Faye-Lund
619b2c9a7d mesa/main: remove unused macro
The first version of this macro is unused, so let's get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-05-23 11:35:55 +02:00
Timothy Arceri
a482cf6ab2 glsl: simplify resource list building code
This greatly simplifies the code to calculate if we should add a
buffer to the resource list. This uses the spec rules and simple
math to decide if we should add the buffer rather than complex
string processing.

This patch refines a patch present in the ARB_gl_spriv merge
request for the NIR linker and applies it to the GLSL IR linker.
This is why we also move the function to the shared linker code,
because we will want to reuse the code for the NIR linker also.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-05-23 15:06:20 +10:00
Chia-I Wu
96c2851586 virgl: track valid buffer range for transfer sync
virgl_transfer_queue_is_queued was used to avoid flushing.  That
fails when the resource is being accessed by previous cmdbufs but
not the current one.

The new approach of tracking the valid buffer range does not apply
to textures however.  But hopefully it is fine because the goal is
to avoid waiting for this scenario

  glBufferSubData(..., offset, size, data1);
  glDrawArrays(...);
  // append new vertex data
  glBufferSubData(..., offset+size, size, data2);
  glDrawArrays(...);

If glTex(Sub)Image* turns out to be an issue, we will need to track
valid level/layer ranges as well.

v2: update virgl_buffer_transfer_extend as well
v3: do not remove virgl_transfer_queue_is_queued

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> (v1)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org> (v2)
2019-05-22 09:28:19 -07:00
Chia-I Wu
440982cdd6 virgl: remove support for buffer surfaces
st/mesa does not need it and virglrenderer does not really support
it.  Remove the support so that we are sure pipe_surface never
refers to a buffer resource.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-22 09:28:19 -07:00
Chia-I Wu
fa9afb9de0 virgl: handle NULL shader resource explicitly
When shader images/buffers are set, do not rely on
virgl_encoder_write_res and virgl_resource_dirty to do the implicit
NULL check.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-22 09:28:19 -07:00
Lionel Landwerlin
cb7c9b2a93 vulkan: fix build dependency issue with generated files
On machines with many cores, you can run into that issue :

../mesa-9999/src/vulkan/overlay-layer/overlay.cpp:42:10: fatal error: vk_enum_to_str.h: No such file or directory

v2: Move declare_dependency around (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Jan Ziak
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-22 14:07:14 +00:00
Greg V
506ebf55c0 gallium: enable dmabuf on BSD as well
The DRM_CONF_SHARE_FD code did not check for Linux, so the commit that
introduced PIPE_CAP_DMABUF broke Wayland-EGL clients on FreeBSD.

Fixes: 8ae50e60 (gallium: replace DRM_CONF_SHARE_FD with PIPE_CAP_DMABUF)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-22 13:20:31 +00:00
Tapani Pälli
ed563b79df iris: fix android build
Fixes: 4756864cdc ""iris: Start wiring up on-disk shader cache
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-22 14:01:41 +03:00
Philipp Zabel
1ccb8a071b etnaviv: fill missing offset in etna_resource_get_handle
Without this gbm_bo_get_offset() can return 0 where it shouldn't.

Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
2019-05-22 12:57:40 +02:00
Samuel Pitoiset
32a0bc915a radv: do not reset query pool during creation
From the Vulkan spec 1.1.108:
   "After query pool creation, each query must be reset before
    it is used."

So, the driver doesn't need to do this at creation time.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-22 08:36:41 +02:00
Samuel Pitoiset
e9bfd88183 radv: fix the sample max distance value for 8x
It should be 7, not 8.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-22 08:36:39 +02:00
Samuel Pitoiset
bc4548ca3d radv: emit correct centroid priority based on the number of samples
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-22 08:36:37 +02:00
Samuel Pitoiset
a7763ddcf2 radv: clean up the sample locations codebase
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-22 08:36:35 +02:00
Samuel Pitoiset
135dff8dcf radv: remove remaining code related to 16 samples
The driver only supports up to 8 samples.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-22 08:36:33 +02:00
Kenneth Graunke
6dc1c2d8bd iris: Fix ALT mode regressions from shader cache
We were checking this based on nir->info.name, but with the shader
cache enabled, nir_strip throws out the name, causing us to use IEEE
mode for ARB programs.

gl-1.0-spot-light regressed because it wants ALT mode for 0^0 behavior.

Fixes: dc5dc727d5 iris: Serialize the NIR to a blob we can use for shader cache purposes.
2019-05-21 16:58:54 -07:00
Marek Olšák
d6053bf2a1 radeonsi: fix a regression in si_rebind_buffer
Don't update non-buffer images.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110701
Fixes: 78e35df52a "radeonsi: update buffer descriptors in all contexts after buffer invalidation"

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Tested-By: Gert Wollny <gert.wollny@collabora..com>
2019-05-21 18:58:03 -04:00
Kenneth Graunke
fb1d08dcfd iris: Expose the disk cache to the state tracker as well.
This lets st/nir cache the NIR for shaders, based on the shader source
string hash, allowing us to skip initial compiles altogether, and also
letting us start from there should we need to recompile for NOS.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-21 15:05:38 -07:00
Dylan Baker
601c9bc135 iris: Cache assembly shaders in the on-disk shader cache
This implements storing and retrieving iris_compiled_shader objects
from the on-disk shader cache.

(by Dylan Baker and Kenneth Graunke)
2019-05-21 15:05:38 -07:00
Kenneth Graunke
dc5dc727d5 iris: Serialize the NIR to a blob we can use for shader cache purposes.
We will use a hash of the serialized NIR together with brw_prog_*_key
(for NOS) as the disk cache key, where the disk cache contains actual
assembly shaders.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-21 15:05:38 -07:00
Dylan Baker
4756864cdc iris: Start wiring up on-disk shader cache
This creates the on-disk shader cache data structure, and handles the
build-id keying aspects.  The next commits will fill it out so it's
actually used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-21 15:05:38 -07:00
Kenneth Graunke
6ae2caf201 iris: Move iris_uncompiled_shader definition to iris_context.h
It had been internal to iris_program.c, but with the upcoming disk cache
code, the "program module" is going to be spread across a couple source
files.  Into a header it goes!

Now it lives alongside iris_compiled_shader, which makes sense.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-21 15:05:38 -07:00
Kenneth Graunke
419d9b21e1 intel: Move brw_prog_key_set_id from i965 to the compiler.
I want to use it in iris.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-21 15:05:38 -07:00
Dylan Baker
b589c2547d docs: update calendar, and news item and link release notes for 19.0.5 2019-05-21 14:25:36 -07:00
Dylan Baker
e2987f83ad docs: Add Sha256 sums for 19.0.5 2019-05-21 14:23:16 -07:00
Dylan Baker
74e8dfecc8 docs: Add release notes for 19.0.5 2019-05-21 14:23:14 -07:00
Caio Marcelo de Oliveira Filho
9b9f7030c6 spirv: Drop GOOGLE suffix from names incorporated to SPIR-V
SPV_GOOGLE_decorate_string and SPV_GOOGLE_hlsl_functionality1 were
incorporated to SPIR-V.  Let's pick the names used by SPIR-V core.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-05-21 11:52:41 -07:00
Caio Marcelo de Oliveira Filho
02d140ce9a spirv: Pick the right bitsize when doing SpvUConvert
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-05-21 11:52:29 -07:00
Caio Marcelo de Oliveira Filho
fd94a45823 spirv: Trivially handle new 1.4 loop controls
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-05-21 11:52:12 -07:00
Caio Marcelo de Oliveira Filho
e21dee6c21 spirv: Update JSON and Headers to 1.4
This refers to commit c4f8f65792d4bf2657ca751904c511bbcf2ac77b from
GitHub.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-05-21 11:50:58 -07:00
Caio Marcelo de Oliveira Filho
4b474e2e8a spirv: Handle instruction aliases in spirv_info_c.py
Choose the first we see in the grammar file as the main one.  This is
needed to parse SPIR-V 1.4 because it introduced opcode aliases.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-05-21 11:50:47 -07:00
Erik Faye-Lund
810b95e02c Revert "glsl: do not use deprecated bison-keyword"
This reverts commit eb85124a9f.
2019-05-21 17:53:54 +02:00
Eric Engestrom
93d900ece3 imgui: delete demo file
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-21 14:40:22 +01:00
Lionel Landwerlin
fd80f1e8d1 vulkan/overlay: update remaining manual error checks
Through a series of rebases, I forgot to switch a bunch of error
checks to use a macro that will show where the problem is, rather than
printing out a dumb "ERROR!".

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-05-21 14:08:35 +01:00
Lionel Landwerlin
213d6527d4 vulkan/overlay: fix timestamp query emission with no pipeline stats
The
   if (!pipe && timestamp)

logic was broken. It should have been :

   if (!pipe && !timestamp)

Let just drop this condition as the following code does the right
thing for all cases.

An error was appearing with the following variables :

VK_INSTANCE_LAYERS=VK_LAYER_MESA_overlay VK_LAYER_MESA_OVERLAY_CONFIG=gpu_timing

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ea7a6fa980 ("vulkan/overlay: add pipeline statistic & timestamps support")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-05-21 14:08:35 +01:00
Erik Faye-Lund
eb85124a9f glsl: do not use deprecated bison-keyword
%error-verbose has been deprecated since Bison 3.0, which was released
in 2013. In Bison 3.3.1 which was recently released, this has started
causing warnings. Let's update the code to do this in the modern way
intead, to avoid cluttering the output needlessly.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-21 11:31:43 +00:00
Karol Herbst
67f9496893 glsl: handle 8 and 16 bit ints in glsl_base_type_is_integer
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-21 08:47:16 +00:00
Dave Airlie
4785e50e75 nir/test: add split vars tests (v2)
This just adds some split var splitting tests, it verifies
by counting derefs and local vars.

a basic load from inputs, store to array,
same as before but with a shader temp
struct { float } [4] don't split test
a basic load from inputs, with some out of band loads.
a load/store of only half the array
two level array, load from inputs store to all levels
a basic load from inputs with an indirect store to array.
two level array, indirect store to lvl 0
two level array, indirect store to lvl 1
load from inputs, store to array twice
load from input, store to array, load from array, store to another array.
load and from input and copy deref to array
create wildcard derefs, and do a copy

v2: use array_imm helpers, move derefs out of loops,
rename toplevel/secondlevel, use ints, fix lvl1 don't split test,
rename globabls to shader_temp, add comment, check the derefs type

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-21 13:43:28 +10:00
Caio Marcelo de Oliveira Filho
cf05ffbfd6 anv: Don't re-use entry_point pointer from spirv_to_nir
When running with NIR_TEST_CLONE=1, the pointer will not be valid, as
the whole shader is going to be recreated every pass.  Prefer using
is_entrypoint (to query when looping) and nir_shader_get_entrypoint()
instead.

Fixes the Vulkan Piglit tests
- vulkan/glsl450/frexp-double
- vulkan/glsl450/isinf-double
- vulkan/shaders/fs-multiple-large-local-array

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-20 16:47:39 -07:00
Caio Marcelo de Oliveira Filho
005cc9ae37 nir: Fix clone of nir_variable state slots
When num_state_slots is 0, don't create the array.  This was
triggering the following assert when running vkcube with
NIR_TEST_CLONE=1

    vkcube: ../src/compiler/nir/nir_split_per_member_structs.c:66:
    split_variable: Assertion `var->state_slots == NULL' failed.

Fixes: 9fbd390dd4 "nir: Add support for cloning shaders"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-20 16:47:28 -07:00
Charmaine Lee
12bf7cfecf mesa: unreference current winsys buffers when unbinding winsys buffers
This fixes surface leak when no winsys buffers are bound.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-05-20 13:09:32 -07:00
Charmaine Lee
b480adfa5e st/mesa: purge framebuffers with current context after unbinding winsys buffers
With commit c89e8470e5, framebuffers are purged after unbinding context,
but this change also introduces a heap corruption when running Rhino application
on VMware svga device. Instead of purging the framebuffers after the context
is unbound, this patch first ubinds the winsys buffers, then purges the framebuffers
with the current context, and then finally unbinds the context.

This fixes heap corruption.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-05-20 13:09:32 -07:00
Caio Marcelo de Oliveira Filho
7e5723d6d7 spirv: Generate proper NULL pointer values
Use the storage class address format information to pick the right
constant values for a NULL pointer.

v2: Don't add a deref_cast to the values.  (Jason)

v3: Update to use vtn_storage_class_to_mode() and
    vtn_mode_to_address_format() explicitly.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
83550b7dc4 spirv: Reuse helpers in vtn_handle_type()
And change vtn_storage_class_to_mode() to accept NULL as
interface_type.  In this case, if we have a SpvStorageClassUniform, we
assume it is uses an ubo_addr_format, like the code being replaced by
the helper.

That assumption is a problem, but no different than the previous
code.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
48ea3bbff6 spirv: Add vtn_variable_mode_image
Corresponding to SpvStorageClassImage.  We see pointers for that
storage class in tests, but don't use the storage class any further.
Adding this so that we can call vtn_mode_to_address_format() for all
supported pointers.

v2: Fail when trying to create a SpvStorageClassImage
    variable.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
672a3f42d9 spirv: Add vtn_mode_to_address_format()
Handles all the modes and we can use it in combination with
nir_address_format_to_glsl_type() to replace the
vtn_ptr_type_for_mode() helper.  Since the new helper is more generic,
moved the assertions from the old one to the call sites.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
192daf68a4 spirv: Add vtn_mode_uses_ssa_offset()
Just the mode is needed to decide whether SSA offsets are needed, so
make a function that takes that and reuse it for
vtn_pointer_uses_ssa_offset().

This will be used for constant null pointers, that won't have a
vtn_pointer handy.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
f9336751bc spirv: Add and use vtn_type_without_array() helper
v2: Renamed from vtn_interface_type. (Jason)
    Accept any type not only pointers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
8af9de0a38 spirv: Change vtn_null_constant() to use vtn_type
This is a preparation to handle OpConstantNull for pointers, we'll use
the vtn_type to get to the address format and then the appropriate
representation of NULL pointer.

v2: Move rest of body to use vtn_type. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
bdf2361b87 spirv: Export vtn_storage_class_to_mode()
So we can reuse in spirv_to_nir.c.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
f051fa6ad7 nir: Add nir_address_format_null_value()
Returns the nir_const_value * with the representation of the NULL
pointer for each address format.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
31a7476335 spirv, radv, anv: Replace ptr_type with addr_format
Instead of setting the glsl types of the pointers for each resource,
set the nir_address_format, from which we can derive the glsl_type,
and in the future the bit pattern representing a NULL pointer.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
6bc9cdb1b7 nir: Add nir_address_format_32bit_offset
This is a simple 32-bit address which is not a global address.  Gives
us a format that don't use 0 as its null pointer value.  We will need
this in anv to represent nir_var_mem_shared addresses.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho
bdaf41107a nir: Add nir_address_format_logical
An address format representing a purely logical addressing model.  In
this model, all deref chains must be complete from the dereference
operation to the variable.  Cast derefs are not allowed.  These
addresses will be 32-bit scalars but the format is immaterial because
you can always chase the chain.  E.g. push constants in anv.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Rob Clark
9f61aa3f75 freedreno/a6xx: WFI in program stateobj too
This "fixes" hangs seen w/ various android games.  I think a similar
issue to with constant state, we need to avoid CP_LOAD_STATE until
previous draw completes.

It isn't entirely clear why blob doesn't need to do this, but it might
have a different way to accomplish the same thing.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-20 09:10:12 -07:00
Rob Clark
abfb31acdb freedreno/a6xx: make sure binning pass constlen is large enough
Since we use same constant state for both binning pass program state and
draw pass state, and it is possible for binning pass shader to use fewer
consts, we need to make sure we program a large enough constlen.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-20 09:10:12 -07:00
Rob Clark
d200d58e65 freedreno/a6xx: limit IBO state to draw pass
Currently we are only supporting images in FS (and CS) so limit this
stateobj to draw pass.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-20 09:10:12 -07:00
Rob Clark
54d94f5780 freedreno/a6xx: don't evaluate FS tex state in binning pass
It is unneeded since FS doesn't run in binning pass.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-20 09:10:12 -07:00
Samuel Pitoiset
daa85a882e radv: decompress FMASK before performing a MSAA decompress using FMASK
This fixes some CTS failures related to VK_EXT_sample_locations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 12:41:47 +02:00
Dave Airlie
6b2b150a66 nir/validate: fix crash if entry is null.
we validate assert entry just before this, but since that doesn't
stop execution, we need to check entry before the next validation
assert.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-20 16:26:48 +10:00
Qiang Yu
a1d419603f lima/gpir: switch to use nir_lower_viewport_transform
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-05-20 10:57:11 +08:00
Qiang Yu
a7688b2713 lima/gpir: support vector ssa load
Some vector sysval can't be lowered to scaler, so need to break
it to scaler in nir to gpir convertion.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-05-20 10:57:11 +08:00
Qiang Yu
4a74e28130 lima/gpir: add helper function for emit load node
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-05-20 10:57:11 +08:00
Timothy Arceri
ac779ff2b7 util: add missing include to build_id.h
Required to use uint8_t

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-05-20 10:24:23 +10:00
Alyssa Rosenzweig
1155446c19 panfrost/midgard: Split up midgard_compile.c (RA)
This commit moves the register allocator out of midgard_compile.c and
into its own midgard_ra.c file. In doing so, a number of dependencies
are identified and moved into their own files in turn. midgard_compile.c
is still fairly monolithic, but this should help.

Code churn, but no functional changes should be introduced by this
commit.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-19 23:37:45 +00:00
Alyssa Rosenzweig
9cd8cd26de panfrost: Improve fixed-function blending
This fixes a few miscellaneous issues with the fixed-function blending
programming, though it is far from complete. For cases known to be
buggy, we force a fallback to blend shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-19 17:56:35 +00:00
Alyssa Rosenzweig
d1a9b760ea panfrost: Wire up nir_lower_blend
This implements blend shaders via nir_lower_blend, by creating dummy
fragment shaders simply passing through the source color and using the
new lowering pass to inject blendability.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-19 17:56:34 +00:00
Alyssa Rosenzweig
39104221e1 panfrost/midgard: Route new blending intrinsics
To prepare for the new nir_lower_blend pass, we wire up the intrinsics
for tilebuffer reads and constant colour loading.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-19 17:56:14 +00:00
Alyssa Rosenzweig
a1885b2a35 panfrost/nir: Add nir_lower_blend pass
This new lowering pass implements the OpenGL ES blend pipeline in
shaders, applicable to hardware lacking full-featured blending hardware
(including Midgard/Bifrost and vc4). This pass is run on a fragment
shader, rewriting the store to a blended version, loading in the
framebuffer destination color and constant color via intrinsics as
necessary. This pass is sufficient for OpenGL ES 2.0 and is verified to
pass dEQP's blend tests. MIN/MAX modes are included and tested as well.
That said, at present it has the following limitations:

 - MRT is not supported (ES3).
 - sRGB support is missing (ES3).
 - Extended blending is not yet ported from GLSL IR lowering (ES3.2)
 - Dual-source blending is not supported. (N/A)
 - Logic ops are not supported. (N/A)

v2: Fix code conventions (per Ian Romanick's feedback). Implement color
masks.

This pass should be in common nir/ space, but due to non-technical
reasons, for now it's in Panfrost space. In the future, depending if
other drivers need some of the functionality, we can move this back to
src/compiler/nir space.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-19 17:54:56 +00:00
Alyssa Rosenzweig
6b2457e75c panfrost: Fix Bifrost-specific padding
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-05-19 17:41:28 +00:00
Alyssa Rosenzweig
7b5217ad70 panfrost: Cleanup panfrost_job comments
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-05-19 17:41:26 +00:00
Alyssa Rosenzweig
ae705387a9 panfrost/decode: Decode blend constant
This adds a forgotten decode line on Midgard and adds the field of a
blend constant on Bifrost. The Bifrost encoding is fairly weird; whereas
Midgard is just a regular 32-bit float, Bifrost uses a fancy
fixed-point-esque encoding. The decode logic here is experimentally
correct. The encode logic is a sort of "guesstimate", assuming that the
high byte is just int(f / 255.0) and then solving algebraicly for the
low byte. This might be slightly off in some cases.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-05-19 17:41:23 +00:00
Alyssa Rosenzweig
3645c781ab panfrost: Hoist blend constant into Midgard-specific struct
This eliminates one major source of #ifdef parity between Midgard and
Bifrost, better representing how the struct acts on Midgard and allowing
proper decodes on Bifrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-05-19 17:41:21 +00:00
Alyssa Rosenzweig
50382df728 panfrost/decode: Disassemble Bifrost shaders
We already have the Bifrost disassembler in-tree, so now that panwrap is
able to dump Bifrost command streams, hook up the disassembler to
pandecode.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>
2019-05-19 17:41:08 +00:00
Bas Nieuwenhuizen
4689e98fe8 vulkan/wsi: Set X11 minImageCount to 3.
For IMMEDIATE and FIFO, most games work in a pipelined manner where the
can produce frames at a rate of 1/MAX(CPU duration, GPU duration), but
the render latency is CPU duration + GPU duration.

This means that with scanout from pageflipping we need 3 frames to run
full speed:
1) CPU rendering work
2) GPU rendering work
3) scanout

Once we have a nonblocking acquire that returns a semaphore we can merge
1 and 3. Hence the ideal implementation needs only 2 images, but games
cannot tellwe currently do not have an ideal implementation and that
hence they need to allocate 3 images. So let us do it for them.

This is a tradeoff as it uses more memory than needed for non-fullscreen
and non-performance intensive applications.

Since this is pretty much a TODO that can use the context I added this as
a comment.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-19 00:38:03 +00:00
Eric Engestrom
ccb8ea7acf meson: expose glapi through osmesa
Suggested-by: Pierre Guillou <pierre.guillou@lip6.fr>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109659
Fixes: f121a669c7 "meson: build gallium based osmesa"
Fixes: cbbd5bb889 "meson: build classic osmesa"
Cc: Brian Paul <brianp@vmware.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
2019-05-18 11:15:04 +01:00
Kenneth Graunke
28c2ce7105 egl: Allow EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY in ES and GL
EGL annoyingly defines a few variants of this token:

   EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_EXT - 0x3138
   EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_KHR - 0x31BD
   EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY     - 0x31BD

The EGL_EXT_create_context_robustness extension specifies that the EXT
token is only valid for ES contexts, not GL.  The EGL_KHR_create_context
extension defines the KHR version, and says it is only allowed for GL
contexts, and specifically calls out that it's an error for ES contexts.

But EGL 1.5 includes the new suffixless token, which has the same value
as the KHR version, and specifically calls out that it's now valid to
use with both GL and ES contexts.  So we should allow this.

Fixes KHR-NoContext.es32.robustness.no_reset_notification and
KHR-NoContext.es32.robustness.lose_context_on_reset on iris, which
apparently is exposing EGL 1.5.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-05-17 15:13:15 -07:00
Jason Ekstrand
1c92358bd8 anv: Only consider minSampleShading when sampleShadingEnable is set
From the Vulkan 1.1.107 spec:

    Sample shading is enabled for a graphics pipeline:

      - If the interface of the fragment shader entry point of the
        graphics pipeline includes an input variable decorated with
        SampleId or SamplePosition. In this case minSampleShadingFactor
        takes the value 1.0.

      - Else if the sampleShadingEnable member of the
        VkPipelineMultisampleStateCreateInfo structure specified when
        creating the graphics pipeline is set to VK_TRUE. In this case
        minSampleShadingFactor takes the value of
        VkPipelineMultisampleStateCreateInfo::minSampleShading.

    Otherwise, sample shading is considered disabled.

In other words, if sampleShadingEnable is set to VK_FALSE, we should
ignore minSampleShading.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-17 20:33:57 +00:00
Jason Ekstrand
8413fd136c anv: Stop forcing bindless for images
This was an unintended artifact of my testing of bindless images.  We
should be choosing bindless or not dynamically.

Fixes: c0d9926df7 "anv: Use bindless handles for images"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-17 19:58:51 +00:00
Neha Bhende
926a6a35cf draw: fix memory leak introduced 7720ce32a
We need to free memory allocation PrimitiveOffsets in draw_gs_destroy().
This fixes memory leak found while running piglit on windows.

Fixes: 7720ce32a ("draw: add support to tgsi paths for geometry streams. (v2)")

Tested with piglit

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-05-17 12:26:48 -06:00
Jason Ekstrand
d2aa65eb18 anv: Emulate texture swizzle in the shader when needed
Now that we have the descriptor buffer mechanism, emulated texture
swizzle can be implemented in a very non-invasive way.  Previous
attempts all tried to extend the push constant based image param
mechanism which was gross.  This could, in theory, be done much faster
with a magic back-end instruction which does indirect MOVs but Vulkan on
IVB is already so slow this isn't going to matter much.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104355
Cc: "19.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-17 12:25:58 -05:00
Alyssa Rosenzweig
ea479fdc1d panfrost/midgard: Typofix
Reported-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-17 14:59:52 +00:00
Eric Engestrom
6a1f609a4c gitlab-ci: build-test the tools as well
Suggested-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-17 11:21:48 +01:00
Samuel Pitoiset
d7501834cd radv: add a workaround for Monster Hunter World and LLVM 7&8
The load/store optimizer pass doesn't handle WaW hazards correctly
and this is the root cause of the reflection issue with Monster
Hunter World. AFAIK, it's the only game that are affected by this
issue.

This is fixed with LLVM r361008, but we need a workaround for older
LLVM versions unfortunately.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-17 11:41:19 +02:00
Thomas Hellstrom
47afc5eed7 svga: Add an environment variable to force coherent surface memory
The vmwgfx driver supports emulated coherent surface memory as of version
2.16. Add en environtment variable to enable this functionality for
texture- and buffer maps: SVGA_FORCE_COHERENT.
This environment variable should be used for testing only.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-05-17 08:44:31 +02:00
Thomas Hellstrom
1a66ead1c7 pipebuffer, winsys/svga: Add functionality to update pb_validate_entry flags
In order to be able to add access modes to a pb_validate_entry, update
the pb_validate_add_buffer function to take a pointer hash table and also
to return whether the buffer was already on the validate list.

Update the svga winsys accordingly.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-05-17 08:44:31 +02:00
Thomas Hellstrom
a119da3bc9 svga: Set the rendered-to flag for dma transfers to surfaces
The rendered-to flag indicates that the HW surface content is more recent
than the content of the mob. That's the case after a SurfaceDMA transfer
to the surface.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-05-17 08:44:31 +02:00
Thomas Hellstrom
fb6d09764d winsys/svga: Fix RELOC_INTERNAL mob GPU access
SVGA_RELOC_INTERNAL indicates a transfer between surface and backing mob.
This means that if the GPU for example reads from the surface it writes
to the backing mob. But since the buffer mapping code allows for
simultaneous gpu- and cpu read access, a read from the surface to the mob
will not synchronize a subsequent map to the readback.

Fix this by inverting the mob access mode in a surface relocation with
SVGA_RELOC_INTERNAL set.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-05-17 08:44:31 +02:00
Thomas Hellstrom
eed24156ec svga: Remove the surface_invalidate winsys function
Instead unconditionally call SVGA3D_InvalidateGBSurface() since it's needed
also for Linux for dirty buffers and operation without SurfaceDMA.
For non-guest-backed operation, remove the surface cache surface invalidation
altogether.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-05-17 08:44:31 +02:00
Gert Wollny
0f598ed7b3 Revert "softpipe/buffer: load only as many components as the the buffer resource type provides"
This reverts commit 865b9ddae4.

The buffer always reports format PIPE_FORMAT_R8_UNORM so with this patch only
one component would be supported. The original issue is still relevant, but
the fix should be different.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-05-17 08:27:55 +02:00
Dave Airlie
b6e2a9eca7 glsl/nir: init non-static class member.
glsl_to_nir.cpp:276: uninit_member: Non-static class member "sig" is not initialized in this constructor nor in any functions that it calls.

Reported by coverity

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-05-17 12:33:09 +10:00
Dave Airlie
ebdddb36a0 imgui: fix undefined behaviour bitshift.
imgui_draw.cpp:1781: error[shiftTooManyBitsSigned]: Shifting signed 32-bit value by 31 bits is undefined behaviour

Reported by coverity

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-05-17 12:33:09 +10:00
Dave Airlie
2bfe5b8556 glsl: init non-static class member in link uniforms. (v2)
link_uniforms.cpp:477: uninit_member: Non-static class member "shader_storage_blocks_write_access" is not initialized in this constructor nor in any functions that it calls.

Reported by coverity.

v2: fix 9->0 typo (Ilia)

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-05-17 12:33:09 +10:00
Dave Airlie
b2d4d08a5c glsl: init packed in more constructors.
src/compiler/glsl_types.cpp:577: uninit_member: Non-static class member "packed" is not initialized in this constructor nor in any functions that it calls.

from Coverity.

Fixes: 659f333b3a (glsl: add packed for struct types)

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-05-17 12:33:09 +10:00
Alyssa Rosenzweig
81d3262fa5 panfrost: Cleanup leak todos
Many of these are now patched; one of them we patch here. Regardless,
this is one less thing to worry about in the code, I suppose.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-17 00:14:49 +00:00
Alyssa Rosenzweig
c65271c929 panfrost: assert(0) -> unreachable for some switch
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 23:42:33 +00:00
Nanley Chery
629806b55b anv: Fix some depth buffer sampling cases on ICL+
Don't attempt sampling with HiZ if the sampler lacks support for it. On
ICL, the HW docs state that sampling with HiZ is not supported and that
instances of AUX_HIZ in the RENDER_SURFACE_STATE object will be
interpreted as AUX_NONE.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-05-16 20:54:53 +00:00
Caio Marcelo de Oliveira Filho
ded2c202d5 nir: Only convert SSA values to regs when needed
If the SSA def produced by this instruction is only in the block in
which it is defined and is not used by ifs or phis, then we don't have
a reason to convert it to a register in
nir_lower_ssa_defs_to_regs_block().

The special case for derefs is covered by the general case, so can be
removed: at this point all derefs in the block are
materialized (i.e. the whole deref chain is in the block) and derefs
are not used in phis.

v2: Fix wrong check for if_uses.  If there's such an use, the def is
    not "local_to_block".  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-16 12:23:47 -07:00
Kenneth Graunke
4b5e8eb3c8 st/mesa: Record samplers for extra planes in info->textures_used.
Normally gl_nir_lower_samplers_as_deref records info->textures_used
for us, but this pass runs after that, attempting to assign samplers
in the same order as st_atom_texture's external_samplers_used loop
so the stars align and we get the same locations.

Since we're adding textures late, we need to amend info->textures_used.

iris uses info->textures_used to set up texture bindings; this fixes
Piglit's ext_image_dma_buf_import-sample-{nv12,yuv420,yvu420} there.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-05-16 11:54:07 -07:00
Caio Marcelo de Oliveira Filho
8a995f2b5e nir: Fix nir_opt_idiv_const when negatives are involved
First, allow the case for negative powers of two.  Then ensure that we
use the absolute value of the non-constant value to calculate the
quotient -- this was hinted in the code by the name 'uq'.

This fixes an issue when 'd' is positive and 'n' is negative.  The
ishr will propagate the negative sign and we'll use nir_ineg() again,
incorrectly.

v2: First version used only ishr, but that isn't sufficient, since it
    never can produce a zero as a result.  (Jason)
    Allow negative powers of two.  (Caio)

Fixes: 74492ebad9 "nir: Add a pass for lowering integer division by constants"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-16 10:55:03 -07:00
Eric Anholt
ef88e23d03 freedreno: Log the number of loops in the shader for shader-db.
shader-db's report.py will use this to see when we've changed loop
unrolling behavior on a shader and skip including other stats like
instruction count from being considered for that shader, since they won't
be useful as a proxy for real world performance in that case.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
2019-05-16 10:25:22 -07:00
Eric Anholt
c2e68bebb4 freedreno: Output the same shader-db format as v3d and intel.
This lets us reuse their report.py, at the expense of fd-report.py no
longer working.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
2019-05-16 10:25:20 -07:00
Eric Anholt
6d9b45171d freedreno: Remove the ir3_tgsi_to_nir() helper function.
It was more of a hindrance, as it pretended that we could compile in the
driver with a missing screen.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
2019-05-16 10:25:18 -07:00
Eric Anholt
a0d4d7febf freedreno: Fix assertion failures in context setup in shader-db mode.
The TTN path needs access to the screen to make the right decisions about
lowering, but we didn't have pctx->screen set up at fdN_prog_init time.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
2019-05-16 10:25:06 -07:00
Marek Olšák
9d1485554c ac: match radeonsi code in ac_shader_binary_read_config 2019-05-16 13:15:36 -04:00
Marek Olšák
894e017c9c r600+radeonsi: use ctx_query_reset_status on radeon
This allows a nice cleanup, because the winsys always handles it.
2019-05-16 13:15:36 -04:00
Marek Olšák
4549c36788 winsys/radeon: implement ctx_query_reset_status by copying radeonsi
To make it behave like amdgpu. I'm just trying to move this out of
radeonsi. The radeonsi code will be removed in the next commit.
2019-05-16 13:15:36 -04:00
Marek Olšák
6b3343e5d8 winsys/amdgpu: report a CS rejection as a reset only if there's no GPU reset 2019-05-16 13:15:36 -04:00
Marek Olšák
78e35df52a radeonsi: update buffer descriptors in all contexts after buffer invalidation
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108824

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
2019-05-16 13:15:36 -04:00
Marek Olšák
0f1b070bad radeonsi: remove old_va parameter from si_rebind_buffer by remembering offsets
This is a prerequisite for the next commit.

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
2019-05-16 13:14:55 -04:00
Marek Olšák
f3ae455eb0 radeonsi: compute culling - flush CS to remove write references to buffers
Only read-only buffers can use compute culling.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:36 -04:00
Marek Olšák
04122532e3 radeonsi: invalidate caches at the beginning of the prim discard compute IB
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:36 -04:00
Marek Olšák
9f505ce21d radeonsi: disable primitive restart for triangles for DiRT Rally
It may decrease performance and it prevents compute-based primitive culling.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:36 -04:00
Marek Olšák
0252fb92b8 radeonsi: add primitive culling stats to the HUD
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:36 -04:00
Marek Olšák
c9b7a37b8f radeonsi: cull primitives with async compute for large draw calls
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:34 -04:00
Marek Olšák
187f1c999f winsys/amdgpu: add REWIND emulation via INDIRECT_BUFFER into cs_check_space
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:10:07 -04:00
Marek Olšák
4eb377d1c3 radeonsi: add si_vs_prolog_bits::unpack_instance_id_from_vertex_id:1
The prim discard compute shader bakes InstanceID into the output index buffer.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:10:07 -04:00
Marek Olšák
b206f007de radeonsi: make some functions non-static
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:10:07 -04:00
Marek Olšák
301344008f radeonsi: allow si_shader_select_with_key to return an optimized shader or fail
If a prim discard compute shader hasn't finished compilation, we don't want
to any shader.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:10:07 -04:00
Marek Olšák
ca9edd7cd0 radeonsi: use pipe_draw_info::instance_count indirectly
It will be modified by compute shader culling.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:10:07 -04:00
Marek Olšák
d380fabdbb radeonsi: use pipe_draw_info::prim and primitive_restart indirectly
so that the fields can be changed by the driver.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:10:07 -04:00
Marek Olšák
43aa2f4f7c radeonsi: make functions for creating LLVM functions non-static
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:10:07 -04:00
Marek Olšák
b19884e08e winsys/amdgpu: add a parallel compute IB coupled with a gfx IB
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:07:00 -04:00
Marek Olšák
eda281e977 ac: add LLVM code for triangle culling
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:58 -04:00
Marek Olšák
07c83d25fd radeonsi: add a cs parameter into si_cp_copy_data
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:57 -04:00
Marek Olšák
ce264d19a0 radeonsi: add a cs parameter into si_cp_release_mem
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:56 -04:00
Marek Olšák
9624855f13 radeonsi: add threadgroups_per_cu param into si_get_compute_resource_limits
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:54 -04:00
Marek Olšák
6e38af0631 radeonsi: move si_*_descriptors_idx functions into si_state.h
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:53 -04:00
Marek Olšák
49a016ec5d radeonsi: make si_initialize_compute reusable
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:51 -04:00
Marek Olšák
c44c6951d4 radeonsi: extract COMPUTE_RESOURCE_LIMITS code into a helper
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:49 -04:00
Marek Olšák
c7ceeea093 radeonsi: return the last part's return value from @wrapper
The primitive discard compute shader will get the position output this way.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:40 -04:00
Marek Olšák
d569b7cb31 winsys/amdgpu: always set NO_CPU_ACCESS and NO_SUBALLOC on GDS resources
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:06:18 -04:00
Jan Zielinski
d65b160e6a swr: clean up supported OGL4.0/4.1 extensions list
This commit adjusts the capabilities returned
by the SWR driver and the documentation to correctly
report the following extensions:

GL_ARB_texture_query_lod, GL_ARB_texture_cube_map_array,
GL_ARB_gpu_shader_fp64, GL_ARB_texture_gather,
GL_ARB_vertex_attrib_64bit.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-05-16 17:41:14 +02:00
Leo Liu
aa040d3b3c vl/dri3: set back buffer from output to NULL with front buffer case
Since the using output optimization is only for back buffer case

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-16 10:28:38 -04:00
Alejandro Piñeiro
16a1ef7860 docs: advice to resolve discussion on gitlab MR doc
For newcomers to gitlab, it is not evident that it is better to press
the "Resolve Discussion" button when you update your branch handling
feedback.

v2:
   * Fix several grammar nits, reorder, use new corrected text (Connor
     Abbot)
   * Use "reviewers", instead of reviewer (Eric Engestrom)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-16 16:16:32 +02:00
Roland Scheidegger
4171a26193 auxiliary/draw: fix crash with zero-stride draw auto
transform feedback draws get the number of vertices from the transform
feedback object. In draw, we'll figure this out with the number of bytes
written divided by the stride. However, it is apparently possible we end
up with a stride of 0 there (not entirely sure it could happen with GL).
Probably when nothing was actually ever written (so we don't actually
have a stride set). Just avoid the division by zero by setting the count
to 0.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-05-16 14:01:33 +02:00
Eric Engestrom
22c1657d05 util/os_file: always use the 'grow' mechanism
Use fstat() only to pre-allocate a big enough buffer.

This fixes a race where if the file grows between fstat() and read()
we would be missing the end of the file, and if the file slims down
read() would just fail.

Fixes: 316964709e "util: add os_read_file() helper"
Reported-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-16 12:56:25 +01:00
Lionel Landwerlin
e04cf0b612 nir: lower_non_uniform_access: iterate over instructions safely
This pass moves instructions around and adds control-flow in the
middle of blocks. We need to use nir_foreach_instr_safe to ensure that
we iterate over instructions correctly anyway.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3bd5457641 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-16 10:22:01 +01:00
Kenneth Graunke
752367b766 iris: Dodge more GLSL IR lowering
This avoids some lower_instructions bits in st.
2019-05-15 19:44:21 -07:00
Jason Ekstrand
fce0214e94 intel/fs/live_variables: Do compute_start_end in BITSET_WORD chunks
For a block with a contiguous chunk of 32 vars that don't need updating,
this lets us skip 32 vars at a time. Also, by using bitscan, we only
iterate for each set bit rather than testing them all one at a time.
Looking at perf (with -O0 which is unfortunately necessary to get
reasonable back-traces), this seems to cuts about 50-60% of the time
spent in compute_start_end() which is, itself about 4-6% of the
run-time. In the real world, with a release driver build, this cuts
1.34% off a full shader-db run. (I ran shader-db 5 times in each
configuration).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-16 02:14:40 +00:00
Jason Ekstrand
b2d274c677 intel/fs/ra: Choose a spill reg before throwing away the graph
Otherwise, we get an effectively random spill reg because we no longer
have the information from RA to guide us.  Also, a completely clean
graph has undefined data in in_stack which is used for choosing the
spill reg so it really is non-deterministic.

Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
c19acf321c intel/fs/ra: Add spill costs to the graph on-demand
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
2c14e2b5bf intel/fs/ra: Add a helper for discarding the interference graph
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Alyssa Rosenzweig
46494c3dc1 nir/algebraic: Remove problematic "optimization"
This line is no longer relevant now that booleans are 1-bit, and in fact
causes issues (infinite progress loop between algebraic optimizations
and copy prop) with constant vector masks.

No shader-db changes on Intel platforms (Jason).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2019-05-16 02:08:37 +00:00
Alyssa Rosenzweig
74ab80b92d panfrost/midgard: Add load/store opcodes
This commit adds a bunch of new load/store opcodes, largely related to
OpenCL, as well as adjusting the name of existing opcodes to be more
uniform. The immediate effect is compute shaders are substantially
easier to interpret now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:25:25 +00:00
Alyssa Rosenzweig
f73c0b73ec panfrost/midgard: Enable integer constant inlining
Midgard ALU features two types of constants: embedded constants (128-bit
chunk, zero/one per schedule bundle) and inline constants (16-bit
splattered into the op, second source if present). Inline constants are
much more efficient from a space and scheduling freedom standpoint, so
it's desirable to inline when possible. Now that integer ops are well
understood and in use, we enable inlining of integers constants in
addition to floats (which have been inlined since forever).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:20:41 +00:00
Alyssa Rosenzweig
8214aaa3c8 panfrost/midgard: Remove imov workaround
The previous commit fixes the issue this patched around.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:20:41 +00:00
Alyssa Rosenzweig
0a13babdd8 panfrost/midgard: Set int outmod for ops writing integers
By default, the "normal" output modifier is set on ALU ops. This is the
correct default for float outputs -- for floats, it preserves the semantic
value. Unfortunately, when used with integers, it does not preserve the
bitstream encoding, causing misbehaviour. (It's an open question what
happens when `normal` is used with integers -- does it apply some other
transformation? or does it do floating point normalization/etc on the
ints as if they were floats?).

Instead, we default to the "clamp to integer" output modifier for
ops writing integers. Semantically, this makes sense (clamping an
integer to the nearest integer is the identity function). In the
hardware with an integer opcode, this is the actual "normal".

This fixes numerous sporadic and sometimes bizarre bugs relating to
integers, especially integer moves. With this in place, we no longer
care about the types involved; it's just bits on the wire again.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:20:30 +00:00
Alyssa Rosenzweig
81b1053d9b panfrost: Set custom stride for textures when necessary
From Gallium (and our) perspective, the stride of a BO is arbitrary. For
internal buffers, we can make it something nice, but for imported linear
buffers (e.g. EGL clients), we don't always have that luxury. To cope,
we calculate the expected stride of a texture, compare it to the BO's
actual reported stride, and if they differ, set the latter as a custom
stride.

Fixes rendering of windows not on tile boundaries (noticeable in Weston
with es2gears_wayland, for instance). Also, this should fix stride
issues with bufer reloading.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:16:36 +00:00
Alyssa Rosenzweig
cea9352059 panfrost/decode: Stride decoding
With a special flag, texture descriptors can include custom stride(s).
We haven't seen a case of this used for mipmaps/cubemaps, so it's not
clear how that will be encoded, but this dumps correctly for single
one-level 2D textures.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:15:37 +00:00
Alyssa Rosenzweig
d699ffbf0e panfrost/decode: Futureproof texture dumping
One field was not dumped for some reason. It's observed to be 0, but
it's still good to have it available.

Also, extra fields might be snuck in the bitmaps array (it's
variable-lengthed at the end), and we want to guard against that
possibility, so we dump a little more.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-16 01:15:37 +00:00
Marek Olšák
ccfcb9d818 ac: rename SI-CIK-VI to GFX6-GFX7-GFX8
Acked-by: Dave Airlie <airlied@redhat.com>

We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.

It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
2019-05-15 20:54:10 -04:00
Marek Olšák
e5cc363f43 ac: add comments to chip enums
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (except GFX2 changes)
Reviewed-by: Dave Airlie <airlied@redhat.com> (except <= GFX5 changes)
2019-05-15 20:54:10 -04:00
Anuj Phogat
a42163cbbc compiler: Add lowering support for 64-bit saturate operations to software
Fixes 7 Khronos GL CTS tests:
KHR-GL45.gpu_shader_fp64.builtin.smoothstep_dvec{double, 2, 3, 4}
KHR-GL45.gpu_shader_fp64.builtin.smoothstep_against_scalar_dvec{2, 3, 4}

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-15 23:30:30 +00:00
Kenneth Graunke
d305409db5 st/dri: Minor style fixes
Trivial.
2019-05-15 14:49:14 -07:00
Chia-I Wu
659c5800e5 virgl: handle DONT_BLOCK and MAP_DIRECTLY
Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY.
Make virgl_resource_transfer_prepare return an enum instead of a
bool for extensibility (e.g., instruct the callers to map
differently).

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-05-15 20:51:28 +00:00
Chia-I Wu
e87186fc67 virgl: add virgl_resource_transfer_prepare
virgl_resource_transfer_prepare should be called before mapping to
prepare the resource.  It does flush, readback, and wait as needed.
virgl_res_needs_flush and virgl_res_needs_readback become internal
helpers to the new function.

There should be no externally visible change.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-05-15 20:51:28 +00:00
Chia-I Wu
cdcf38b98a virgl: honor DISCARD_WHOLE_RESOURCE in virgl_res_needs_readback
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-05-15 20:51:28 +00:00
Chia-I Wu
a62ab178ce virgl: clean up virgl_res_needs_readback
Add comments and follow the coding style of virgl_res_needs_flush.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-05-15 20:51:28 +00:00
Lionel Landwerlin
391a836e8f nir: fix lower_non_uniform_access pass
Obviously missing the instruction insertion into the SSA list.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3bd5457641 ("nir: Add a lowering pass for non-uniform resource access")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-15 18:15:20 +00:00
Alex Villacís Lasso
b2200514af gbm: gbm_bo_get_handle_for_plane fallback to nonplanar handle
Commit f9567ab435 (gbm: Export a getter for per
plane handles) contains an API version check that fails on i915 (API version 7
vs. check for minimum API version 13). Any client that migrates to the planar
API will start failing on i915 (see https://gitlab.gnome.org/GNOME/mutter/issues/127
for mutter, and https://bugs.freedesktop.org/show_bug.cgi?id=108487 for weston).

This commit adds a fallback for plane 0 when the API check fails and returns the
non-planar handle in this scenario, making the call equivalent to
gbm_bo_get_handle(). This is enough for weston 6.0.0 to start working again on an
i915 system.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108487

Signed-off-by: Alex Villacís Lasso <a_villacis@palosanto.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-05-15 18:27:30 +01:00
Alyssa Rosenzweig
a9cef4f0e5 gallium: Add default check for PIPE_CAP_FRAGMENT_SHADER_INTERLOCK
Fixes: c704c0226 ("gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK")

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 21:34:49 -07:00
Andrii Kryvytskyi
eca53f00aa iris: Check if resource has stencil before returning it
Signed-off-by: Andrii Kryvytskyi <andrii.o.kryvytskyi@globallogic.com>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 21:16:11 -07:00
Jordan Justen
49958c4b5d i965/blorp: Set MOCS for gen11 in blorp_alloc_vertex_buffer
v2:
 * Add build error for gen > 6 if MOCS is not set. (Lionel)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-05-14 19:57:01 -07:00
Kenneth Graunke
bb5db02bab iris: Enable fragment shader interlock on Gen9+.
There's some debate about whether we should support this on older
hardware as well.  Currently i965 turns it off on Gen8- though, so
we follow suit.  If this changes, we can update this as well.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-14 19:34:33 -07:00
Kenneth Graunke
c704c0226c gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK.
Corresponding to GL_ARB_fragment_shader_interlock and
GL_NV_fragment_shader_interlock.  Currently, only the NIR paths
support this functionality, but someone could conceivably add it
to TGSI too.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-14 19:34:29 -07:00
Dave Airlie
4efd04ab18 intel/compiler: use bitset instead of opencoding a 32-bit bitset. (v2)
In the future I want to expand this to 128-bits, for vec16 support, so
lets just put the code in place to use bitset ranges now.

v2: just declare the bitset to be the max of what we should ever see
and change assert to reflect it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-15 07:10:34 +10:00
Dave Airlie
3b2c433167 intel/compiler: remove repeated bit_size / 8 in brw mem lowering pass.
Just use a variable already.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-15 07:10:30 +10:00
Kenneth Graunke
646924cfa1 intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8
Our tessellation control shaders can be dispatched in several modes.

- SINGLE_PATCH (Gen7+) processes a single patch per thread, with each
  channel corresponding to a different patch vertex.  PATCHLIST_N will
  launch (N / 8) threads.  If N is less than 8, some channels will be
  disabled, leaving some untapped hardware capabilities.  Conditionals
  based on gl_InvocationID are non-uniform, which means that they'll
  often have to execute both paths.  However, if there are fewer than
  8 vertices, all invocations will happen within a single thread, so
  barriers can become no-ops, which is nice.  We also burn a maximum
  of 4 registers for ICP handles, so we can compile without regard for
  the value of N.  It also works in all cases.

- DUAL_PATCH mode processes up to two patches at a time, where the first
  four channels come from patch 1, and the second group of four come
  from patch 2.  This tries to provide better EU utilization for small
  patches (N <= 4).  It cannot be used in all cases.

- 8_PATCH mode processes 8 patches at a time, with a thread launched per
  vertex in the patch.  Each channel corresponds to the same vertex, but
  in each of the 8 patches.  This utilizes all channels even for small
  patches.  It also makes conditions on gl_InvocationID uniform, leading
  to proper jumps.  Barriers, unfortunately, become real.  Worse, for
  PATCHLIST_N, the thread payload burns N registers for ICP handles.
  This can burn up to 32 registers, or 1/4 of our register file, for
  URB handles.  For Vulkan (and DX), we know the number of vertices at
  compile time, so we can limit the amount of waste.  In GL, the patch
  dimension is dynamic state, so we either would have to waste all 32
  (not reasonable) or guess (badly) and recompile.  This is unfortunate.
  Because we can only spawn 16 thread instances, we can only use this
  mode for PATCHLIST_16 and smaller.  The rest must use SINGLE_PATCH.

This patch implements the new 8_PATCH TCS mode, but leaves us using
SINGLE_PATCH by default.  A new INTEL_DEBUG=tcs8 flag will switch to
using 8_PATCH mode for testing and benchmarking purposes.  We may
want to consider using 8_PATCH mode in Vulkan in some cases.

The data I've seen shows that 8_PATCH mode can be more efficient in
some cases, but SINGLE_PATCH mode (the one we use today) is faster
in other cases.  Ultimately, the TES matters much more than the TCS
for performance, so the decision may not matter much.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:30 -07:00
Kenneth Graunke
076159b40b intel/compiler: Move ICP handle fetching into a helper function.
This will be significantly different in 8_PATCH mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:28 -07:00
Kenneth Graunke
3d84fd29e8 intel/compiler: Don't repeat dispatch max fixing condition
Having a single flag will keep both places in sync if the condition
gets more complicated.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:27 -07:00
Kenneth Graunke
f0d52cf2b0 intel/compiler: Rename invocation_id_mask to instance_id_mask
The payload field is actually "instance" (thread number), which is used
to calculate the invocation ID.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:25 -07:00
Kenneth Graunke
d86260719e intel/compiler: Refactor TCS invocation ID setup into a helper
When we add 8_PATCH mode, this will get a bit more complex, so we may
as well start by putting it in a helper function.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:24 -07:00
Kenneth Graunke
381c2aded2 i965: Pass compiler to default key populators
This lets us get devinfo and other misc. compiler settings.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:21 -07:00
Marek Olšák
6b0b8f132a ac: use 1D GEPs for descriptors and constants
just a cleanup

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-14 15:15:11 -04:00
Marek Olšák
67b4785958 mesa: fix _mesa_max_texture_levels for GL_TEXTURE_EXTERNAL_OES
This helps fix:
    piglit/bin/ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto

Fixes: d88f3392ff

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 15:15:11 -04:00
Eric Anholt
e5db87b00b freedreno: Restore msm_drm.h to a pristine "make headers_install" copy.
This diverged back in f1374805a8 ("drm-uapi: use local files, not system
libdrm") to point at drm-uapi's copy, which we don't need now that we're
actually in drm-uapi.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-05-14 11:51:57 -07:00
Eric Anholt
18d11cb4dc freedreno: Move msm_drm.h to the same spot as other DRM uapi.
The new location matches other drivers, and has a README about the rules
for updating it.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-05-14 11:51:55 -07:00
Ian Romanick
32d259713b nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions
The goal is to avoid having an extra MOV instruction to perform the
saturate.  Doing the subtraction first allows the saturate to be applied
to the ADD instruction making the MOV unnecessary.  Values generated in
different block and values from non-ALU instructions (e.g., texture
instructions) almost always need the extra MOV.

Multiply instructions are restricted because doing this rearrangement
can interfere with the generation of flrp and ffma instructions.

v2: Now that the final method has been selected, squash three commits
into one.

All Intel platforms has similar results. (Ice Lake shown)
total instructions in shared programs: 17223214 -> 17219386 (-0.02%)
instructions in affected programs: 1524376 -> 1520548 (-0.25%)
helped: 2686
HURT: 26
helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37%
HURT stats (abs)   min: 1 max: 2 x̄: 1.69 x̃: 2
HURT stats (rel)   min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35%
95% mean confidence interval for instructions value: -1.46 -1.36
95% mean confidence interval for instructions %-change: -0.56% -0.50%
Instructions are helped.

total cycles in shared programs: 360811571 -> 360791896 (<.01%)
cycles in affected programs: 103650214 -> 103630539 (-0.02%)
helped: 1557
HURT: 675
helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16
helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1513 x̄: 66.44 x̃: 14
HURT stats (rel)   min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49%
95% mean confidence interval for cycles value: -14.82 -2.81
95% mean confidence interval for cycles %-change: -0.50% -0.20%
Cycles are helped.

LOST:   2
GAINED: 0

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:23 -07:00
Ian Romanick
a7f0c57673 nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)
v2: Fix copy-and-paste bug in a cmp b vs b cmp a cases.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224337 -> 17224269 (<.01%)
instructions in affected programs: 13578 -> 13510 (-0.50%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.31% max: 3.12% x̄: 0.84% x̃: 0.42%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.05% -0.63%
Instructions are helped.

total cycles in shared programs: 360826090 -> 360825137 (<.01%)
cycles in affected programs: 94867 -> 93914 (-1.00%)
helped: 58
HURT: 1
helped stats (abs) min: 2 max: 28 x̄: 17.74 x̃: 18
helped stats (rel) min: 0.08% max: 3.17% x̄: 1.39% x̃: 1.22%
HURT stats (abs)   min: 76 max: 76 x̄: 76.00 x̃: 76
HURT stats (rel)   min: 2.86% max: 2.86% x̄: 2.86% x̃: 2.86%
95% mean confidence interval for cycles value: -19.53 -12.78
95% mean confidence interval for cycles %-change: -1.56% -1.08%
Cycles are helped.

No changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:23 -07:00
Ian Romanick
281f20e26d nir/algebraic: Strip double negatives from comparison sources
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224623 -> 17224337 (<.01%)
instructions in affected programs: 32648 -> 32362 (-0.88%)
helped: 148
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.93 x̃: 2
helped stats (rel) min: 0.16% max: 2.74% x̄: 1.07% x̃: 1.08%
95% mean confidence interval for instructions value: -1.97 -1.89
95% mean confidence interval for instructions %-change: -1.15% -1.00%
Instructions are helped.

total cycles in shared programs: 360828714 -> 360826090 (<.01%)
cycles in affected programs: 347416 -> 344792 (-0.76%)
helped: 148
HURT: 26
helped stats (abs) min: 1 max: 426 x̄: 26.33 x̃: 18
helped stats (rel) min: 0.03% max: 15.10% x̄: 1.78% x̃: 1.41%
HURT stats (abs)   min: 2 max: 337 x̄: 48.96 x̃: 6
HURT stats (rel)   min: 0.04% max: 18.82% x̄: 2.15% x̃: 0.27%
95% mean confidence interval for cycles value: -23.78 -6.38
95% mean confidence interval for cycles %-change: -1.59% -0.79%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
45c7ff95fc intel/compiler: Repeat nir_opt_algebraic_late
A tiny bit of help seems to come from nir_copy_prop.  Future patches
will benefit from this change.

Doing more copy propagation on the vec4 backend led to a disaster in
hurt cycles.

v2: Fix typo in comment.  Noticed by Matt.

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224634 -> 17224623 (<.01%)
instructions in affected programs: 4586 -> 4575 (-0.24%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.36% -0.19%
Instructions are helped.

total cycles in shared programs: 360828542 -> 360828714 (<.01%)
cycles in affected programs: 151159 -> 151331 (0.11%)
helped: 49
HURT: 28
helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6
helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42%
HURT stats (abs)   min: 1 max: 196 x̄: 52.36 x̃: 15
HURT stats (rel)   min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88%
95% mean confidence interval for cycles value: -13.48 17.95
95% mean confidence interval for cycles %-change: -0.69% 0.84%
Inconclusive result (value mean confidence interval includes 0).

Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13529544 -> 13529542 (<.01%)
instructions in affected programs: 358 -> 356 (-0.56%)
helped: 2
HURT: 0

total cycles in shared programs: 357290311 -> 357289678 (<.01%)
cycles in affected programs: 178324 -> 177691 (-0.35%)
helped: 48
HURT: 40
helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13
helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66%
HURT stats (abs)   min: 1 max: 224 x̄: 22.00 x̃: 6
HURT stats (rel)   min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31%
95% mean confidence interval for cycles value: -18.28 3.89
95% mean confidence interval for cycles %-change: -1.01% 0.32%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8159110 -> 8158980 (<.01%)
instructions in affected programs: 22719 -> 22589 (-0.57%)
helped: 65
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74%
95% mean confidence interval for instructions value: -2.06 -1.94
95% mean confidence interval for instructions %-change: -0.78% -0.68%
Instructions are helped.

total cycles in shared programs: 188609448 -> 188609214 (<.01%)
cycles in affected programs: 1875852 -> 1875618 (-0.01%)
helped: 109
HURT: 104
helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4
helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07%
HURT stats (abs)   min: 2 max: 20 x̄: 3.31 x̃: 2
HURT stats (rel)   min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02%
95% mean confidence interval for cycles value: -1.95 -0.25
95% mean confidence interval for cycles %-change: -0.04% -0.01%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
d2a9ba03e3 Revert "nir: add late opt to turn inot/b2f combos back to bcsel"
This reverts commit 7acc865226.

With these optimizations in place, the extra constant folding added in
the next commit extends some live ranges of 0.0 and ±1.0 constants, and
that causes several hundred shaders to have more spills and fills.

I believe this optimization we made basically irrelevant by 7725d60938
"intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))".

All Gen7.5+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17225303 -> 17224634 (<.01%)
instructions in affected programs: 879402 -> 878733 (-0.08%)
helped: 679
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.02 -0.95
95% mean confidence interval for instructions %-change: -0.26% -0.22%
Instructions are helped.

total cycles in shared programs: 360842595 -> 360828542 (<.01%)
cycles in affected programs: 110443594 -> 110429541 (-0.01%)
helped: 389
HURT: 265
helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28
helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11%
HURT stats (abs)   min: 1 max: 7614 x̄: 185.96 x̃: 48
HURT stats (rel)   min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10%
95% mean confidence interval for cycles value: -75.65 32.67
95% mean confidence interval for cycles %-change: -0.49% -0.06%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12159 -> 12161 (0.02%)
spills in affected programs: 13 -> 15 (15.38%)
helped: 0
HURT: 1

total fills in shared programs: 25207 -> 25208 (<.01%)
fills in affected programs: 25 -> 26 (4.00%)
helped: 0
HURT: 1

Ivy Bridge
total instructions in shared programs: 12082019 -> 12082013 (<.01%)
instructions in affected programs: 1033 -> 1027 (-0.58%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.78% -0.45%
Instructions are helped.

total cycles in shared programs: 179849270 -> 179849157 (<.01%)
cycles in affected programs: 4735 -> 4622 (-2.39%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18
helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36%
95% mean confidence interval for cycles value: -82.73 26.23
95% mean confidence interval for cycles %-change: -7.98% 2.28%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10882750 -> 10882748 (<.01%)
instructions in affected programs: 266 -> 264 (-0.75%)
helped: 2
HURT: 0

Iron Lake
total cycles in shared programs: 188609440 -> 188609448 (<.01%)
cycles in affected programs: 4320 -> 4328 (0.19%)
helped: 0
HURT: 2

GM45
total cycles in shared programs: 129016868 -> 129016872 (<.01%)
cycles in affected programs: 2302 -> 2306 (0.17%)
helped: 0
HURT: 1

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
3cb091f8b4 nir/algebraic: Eliminate a tautological compare
The value-range tracking pass that is coming is not clever enough to
know that the result of the ffma must be non-negative.  Making it that
smart will require quite a bit of work.  It might be possible to add a
special case that detects that a whole tree of fadd(fmul(fsat(a),
fneg(fsat(a))), 1.0) cannot be negative.

For cases when the comparison is used in the domain guard for a
square-root (see nir/algebraic: Simplify fsqrt domain guard), the
compare may be converted to a fmax.  This patch also handles that case.

All of the affected cases are in DiRT: Showdown.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17225365 -> 17225303 (<.01%)
instructions in affected programs: 40051 -> 39989 (-0.15%)
helped: 62
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.31% -0.22%
Instructions are helped.

total cycles in shared programs: 360842788 -> 360842595 (<.01%)
cycles in affected programs: 1818081 -> 1817888 (-0.01%)
helped: 29
HURT: 22
helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14
helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42%
HURT stats (abs)   min: 1 max: 108 x̄: 18.45 x̃: 7
HURT stats (rel)   min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19%
95% mean confidence interval for cycles value: -14.48 6.91
95% mean confidence interval for cycles %-change: -0.71% 0.21%
Inconclusive result (value mean confidence interval includes 0).

No changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
9725e45b3d nir/algebraic: Simplify fsqrt domain guard
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17228376 -> 17225365 (-0.02%)
instructions in affected programs: 280732 -> 277721 (-1.07%)
helped: 1072
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.81 x̃: 2
helped stats (rel) min: 0.16% max: 5.10% x̄: 1.43% x̃: 1.07%
95% mean confidence interval for instructions value: -2.92 -2.70
95% mean confidence interval for instructions %-change: -1.48% -1.37%
Instructions are helped.

total cycles in shared programs: 360935690 -> 360842788 (-0.03%)
cycles in affected programs: 7838017 -> 7745115 (-1.19%)
helped: 1569
HURT: 69
helped stats (abs) min: 1 max: 1198 x̄: 63.53 x̃: 20
helped stats (rel) min: 0.06% max: 26.17% x̄: 3.44% x̃: 2.12%
HURT stats (abs)   min: 1 max: 2820 x̄: 98.22 x̃: 47
HURT stats (rel)   min: 0.05% max: 16.67% x̄: 3.50% x̃: 2.31%
95% mean confidence interval for cycles value: -63.55 -49.89
95% mean confidence interval for cycles %-change: -3.33% -2.96%
Cycles are helped.

No changes on any other platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
e2ad047779 nir/search: Don't compare 8-bit or 1-bit constants with floats
Without this, adding an algebraic rule like

   (('bcsel', ('flt', a, 0.0), 0.0, ...), ...),

will cause assertion failures inside nir_src_comp_as_float in
GTF-GL46.gtf21.GL.lessThan.lessThan_vec3_frag (and related tests) from
the OpenGL CTS and shaders/closed/steam/witcher-2/511.shader_test from
shader-db.

All of these cases have some code that ends up like

   ('bcsel', ('flt', a, 0.0), 'b@1', ...)

When the 'b@1' is tested, nir_src_comp_as_float fails because there's
no such thing as a 1-bit float.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
5116646a76 nir/algebraic: Recognize open-coded fsat with modifiers
This change also enables a later change (nir/algebraic: Replace
1-fsat(a) with fsat(1-a)) to affect more shaders.

Almost all of the affected shaders are in Bioshock Infinite, and all of
those shaders all require GLSL 4.10.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17228584 -> 17228376 (<.01%)
instructions in affected programs: 31438 -> 31230 (-0.66%)
helped: 105
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.98 x̃: 1
helped stats (rel) min: 0.08% max: 1.53% x̄: 0.73% x̃: 0.70%
95% mean confidence interval for instructions value: -2.20 -1.76
95% mean confidence interval for instructions %-change: -0.80% -0.67%
Instructions are helped.

total cycles in shared programs: 360936431 -> 360935690 (<.01%)
cycles in affected programs: 420100 -> 419359 (-0.18%)
helped: 71
HURT: 21
helped stats (abs) min: 1 max: 160 x̄: 19.28 x̃: 10
helped stats (rel) min: <.01% max: 9.78% x̄: 0.95% x̃: 0.48%
HURT stats (abs)   min: 1 max: 198 x̄: 29.90 x̃: 10
HURT stats (rel)   min: 0.05% max: 8.36% x̄: 1.24% x̃: 0.90%
95% mean confidence interval for cycles value: -16.77 0.66
95% mean confidence interval for cycles %-change: -0.85% -0.06%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
c769641c8e nir/algebraic: Push unary operations into source operands of fsat source
Pushing a unary operation, like fneg, into the operation that generates
its operand allows the fsat to be applied to the inner instruction
instead of on a separate instruction that performs the unary operation.
This changes

    fmul	ssa_100, ssa_99, ssa_98
    fmov.sat	ssa_101, -ssa_100

into

    fmul.sat	ssa_100, -ssa_99, ssa_98

Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown)
total instructions in shared programs: 17228658 -> 17228584 (<.01%)
instructions in affected programs: 3163 -> 3089 (-2.34%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2
helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51%
95% mean confidence interval for instructions value: -1.66 -1.37
95% mean confidence interval for instructions %-change: -4.37% -3.00%
Instructions are helped.

total cycles in shared programs: 360937144 -> 360936431 (<.01%)
cycles in affected programs: 24029 -> 23316 (-2.97%)
helped: 47
HURT: 2
helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16
helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50%
95% mean confidence interval for cycles value: -16.05 -13.05
95% mean confidence interval for cycles %-change: -4.07% -3.15%
Cycles are helped.

All Gen7 and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 13536059 -> 13535884 (<.01%)
instructions in affected programs: 8797 -> 8622 (-1.99%)
helped: 150
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96%
95% mean confidence interval for instructions value: -1.23 -1.11
95% mean confidence interval for instructions %-change: -3.97% -3.05%
Instructions are helped.

total cycles in shared programs: 357696119 -> 357694193 (<.01%)
cycles in affected programs: 50216 -> 48290 (-3.84%)
helped: 109
HURT: 14
helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16
helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37%
HURT stats (abs)   min: 2 max: 26 x̄: 10.14 x̃: 5
HURT stats (rel)   min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92%
95% mean confidence interval for cycles value: -19.27 -12.05
95% mean confidence interval for cycles %-change: -7.34% -5.31%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
3b74790941 nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))
All Gen6+ GPUs had similar results. (Skylake shown)
total instructions in shared programs: 15336712 -> 15336622 (<.01%)
instructions in affected programs: 3952 -> 3862 (-2.28%)
helped: 24
HURT: 0
helped stats (abs) min: 3 max: 5 x̄: 3.75 x̃: 4
helped stats (rel) min: 1.75% max: 2.70% x̄: 2.34% x̃: 2.46%
95% mean confidence interval for instructions value: -4.06 -3.44
95% mean confidence interval for instructions %-change: -2.47% -2.22%
Instructions are helped.

total cycles in shared programs: 355722052 -> 355721235 (<.01%)
cycles in affected programs: 27326 -> 26509 (-2.99%)
helped: 20
HURT: 4
helped stats (abs) min: 1 max: 227 x̄: 44.75 x̃: 14
helped stats (rel) min: 0.12% max: 22.95% x̄: 3.83% x̃: 1.23%
HURT stats (abs)   min: 2 max: 64 x̄: 19.50 x̃: 6
HURT stats (rel)   min: 0.21% max: 3.63% x̄: 1.24% x̃: 0.55%
95% mean confidence interval for cycles value: -61.61 -6.47
95% mean confidence interval for cycles %-change: -5.59% -0.39%
Cycles are helped.

No changes on Ice Lake, Iron Lake, or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:21 -07:00
Ian Romanick
a79570099b intel/fs: Allow cmod propagation to instructions with saturate modifier
v2: Add unit tests.  Suggested by Matt.

All Intel GPUs had similar results. (Ice Lake shown)
total instructions in shared programs: 17229441 -> 17228658 (<.01%)
instructions in affected programs: 159574 -> 158791 (-0.49%)
helped: 489
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.60 x̃: 1
helped stats (rel) min: 0.07% max: 2.70% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.72 -1.48
95% mean confidence interval for instructions %-change: -0.64% -0.58%
Instructions are helped.

total cycles in shared programs: 360944149 -> 360937144 (<.01%)
cycles in affected programs: 1072195 -> 1065190 (-0.65%)
helped: 254
HURT: 27
helped stats (abs) min: 2 max: 234 x̄: 30.51 x̃: 9
helped stats (rel) min: 0.04% max: 8.99% x̄: 0.75% x̃: 0.24%
HURT stats (abs)   min: 2 max: 83 x̄: 27.56 x̃: 24
HURT stats (rel)   min: 0.09% max: 3.79% x̄: 1.28% x̃: 1.16%
95% mean confidence interval for cycles value: -30.11 -19.75
95% mean confidence interval for cycles %-change: -0.70% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
2019-05-14 11:38:21 -07:00
Ian Romanick
a7724b1cbb nir/algebraic: Add missing ffma(-1, a, b) pattern
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17229439 -> 17229377 (<.01%)
instructions in affected programs: 9859 -> 9797 (-0.63%)
helped: 41
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.08% max: 11.54% x̄: 1.65% x̃: 0.67%
95% mean confidence interval for instructions value: -1.88 -1.14
95% mean confidence interval for instructions %-change: -2.48% -0.81%
Instructions are helped.

total cycles in shared programs: 360944145 -> 360942989 (<.01%)
cycles in affected programs: 178167 -> 177011 (-0.65%)
helped: 36
HURT: 19
helped stats (abs) min: 1 max: 222 x̄: 38.03 x̃: 5
helped stats (rel) min: 0.01% max: 31.01% x̄: 4.01% x̃: 0.45%
HURT stats (abs)   min: 1 max: 34 x̄: 11.21 x̃: 6
HURT stats (rel)   min: 0.03% max: 2.74% x̄: 0.72% x̃: 0.50%
95% mean confidence interval for cycles value: -36.01 -6.02
95% mean confidence interval for cycles %-change: -4.18% -0.57%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:03 -07:00
Ian Romanick
7b4ff6a1af nir: Mark ffma as 2src_commutative
This doesn't make any real difference now, but future work (not in this
series) will add a LOT of ffma patterns.  Having to duplicate all of
them for ffma(a, b, c) and ffma(b, a, c) is just terrible.

No shader-db changes on any Intel platform.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Ian Romanick
e049a9c92b nir: Add support for 2src_commutative ops that have 3 sources
v2: Instead of handling 3 sources as a special case, generalize with
loops to N sources.  Suggested by Jason.

v3: Further generalize by only checking that number of sources is >= 2.
Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Ian Romanick
ede45bf9cf nir: Rename commutative to 2src_commutative
The meaning of the new name is that the first two sources are
commutative.  Since this is only currently applied to two-source
operations, there is no change.

A future change will mark ffma as 2src_commutative.

It is also possible that future work will add 3src_commutative for
opcodes like fmin3.

v2: s/commutative_2src/2src_commutative/g.  I had originally considered
this, but I discarded it because I did't want to deal with identifiers
that (should) start with 2.  Jason suggested it in review, so we decided
that _2src_commutative would be used in nir_opcodes.py.  Also add some
comments documenting what 2src_commutative means.  Also suggested by
Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Jason Ekstrand
e99081e76d intel/fs/ra: Spill without destroying the interference graph
Instead of re-building the interference graph every time we spill, we
modify it in place so we can avoid recalculating liveness and the whole
O(n^2) interference graph building process.  We make a simplifying
assumption in order to do so which is that all spill/fill temporary
registers live for the entire duration of the instruction around which
we're spilling.  This isn't quite true because a spill into the source
of an instruction doesn't need to interfere with its destination, for
instance.  Not re-calculating liveness also means that we aren't
adjusting spill costs based on the new liveness.  The combination of
these things results in a bit of churn in spilling.  It takes a large
cut out of the run-time of shader-db on my laptop.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311224 -> 15311360 (<.01%)
    instructions in affected programs: 77027 -> 77163 (0.18%)
    helped: 11
    HURT: 18

    total cycles in shared programs: 355544739 -> 355830749 (0.08%)
    cycles in affected programs: 203273745 -> 203559755 (0.14%)
    helped: 234
    HURT: 190

    total spills in shared programs: 12049 -> 12042 (-0.06%)
    spills in affected programs: 2465 -> 2458 (-0.28%)
    helped: 9
    HURT: 16

    total fills in shared programs: 25112 -> 25165 (0.21%)
    fills in affected programs: 6819 -> 6872 (0.78%)
    helped: 11
    HURT: 16

    Total CPU time (seconds): 2469.68 -> 2360.22 (-4.43%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
147665d0a2 intel/fs/ra: Put the VGRFs at the end of the nodes
This is slightly less convenient in some places but it will make it much
easier when we want to start adding nodes dynamically.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
e7b7d572b3 intel/fs/ra: Re-arrange interference setup
The old code was arranged by the type of interference being added.  It
would set up payload registers and then add payload interference for all
VGRFs.  It would set up MRFs and add MRF interference for all VGRFs.
This commit re-arranges things to be organized differently.  It first
creates and sets up all RA nodes and then groups interference into two
new categories:  live range and instruction interference.  Once all the
RA nodes have been set up, it walks the list of VGRFs and sets up their
live range interference and then walks the list of instructions and sets
up instruction interference.  This new arrangement will be advantageous
for a future patch but, at the moment, it cuts 2% off the run-time of
shader-db on my laptop.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311224 -> 15311224 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355544739 -> 355544739 (0.00%)
    cycles in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    Total CPU time (seconds): 2523.45 -> 2469.68 (-2.13%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
0fd60e95fb intel/fs/ra: Do the spill loop inside RA
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
47b1dcdcab intel/fs/ra: Only add MRF hack interference if we're spilling
The only use of the MRF hack these days is for spilling and there we
don't need the precise MRF usage information.  If we're spilling then we
know pretty well how many MRFs are going to be used.  It is possible if
the only things that are spilled have fewer SIMD channels than the
dispatch width of the shader that this may be more MRFs than needed.
That's a risk we're willing to takd.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311224 (<.01%)
    instructions in affected programs: 16664 -> 16788 (0.74%)
    helped: 1
    HURT: 5

    total cycles in shared programs: 355543197 -> 355544739 (<.01%)
    cycles in affected programs: 731864 -> 733406 (0.21%)
    helped: 3
    HURT: 6

The hurt shaders are all SIMD32 compute shaders where we reserve enough
space for a 32-wide spill/fill but don't need it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
69878a9bb0 intel/fs/ra: Pull the guts of RA into its own class
This accomplishes two things.  First, it makes interfaces which are
really private to RA private to RA.  Second, it gives us a place to
store some common stuff as we go through the algorithm.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
9e00a251be intel/fs/ra: Move assign_regs further down in the file
It's the main function from which all the other functions are called.
It belongs at the bottom.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
5d9ac57c8c intel/fs/ra: Split building the interference graph into a helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
472ef2f98d intel/fs/ra: Initialize grf_used with first_non_payload_grf
There's no reason why we need to use the calculated payload_node_count
value which is just first_non_payload_grf aligned up.  The grf_used
value will be aligned up to 16 anyway (which is a much bigger alignment)
before being handed off to hardware.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
096ad8a809 intel/fs/ra: Stop adding RA interference to too many SENDS nodes
We only have one node per VGRF so this was adding way too much
interference.  No idea how we didn't catch this before.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311100 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355468050 -> 355543197 (0.02%)
    cycles in affected programs: 2472492 -> 2547639 (3.04%)
    helped: 17
    HURT: 20

Fixes: 014edff0d2 "intel/fs: Add interference between SENDS sources"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
5911abd76f util/ra: Assert nodes are in-bounds in add_node_interference
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
88cac12230 intel/fs/ra: Only add dest interference to sources that exist
Fixes: 83dedb6354 "i965: Add src/dst interference for certain"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
e291cd8a7e util/ra: Don't destroy the graph in ra_allocate()
We want to be able to call ra_allocate() and, when it fails, mutate the
graph and try again rather than re-building the graph from scratch.
This commit moves all the scratch bits except the final register
allocation (which is really an out value not scratch) into sub-structs
named "tmp" to make it clear which things are scratch.  It also adds
bits to the ra_select() initialization loop to initialize things (since
we can't trust rzalloc anymore) and copy q_test and forced_reg over.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
9040215f5d util/ra: Add a helper for resetting a node's interference
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
698bb9b984 util/ra: Add helpers for adding nodes to an interference graph
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
6c0f75c953 util/ralloc: Add helpers for growing zero-initialized memory
Unfortunately, we can't quite follow the standard C conventions for
these because ralloc doesn't know the sizes of pointers.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
6212326941 intel/fs: Stop doing extra RA calls
In the last phase of the schedule and RA loop, the RA call is redundant
if we spill.  Immediately afterwards, we're going to see that we
couldn't allocate without spilling and call back into RA and tell it to
go ahead and spill.  We've known about it for a while but we've always
brushed over it on the theory that, if you're going to spill, you'll be
calling RA a bunch anyway and what does one extra RA hurt?  As it turns
out, it hurts more than you'd expect.  Because the RA interference graph
gets sparser with each spill and the RA algorithm is more efficient on
sparser graphs, the RA call that we're duplicating is actually the most
expensive call in the RA-and-spill loop.

There's another extra RA call we do that's a bit harder to see which
this also removes.  If we try to compile a shader that isn't the minimum
dispatch width and it fails to allocate without spilling we call fail()
to set an error but then go ahead and do the first spilling RA pass and
only after that's complete do we detect the fail and bail out.  By
making minimum dispatch widths part of the spill condition, we side-step
this problem.

Getting rid of these extra spills takes the compile time of a nasty
Aztec Ruins shader from about 28 seconds to about 26 seconds on my
laptop.  It also makes shader-db 1.5% faster

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311100 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355468050 -> 355468050 (0.00%)
    cycles in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
41b310e219 util/ra: Improve the performance of ra_simplify
The most expensive part of register allocation is the ra_simplify step
which is a fixed-point algorithm with a worst-case complexity of O(n^2)
which adds the registers to a stack which we then use later to do the
actual allocation.  This commit uses bit sets and changes the core loop
of ra_simplify to first walk 32-node chunks and then walk each chunk.
This lets us skip whole 32-node chunks in one go based on bit operations
and compute the minimum q value potentially 32x as fast.  Of course, the
algorithm still has the same fundamental O(n^2) run-time but the
constant is now much lower.

In the nasty Aztec Ruins compute shader, this shaves a full four seconds
off the 30s compile time for a release build of mesa.  In a debug build
(needed for accurate stack traces), perf says that ra_select takes 20%
of runtime before this patch and only 5-6% of runtime after this patch.
It also makes shader-db runs faster.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311100 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355468050 -> 355468050 (0.00%)
    cycles in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
e1511f1d4c util/ra: Only update q_total if the reg is not assigned
We only use q_total if the reg is not assigned so there's no point in
updating it if the reg is not assigned.  This has no known perf benefit
but it will reduce churn in a future commit.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
9d6d1f47e7 util/ra: Only update best_optimistic_node if !progress
This shaves about half a second off the 30 second compile time of one of
the compute shaders in Aztec ruins.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
de56d3a2d1 util/ra: Make in_stack a bitset in the graph
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
7720ad65ae util/ra: Get rid of tabs
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Chia-I Wu
34810f4237 virgl: clean up virgl_res_needs_flush
Add comments and some minor cleanups.

v2: document the function

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> (v1)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
2019-05-14 17:00:22 +00:00
Chia-I Wu
08241624ad virgl: comment on a sync issue in transfers
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-14 17:00:22 +00:00
Chia-I Wu
76e45534d2 virgl: PIPE_TRANSFER_READ does not imply flush
virgl_res_needs_flush should suffice.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-14 17:00:22 +00:00
Chia-I Wu
9f8521882a virgl: do not skip readback because of explicit flush
Both apps and we (see virgl_buffer_transfer_flush_region) might
flush regions that are unmodified.  We have to read back for those
flushes.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-14 17:00:22 +00:00
Chia-I Wu
be8eeb3b59 virgl: remove unused virgl_transfer_inline_write
It currently has no user and is probably incorrect (resource_wait is
required in some more cases).  Remove it so that we can focus on
transfers first.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-14 17:00:22 +00:00
Nanley Chery
e81392868e iris/resource: Drop redundant checks for aux support
Drop some checks that are already done by ISL.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
75a3947af4 iris/resource: Fall back to no aux if creation fails
No surface requires an auxiliary surface to operate correctly. Fall back
to an uncompressed surface if mesa fails to create and allocate an
auxiliary surface. This enables adding more restrictions to ISL without
having to update iris.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
1423b78633 i965/miptree: Refactor intel_miptree_supports_ccs_e()
Update and rename this function to format_supports_ccs_e() to better
match its behavior.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
779bd8d332 i965/miptree: Drop intel_*_supports_hiz()
intel_tiling_supports_hiz() and intel_miptree_supports_hiz() duplicate
much the work done by isl_surf_get_hiz_surf(). Replace them with simple
expressions.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
29a13eb71d isl: Add restrictions to isl_surf_get_hiz_surf()
Import some restrictions from intel_tiling_supports_hiz() and
intel_miptree_supports_hiz().

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
942755bec4 i965/miptree: Drop intel_*_supports_ccs()
intel_tiling_supports_ccs() and intel_miptree_supports_ccs() duplicate
much the work done by isl_surf_get_ccs_surf(). Drop them both and index
a boolean array to choose CCS_D in intel_miptree_choose_aux_usage().

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
d57242190e isl: Add restriction and comments to isl_surf_get_ccs_surf()
Import some restrictions and comments from intel_miptree_supports_ccs().

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
91a42537d1 i965/miptree: Drop intel_miptree_supports_mcs()
This function duplicates much the work done by isl_surf_get_mcs_surf().
Replace it with a simple expression.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
1de089797c isl: Modify restrictions in isl_surf_get_mcs_surf()
Import some restrictions from intel_miptree_supports_mcs() and don't
assume that the caller knows which device generations are supported.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
cf758c4182 i965/miptree: Fall back to no aux if creation fails
No surface requires an auxiliary surface to operate correctly. Fall back
to an uncompressed surface if mesa fails to create and allocate an
auxiliary surface. This enables adding more restrictions to ISL without
having to update i965.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Mathias Fröhlich
fc455797c1 mesa: Set _NEW_VARYING_VP_INPUTS iff varying_vp_inputs are set.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-14 18:09:49 +02:00
Mathias Fröhlich
b4b1df5a17 mesa: Avoid setting _NEW_VARYING_VP_INPUTS in non fixed function mode.
Instead of checking the API variant on entry of set_varying_vp_inputs
to check if we can ever be interrested in fixed function processing
or not, we can check if we are actually fixed function processing.
To check this we can use the immediately updated
gl_context::VertexProgram._VPMode value that tells us if we have a
user provided shader program or if we are in fixed function processing
either through an internal TNL shader of directly through hardware.
When doing so, we also need to recheck the varying_vp_inputs variable
at the time gl_context::VertexProgram._VPMode is set to VP_MODE_FF.
Put asserts at the consumers of gl_context::varying_vp_inputs to make
sure gl_context::VertexProgram._VPMode is set to VP_MODE_FF. By that
gl_context::varying_vp_inputs should be up to date then.

By not looking at the opengl api for this decision we should actually
catch more cases where we can avoid setting a state change flag, including
the ones where we cannot get into VP_MODE_FF by the choice of the api.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-14 18:09:49 +02:00
Mathias Fröhlich
663f93c869 mesa: Fix test for setting the _NEW_VARYING_VP_INPUTS flag.
The precondition stated in the comment is not true. The values mentioned are
only set from _mesa_update_state which in turn may not yet be called.
For now set the _NEW_VARYING_VP_INPUTS flag a bit more often, we will narrow
that down to a minimum again in a later patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-14 18:09:49 +02:00
Mathias Fröhlich
df50af19d3 mesa: Make _mesa_set_varying_vp_inputs static in state.c.
Is no longer used outside that file.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-14 18:09:49 +02:00
Mathias Fröhlich
99952579f3 mesa: Fix old outdated variable name in a comment.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-14 18:09:49 +02:00
Mathias Fröhlich
e634ba5116 mesa/vbo: Update Comment to what is actually happening.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-14 18:09:49 +02:00
Jonas Ådahl
903ad59407 wayland/egl: Ensure correct buffer size when allocating
Whenever a buffer is allocated, e.g. by the first draw call or EGL call after a
buffer swap, make sure the size is up to date. Prior to this commit, we
failed to do so when querying the buffer age, or swapping buffers
without any prior EGL call or draw call.

Signed-off-by: Jonas Ådahl <jadahl@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-14 15:33:35 +00:00
Paulo Zanoni
73055ae1c9 egl: check if a window/pixmap is already used on surface creation
The spec says we can't create another surface if we already created a
surface with the given window or pixmap. Implement this check.

This behavior is exercised by piglit/egl-create-surface.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-05-14 12:41:14 +00:00
Paulo Zanoni
04ecda3b3c egl: store the native surface pointer in struct _egl_surface
Each platform stores this in a different place:
  - platform_drm uses dri2_surf->gbm_surf->base
  - platform_android uses dri2_surf->window
  - platform_wayland uses dri2_surf->wl_win
  - platform_x11 uses dri2_surf->drawable
  - platform_x11_dri3 uses dri3_surf->loader_drawable.drawable
  - haiku doesn't even store it!

We need access to the native surface since the specification asks us
to refuse creating a new surface if there's already an EGLSurface
associated with native_surface.

An alternative to this patch would be to create a new
API.GetNativeWindow callback that each platform would have to
implement. While that's something we can definitely do, I prefer
this approach.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-05-14 12:41:14 +00:00
Samuel Pitoiset
9520e7c1e9 radv: add support for VK_KHR_uniform_buffer_standard_layout
Nothing to do.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-14 09:15:28 +02:00
Gert Wollny
865b9ddae4 softpipe/buffer: load only as many components as the the buffer resource type provides
Otherwise we risk to read past the end of the buffer.

In addition, change the loop counters to unsigned to be consistent
with the types.

Fixes: afa8707ba9
    softpipe: add SSBO/shader atomics support.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-05-14 06:49:43 +00:00
Tomeu Vizoso
1050273094 panfrost: ci: Reduce batch size to 3000
As with the previous value of 5000 we seemed to be reaching OOM in some
circumstances.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-14 07:43:11 +02:00
Tomeu Vizoso
9beb8aedeb panfrost: ci: Update expectations
Since last Friday, these two tests have been fixed:

dEQP-GLES2.functional.shaders.functions.control_flow.return_in_nested_loop_fragment
dEQP-GLES2.functional.shaders.linkage.varying_7

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-14 07:43:06 +02:00
Eric Anholt
db329260bf freedreno: Fix warning on printing a uint64_t using %llx.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-13 15:37:01 -07:00
Eric Anholt
40dd28acc3 freedreno: Silence compiler warnings about "*" in boolean context.
It sure looks like we just want both of them to be nonzero, and && is
probably going to be cheaper than * anyway.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-13 15:37:01 -07:00
Eric Anholt
06168d3f6a freedreno: Silence compiler warnings about uninit 'layers'
My gcc can't see that the uninitialized value from the PIPE_BUFFER case
isn't used from the !PIPE_BUFFER cases later.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-13 15:37:01 -07:00
Eric Anholt
c49f0159bd freedreno: Quiet compiler warnings on 64-bit.
__u64 is a ulonglong on x86_64, not uint64_t, so my gcc was complaining
about the wrong type being passed in.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-13 15:37:01 -07:00
Eric Anholt
0734905d9a freedreno: Make emacs indent the way robclark's eclipse does.
The .editorconfig helps with the tabs, but we've got this
two-tabs-from-previous-indentation line continuation style that requires
whacking the c-file-offsets.  This will throw emacs warnings when first
opening a file in the directory, press '!' to shut it up for the future.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-13 15:37:01 -07:00
Eric Anholt
257999d9a8 freedreno: Make .editorconfig match .dir-locals.el.
The editorconfig takes precedence over dir-locals in emacs26 with
editorconfig enabled, so the /.editorconfig was affecting these
directories.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-13 15:37:01 -07:00
Jason Ekstrand
0745d4bd96 anv: Implement VK_KHR_uniform_buffer_standard_layout
There's no real work to do here since we already support scalar block
layout which is a direct superset of what this extension allows.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-13 17:20:33 -05:00
Jason Ekstrand
b464504777 vulkan: Update the XML and headers to 1.1.108
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-13 17:20:33 -05:00
Jason Ekstrand
072227da0a tu/entrypoints: Import copy
It's used without being imported
2019-05-13 17:20:33 -05:00
Karol Herbst
fc800af83b nv50/ir/nir: make use of SYSTEM_VALUE_MAX when iterating read sysvals
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <dev@pmoreau.org>
2019-05-13 23:40:40 +02:00
Karol Herbst
358e52383c nv50/ir/nir: prefer to shift 1ull instead of 1ll
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <dev@pmoreau.org>
2019-05-13 23:40:40 +02:00
Bas Nieuwenhuizen
1619f20883 radv: Clean up signalled and submitted fields from winsys fences.
Other types like syncobj do not need it, so lets make things a bit more uniform.

Also reduce confusion what the signalled/submitted referred to (especially with
imported fences)

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-05-13 20:36:29 +00:00
Samuel Pitoiset
5555db103e radv: bump reported version to 1.1.107
VK_AMD_draw_indirect_count has been promoted with the suffix
changed to KHR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-13 21:38:01 +02:00
Eric Anholt
60a64f028d v3d: Use driconf to expose non-MSAA texture limits for Xorg.
The V3D 4.2 HW has a limit to MSAA texture sizes of 4096.  With non-MSAA,
we can go up to 7680 (actually probably 8138, but that hasn't been
validated by the HW team).  Exposing 7680 in X11 will allow dual 4k displays.
2019-05-13 12:03:11 -07:00
Eric Anholt
0c31fe9ee7 gallium: Redefine the max texture 2d cap from _LEVELS to _SIZE.
The _LEVELS assumes that the max is always power of two.  For V3D 4.2, we
can support up to 7680 non-power-of-two MSAA textures, which will let X11
support dual 4k displays on newer hardware.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 12:03:08 -07:00
Eric Anholt
f33cb272f0 mesa: Replace MaxTextureLevels with MaxTextureSize.
In most places (glGetInteger, max_legal_texture_dimensions), we wanted the
number of pixels, not the number of levels.  Number of levels is easily
recovered with util_next_power_of_two() and ffs().  More importantly, for
V3D we want to be able to expose a non-power-of-two maximum texture size
to cover 2x4k displays on HW that can't quite do 8192 wide.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 12:03:05 -07:00
Eric Anholt
ce6dbc0417 mesa: Remove proxy image checks for maximum level.
We've already verified this by _mesa_legal_texture_dimensions() before
this call.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 12:03:03 -07:00
Eric Anholt
d88f3392ff mesa: Reuse _mesa_max_texture_levels() instead of open-coding it.
The shared function has some extension presence checks, but other than
that has the same switch statement contents.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 12:02:59 -07:00
Vinson Lee
20b42fad9b intel/tools: Fix build with glibc < 2.27.
glibc < 2.27 defines OVERFLOW in /usr/include/math.h.

This patch fixes this build error.

In file included from ../include/c99_math.h:37:0,
                 from ../src/util/u_math.h:44,
                 from ../src/mesa/main/macros.h:35,
                 from ../src/intel/compiler/brw_reg.h:47,
                 from ../src/intel/tools/i965_asm.h:32,
                 from ../src/intel/tools/i965_gram.y:29:
src/intel/tools/i965_gram.tab.c:562:5: error: expected identifier before numeric constant
     OVERFLOW = 412,
     ^

Fixes: 70308a5a8a ("intel/tools: New i965 instruction assembler tool")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110656
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
2019-05-13 11:05:48 -07:00
Marek Olšák
84816d1464 st/mesa: enable the ST_DEBUG env var in release and debugoptimized builds
Useful for dumping shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-13 13:01:01 -04:00
Nicolai Hähnle
d814c21b1b radeonsi: overhaul the vertex fetch fixup mechanism
The overall goal is to support unaligned loads from vertex buffers
natively on SI.

In the unaligned case, we fall back to the general case implementation in
ac_build_opencoded_load_format. Since this function is fully general,
we will also use it going forward for cases requiring fully manual format
conversions of dwords anyway.

This requires a different encoding of the fix_fetch array, which will now
contain the entire format information if a fixup is required.

Having to check the alignment of vertex buffers is awkward. To keep the
impact on the fast path minimal, the si_context will keep track of which
vertex buffers are (not) at least dword-aligned, while the
si_vertex_elements will note which vertex buffers have some (at most dword)
alignment requirement. Vertex buffers should be dword-aligned most of the
time, which allows a fast early-out in almost all cases.

Add the radeonsi_vs_fetch_always_opencode configuration variable for
testing purposes. Note that it can only be used reliably on LLVM >= 9,
because support for byte and short load is required.

v2:
- add a missing check to si_bind_vertex_elements

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 17:07:23 +02:00
Nicolai Hähnle
8a951c3d2f radeonsi: store sctx->vertex_elements in a local in si_shader_selector_key_vs
Purely as a shorthand in the remainder of the function.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 17:07:23 +02:00
Nicolai Hähnle
81fe33735a amd/common: add ac_build_opencoded_fetch_format
Implement software emulation of buffer_load_format for all types required
by vertex buffer fetches.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 17:07:23 +02:00
Jason Ekstrand
712f99934c nir/validate: Use a single set for SSA def validation
The current SSA def validation we do in nir_validate validates three
things:

 1. That each SSA def is only ever used in the function in which it is
    defined.

 2. That an nir_src exists in an SSA def's use list if and only if it
    points to that SSA def.

 3. That each nir_src is in the correct use list (uses or if_uses) based
    on whether it's an if condition or not.

The way we were doing this before was that we had a hash table which
provided a map from SSA def to a small ssa_def_validate_state data
structure which contained a pointer to the nir_function_impl and two
hash sets, one for each use list.  This meant piles of allocation and
creating of little hash sets.  It also meant one hash lookup for each
SSA def plus one per use as well as two per src (because we have to look
up the ssa_def_validate_state and then look up the use.)  It also
involved a second walk over the instructions as a post-validate step.

This commit changes us to use a single low-collision hash set of SSA
sources for all of this by being a bit more clever.  We accomplish the
objectives above as follows:

 1. The list is clear when we start validating a function.  If the
    nir_src references an SSA def which is defined in a different
    function, it simply won't be in the set.

 2. When validating the SSA defs, we walk the uses and verify that they
    have is_ssa set and that the SSA def points to the SSA def we're
    validating.  This catches the case of a nir_src being in the wrong
    list.  We then put the nir_src in the set and, when we validate the
    nir_src, we assert that it's in the set.  This takes care of any
    cases where a nir_src isn't in the use list.  After checking that
    the nir_src is in the set, we remove it from the set and, at the end
    of nir_function_impl validation, we assert that the set is empty.
    This takes care of any cases where a nir_src is in a use list but
    the instruction is no longer in the shader.

 3. When we put a nir_src in the set, we set the bottom bit of the
    pointer to 1 if it's the condition of an if.  This lets us detect
    whether or not a nir_src is in the right list.

When running shader-db with an optimized debug build of mesa on my
laptop, I get the following shader-db CPU times:

   With NIR_VALIDATE=0       3033.34 seconds
   Before this commit       20224.83 seconds
   After this commit         6255.50 seconds

Assuming shader-db is a representative sampling of GLSL shaders, this
means that making this change yields an 81% reduction in the time spent
in nir_validate.  It still isn't cheap but enabling validation now only
increases compile times by 2x instead of 6.6x.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-13 14:43:47 +00:00
Jason Ekstrand
bab08c791d util/set: Add a helper to resize a set
Often times you don't know how big a set will be and you want the code
to just grow it as needed.  However, sometimes you do know and you can
avoid a lot of rehashing if you just specify a size up-front.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-13 14:43:47 +00:00
Jason Ekstrand
abb450870e util/set: Add a search_and_add function
This function is identical to _mesa_set_add except that it takes an
extra out parameter that lets the caller detect if a replacement
happened.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-13 14:43:47 +00:00
Jason Ekstrand
460567eabf nir/validate: Use a ralloc context for our temporary data
All of our hash tables and sets are already using ralloc.  There's
really no good reason why we don't just make a ralloc context rather
than try to remember to clean everything up manually.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-13 14:43:47 +00:00
Patrick Lerda
6963f59cae lima: add Allwinner H5 support
The H5 hardware variant requires a specific plb_max_blk number. This
value can't be probed at the hardware level.

Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-05-13 13:32:55 +02:00
Patrick Lerda
38c5a5a8b5 lima: refactor plb_max_blk
Move plb_max_blk to lima_screen, and add a new debug option:
LIMA_PLB_MAX_BLK

Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-05-13 13:32:55 +02:00
Bas Nieuwenhuizen
f53ebfb450 radv: Do not use extra descriptor space for the 3rd plane.
While ImageFormatProperties returns the number of internal descriptors,
it turns out that applications do not need to actually allocate more
descriptors in the descriptor pool.

So if we make descriptors with more planes larger we have to be
convervative and always allocate space for the larger descriptors
which is a waste given the low usage of this ext.

So let us make use of the fact that 3plane formats all have the
same formats & dimensions for the last two planes. This way we
only need the first half of the descriptor of the 3rd plane and
can share the second half of the second plane.

This allows us to use 16 bytes for the descriptor which nicely
fits into the 16 bytes that are unused right next to the sampler.

Fixes: 5564c38212 "radv: Update descriptor sets for multiple planes."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-12 23:02:44 +00:00
Bas Nieuwenhuizen
d6dfb2cf50 radv: Add support for icd loader interface v4.
Adds support for physical device functions unknown to the loader.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-13 00:41:31 +02:00
Alyssa Rosenzweig
726f0263e1 panfrost/midgard: Handle csel correctly
We use an algebraic pass for the csel optimizations, and use proper
vectorized csel ops (i/fcsel_v) for mixed, rather lowering.

To avoid regressions along the way, we fix an issue with the copy
propagation pass (it should not attempt to propagate constants).
Similarly, we take care to break bundles when using csel to fix some
scheduler corner cases.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-12 22:21:49 +00:00
Illia Iorin
a35269cf44 iris: Implement ARB_indirect_parameters
iris_draw_vbo is divided into two functions to remove unnecessary
operations from the loop. This implementation of ARB_indirect_parameters
takes into account NV_conditional_render by saving MI_PREDICATE_RESULT
at the start of a draw call and restoring it at the end also the result
of NV_conditional_render is taken into account when computing predicates
that limit draw calls for ARB_indirect_parameters in a similar way
to 1952fd8d in ANV.

v2: Optimize indirect draws (suggested by Kenneth Graunke)
v3: (by Kenneth Graunke)
 - Fix an issue where indirect draws wouldn't set patch information
   before updating the compiled TCS.
 - Move some code back to iris_draw_vbo to avoid duplicating it.
 - Fix minor indentation issues.

Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-11 23:56:52 -07:00
Kenneth Graunke
21a0be4a79 iris: Split iris_update_draw_info into two functions.
Shader draw parameters need updating on each iteration of a multidraw
loop, but the primitive based information only needs to be updated once.

Also, patch information needs to be recorded before filling out the TCS
program key, as it determines the number of HS instances.
2019-05-11 23:54:15 -07:00
Ruslan Kabatsayev
974c4d679c nir: Fix wrong sign in lower_rcp
The nested fma calls were supposed to implement

x_new = x + x * (1 - x*src),

but instead current code is equivalent to

x_new = x - x * (1 - x*src).

The result is that Newton-Raphson steps don't improve precision at all.
This patch fixes this problem.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110435
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-11 09:25:22 -07:00
Mike Blumenkrantz
7b2468bf6e intel: drop misleading driver name from gen_get_device_info() 2019-05-11 04:14:06 +00:00
Józef Kucia
24af0f1318 radv: clear vertex bindings while resetting command buffer
Only vertex inputs accessed by vertex shader must have valid buffers
bound.

Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 5010436e09 "radv: bail out when binding the same vertex buffers"
2019-05-11 02:51:00 +02:00
Marek Olšák
83435e748f st/mesa: fix 2 crashes in st_tgsi_lower_yuv
src/mesa/state_tracker/st_tgsi_lower_yuv.c:68: void reg_dst(struct
 tgsi_full_dst_register *, const struct tgsi_full_dst_register *, unsigned
 int): assertion "dst->Register.WriteMask" failed

The second crash was due to insufficient allocated size for TGSI
instructions.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-05-10 20:51:16 -04:00
Kenneth Graunke
72ccefb529 iris: Use full ways for L3 cache setup on Icelake.
Anuj fixed this in i965 and anv, but the fix never landed in iris.
Fixes tessellation corruption on Icelake.  Thanks to Rafael for
bisecting this and tracking it down.

Fixes: d0996d5fab iris: Emit default L3 config for the render pipeline
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-10 16:50:14 -07:00
Caio Marcelo de Oliveira Filho
3610081daa anv: Fix limits when VK_EXT_descriptor_indexing is used
Update various limits in
VkPhysicalDeviceDescriptorIndexingPropertiesEXT that were previously
zero to their values from VkPhysicalDeviceLimits.  When using
VK_EXT_descriptor_indexing, the former limits will apply to all the
descriptor layout sets -- not only those using the new feature bits.

For the reference, VK_EXT_descriptor_indexing says

    "There are new descriptor set layout and descriptor pool creation
    flags that are required to opt in to the update-after-bind
    functionality, and there are separate maxPerStage* and
    maxDescriptorSet* limits that apply to these descriptor set
    layouts which may be much higher than the pre-existing limits. The
    old limits only count descriptors in non-updateAfterBind
    descriptor set layouts, and the new limits count descriptors in
    all descriptor set layouts in the pipeline layout."

Fixes: 6e230d7607 "anv: Implement VK_EXT_descriptor_indexing"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-10 15:15:11 -07:00
Lionel Landwerlin
ad2b4aa378 vulkan/overlay: keep allocating draw data until it can be reused
The original implementation assumed that we could allocate the same
amount of command buffers as the number of images in the swapchain.
But the application could potentially render much faster and rerender
into images that have been submitted for presentation but not yet
presented.

This change keeps on allocating command buffers, vertex buffer, vertex
indices as well as a semaphore and a fence for as long as we can't
reuse a previously submitted one.

This fixes rendering issues in the overlay at high frame rates.

v2: Don't recreate semaphores constantly (Józef)

v3: Drop useless surface & FreeCommandBuffers (Józef)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110655
Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Józef Kucia <joseph.kucia@gmail.com>
2019-05-10 21:54:48 +01:00
Lionel Landwerlin
877b371cbb vulkan/overlay: fix truncating error on 32bit platforms
Non dispatchable handles can be uint64_t. When compiling the layer on
a 32bit platform, this will lead to casting uint64_t into (void *)
which is 32bit, leading to incorrect handles being mapped internally
in the layer.

v2: Use more HKEY() (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Józef Kucia <joseph.kucia@gmail.com>
Fixes: 2d2927938f ("vulkan/overlay-layer: fix cast errors")
Reviewed-by: Józef Kucia <joseph.kucia@gmail.com>
2019-05-10 21:54:48 +01:00
Kenneth Graunke
3f60810de0 i965: Fix memory leaks in brw_upload_cs_work_groups_surface().
This was taking a reference to the 64kB upload buffer and never
returning it, leaking a reference each time this atom triggered.

This leaked lots of 64kB upload BOs, eventually running us out of
of VMA space.  This would usually happen when using mpv to watch a
movie, after 20-40 minutes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110134
Fixes: 63d7b33f51 i965/cs: Setup surface binding for gl_NumWorkGroups
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-10 12:50:19 -07:00
Julien Isorce
98b852cd07 st/va: set the visible image dimensions in vlVaDeriveImage
This fixes video being rendered incorrectly.

User wants height of 360 but internally pipe_video_buffer 's height
is 368 in the test below.

Test:
  GST_GL_PLATFORM=egl gst-launch-1.0 videotestsrc ! video/x-raw, width=868, height=360, format=NV12 ! vaapipostproc ! glimagesink

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2019-05-10 17:13:31 +00:00
Alyssa Rosenzweig
292187afcc swrast: Rename blend_func->swrast_blend_func
This avoids a conflict with the new (driver-agnostic) blend_func enum in
shader_enum.h, which broke the build of swrast (and i965 by extension).

My apologies :(

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes: f41be53a ("compiler: Add enums for blend state")
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-10 09:34:55 -07:00
Eric Engestrom
6e5728e5c9 travis: fix syntax, and drop unused stuff
Fixes: a988d95389 "ci: Delete autotools build jobs"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-10 17:26:53 +01:00
Alyssa Rosenzweig
006cafc243 nir: Add blend_const_color_rgba sysval
This represents a float vec4 constant color, as passed to glBlendColor.
While the existing 4 shader sysvals are retained to minimize code churn,
a single vectorized intrinsic is required for efficient blending on
vector architectures. (This may also apply to archictectures like
Bifrost where ALU is scalar but load/store is vector; it largely depends
on how blending is implemented per-driver.)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:49:28 +00:00
Alyssa Rosenzweig
6b0472b181 gallium: Add helper to convert PIPE blending to shader_enum style
Complementing the new API-agnostic shader_enum blending style, we add
helpers to translate between the two forms. Ideally, we could just use
PIPE blending directly, but that makes Vulkan support challenging.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:49:16 +00:00
Alyssa Rosenzweig
f41be53a17 compiler: Add enums for blend state
We add enums corresponding to (GLES) blend state to shader_enums.h,
complementing the existing advanced blending enums in the file. This
allows us to represent blending state in a driver-agnostic, API-agnostic
way to permit lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:49:01 +00:00
Jonathan Marek
d0bff89159 nir: allow specifying a set of opcodes in lower_alu_to_scalar
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:10:41 +00:00
Jason Ekstrand
f8bda81887 intel/fs/copy-prop: Don't walk all the ACPs for each instruction
In order to set up KILL sets, the dataflow code was walking the entire
array of ACPs for every instruction.  If you assume the number of ACPs
increases roughly with the number of instructions, this is O(n^2).  As
it turns out, regions_overlap() is not nearly as cheap as one would like
and shows up as a significant chunk on perf traces.

This commit changes things around and instead first builds an array of
exec_lists which it uses like a hash table (keyed off ACP source or
destination) similar to what's done in the rest of the copy-prop code.
By first walking the list of ACPs and populating the table and then
walking instructions and only looking at ACPs which probably have the
same VGRF number, we can reduce the complexity to O(n).  This takes the
execution time of the piglit vs-isnan-dvec test from about 56.4 seconds
on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to
about 38.7 seconds.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-10 09:10:17 -05:00
Jason Ekstrand
20bbc175a4 intel/fs/copy-prop: Purge unused ACPs
If the destination of an ACP entry exists only within this block, then
there's no need to keep it for dataflow analysis.  We can delete it from
the out_acp table and avoid growing the bitsets any bigger than we
absolutely have to.  This reduces the maximum number of global ACP
entries in the vs-isnan-dvec with software fp64 on Kaby Lake from 8630
to 3942 and takes the execution time of the piglit vs-isnan-dvec test
from about 1:16.2 on an unoptimized debug build (what we run in CI) with
NIR_VALIDATE=0 to about 56.4 seconds.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-10 09:10:17 -05:00
Jason Ekstrand
0b6da5bac6 intel/fs/copy-prop: Bump the hash table size to 64
While the number of ACPs is generally not huge compared to the number of
blocks, 16 does seem a bit small.  Bumping it to 64 takes the execution
time of the piglit vs-isnan-dvec test from about 1:18.1 on an unoptimized
debug build (what we run in CI) with NIR_VALIDATE=0 to about 1:16.2.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-10 09:10:17 -05:00
Leo Liu
ceba9ff294 winsys/amdgpu: add VCN JPEG to no user fence group
There is no user fence for JPEG, the bug triggering
kernel WARN_ON(flags & AMDGPU_FENCE_FLAG_64BIT)

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: mesa-stable@lists.freedesktop.org
2019-05-10 08:24:49 -04:00
Qiang Yu
e2fc0c4a0c lima: fix width 4096 resolution GP fail
When width=4096 and shift_w=0, block_w=0x100 which overflow
the PLBU_CMD 8 bits for it.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
2019-05-10 16:07:40 +08:00
Tomeu Vizoso
1b97d9c180 panfrost: Add CAPFs for conservative rasterization
Just do what everybody else but Nouveau does and return 0.0f.

This prevents the repeated logging of these messages on startup:

Unexpected PIPE_CAPF 6 query
Unexpected PIPE_CAPF 7 query
Unexpected PIPE_CAPF 8 query

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-10 07:40:52 +02:00
Tomeu Vizoso
c3538ab570 panfrost: Only take the fast paths on buffers aligned to block size
As the functions operate on 16-byte blocks.

Fixes this Valgrind error:

Invalid read of size 4
   at 0x5857568: swizzle_bpp1_align16 (pan_swizzle.c:85)
   by 0x585780F: panfrost_texture_swizzle (pan_swizzle.c:171)
   by 0x584F587: panfrost_tile_texture (pan_resource.c:489)
   by 0x584F641: panfrost_transfer_unmap (pan_resource.c:525)
   by 0x587718D: u_transfer_helper_transfer_unmap (u_transfer_helper.c:516)
   by 0x5875D85: pipe_transfer_unmap (u_inlines.h:515)
   by 0x5875F13: u_default_texture_subdata (u_transfer.c:80)
   by 0x53FFDC3: st_TexSubImage (st_cb_texture.c:1480)
   by 0x54005BB: st_TexImage (st_cb_texture.c:1709)
   by 0x5391353: teximage (teximage.c:3105)
   by 0x5391353: teximage_err (teximage.c:3132)
   by 0x5391B9B: _mesa_TexImage2D (teximage.c:3170)
   by 0x5097A77: shared_dispatch_stub_183 (glapi_mapi_tmp.h:18833)
 Address 0x1e94f1e8 is 0 bytes after a block of size 16 alloc'd
   at 0x483F5C8: malloc (vg_replace_malloc.c:299)
   by 0x584F47D: panfrost_transfer_map (pan_resource.c:467)
   by 0x587694D: u_transfer_helper_transfer_map (u_transfer_helper.c:243)
   by 0x5875EA7: u_default_texture_subdata (u_transfer.c:59)
   by 0x53FFDC3: st_TexSubImage (st_cb_texture.c:1480)
   by 0x54005BB: st_TexImage (st_cb_texture.c:1709)
   by 0x5391353: teximage (teximage.c:3105)
   by 0x5391353: teximage_err (teximage.c:3132)
   by 0x5391B9B: _mesa_TexImage2D (teximage.c:3170)
   by 0x5097A77: shared_dispatch_stub_183 (glapi_mapi_tmp.h:18833)
   by 0x4DA8AB: glu::CallLogWrapper::glTexImage2D(unsigned int, int, int, int, int, int, unsigned int, unsigned int, void const*) (in /home/tomeu/deqp-build/modules/gles2/deqp-gles2)

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: 19.1 <mesa-stable@lists.freedesktop.org>
2019-05-10 07:39:39 +02:00
Tomeu Vizoso
554975bafa panfrost: Fix two uninitialized accesses in compiler
Valgrind was complaining of those.

NIR_PASS only sets progress to TRUE if there was progress.

nir_const_load_to_arr() only sets as many constants as components has
the instruction.

This was causing some dEQP tests to flip-flop, such as:

dEQP-GLES2.functional.fragment_ops.blend.equation_src_func_dst_func.add_src_color_constant_color

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes: 14531d676b ("nir: make nir_const_value scalar")
2019-05-10 07:37:57 +02:00
Tomeu Vizoso
67b9c196d0 panfrost: ci: Skip running some tests
These tests add too much time to the total run time, and some of them
even hang the DUTs, even if I haven't been able to reproduce it locally.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-10 07:37:47 +02:00
Tomeu Vizoso
a94cf20051 panfrost: ci: Don't restart Weston
There doesn't seem to actually be any noticeably memory leaks on Weston
when running dEQP. We do seem to leak quiet a bit in the client, so we
still have to run the dEQP runner in batches.

This removes the risk of Weston not restarting properly and introducing
spurious failures.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-10 07:37:30 +02:00
Tomeu Vizoso
0d0823638f panfrost: ci: Update list of expected failures
This matches the current state of things on both RK3288 and RK3399.
Hopefully, from now on we'll only remove stuff from this list.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-10 07:37:23 +02:00
Tomeu Vizoso
8a328c725a panfrost: ci: Tweak dEQP to improve throughput
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-10 07:37:18 +02:00
Tomeu Vizoso
bbed39bbf2 panfrost: ci: Fix list of tests to run
Make sure we have only test case names in the list, excluding names of
test groups.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-10 07:37:13 +02:00
Tomeu Vizoso
7842fe3a45 panfrost: ci: Check for incomplete runs
To improve robustness, check that we got the expected number of results.
Right now we hard-code the expected number of tests run, but with some
effort we may be able to infer it.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-10 07:37:05 +02:00
Tomeu Vizoso
8e139250aa panfrost: ci: Add tests to flip-flop list
These tests aren't giving reliable results. Mask them for now.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-10 07:37:00 +02:00
Tomeu Vizoso
dab01348d0 panfrost: ci: Add support for running the tests on RK3288
Build artifacts for armhf and schedule them on a Veyron Chromebook with
RK3288.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-10 07:32:29 +02:00
Vasily Khoruzhick
e44a4bae52 lima: fix tile buffer reloading
Buffer needs to be reloaded every time unless explicit clear() was
called.

Fixes rendering issues with wayland compositors.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-09 21:45:04 -07:00
Caio Marcelo de Oliveira Filho
f7d53fffa2 anv: Remove special allocation for anv_push_constants
The key reason for that mechanism is gone: all the extra optional data
that could be in the anv_push_constants was moved elsewhere.  At this
point, just put anv_push_constants directly in anv_cmd_state (part of
anv_cmd_buffer).

v2: Remove a NULL check we don't need anymore in
    anv_cmd_buffer_push_constants().  (Lionel)
    Fix size we consider for valid push params.  (Lionel)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-09 19:01:14 -07:00
Kenneth Graunke
c61862ddfc iris: Expose PIPE_CAP_DEVICE_RESET_STATUS_QUERY
This provides a way for the application to query whether any resets have
happened, which lets us expose "robust" contexts.  This also enables the
KHR_robust_buffer_access_behavior tests.
2019-05-09 16:49:07 -07:00
Kenneth Graunke
343f41781c iris: Hook up device reset callbacks
This mechanism lets the driver inform the state tracker about GPU
resets, say for destroying a robust API context and reporting a "device
lost" error to the application, making it take action to deal with this.
2019-05-09 16:49:07 -07:00
Kenneth Graunke
c5c12bdd00 iris: Try to recover from GPU hangs.
The iris batch module now tries to detect that the kernel has banned
our GEM context, creates a new non-banned context, and informs the
iris context module that all assumptions about state are now invalid
and it needs to reinitialize the relevant state.

Based on Chris Wilson's work, but significantly rewritten by me.
2019-05-09 16:49:07 -07:00
Chris Wilson
7402564c07 iris: Add helpers to clone a hardware context.
(Chris Wilson wrote this code in a patch titled "i965: Be resilient in
the face of GPU hangs"; Ken fixed a bug and copied it to iris.)
2019-05-09 16:49:07 -07:00
Kenneth Graunke
c3701e9070 iris: Mark render batches as non-recoverable.
Adapted from Chris Wilson's patch.  The comment is largely his.

Currently, when iris hangs the GPU, it will continue sending batches
which incrementally update the state, assuming it's preserved across
batches.  However, the kernel's GPU reset support reinitializes the
guilty context to the default GPU state (reasonably not wanting to
trust the current state).  This ends up resetting critical things
like STATE_BASE_ADDRESS, causing memory accesses in all subsequent
batches to be garbage, and almost certainly result in more hangs
until we're banned or we kill the machine.

We now ask the kernel to ban our render context immediately, so we
notice we've gone off the rails as fast as possible.  Eventually, we'll
attempt to recover and continue.  For now, we just avoid torching the
GPU over and over.
2019-05-09 16:49:07 -07:00
Rob Clark
9faf218b8c freedreno/ir3: fix rasterflat/glxgears
Ofc legacy gl features that are broken don't trigger fails in deqp.  I
should remember to test glxgears more often.

Fixes: 7ff6705b8d freedreno/ir3: convert to "new style" frag inputs
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-09 16:21:05 -07:00
Lionel Landwerlin
f2f6ac1c08 anv: Use corresponding type from the vector allocation
We didn't notice this issue much because the 2 struct share a similar
layout, expect for the additional fields...

We run into that issue in Anv :

==15236== Invalid write of size 8
==15236==    at 0x8CF3939C: anv_state_table_expand_range (anv_allocator.c:211)
==15236==    by 0x8CF394D5: anv_state_table_grow (anv_allocator.c:264)
==15236==    by 0x8CF3967E: anv_state_table_add (anv_allocator.c:312)
==15236==    by 0x8CF3B13C: anv_state_pool_alloc_no_vg (anv_allocator.c:1167)
==15236==    by 0x8CF3B2B0: anv_state_pool_alloc (anv_allocator.c:1190)
==15236==    by 0x8CF60871: alloc_surface_state (anv_image.c:1122)
==15236==    by 0x8CF61FF9: anv_CreateImageView (anv_image.c:1519)
==15236==    by 0x8BCBD2ED: vkCreateImageView (trampoline.c:1358)
==15236==  Address 0x8994ef10 is 0 bytes after a block of size 128 alloc'd
==15236==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15236==    by 0x8D2578E6: u_vector_init (u_vector.c:47)
==15236==    by 0x8CF3929A: anv_state_table_init (anv_allocator.c:168)
==15236==    by 0x8CF3A99A: anv_state_pool_init (anv_allocator.c:921)
==15236==    by 0x8CF56517: anv_CreateDevice (anv_device.c:1909)
==15236==    by 0x8BCB4FBA: terminator_CreateDevice (loader.c:6073)
==15236==    by 0x8DD2CB3D: ??? (in /home/djdeath/.steam/ubuntu12_64/libVkLayer_steam_fossilize.so)
==15236==    by 0x8DF4D241: vkCreateDevice (in /home/djdeath/.steam/ubuntu12_64/steamoverlayvulkanlayer.so)
==15236==    by 0x8BCB35C6: loader_create_device_chain (loader.c:5449)
==15236==    by 0x8BCBC230: vkCreateDevice (trampoline.c:838)

v2: Rename mmap_cleanups to avoid confusion (Caio)

v3: s/fail_mmap_cleanups/fail_cleanups/ (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110648
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-09 21:57:26 +01:00
Dylan Baker
79ad8acd01 docs: update calendar, and news item and link release notes for 19.0.4 2019-05-09 13:48:47 -07:00
Dylan Baker
723f74c270 docs: Add SHA256 sums for mesa 19.0.4 2019-05-09 13:46:30 -07:00
Dylan Baker
ce32b71a8c Docs: add 19.0.4 release notes 2019-05-09 13:46:26 -07:00
Pierre-Eric Pelloux-Prayer
62ed82ea1a mesa: fix GL_PROGRAM_BINARY_RETRIEVABLE_HINT handling
When first implemented in fefd03e16c Mesa's behavior was aligned on behavior
of Nvidia's driver. This caused a failing test in piglit but was ok since the
specification is unclear on this subject.

Nvidia's driver behavior has been modified because using version 410.104, the
problematic test (program_binary_retrievable_hint) now passes.

This commit defers BinaryRetrievableHint update until the next linking so the
test passes on Mesa as well.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-05-09 16:15:20 -04:00
Ian Romanick
1f1007a4ed nir: Initialize lower_flrp_progress everywhere
I don't know why I thought NIR_PASS always set the progress variable.
Derp.

Fixes: d41cdef2a5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic")
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Coverity CID: 1444996
Coverity CID: 1444995
Coverity CID: 1444994
Coverity CID: 1444993
Coverity CID: 1444991
Coverity CID: 1444989
2019-05-09 10:03:51 -07:00
Eric Engestrom
8b3baa2744 gallium: fix typo in comment
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-09 11:14:37 +01:00
Eric Engestrom
86628ed79f meson: fix a couple typos in comments
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-09 11:14:37 +01:00
Eric Engestrom
6c6af0c8b0 i965_asm: avoid free()ing uninitialized pointers
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-09 10:03:15 +00:00
Eric Engestrom
51597eca84 i965_asm: fix memleak
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-09 10:03:15 +00:00
Samuel Pitoiset
53dfff1c4d radv: fix setting the number of rectangles when it's dyanmic
We need to know the number of rectangles.

This fixes new CTS dEQP-VK.draw.discard_rectangles.dynamic_*.

Fixes: 5db0bf9994 ("radv: Implement VK_EXT_discard_rectangles.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-09 11:42:25 +02:00
Chris Wilson
8b81256469 iris: Reorganise execbuf to have a single point of failure
Propagate the failure from GEM_EXECBUFFER2, cleanup then report failure
if need be. We retain the current behaviour to abort() at the first sign
of trouble -- for a non-robustness context, arguably this is the right
thing to do as the client cannot recover, and the system state is lost.
How to properly integrate with KHR_robustness and reset-strategy is
left as a future exercise.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-08 17:21:07 -07:00
Chris Wilson
8b7e19dbc5 drm-uapi: Update i915_drm.h for I915_CONTEXT_PARAM_RECOVERABLE
Pull i915_drm.h to include

kernel commit ba4fda620a5f7db521aa9e0262cf49854c1b1d9c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Feb 18 10:58:21 2019 +0000

    drm/i915: Optionally disable automatic recovery after a GPU reset

for improved resilience in handling GPU hangs.
2019-05-08 17:21:07 -07:00
Dave Airlie
0a42d5b98b kmsro: add _dri.so to two of the kmsro drivers.
Fixes: 8cfc17bdda (kmsro: Add the rest of the current set of tinydrm drivers.)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-09 07:15:26 +10:00
Kenneth Graunke
d9b9bb91ff iris: Report the same video memory settings as i965.
This just copy and pastes Ian's code from i965.
2019-05-08 12:43:08 -07:00
Eric Engestrom
5f8d29ab4b gitlab-ci: add the vulkan overlay layer to the vulkan build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-08 19:51:46 +02:00
Eric Engestrom
c6306125b5 gitlab-ci: add the vulkan overlay layer to the vulkan build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

[ Michel Dänzer: Take changes affecting the docker image from !299,
  plus remove the unzip package again before generating the image ]
2019-05-08 16:59:02 +00:00
Michel Dänzer
fcf75534ec gitlab-ci: Don't install WINE packages
They were just making the docker image larger for no benefit at this
point.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 16:59:02 +00:00
Michel Dänzer
82b30094ed gitlab-ci: Reorder jobs a bit to be generally ordered longer => shorter
This makes the longer jobs likely to run earlier, which can help the
overall pipeline duration.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 16:59:02 +00:00
Michel Dänzer
6897715770 gitlab-ci: Build clover against all supported versions of LLVM
And consolidate it all into a single job.

It doesn't take much longer than a single version, thanks to ccache.
Overall, this single job might be faster or at least use fewer CPU
cycles than the two jobs before, while covering thrice as many versions
of LLVM.

v2:
* Move "rm -rf _build" to meson-build.sh.
* Set GALLIUM_DRIVERS the same way both times in the meson-clover job,
  for symmetry.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> # v1
2019-05-08 16:59:02 +00:00
Michel Dänzer
cc2b3a99cc gitlab-ci: Move meson job script to separate file
No functional change intended (except for no longer running meson
--version separately, as the version appears early in meson's output
anyway).

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 16:59:02 +00:00
Michel Dänzer
d0b9a7f0d7 gitlab-ci: Remove superfluous comment about image tag counter suffix
We really shouldn't ever need a suffix, otherwise it indicates a failure
in coordination. :) In which case, it doesn't really matter how the tag
is disambiguated.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 16:59:02 +00:00
Dylan Baker
0d59459432 meson: Force the use of config-tool for llvm
meson git now has a cmake find method for llvm, but it lacks a couple of
features that we use from the config tool version. Until that reaches
parity we need to use the config-tool version.

CC: 19.0 19.1 <<mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 09:39:03 -07:00
Brian Paul
a17c1ae165 gallium/util: fix two MSVC compiler warnings
Remove stray const qualifier.
s/unsigned/enum tgsi_semantic/

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:42 -06:00
Brian Paul
4f54e550e9 gallium/pp: s/uint/enum tgsi_semantic/ to fix MSVC warning
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:42 -06:00
Brian Paul
cf5c7beb63 noop: s/enum pipe_transfer_usage/unsigned/ to fix MSVC warning
The function pointer declaration in pipe_context uses unsigned
for the bitmask.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:41 -06:00
Brian Paul
bc517dbbf7 ddebug: fix a few MSVC compiler warnings
Don't return an expression in void functions.
Replace an unsigned int with proper enum.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:41 -06:00
Brian Paul
2e28983ed2 glsl: s/GLboolean/bool/ to silence MSVC compiler warning
It complains about mixing GLboolean and bool in the |= expression.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-08 10:05:41 -06:00
Ian Romanick
ed5f024515 nir/flrp: Reassociate add in flrp(±1, b, c) lowering path
With this reassociation, this lowering path is still beneficial.

Ice Lake
total instructions in shared programs: 17220191 -> 17207181 (-0.08%)
instructions in affected programs: 999871 -> 986861 (-1.30%)
helped: 3703
HURT: 17
helped stats (abs) min: 1 max: 686 x̄: 3.52 x̃: 3
helped stats (rel) min: 0.09% max: 51.97% x̄: 2.21% x̃: 1.35%
HURT stats (abs)   min: 1 max: 9 x̄: 1.47 x̃: 1
HURT stats (rel)   min: 0.08% max: 4.55% x̄: 0.78% x̃: 0.55%
95% mean confidence interval for instructions value: -4.01 -2.99
95% mean confidence interval for instructions %-change: -2.29% -2.11%
Instructions are helped.

total cycles in shared programs: 360871298 -> 360755040 (-0.03%)
cycles in affected programs: 9931334 -> 9815076 (-1.17%)
helped: 2388
HURT: 1569
helped stats (abs) min: 1 max: 10228 x̄: 93.54 x̃: 18
helped stats (rel) min: <.01% max: 74.11% x̄: 3.36% x̃: 1.07%
HURT stats (abs)   min: 1 max: 1917 x̄: 68.27 x̃: 22
HURT stats (rel)   min: <.01% max: 44.90% x̄: 3.44% x̃: 1.72%
95% mean confidence interval for cycles value: -39.48 -19.28
95% mean confidence interval for cycles %-change: -0.86% -0.46%
Cycles are helped.

total spills in shared programs: 12355 -> 12159 (-1.59%)
spills in affected programs: 295 -> 99 (-66.44%)
helped: 2
HURT: 1

total fills in shared programs: 25398 -> 25207 (-0.75%)
fills in affected programs: 288 -> 97 (-66.32%)
helped: 2
HURT: 1

LOST:   3
GAINED: 44

Iron Lake
total instructions in shared programs: 8169225 -> 8159729 (-0.12%)
instructions in affected programs: 1025712 -> 1016216 (-0.93%)
helped: 3352
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.83 x̃: 3
helped stats (rel) min: 0.15% max: 12.00% x̄: 1.51% x̃: 1.05%
95% mean confidence interval for instructions value: -2.86 -2.80
95% mean confidence interval for instructions %-change: -1.56% -1.46%
Instructions are helped.

total cycles in shared programs: 188656796 -> 188612280 (-0.02%)
cycles in affected programs: 18633584 -> 18589068 (-0.24%)
helped: 3085
HURT: 14
helped stats (abs) min: 2 max: 72 x̄: 14.45 x̃: 12
helped stats (rel) min: 0.02% max: 5.73% x̄: 0.73% x̃: 0.31%
HURT stats (abs)   min: 2 max: 4 x̄: 3.71 x̃: 4
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -14.55 -14.18
95% mean confidence interval for cycles %-change: -0.76% -0.69%
Cycles are helped.

GM45
total instructions in shared programs: 5026905 -> 5021856 (-0.10%)
instructions in affected programs: 584169 -> 579120 (-0.86%)
helped: 1776
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.84 x̃: 3
helped stats (rel) min: 0.15% max: 11.11% x̄: 1.43% x̃: 0.98%
95% mean confidence interval for instructions value: -2.88 -2.80
95% mean confidence interval for instructions %-change: -1.50% -1.37%
Instructions are helped.

total cycles in shared programs: 129047376 -> 129018918 (-0.02%)
cycles in affected programs: 12941924 -> 12913466 (-0.22%)
helped: 1722
HURT: 14
helped stats (abs) min: 4 max: 72 x̄: 16.56 x̃: 18
helped stats (rel) min: 0.02% max: 5.73% x̄: 0.72% x̃: 0.30%
HURT stats (abs)   min: 2 max: 4 x̄: 3.71 x̃: 4
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -16.65 -16.13
95% mean confidence interval for cycles %-change: -0.76% -0.66%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-08 07:41:54 -07:00
Ian Romanick
ba203a3cd7 nir/flrp: Fix typo on the flrp(±1, b, c) path
After Samuel reported the bisect, I was able to find the bug by
inspection.  Good thing for well-named varibles. :)

Unfortunately, this undoes almost all of the benefit of the original
patch.

Ice Lake
total instructions in shared programs: 17183159 -> 17218166 (0.20%)
instructions in affected programs: 1308722 -> 1343729 (2.67%)
helped: 98
HURT: 4746
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.47% max: 2.70% x̄: 0.60% x̃: 0.57%
HURT stats (abs)   min: 1 max: 691 x̄: 7.40 x̃: 8
HURT stats (rel)   min: 0.10% max: 700.00% x̄: 5.82% x̃: 2.83%
95% mean confidence interval for instructions value: 6.82 7.64
95% mean confidence interval for instructions %-change: 5.22% 6.15%
Instructions are HURT.

total cycles in shared programs: 360705959 -> 360853522 (0.04%)
cycles in affected programs: 10754380 -> 10901943 (1.37%)
helped: 1594
HURT: 3331
helped stats (abs) min: 1 max: 1896 x̄: 119.81 x̃: 60
helped stats (rel) min: <.01% max: 35.48% x̄: 5.06% x̃: 3.64%
HURT stats (abs)   min: 1 max: 10208 x̄: 101.63 x̃: 38
HURT stats (rel)   min: 0.01% max: 878.95% x̄: 9.01% x̃: 2.78%
95% mean confidence interval for cycles value: 21.11 38.81
95% mean confidence interval for cycles %-change: 3.76% 5.15%
Cycles are HURT.

total spills in shared programs: 12158 -> 12355 (1.62%)
spills in affected programs: 98 -> 295 (201.02%)
helped: 1
HURT: 2

total fills in shared programs: 25204 -> 25398 (0.77%)
fills in affected programs: 94 -> 288 (206.38%)
helped: 0
HURT: 3

LOST:   15
GAINED: 8

Iron Lake
total instructions in shared programs: 8121430 -> 8166733 (0.56%)
instructions in affected programs: 1148353 -> 1193656 (3.95%)
helped: 2
HURT: 4046
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.85% max: 1.92% x̄: 1.89% x̃: 1.89%
HURT stats (abs)   min: 1 max: 43 x̄: 11.20 x̃: 11
HURT stats (rel)   min: 0.20% max: 716.67% x̄: 7.40% x̃: 3.87%
95% mean confidence interval for instructions value: 11.02 11.37
95% mean confidence interval for instructions %-change: 6.84% 7.94%
Instructions are HURT.

total cycles in shared programs: 188376326 -> 188601568 (0.12%)
cycles in affected programs: 27416674 -> 27641916 (0.82%)
helped: 68
HURT: 3947
helped stats (abs) min: 2 max: 222 x̄: 13.88 x̃: 6
helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01%
HURT stats (abs)   min: 2 max: 670 x̄: 57.31 x̃: 64
HURT stats (rel)   min: <.01% max: 1811.11% x̄: 4.11% x̃: 1.09%
95% mean confidence interval for cycles value: 55.01 57.20
95% mean confidence interval for cycles %-change: 2.88% 5.19%
Cycles are HURT.

LOST:   35
GAINED: 3

GM45
total instructions in shared programs: 4979794 -> 5003551 (0.48%)
instructions in affected programs: 635174 -> 658931 (3.74%)
helped: 1
HURT: 2142
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.85% max: 1.85% x̄: 1.85% x̃: 1.85%
HURT stats (abs)   min: 1 max: 43 x̄: 11.09 x̃: 11
HURT stats (rel)   min: 0.20% max: 716.67% x̄: 7.00% x̃: 3.53%
95% mean confidence interval for instructions value: 10.85 11.33
95% mean confidence interval for instructions %-change: 6.25% 7.74%
Instructions are HURT.

total cycles in shared programs: 128519586 -> 128654990 (0.11%)
cycles in affected programs: 17635304 -> 17770708 (0.77%)
helped: 46
HURT: 2088
helped stats (abs) min: 4 max: 220 x̄: 18.13 x̃: 6
helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01%
HURT stats (abs)   min: 2 max: 670 x̄: 65.25 x̃: 66
HURT stats (rel)   min: <.01% max: 1464.29% x̄: 4.05% x̃: 0.99%
95% mean confidence interval for cycles value: 61.75 65.15
95% mean confidence interval for cycles %-change: 2.58% 5.34%
Cycles are HURT.

LOST:   38
GAINED: 38

Fixes: 5b908db604 ("nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently")
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-08 07:41:26 -07:00
Lionel Landwerlin
43596e5f34 anv: fix use after free
Once mem->bo is removed from the cache, it is likely to be freed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b80930a6fe ("anv: add support for VK_EXT_memory_budget")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 12:02:13 +01:00
Lionel Landwerlin
a07d06f103 anv: rework queries writes to ensure ordering memory writes
We use a mix of MI & PIPE_CONTROL commands to write our queries' data
(results & availability). Those commands' memory write order is not
guaranteed with regard to their order in the command stream, unless CS
stalls are inserted between them. This is problematic for 2 reasons :

   1. We copy results from the device using MI commands even though
      the values are generated from PIPE_CONTROL, meaning we could
      copy unlanded values into the results and then copy the
      availability that is inconsistent with the values.

   2. We allow the user to poll on the availability values of the
      query pool from the CPU. If the availability lands in memory
      before the values then we could return invalid values.

This change does 2 things to address this problem :

      - We use either PIPE_CONTROL or MI commands to write both
        queries values and availability, so that the ordering of the
        memory writes guarantees that if availability is visible,
        results are also visible.

      - For the occlusion & timestamp queries we apply a CS stall
        before copying the results on the device, to ensure copying
        with MI commands see the correct values of previous
        PIPE_CONTROL writes of availability (required by the Vulkan
        spec).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-08 09:49:09 +00:00
Timothy Arceri
e19a8fe033 radv: call constant folding before opt algebraic
The pattern of calling opt algebraic first seems to have originated
in i965. The order in OpenGL drivers generally doesn't matter
because the GLSL IR optimisations do constant folding before
opt algebraic.

However in Vulkan drivers calling opt algebraic first can result
in missed constant folding opportunities.

vkpipeline-db results (VEGA64):

Totals from affected shaders:
SGPRS: 3160 -> 3176 (0.51 %)
VGPRS: 3588 -> 3580 (-0.22 %)
Spilled SGPRs: 52 -> 44 (-15.38 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 12 -> 12 (0.00 %) dwords per thread
Code Size: 261812 -> 261036 (-0.30 %) bytes
LDS: 7 -> 7 (0.00 %) blocks
Max Waves: 346 -> 348 (0.58 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-08 19:45:01 +10:00
Erik Faye-Lund
ecdab0dfea docs: drop h1 in header
It's generally frowned upon to have more than one H1 per document in
HTML4. So let's put the text directly inside the header. This means we
can drop the flex-based centering, which makes things a bit easier. We
also need to change the padding to rem instead of em, because the em has
now changed.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 07:18:15 +00:00
Erik Faye-Lund
6e0e550904 docs: harmonize headings and titles
We're pretty insonsistent in what the headings and titles are, especially
compared to what the articles are listed as in the sidebar. Let's
harmonize this.

There's a notable exception for meson.html, where the sidebar uses a
short-hand form that makes sense in the sidebar, but not in the article
due to the visible context being different.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 07:18:15 +00:00
Erik Faye-Lund
269474b428 docs: renumber headings
It's generally frowned upon to have multiple H1 headings in HTML4. So
let's make sure each article has a primary heading for the article, and
that that heading is the title that is used in the sidebar.

While we're at it, let's update the title in the articles to match the
title from the sidebar as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 07:18:15 +00:00
Erik Faye-Lund
87683ba058 docs: give download-article a primary heading
It's generally frowned upon to have multiple H1 headings in HTML4. So
let's add a primary heading for the article, and source that from the
title used in the sidebar.

While we're at it, let's update the title in the article to match the
title from the sidebar as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 07:18:15 +00:00
Erik Faye-Lund
a8df27b0b2 docs: use title-casing for all headings in sidebar
We generally use title-casing for headings in the sidebar. But not
all headings was constently cased like that. Let's make sure this
is consistent.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 07:18:15 +00:00
Erik Faye-Lund
f06e698aad docs: spell out "and" in sidebar
There's no need to keep this short, we can just spell out "and" here.
Besides, a slash kind of implies "or", but these articles are about
both of these, not either.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 07:18:15 +00:00
Erik Faye-Lund
7809331cb9 docs: remove pointless list-entry
It's quite visible that there's more docs below, we don't need to spell
it out for the reader.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 07:18:15 +00:00
Erik Faye-Lund
7421fdf68a docs: spell out faq in sidebar
We're not short on space here, so there's little point in abbreviating
this. This also matches the heading in the article.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 07:18:15 +00:00
Erik Faye-Lund
e4fe83c8a0 docs: spell out "and" in sidebar
We're not short on space here, so let's just spell out "and" instead of
using the ampersand. This is more consistent with the entry above in the
sidebar.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 07:18:15 +00:00
Timothy Arceri
4fd8161773 glsl_to_nir: remove unused type_is_int()
This was missed in e00fa99b08.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-08 14:11:38 +10:00
Timothy Arceri
a01b393c39 Revert "glx: Fix synthetic error generation in __glXSendError"
This reverts commit e91ee763c3.

This seems to have broken a number of wine games. Lets revert
everything for now and try again later.

Acked-by: Adam Jackson <ajax@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110632
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110590
2019-05-08 13:16:44 +10:00
Timothy Arceri
024232b26c radeonsi: add an AMD_TEX_ANISO environment variable
This brings it inline with the recently added AMD_DEBUG.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109619
2019-05-08 09:32:25 +10:00
Kenneth Graunke
d568fcd0a0 i965: leave the top 4Gb of the high heap VMA unused
This ports commit 9e7b0988d6 from anv
to i965.  Thanks to Lionel for noticing that it was missing!

Fixes: 01058a5522 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 15:45:56 -07:00
Kenneth Graunke
17210c63a9 i965: Force VMA alignment to be a multiple of the page size.
This should happen regardless, but let's be paranoid.

Fixes: 01058a5522 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 15:45:56 -07:00
Kenneth Graunke
15f134c628 i965: Fix BRW_MEMZONE_LOW_4G heap size.
The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages,
and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB.

So we can't use the top page.

Fixes: 01058a5522 i965: Add virtual memory allocator infrastructure to brw_bufmgr.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 15:45:56 -07:00
Matt Turner
e8c74a1e16 intel/compiler: Unset flag reg when FB write is not predicated
In the FS IR we pretend that the instruction is predicated with (+f0.1)
just for flag dependency tracking purposes. Since the instruction
doesn't support predication before Haswell, we unset the predicate so we
should also unset the flag register so that we can round-trip the
disassembly.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
5d7a9e0811 intel/disasm: Disassemble immediate value properly for dim
On haswell, for dim instruction we encode immediate float value operand
into double float,

v2: Fix comment (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
6c83a68ebc intel/disasm: Disassemble JIP offset for while
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
9db616e8a2 intel/compiler: Replicate 16 bit immediate value correctly
For the W or UW (signed or unsigned word) source types, the 16-bit value
must be replicated in both the low and high words of the 32-bit
immediate value.

v2: Fix replication in other places as well
V3: fix a few nits (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
5211159b5b intel/compiler: Print quad value in hex format
Print quad value same as unsigned quad so that we can distinguish in
between quater control disassembled values for e.g 1/2/3[Q] and
immediate quad value for e.g 1Q. This allows round-tripping through the
assembler/disassembler.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
4e828bb48a intel/tools: Add unit tests for assembler
v1: Pass executable object from meson to test(Dylan Baker)
v2: Ignore generated output files from git status(Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-07 14:33:48 -07:00
Mika Kuoppala
1fb5ce0a11 intel/tools: Initialize offset correctly for i965_asm
If we leave offset uninitialized, access to store
will be random depending on stack value and can
segfault.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Mika Kuoppala
85da1194ec intel/tools: Add meson pthread dependancy for i965_asm
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
70308a5a8a intel/tools: New i965 instruction assembler tool
Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who
mentored me through out this project.

v2: Fix memory leaks and naming convention (Caio)
v3: Fix meson changes (Dylan Baker)
v4: Fix usage options (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/141
2019-05-07 14:33:38 -07:00
Kenneth Graunke
a232aa5c50 iris: Also handle res->offset for buffer sampler/image views 2019-05-07 13:36:18 -07:00
Mike Blumenkrantz
ddd716e746 iris: support dmabuf imports with offsets
this adds support for imports where the image data begins at an offset
from the start of the buffer, as used in h/x264

fixes kwg/mesa#47

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-07 13:36:08 -07:00
Roland Scheidegger
748f603390 gallivm: fix broken 8-wide s3tc decoding
Brian noticed there was an uninitialized var for the 8-wide case and 128
bit blocks, which made it always crash. Likewise, the 64bit block case
had another crash bug due to type mismatch.
Color decode (used for all s3tc formats) also had a bogus shuffle for
this case, leading to decode artifacts.
Fix these all up, which makes the code actually work 8-wide. Note that
it's still not used - I've verified it works, and the generated assembly
does look quite a bit simpler actually (20-30% less instructions for the
s3tc decode part with avx2), however in practice it still seems to be
sligthly slower for some unknown reason (tested with openarena) on my
haswell box, so for now continue to split things into 4-wide vectors
before decoding.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-05-07 18:59:38 +02:00
Juan A. Suarez Romero
92dba1c66e docs: Add relnotes stub for 19.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-05-07 16:07:29 +00:00
Juan A. Suarez Romero
14a7959cfa Bump version for 19.1 branch
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-05-07 16:02:34 +00:00
Vasily Khoruzhick
6b46399e2f lima: enable sin and cos lowering for GP
GP doesn't support sin/cos natively, so we have to lower them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Vasily Khoruzhick
e67e4e90b2 nir: implement lowering for fsin and fcos
Lower sin and cos using Nick's fast sin/cos approximation from
https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648

It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Rob Clark
b15c46e6bf freedreno/ir3: move const_state to ir3_shader
For a6xx, we construct/emit a single VS const state used for both
binning pass and draw pass.  So far we were mostly getting lucky that
there were not (obvious) mismatches between the const_state (like
different lowered immediates) between the binning and draw pass
VS ir3_shader_variant.

And I guess this situation will come up more as GS and tess is added
into the equation.

Since really everything about the const state is not specific to the
variant, move this.  The main exception is lowered immediates, but these
are the last to appear in the layout, and it doesn't hurt for each new
shader variant to just append any immed's it lowers to the end of the
immediate state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark
5690f83bb5 freedreno/ir3: split out const_state setup
Next patch moves const_state to ir3_shader, before the compile context
is created.  So move the code around in prep to call it earlier.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark
9403184ddd freedreno/ir3: move immediates to const_state
They are really part of the constant state, and it will moving things
from ir3_shader_variant to ir3_shader if we combine them.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark
23e7a34466 freedreno/ir3: consolidate const state
Combine the offsets of differenet parts of the constant space with (what
was formerly known as) ir3_driver_const_layout.  Bunch of churn, but no
functional change.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark
ef3eecd66b freedreno/ir3: move ir3_pointer_size()
Move to ir3_compiler so it doesn't depend on the compile context.  Prep
work for moving constant state from variant (where we have compile
context) to shader (where we do not).

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Lionel Landwerlin
2d2927938f vulkan/overlay-layer: fix cast errors
Not quite sure what version of GCC/Clang produces errors (8.3.0
locally was fine).

v2: also fix an integer literal issue (Karol)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-07 10:45:45 +01:00
Samuel Iglesias Gonsálvez
bc66cebc0d anv: fix alphaToCoverage when there is no color attachment
There are tests in CTS for alpha to coverage without a color attachment
that are failing. This happens because we remove the shader color
outputs when we don't have a valid color attachment for them, but when
alpha to coverage is enabled we still want to preserve the the output
at location 0 since we need the alpha component. In that case we will
also need to create a null render target for RT 0.

v2:
  - We already create a null rt when we don't have any, so reuse that
    for this case (Jason)
  - Simplify the code a bit (Iago)

v3:
  - Take alpha to coverage from the key and don't tie this to depth-only
    rendering only, we want the same behavior if we have multiple render
    targets but the one at location 0 is not used. (Jason).
  - Rewrite commit message (Iago)

v4:
  - Make sure we take into account the array length of the shader outputs,
    which we were no handling correctly either and make sure we also
    create null render targets for any invalid array entries too.

v5:
  - Simplify removal of unused outputs by using rt_used[] so we don't have
    to special case alpha to coverage there too.

Fixes the following CTS tests:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 09:35:47 +02:00
Ian Romanick
c866500525 intel/compiler: Don't always require precise lowering of flrp
No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8164367 -> 8135551 (-0.35%)
instructions in affected programs: 3271235 -> 3242419 (-0.88%)
helped: 13636
HURT: 90
helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1
helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97%
HURT stats (abs)   min: 1 max: 4 x̄: 1.80 x̃: 2
HURT stats (rel)   min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78%
95% mean confidence interval for instructions value: -2.13 -2.07
95% mean confidence interval for instructions %-change: -1.16% -1.13%
Instructions are helped.

total cycles in shared programs: 188719974 -> 188586222 (-0.07%)
cycles in affected programs: 70415766 -> 70282014 (-0.19%)
helped: 12563
HURT: 515
helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6
helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27%
HURT stats (abs)   min: 2 max: 54 x̄: 6.07 x̃: 4
HURT stats (rel)   min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08%
95% mean confidence interval for cycles value: -10.56 -9.90
95% mean confidence interval for cycles %-change: -0.47% -0.45%
Cycles are helped.

LOST:   0
GAINED: 13

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
ab86926156 nir/algebraic: Reassociate open-coded flrp(1, b, c)
In a previous verion of this patch, Jason commented,

   "Re-associating based on whether or not something has a constant
   value of 1.0 seems a bit sneaky.  I think it's well within the rules
   but it seems like something that could bite you."

That is possibly true.  The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
fabs(c) approaches 0.

However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c).  On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc.  The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.

This patch just cuts out the middle man.  Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.

A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch.  This includes a few
shaders in ET:QW.

I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any.  The helped /
hurt cycles was about even.

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
instructions in affected programs: 1089851 -> 1082198 (-0.70%)
helped: 3285
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.09%
Instructions are helped.

total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
cycles in affected programs: 20004922 -> 19966558 (-0.19%)
helped: 3012
HURT: 477
helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs)   min: 2 max: 328 x̄: 4.27 x̃: 4
HURT stats (rel)   min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.38 -10.62
95% mean confidence interval for cycles %-change: -0.46% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
c995d1ca3a nir/flrp: Lower flrp(a, b, #c) differently
This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
ae02622d8f nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists
There is little effect on Intel GPUs now because we almost always take
the "always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any other Intel platforms.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total cycles in shared programs: 188852500 -> 188852484 (<.01%)
cycles in affected programs: 14612 -> 14596 (-0.11%)
helped: 4
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11%
95% mean confidence interval for cycles value: -4.00 -4.00
95% mean confidence interval for cycles %-change: -0.13% -0.09%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
6698d861a5 nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists
This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any Intel platform.  Before a number of large rebases this
helped cycles in a couple shaders on Iron Lake and GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
5b908db604 nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently
No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8189888 -> 8153912 (-0.44%)
instructions in affected programs: 1199037 -> 1163061 (-3.00%)
helped: 4124
HURT: 10
helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9
helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02%
HURT stats (abs)   min: 1 max: 2 x̄: 1.20 x̃: 1
HURT stats (rel)   min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06%
95% mean confidence interval for instructions value: -8.84 -8.56
95% mean confidence interval for instructions %-change: -5.12% -4.77%
Instructions are helped.

total cycles in shared programs: 188606710 -> 188426964 (-0.10%)
cycles in affected programs: 27505596 -> 27325850 (-0.65%)
helped: 4026
HURT: 77
helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46
helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85%
HURT stats (abs)   min: 2 max: 376 x̄: 17.79 x̃: 6
HURT stats (rel)   min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04%
95% mean confidence interval for cycles value: -44.75 -42.87
95% mean confidence interval for cycles %-change: -2.44% -2.17%
Cycles are helped.

LOST:   3
GAINED: 35

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
23c5501b77 nir/flrp: Lower flrp(#a, #b, c) differently
If the magnitudes of #a and #b are such that (b-a) won't lose too much
precision, lower as a+c(b-a).

No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8192503 -> 8192383 (<.01%)
instructions in affected programs: 18417 -> 18297 (-0.65%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43%
95% mean confidence interval for instructions value: -2.48 -1.05
95% mean confidence interval for instructions %-change: -1.56% -0.63%
Instructions are helped.

total cycles in shared programs: 188662536 -> 188661956 (<.01%)
cycles in affected programs: 744476 -> 743896 (-0.08%)
helped: 62
HURT: 0
helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6
helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06%
95% mean confidence interval for cycles value: -12.37 -6.34
95% mean confidence interval for cycles %-change: -0.48% -0.06%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
dd7135d55d intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5
Previously lower_flrp32 was only set for vertex shaders.  Fragment
shaders performed a(1-c)+bc lowering during code generation.

The shaders with loops hurt are SIMD8 and SIMD16 shaders for a
text-identical fragment shader.

v2: Rebase on 26391cceaa ("intel/compiler: Lower ffma on Gen4 and
Gen5").

v3: Rebase on a004e95dd7 ("radeonsi/nir: create si_nir_opts() helper")

Iron Lake
total instructions in shared programs: 8211385 -> 8185974 (-0.31%)
instructions in affected programs: 2503898 -> 2478487 (-1.01%)
helped: 9936
HURT: 921
helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11%
HURT stats (abs)   min: 1 max: 12 x̄: 3.24 x̃: 2
HURT stats (rel)   min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89%
95% mean confidence interval for instructions value: -2.43 -2.25
95% mean confidence interval for instructions %-change: -1.41% -1.33%
Instructions are helped.

total cycles in shared programs: 188523186 -> 188401198 (-0.06%)
cycles in affected programs: 71541604 -> 71419616 (-0.17%)
helped: 11649
HURT: 1871
helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6
helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 13.38 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17%
95% mean confidence interval for cycles value: -9.42 -8.63
95% mean confidence interval for cycles %-change: -0.54% -0.50%
Cycles are helped.

total loops in shared programs: 852 -> 856 (0.47%)
loops in affected programs: 0 -> 4
helped: 0
HURT: 4
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
95% mean confidence interval for loops value: 1.00 1.00
95% mean confidence interval for loops %-change: 0.00% 0.00%
Loops are HURT.

LOST:   3
GAINED: 12

GM45
total instructions in shared programs: 5046407 -> 5033694 (-0.25%)
instructions in affected programs: 1303584 -> 1290871 (-0.98%)
helped: 5010
HURT: 464
helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2
helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08%
HURT stats (abs)   min: 1 max: 75 x̄: 3.39 x̃: 2
HURT stats (rel)   min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87%
95% mean confidence interval for instructions value: -2.45 -2.20
95% mean confidence interval for instructions %-change: -1.40% -1.28%
Instructions are helped.

total cycles in shared programs: 128889476 -> 128812366 (-0.06%)
cycles in affected programs: 44845402 -> 44768292 (-0.17%)
helped: 6079
HURT: 940
helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8
helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 16.01 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17%
95% mean confidence interval for cycles value: -11.63 -10.34
95% mean confidence interval for cycles %-change: -0.58% -0.52%
Cycles are helped.

total loops in shared programs: 633 -> 635 (0.32%)
loops in affected programs: 0 -> 2
helped: 0
HURT: 2

total spills in shared programs: 60 -> 69 (15.00%)
spills in affected programs: 54 -> 63 (16.67%)
helped: 0
HURT: 1

total fills in shared programs: 92 -> 105 (14.13%)
fills in affected programs: 80 -> 93 (16.25%)
helped: 0
HURT: 1

LOST:   15
GAINED: 15

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
2019-05-06 22:52:29 -07:00
Ian Romanick
d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
158370ed2a nir/flrp: Add new lowering pass for flrp instructions
This pass will soon grow to include some optimizations that are
difficult or impossible to implement correctly within nir_opt_algebraic.
It also include the ability to generate strictly correct code which the
current nir_opt_algebraic lowering lacks (though that could be changed).

v2: Document the parameters to nir_lower_flrp.  Rebase on top of
3766334923 ("compiler/nir: add lowering for 16-bit flrp")

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick
dc566a033c nir/algebraic: Pull common multiplication out of flrp arguments
All Intel platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342485 -> 15337495 (-0.03%)
instructions in affected programs: 217456 -> 212466 (-2.29%)
helped: 1539
HURT: 1
helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3
helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56%
95% mean confidence interval for instructions value: -3.39 -3.09
95% mean confidence interval for instructions %-change: -3.24% -2.96%
Instructions are helped.

total cycles in shared programs: 355734320 -> 355728237 (<.01%)
cycles in affected programs: 1851555 -> 1845472 (-0.33%)
helped: 835
HURT: 575
helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14
helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81%
HURT stats (abs)   min: 1 max: 322 x̄: 48.40 x̃: 14
HURT stats (rel)   min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43%
95% mean confidence interval for cycles value: -8.50 -0.13
95% mean confidence interval for cycles %-change: 0.48% 1.62%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick
a83a6e9690 nir/algebraic: Pull common addition out of flrp arguments
v2: Augment the late optimization patterns with a couple pre-ffma pass
patterns.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342982 -> 15342485 (<.01%)
instructions in affected programs: 56304 -> 55807 (-0.88%)
helped: 235
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1
helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74%
95% mean confidence interval for instructions value: -2.31 -1.92
95% mean confidence interval for instructions %-change: -1.46% -1.09%
Instructions are helped.

total cycles in shared programs: 355734740 -> 355734320 (<.01%)
cycles in affected programs: 1028807 -> 1028387 (-0.04%)
helped: 134
HURT: 104
helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8
helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61%
HURT stats (abs)   min: 1 max: 203 x̄: 29.06 x̃: 8
HURT stats (rel)   min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46%
95% mean confidence interval for cycles value: -8.51 4.98
95% mean confidence interval for cycles %-change: -0.35% 0.39%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10886815 -> 10886390 (<.01%)
instructions in affected programs: 36883 -> 36458 (-1.15%)
helped: 147
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23%
95% mean confidence interval for instructions value: -3.12 -2.67
95% mean confidence interval for instructions %-change: -1.83% -1.38%
Instructions are helped.

total cycles in shared programs: 154188360 -> 154186902 (<.01%)
cycles in affected programs: 388094 -> 386636 (-0.38%)
helped: 90
HURT: 58
helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15
helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83%
HURT stats (abs)   min: 1 max: 684 x̄: 31.97 x̃: 10
HURT stats (rel)   min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51%
95% mean confidence interval for cycles value: -22.62 2.92
95% mean confidence interval for cycles %-change: -0.68% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8221239 -> 8220357 (-0.01%)
instructions in affected programs: 54560 -> 53678 (-1.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3
helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17%
95% mean confidence interval for instructions value: -5.21 -4.28
95% mean confidence interval for instructions %-change: -2.23% -1.72%
Instructions are helped.

total cycles in shared programs: 188654442 -> 188650364 (<.01%)
cycles in affected programs: 1454384 -> 1450306 (-0.28%)
helped: 204
HURT: 0
helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18
helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22%
95% mean confidence interval for cycles value: -22.38 -17.60
95% mean confidence interval for cycles %-change: -0.67% -0.46%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Christian Gmeiner
e00fa99b08 glsl_to_nir: drop supports_ints
At initial nir level all drivers are supporting ints.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:59 +02:00
Christian Gmeiner
4e110eca42 nir: nir_shader_compiler_options: drop native_integers
Driver which do not support native integers should use a lowering
pass to go from integers to floats.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:52 +02:00
Alyssa Rosenzweig
050b934a24 panfrost: Refactor blend descriptors
This commit does a fairly large cleanup of blend descriptors, although
there should not be any functional changes. In particular, we split
apart the Midgard and Bifrost blend descriptors, since they are
radically different. From there, we can identify that the Midgard
descriptor as previously written was really two render targets'
descriptors stuck together. From this observation, we split the Midgard
descriptor into what a single RT actually needs. This enables us to
correctly dump blending configuration for MRT samples on Midgard. It
also allows the Midgard and Bifrost blend code to peacefully coexist,
with runtime selection rather than a #ifdef. So, as a bonus, this will
help the future Bifrost effort, eliminating one major source of
compile-time architectural divergence.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-07 03:21:08 +00:00
Vasily Khoruzhick
d4a249aa09 lima/gpir: enable lowering for ftrunc
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Vasily Khoruzhick
f4659bea7c lima/gpir: implement nir_op_fmov
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Vasily Khoruzhick
cf1ab4b96b lima: use int_to_float lowering pass
Neither GP nor PP in Mali4x0 support integers, so utilize new pass
and set native_integers to true for now until this flag is dropped.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Vasily Khoruzhick
443c5a3cd6 nir: add int_to_float lowering pass
This new pass lowers ints and bools to floats. It allows hardware
that doesn't have native integers (e.g. Mali4x0) use the same
code paths as modern hardware.

It uses newly introduced pass to gather SSA types and should be
used as late as possible.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Timothy Arceri
49025292fb radeonsi: add config entry for Counter-Strike Global Offensive
This fixes rendering issues with gun scopes which is rather
important.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100239
2019-05-07 09:42:09 +10:00
Vasily Khoruzhick
d085920b64 lima/gpir: fix float uniform alignment issue
If PIPE_CAP_PACKED_UNIFORMS is not set uniforms are vec4 aligned,
so lima_nir_lower_uniform_to_scalar should use first channel of vec4
for float uniforms.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-06 14:08:09 -07:00
Erik Faye-Lund
d84b85bc28 draw: flush when setting stream-out targets
We need to re-prepare the middle-end state to pick up changes to this
state to react correctly to pausing/resuming stream-out. So let's add a
flush here.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: ec8cbd79ac "draw/softpipe: EXT_transform_feedback support (v2)"
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-06 22:42:37 +02:00
Erik Faye-Lund
ed53e61bec llvmpipe: pass stream-out targets to draw-module early
We currently set this state in the draw-module twice on each draw, but
which trashes this state. So far that's not a problem, because we don't
really do much from that function.

But it turns out, we're going to have to do more; namely flush when the
state changes. This will incur a large performance penalty due to the
excessive setting.

Instead, let's rely on the CSO caching making sure that
llvmpipe_set_so_targets doesn't get called needlessly, and setup the
state directly there instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-06 22:42:37 +02:00
Uros Bizjak
fc7649c4b7 doc: Update GL_KHR_robustness in features.txt for r600
glxinfo for Cypress XT [Radeon HD 5870] lists GL_KHR_robustness
as supported extension.  This was the last missing extension
for GL 4.5, so Mark GL 4.5 as all DONE for r600.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-05-07 06:21:48 +10:00
Chia-I Wu
c7078397ca virgl: do not use inline writes for subdata
Inline writes skip transfer map/unamp at the cost of an extra copy
on the data during execbuffer.  That is generally a win for small
transfers.  But the heuristic to use inline writes based on buffer
sizes rather than transfer sizes makes little sense.  More
importantly, inline writes miss optimizations that are done for
buffer transfers.

Let's just use transfers.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-05-06 10:31:56 -07:00
Chia-I Wu
898be8036d virgl: rework queries
virglrender has been changed such that

 - VIRGL_CCMD_GET_QUERY_RESULT is fenced
 - query buffers (PIPE_BIND_CUSTOM) are coherent

We can check if a query is ready using DRM_IOCTL_VIRTGPU_WAIT, and also
avoid a synchronized transfer to retrieve the query result.  When
running against an older virglrenderer, it falls back to the old
behavior automatically.

TF2 @ 640x480 for pts4.dem went from 17fps to 40fps on my testing
machine.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-06 10:20:40 -07:00
Chia-I Wu
b4da53b0c3 virgl: export resource_is_busy from winsys
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-06 10:20:38 -07:00
Samuel Pitoiset
c10808441c radv: fix rowPitch for R32G32B32 formats on GFX9
The pitch is actually the number of components per row. We found
the problem when we implemented some meta operations for these
formats and the wrong pitch has been confirmed with a small test case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108325
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-06 19:07:44 +02:00
Kenneth Graunke
a032a9665f iris: Enable PIPE_CAP_SURFACE_REINTERPRET_BLOCKS
This makes CompressedTexSubImage from a PBO source do proper GPU
rendering to upload instead of stalling to map the PBO source on
the CPU (then copying it on the CPU).

Thanks Bas Nieuwenhuizen for pointing out that Vulkan includes this
functionality, and to Jason Ekstrand for writing the code I adapted.
Vulkan only supports a single layer, however, and this code tries to
support multiple layers as long as it's miplevel 0.

Improves performance in Sid Meier's Civilization VI:

   Average frame time (ms):         -3.67423% +/- 1.46201% (n=5)
   99th percentile frame time (ms): -5.09910% +/- 3.87874% (n=5)
2019-05-06 09:50:32 -07:00
Bas Nieuwenhuizen
8139efbbbd radv: Use given stride for images imported from Android.
Handled similarly as radeonsi. I checked the offsets are actually used.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-06 15:36:39 +00:00
Erico Nunes
11602ccd5d lima/ppir: abort compilation in case of unsupported intrinsic
Currently ppir continues compilation when there is an unsupported
intrinsic, resulting in a shader that will surely not work as intended.

This is a problem during piglit runs as some tests don't compile
properly due to this but actually still get submitted to the gpu and
leave the system in an unstable state after executing, causing further
tests to fail.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-05-06 17:15:27 +02:00
Erico Nunes
60a128fe81 lima/ir: print names of unsupported intrinsics
While lima still doesn't support some kinds of intrinsics, it is more
helpful to display the name of the unsupported instr->intrinsic to make
debugging easier.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-05-06 17:15:06 +02:00
John Stultz
c7f2145b4b mesa: Makefile.sources: Add nir_lower_fb_read.c to Makefile.sources list
In commit a99c360a46 (nir: add pass to lower fb reads), a new
file was added that needs to also be added to the
Makefile.sources list used by the Android and SCons build system.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: a99c360a46 ("nir: add pass to lower fb reads")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
John Stultz
d04f44a459 mesa: Makefile.sources: Add ir3_nir_lower_load_barycentric_at_sample/offset to Makefile.sources
In commit 2f0b9d2249 ("freedreno/ir3: lower
load_barycentric_at_offset") a new file was added that needs to
also be added to the Makefile.sources list used by Android and
SCons build system.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 2f0b9d2249 ("freedreno/ir3: lower load_barycentric_at_offset")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
John Stultz
c935862127 mesa: android: freedreno: Fix build failure due to path change
The ir3_nir_trig.py file was moved in a previous commit,
aa0fed10d3 (freedreno: move ir3 to common location),
so update the Android.gen.mk file to match.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: aa0fed10d3 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
Amit Pundir
88105375c9 mesa: android: freedreno: build libfreedreno_{drm,ir3} static libs
Add libfreedreno_drm/ir3 to the build

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: b4476138d5 ("freedreno: move drm to common location")
Fixes: aa0fed10d3 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Tweaked to add extra ir3 files from master]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
Alistair Strachan
0fda3eac31 mesa: android: Remove unnecessary dependency tracking rules
The current AOSP master build system breaks building mesa due to the
following error:

external/mesa3d/src/compiler/Android.glsl.gen.mk:94: error:
  writing to readonly directory: "external/mesa3d/src/compiler/glsl/ir.h"

This error is bogus -- nothing "writes" to ir.h -- but the rule is
unnecessary because the generated header that is a dependency of the
non-generated header should be added to LOCAL_GENERATED_SOURCES and this
will track if the dependency needs to be regenerated.

(This change fixes a similar problem affecting nir.h too.)

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Alistair Strachan <astrachan@google.com>
[jstultz: Forward ported and tweaked commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:25 +00:00
Bas Nieuwenhuizen
5692351264 radv: Implement cosited_even sampling.
Apparently cosited_even was the required one instead of midpoint.

This adds slight offset of 0.5 pixels to the coordinates (+ we need
the image size to convert to normalized coords)

Fixes: 91702374d5 "radv: Add ycbcr lowering pass."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-06 11:09:30 +00:00
Michel Dänzer
28784e494e Restore erroneously removed .gitignore entry for "build" directory
It was removed in "delete autotools .gitignore files", but the build
directory is created by scons.

[Skip CI]

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-06 12:11:44 +02:00
Bas Nieuwenhuizen
5cbe12ad1b radv: Disable subsampled formats.
Broken on Polaris and since I discovered NV12 is not subsampled, but
a 2-plane format I decided I don't really care.

Work to do to re-enable:

1) Figure out which devices support it natively.
2) Write some software emulation for the others.

Fixes: 52c1adda21 "radv: Add ycbcr format features."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-06 09:53:37 +00:00
Timothy Arceri
1af72fa4d6 util/drirc: add workarounds for bugs in Doom 3: BFG
This makes the game playable on radeonsi.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110143
2019-05-06 17:32:36 +10:00
2873 changed files with 243488 additions and 103676 deletions

View File

@@ -35,6 +35,6 @@ indent_size = 2
[*.patch]
trim_trailing_whitespace = false
[meson.build,meson_options.txt]
[{meson.build,meson_options.txt}]
indent_style = space
indent_size = 2

2
.gitignore vendored
View File

@@ -1,2 +1,4 @@
*.pyc
*.pyo
*.out
build

View File

@@ -12,12 +12,9 @@
# main repository, it's recommended to remove the image from the source
# repository's container registry, so that the image from the main
# repository's registry will be used there as well.
#
# The format of the tag is "%Y-%m-%d-${counter}" where ${counter} stays
# at "01" unless you have multiple updates on the same day :)
variables:
UPSTREAM_REPO: mesa/mesa
DEBIAN_TAG: "2019-05-01"
DEBIAN_TAG: "2019-08-09"
DEBIAN_VERSION: stretch-slim
DEBIAN_IMAGE: "$CI_REGISTRY_IMAGE/debian/$DEBIAN_VERSION:$DEBIAN_TAG"
@@ -29,6 +26,7 @@ include:
stages:
- containers-build
- build+test
- test
# When to automatically run the CI
@@ -42,6 +40,14 @@ stages:
when:
- runner_system_failure
.ci-deqp-artifacts: &ci-deqp-artifacts
artifacts:
when: always
untracked: false
paths:
# Watch out! Artifacts are relative to the build dir.
# https://gitlab.com/gitlab-org/gitlab-ce/commit/8788fb925706cad594adf6917a6c5f6587dd1521
- artifacts
# CONTAINERS
@@ -64,8 +70,12 @@ debian:
paths:
- ccache
artifacts:
when: on_failure
untracked: true
when: always
paths:
- _build/meson-logs/*.txt
# scons:
- build/*/config.log
- shader-db
variables:
CCACHE_COMPILERCHECK: "content"
# Use ccache transparently, and print stats before/after
@@ -76,37 +86,19 @@ debian:
- ccache --zero-stats || true
- ccache --show-stats || true
after_script:
# In case the install dir is being saved as artifacts, tar it up
# so that symlinks and hardlinks aren't each packed separately in
# the zip file.
- if [ -d install ]; then
tar -cf artifacts/install.tar install;
fi
- export CCACHE_DIR="$PWD/ccache"
- ccache --show-stats
.meson-build:
extends: .build
script:
# We need to control the version of llvm-config we're using, so we'll
# generate a native file to do so. This requires meson >=0.49
- if test -n "$LLVM_VERSION"; then
LLVM_CONFIG="llvm-config-${LLVM_VERSION}";
echo -e "[binaries]\nllvm-config = '`which $LLVM_CONFIG`'" > native.file;
$LLVM_CONFIG --version;
else
touch native.file;
fi
- meson --version
- meson _build
--native-file=native.file
-D buildtype=debug
-D build-tests=true
-D libunwind=${UNWIND}
${DRI_LOADERS}
-D dri-drivers=${DRI_DRIVERS:-[]}
${GALLIUM_ST}
-D gallium-drivers=${GALLIUM_DRIVERS:-[]}
-D vulkan-drivers=${VULKAN_DRIVERS:-[]}
-D I-love-half-baked-turnips=true
- cd _build
- meson configure
- ninja -j4
- LC_ALL=C.UTF-8 ninja test
- .gitlab-ci/meson-build.sh
.scons-build:
extends: .build
@@ -123,12 +115,8 @@ debian:
# gallium drivers combined.
# Start this early so that it doesn't limit the total run time.
#
# We also put softpipe (and therefore gallium nine, which requires
# it) here, since softpipe/llvmpipe can't be built alongside classic
# swrast.
#
# Putting glvnd here is arbitrary, but we want it in one of the builds
# for coverage.
# We also stick the glvnd build here, since we want non-glvnd in
# meson-main for actual driver CI.
meson-swr-glvnd:
extends: .meson-build
variables:
@@ -143,10 +131,9 @@ meson-swr-glvnd:
-D gallium-omx=disabled
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=true
-D gallium-nine=false
-D gallium-opencl=disabled
-D osmesa=gallium
GALLIUM_DRIVERS: "swr,swrast,iris"
GALLIUM_DRIVERS: "swr,iris"
LLVM_VERSION: "6.0"
meson-clang:
@@ -164,6 +151,75 @@ meson-clang:
# clang++ breaks if it picks up the GCC 8 directory without libstdc++.so
- apt-get remove -y libgcc-8-dev
scons-swr:
extends: .scons-build
variables:
SCONS_TARGET: "swr=1"
SCONS_CHECK_COMMAND: "true"
LLVM_VERSION: "6.0"
scons-win64:
extends: .scons-build
variables:
SCONS_TARGET: platform=windows machine=x86_64
SCONS_CHECK_COMMAND: "true"
meson-main:
extends: .meson-build
variables:
UNWIND: "true"
DRI_LOADERS: >
-D glx=dri
-D gbm=true
-D egl=true
-D platforms=x11,wayland,drm,surfaceless
DRI_DRIVERS: "i915,i965,r100,r200,nouveau"
GALLIUM_ST: >
-D dri3=true
-D gallium-extra-hud=true
-D gallium-vdpau=true
-D gallium-xvmc=true
-D gallium-omx=bellagio
-D gallium-va=true
-D gallium-xa=true
-D gallium-nine=true
-D gallium-opencl=disabled
GALLIUM_DRIVERS: "iris,nouveau,kmsro,r300,r600,freedreno,swrast,svga,v3d,vc4,virgl,etnaviv,panfrost,lima"
LLVM_VERSION: "7"
EXTRA_OPTION: >
-D osmesa=gallium
-D tools=all
MESON_SHADERDB: "true"
BUILDTYPE: "debugoptimized"
<<: *ci-deqp-artifacts
meson-clover:
extends: .meson-build
variables:
UNWIND: "true"
DRI_LOADERS: >
-D glx=disabled
-D egl=false
-D gbm=false
GALLIUM_ST: >
-D dri3=false
-D gallium-vdpau=false
-D gallium-xvmc=false
-D gallium-omx=disabled
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=false
-D gallium-opencl=icd
script:
- export GALLIUM_DRIVERS="r600,radeonsi"
- .gitlab-ci/meson-build.sh
- LLVM_VERSION=7 .gitlab-ci/meson-build.sh
- export GALLIUM_DRIVERS="i915,r600"
- LLVM_VERSION=3.9 .gitlab-ci/meson-build.sh
- LLVM_VERSION=4.0 .gitlab-ci/meson-build.sh
- LLVM_VERSION=5.0 .gitlab-ci/meson-build.sh
- LLVM_VERSION=6.0 .gitlab-ci/meson-build.sh
meson-vulkan:
extends: .meson-build
variables:
@@ -185,39 +241,19 @@ meson-vulkan:
-D gallium-opencl=disabled
VULKAN_DRIVERS: intel,amd,freedreno
LLVM_VERSION: "7"
EXTRA_OPTION: >
-D vulkan-overlay-layer=true
meson-main:
.meson-cross:
extends: .meson-build
variables:
UNWIND: "true"
DRI_LOADERS: >
-D glx=dri
-D gbm=true
-D egl=true
-D platforms=x11,wayland,drm,surfaceless
-D osmesa=classic
DRI_DRIVERS: "i915,i965,r100,r200,swrast,nouveau"
GALLIUM_ST: >
-D dri3=true
-D gallium-extra-hud=true
-D gallium-vdpau=true
-D gallium-xvmc=true
-D gallium-omx=bellagio
-D gallium-va=true
-D gallium-xa=true
-D gallium-nine=false
-D gallium-opencl=disabled
GALLIUM_DRIVERS: "iris,nouveau,kmsro,r300,r600,freedreno,svga,v3d,vc4,virgl,etnaviv,panfrost,lima"
LLVM_VERSION: "7"
meson-clover-llvm:
extends: .meson-build
variables:
UNWIND: "true"
UNWIND: "false"
DRI_LOADERS: >
-D glx=disabled
-D egl=false
-D gbm=false
-D egl=false
-D platforms=surfaceless
-D osmesa=none
GALLIUM_ST: >
-D dri3=false
-D gallium-vdpau=false
@@ -226,14 +262,52 @@ meson-clover-llvm:
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=false
-D gallium-opencl=icd
GALLIUM_DRIVERS: "r600,radeonsi"
-D llvm=false
<<: *ci-deqp-artifacts
script:
- .gitlab-ci/meson-build.sh
meson-clover-llvm39:
extends: meson-clover-llvm
meson-armhf:
extends: .meson-cross
variables:
GALLIUM_DRIVERS: "i915,r600"
LLVM_VERSION: "3.9"
CROSS: armhf
VULKAN_DRIVERS: freedreno
GALLIUM_DRIVERS: "etnaviv,freedreno,kmsro,lima,nouveau,panfrost,tegra,v3d,vc4"
# Disable the tests since we're cross compiling.
EXTRA_OPTION: >
-D build-tests=false
-D I-love-half-baked-turnips=true
-D vulkan-overlay-layer=true
meson-arm64:
extends: .meson-cross
variables:
CROSS: arm64
VULKAN_DRIVERS: freedreno
GALLIUM_DRIVERS: "etnaviv,freedreno,kmsro,lima,nouveau,panfrost,tegra,v3d,vc4"
# Disable the tests since we're cross compiling.
EXTRA_OPTION: >
-D build-tests=false
-D I-love-half-baked-turnips=true
-D vulkan-overlay-layer=true
# While the main point of this build is testing the i386 cross build,
# we also use this one to test some other options that are exclusive
# with meson-main's choices (classic swrast and osmesa)
meson-i386:
extends: .meson-cross
variables:
CROSS: i386
VULKAN_DRIVERS: intel
DRI_DRIVERS: "swrast"
GALLIUM_DRIVERS: "iris"
# Disable i386 tests, because u_format_tests gets precision
# failures in dxtn unpacking
EXTRA_OPTION: >
-D build-tests=false
-D vulkan-overlay-layer=true
-D llvm=false
-D osmesa=classic
scons-nollvm:
extends: .scons-build
@@ -250,15 +324,59 @@ scons-llvm:
# LLVM 3.4 packages were built with an old libstdc++ ABI
CXX: "g++ -D_GLIBCXX_USE_CXX11_ABI=0"
scons-swr:
extends: .scons-build
.deqp-test:
<<: *ci-run-policy
stage: test
image: $DEBIAN_IMAGE
variables:
SCONS_TARGET: "swr=1"
SCONS_CHECK_COMMAND: "true"
LLVM_VERSION: "6.0"
GIT_STRATEGY: none # testing doesn't build anything from source
DEQP_SKIPS: deqp-default-skips.txt
script:
# Note: Build dir (and thus install) may be dirty due to GIT_STRATEGY
- rm -rf install
- tar -xf artifacts/install.tar
- ./artifacts/deqp-runner.sh
artifacts:
when: on_failure
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
paths:
- results/
scons-win64:
extends: .scons-build
test-llvmpipe-gles2:
parallel: 4
variables:
SCONS_TARGET: platform=windows machine=x86_64
SCONS_CHECK_COMMAND: "true"
DEQP_VER: gles2
DEQP_EXPECTED_FAILS: deqp-llvmpipe-fails.txt
LIBGL_ALWAYS_SOFTWARE: "true"
DEQP_RENDERER_MATCH: "llvmpipe"
extends: .deqp-test
dependencies:
- meson-main
test-softpipe-gles2:
parallel: 4
variables:
DEQP_VER: gles2
DEQP_EXPECTED_FAILS: deqp-softpipe-fails.txt
LIBGL_ALWAYS_SOFTWARE: "true"
DEQP_RENDERER_MATCH: "softpipe"
GALLIUM_DRIVER: "softpipe"
extends: .deqp-test
dependencies:
- meson-main
# The GLES2 CTS run takes about 8 minutes of CPU time, while GLES3 is
# 25 minutes. Until we can get its runtime down, just do a partial
# (every 10 tests) run.
test-softpipe-gles3-limited:
variables:
DEQP_VER: gles3
DEQP_EXPECTED_FAILS: deqp-softpipe-fails.txt
LIBGL_ALWAYS_SOFTWARE: "true"
DEQP_RENDERER_MATCH: "softpipe"
GALLIUM_DRIVER: "softpipe"
CI_NODE_INDEX: 1
CI_NODE_TOTAL: 10
extends: .deqp-test
dependencies:
- meson-main

View File

@@ -5,17 +5,22 @@ set -o xtrace
export DEBIAN_FRONTEND=noninteractive
CROSS_ARCHITECTURES="armhf arm64 i386"
for arch in $CROSS_ARCHITECTURES; do
dpkg --add-architecture $arch
done
apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
wget \
gnupg \
software-properties-common
unzip \
gnupg
curl -fsSL https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
add-apt-repository "deb https://apt.llvm.org/stretch/ llvm-toolchain-stretch-7 main"
add-apt-repository "deb https://apt.llvm.org/stretch/ llvm-toolchain-stretch-8 main"
echo "deb [trusted=yes] https://apt.llvm.org/stretch/ llvm-toolchain-stretch-7 main" >/etc/apt/sources.list.d/llvm7.list
echo "deb [trusted=yes] https://apt.llvm.org/stretch/ llvm-toolchain-stretch-8 main" >/etc/apt/sources.list.d/llvm8.list
sed -i -e 's/http:\/\/deb/https:\/\/deb/g' /etc/apt/sources.list
echo 'deb https://deb.debian.org/debian stretch-backports main' >/etc/apt/sources.list.d/backports.list
@@ -26,18 +31,25 @@ apt-get install -y -t stretch-backports \
llvm-3.4-dev \
llvm-3.9-dev \
libclang-3.9-dev \
llvm-4.0-dev \
libclang-4.0-dev \
llvm-5.0-dev \
libclang-5.0-dev \
llvm-6.0-dev \
libclang-6.0-dev \
llvm-7-dev \
libclang-7-dev \
llvm-8-dev \
libclang-8-dev \
g++ \
clang-8 \
libclang-7-dev
clang-8
# Install remaining packages from Debian buster to get newer versions
add-apt-repository "deb https://deb.debian.org/debian/ buster main"
add-apt-repository "deb https://deb.debian.org/debian/ buster-updates main"
echo "deb https://deb.debian.org/debian/ buster main" >/etc/apt/sources.list.d/buster.list
echo "deb https://deb.debian.org/debian/ buster-updates main" >/etc/apt/sources.list.d/buster-updates.list
apt-get update
apt-get install -y \
git \
bzip2 \
zlib1g-dev \
pkg-config \
@@ -45,6 +57,10 @@ apt-get install -y \
libxdamage-dev \
libxxf86vm-dev \
gcc \
git \
libepoxy-dev \
libegl1-mesa-dev \
libgbm-dev \
libclc-dev \
libxvmc-dev \
libomxil-bellagio-dev \
@@ -54,24 +70,44 @@ apt-get install -y \
libelf-dev \
libunwind-dev \
libglvnd-dev \
libgtk-3-dev \
libpng-dev \
libgbm-dev \
libgles2-mesa-dev \
python-mako \
python3-mako \
meson \
scons
# autotools build deps
apt-get install -y \
automake \
libtool \
bison \
flex \
gettext \
make
cmake \
meson \
scons
# Cross-build Mesa deps
for arch in $CROSS_ARCHITECTURES; do
apt-get install -y \
libdrm-dev:${arch} \
libexpat1-dev:${arch} \
libelf-dev:${arch}
done
apt-get install -y \
dpkg-dev \
gcc-aarch64-linux-gnu \
g++-aarch64-linux-gnu \
gcc-arm-linux-gnueabihf \
g++-arm-linux-gnueabihf \
gcc-i686-linux-gnu \
g++-i686-linux-gnu
# for 64bit windows cross-builds
apt-get install -y \
wine64 \
mingw-w64
apt-get install -y mingw-w64
# for the vulkan overlay layer
wget https://github.com/KhronosGroup/glslang/releases/download/master-tot/glslang-master-linux-Release.zip
unzip glslang-master-linux-Release.zip bin/glslangValidator
install -m755 bin/glslangValidator /usr/local/bin/
rm bin/glslangValidator glslang-master-linux-Release.zip
# dependencies where we want a specific version
export XORG_RELEASES=https://xorg.freedesktop.org/releases/individual
@@ -82,16 +118,16 @@ export XORGMACROS_VERSION=util-macros-1.19.0
export GLPROTO_VERSION=glproto-1.4.17
export DRI2PROTO_VERSION=dri2proto-2.8
export LIBPCIACCESS_VERSION=libpciaccess-0.13.4
export LIBDRM_VERSION=libdrm-2.4.97
export LIBDRM_VERSION=libdrm-2.4.99
export XCBPROTO_VERSION=xcb-proto-1.13
export RANDRPROTO_VERSION=randrproto-1.3.0
export LIBXRANDR_VERSION=libXrandr-1.3.0
export RANDRPROTO_VERSION=randrproto-1.5.0
export LIBXRANDR_VERSION=libXrandr-1.5.0
export LIBXCB_VERSION=libxcb-1.13
export LIBXSHMFENCE_VERSION=libxshmfence-1.3
export LIBVDPAU_VERSION=libvdpau-1.1
export LIBVA_VERSION=libva-1.7.0
export LIBWAYLAND_VERSION=wayland-1.15.0
export WAYLAND_PROTOCOLS_VERSION=wayland-protocols-1.8
export WAYLAND_PROTOCOLS_VERSION=wayland-protocols-1.12
wget $XORG_RELEASES/util/$XORGMACROS_VERSION.tar.bz2
tar -xvf $XORGMACROS_VERSION.tar.bz2 && rm $XORGMACROS_VERSION.tar.bz2
@@ -163,19 +199,87 @@ tar -xvf $WAYLAND_PROTOCOLS_VERSION.tar.xz && rm $WAYLAND_PROTOCOLS_VERSION.tar.
cd $WAYLAND_PROTOCOLS_VERSION; ./configure; make install; cd ..
rm -rf $WAYLAND_PROTOCOLS_VERSION
pushd /usr/local
git clone https://gitlab.freedesktop.org/mesa/shader-db.git --depth 1
rm -rf shader-db/.git
cd shader-db
make
popd
# Use ccache to speed up builds
apt-get install -y ccache
# We need xmllint to validate the XML files in Mesa
apt-get install -y libxml2-utils
# Remove unused packages
# Generate cross build files for Meson
for arch in $CROSS_ARCHITECTURES; do
cross_file="/cross_file-$arch.txt"
/usr/share/meson/debcrossgen --arch "$arch" -o "$cross_file"
# Work around a bug in debcrossgen that should be fixed in the next release
if [ "$arch" = "i386" ]; then
sed -i "s|cpu_family = 'i686'|cpu_family = 'x86'|g" "$cross_file"
fi
done
############### Build dEQP
git config --global user.email "mesa@example.com"
git config --global user.name "Mesa CI"
# XXX: Use --depth 1 once we can drop the cherry-picks.
git clone \
https://github.com/KhronosGroup/VK-GL-CTS.git \
-b opengl-es-cts-3.2.5.1 \
/VK-GL-CTS
cd /VK-GL-CTS
# Fix surfaceless build
git cherry-pick -x 22f41e5e321c6dcd8569c4dad91bce89f06b3670
git cherry-pick -x 1daa8dff73161ea60ead965bd6c9f2a0a2165648
# surfaceless links against libkms and such despite not using it.
sed -i '/gbm/d' targets/surfaceless/surfaceless.cmake
sed -i '/libkms/d' targets/surfaceless/surfaceless.cmake
sed -i '/libgbm/d' targets/surfaceless/surfaceless.cmake
python3 external/fetch_sources.py
mkdir -p /deqp
cd /deqp
cmake -G Ninja \
-DDEQP_TARGET=surfaceless \
-DCMAKE_BUILD_TYPE=Release \
/VK-GL-CTS
ninja
# Copy out the mustpass lists we want from a bunch of other junk.
mkdir /deqp/mustpass
for gles in gles2 gles3 gles31; do
cp \
/deqp/external/openglcts/modules/gl_cts/data/mustpass/gles/aosp_mustpass/3.2.5.x/$gles-master.txt \
/deqp/mustpass/$gles-master.txt
done
# Remove the rest of the build products that we don't need.
rm -rf /deqp/external
rm -rf /deqp/modules/internal
rm -rf /deqp/executor
rm -rf /deqp/execserver
rm -rf /deqp/modules/egl
rm -rf /deqp/framework
du -sh *
rm -rf /VK-GL-CTS
############### Uninstall the build software
apt-get purge -y \
automake \
libtool \
make \
git \
curl \
wget \
unzip \
gnupg \
software-properties-common
cmake \
git \
libgles2-mesa-dev \
libgbm-dev
apt-get autoremove -y --purge

View File

@@ -0,0 +1,10 @@
# Note: skips lists for CI are just a list of lines that, when
# non-zero-length and not starting with '#', will regex match to
# delete lines from the test list. Be careful.
# Skip the perf/stress tests to keep runtime manageable
dEQP-GLES[0-9]*.performance
dEQP-GLES[0-9]*.stress
# These are really slow on tiling architectures (including llvmpipe).
dEQP-GLES[0-9]*.functional.flush_finish

View File

@@ -0,0 +1,124 @@
dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_center
dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_corner
dEQP-GLES2.functional.clipping.point.wide_point_clip
dEQP-GLES2.functional.clipping.point.wide_point_clip_viewport_center
dEQP-GLES2.functional.clipping.point.wide_point_clip_viewport_corner
dEQP-GLES2.functional.clipping.triangle_vertex.clip_two.clip_neg_y_neg_z_and_neg_x_neg_y_pos_z
dEQP-GLES2.functional.clipping.triangle_vertex.clip_two.clip_pos_y_pos_z_and_neg_x_neg_y_neg_z
dEQP-GLES2.functional.fbo.render.color_clear.rbo_rgba4
dEQP-GLES2.functional.fbo.render.color_clear.rbo_rgba4_depth_component16
dEQP-GLES2.functional.fbo.render.color_clear.rbo_rgba4_stencil_index8
dEQP-GLES2.functional.fbo.render.depth.rbo_rgba4_depth_component16
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4_stencil_index8
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.rebind_rbo_rgba4
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.rebind_rbo_rgba4_stencil_index8
dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.no_rebind_rbo_rgba4_depth_component16
dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.rebind_rbo_rgba4_depth_component16
dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.no_rebind_rbo_rgba4_stencil_index8
dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.rebind_rbo_rgba4_stencil_index8
dEQP-GLES2.functional.fbo.render.shared_colorbuffer.rbo_rgba4
dEQP-GLES2.functional.fbo.render.shared_colorbuffer.rbo_rgba4_depth_component16
dEQP-GLES2.functional.fbo.render.shared_depthbuffer.rbo_rgba4_depth_component16
dEQP-GLES2.functional.polygon_offset.default_displacement_with_units
dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units
dEQP-GLES2.functional.rasterization.interpolation.basic.line_loop_wide
dEQP-GLES2.functional.rasterization.interpolation.basic.line_strip_wide
dEQP-GLES2.functional.rasterization.interpolation.basic.lines_wide
dEQP-GLES2.functional.rasterization.interpolation.projected.line_loop_wide
dEQP-GLES2.functional.rasterization.interpolation.projected.line_strip_wide
dEQP-GLES2.functional.rasterization.interpolation.projected.lines_wide
dEQP-GLES2.functional.rasterization.limits.points
dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2d_bias
dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2dproj_vec3_bias
dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2dproj_vec4_bias
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_clamp_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_mirror_etc1
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_mirror_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_repeat_etc1
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_repeat_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_clamp_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_mirror_etc1
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_mirror_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_etc1
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_l8
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_rgb888
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_rgba4444
dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_clamp_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_mirror_etc1
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_mirror_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_repeat_etc1
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_repeat_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_clamp_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_mirror_etc1
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_mirror_rgba8888
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_etc1
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_l8
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_rgb888
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_rgba4444
dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_rgba8888
dEQP-GLES2.functional.texture.mipmap.2d.affine.linear_linear_repeat
dEQP-GLES2.functional.texture.mipmap.2d.affine.nearest_linear_clamp
dEQP-GLES2.functional.texture.mipmap.2d.affine.nearest_linear_mirror
dEQP-GLES2.functional.texture.mipmap.2d.affine.nearest_linear_repeat
dEQP-GLES2.functional.texture.mipmap.2d.basic.linear_linear_repeat
dEQP-GLES2.functional.texture.mipmap.2d.basic.linear_linear_repeat_non_square
dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_clamp
dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_clamp_non_square
dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_mirror
dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_mirror_non_square
dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_repeat
dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_repeat_non_square
dEQP-GLES2.functional.texture.mipmap.2d.projected.linear_linear_repeat
dEQP-GLES2.functional.texture.mipmap.2d.projected.nearest_linear_clamp
dEQP-GLES2.functional.texture.mipmap.2d.projected.nearest_linear_mirror
dEQP-GLES2.functional.texture.mipmap.2d.projected.nearest_linear_repeat
dEQP-GLES2.functional.texture.mipmap.cube.basic.linear_linear
dEQP-GLES2.functional.texture.mipmap.cube.basic.linear_nearest
dEQP-GLES2.functional.texture.mipmap.cube.bias.linear_linear
dEQP-GLES2.functional.texture.mipmap.cube.bias.linear_nearest
dEQP-GLES2.functional.texture.mipmap.cube.projected.linear_linear
dEQP-GLES2.functional.texture.mipmap.cube.projected.linear_nearest
dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_linear_clamp
dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_linear_mirror
dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_linear_repeat
dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_nearest_clamp
dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_nearest_mirror
dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_nearest_repeat
dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_linear_clamp
dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_linear_mirror
dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_linear_repeat
dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_nearest_clamp
dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_nearest_mirror
dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_nearest_repeat
dEQP-GLES2.functional.texture.vertex.2d.wrap.clamp_clamp
dEQP-GLES2.functional.texture.vertex.2d.wrap.clamp_mirror
dEQP-GLES2.functional.texture.vertex.2d.wrap.clamp_repeat
dEQP-GLES2.functional.texture.vertex.2d.wrap.mirror_clamp
dEQP-GLES2.functional.texture.vertex.2d.wrap.mirror_mirror
dEQP-GLES2.functional.texture.vertex.2d.wrap.mirror_repeat
dEQP-GLES2.functional.texture.vertex.2d.wrap.repeat_clamp
dEQP-GLES2.functional.texture.vertex.2d.wrap.repeat_mirror
dEQP-GLES2.functional.texture.vertex.2d.wrap.repeat_repeat
dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_linear_clamp
dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_linear_mirror
dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_linear_repeat
dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_nearest_clamp
dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_nearest_mirror
dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_nearest_repeat
dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_linear_clamp
dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_linear_mirror
dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_linear_repeat
dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_nearest_clamp
dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_nearest_mirror
dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_nearest_repeat
dEQP-GLES2.functional.texture.vertex.cube.wrap.clamp_clamp
dEQP-GLES2.functional.texture.vertex.cube.wrap.clamp_mirror
dEQP-GLES2.functional.texture.vertex.cube.wrap.clamp_repeat
dEQP-GLES2.functional.texture.vertex.cube.wrap.mirror_clamp
dEQP-GLES2.functional.texture.vertex.cube.wrap.mirror_mirror
dEQP-GLES2.functional.texture.vertex.cube.wrap.mirror_repeat
dEQP-GLES2.functional.texture.vertex.cube.wrap.repeat_clamp
dEQP-GLES2.functional.texture.vertex.cube.wrap.repeat_mirror
dEQP-GLES2.functional.texture.vertex.cube.wrap.repeat_repeat

112
.gitlab-ci/deqp-runner.sh Executable file
View File

@@ -0,0 +1,112 @@
#!/bin/bash
set -ex
DEQP_OPTIONS=(--deqp-surface-width=256 --deqp-surface-height=256)
DEQP_OPTIONS+=(--deqp-surface-type=pbuffer)
DEQP_OPTIONS+=(--deqp-gl-config-name=rgba8888d24s8ms0)
DEQP_OPTIONS+=(--deqp-visibility=hidden)
DEQP_OPTIONS+=(--deqp-log-images=disable)
DEQP_OPTIONS+=(--deqp-watchdog=enable)
DEQP_OPTIONS+=(--deqp-crashhandler=enable)
if [ -z "$DEQP_VER" ]; then
echo 'DEQP_VER must be set to something like "gles2" or "gles31" for the test run'
exit 1
fi
if [ -z "$DEQP_SKIPS" ]; then
echo 'DEQP_SKIPS must be set to something like "deqp-default-skips.txt"'
exit 1
fi
# Prep the expected failure list
if [ -n "$DEQP_EXPECTED_FAILS" ]; then
export DEQP_EXPECTED_FAILS=`pwd`/artifacts/$DEQP_EXPECTED_FAILS
else
export DEQP_EXPECTED_FAILS=/tmp/expect-no-failures.txt
touch $DEQP_EXPECTED_FAILS
fi
sort < $DEQP_EXPECTED_FAILS > /tmp/expected-fails.txt
# Fix relative paths on inputs.
export DEQP_SKIPS=`pwd`/artifacts/$DEQP_SKIPS
# Be a good citizen on the shared runners.
export LP_NUM_THREADS=4
# Set up the driver environment.
export LD_LIBRARY_PATH=`pwd`/install/lib/
export EGL_PLATFORM=surfaceless
# the runner was failing to look for libkms in /usr/local/lib for some reason
# I never figured out.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
RESULTS=`pwd`/results
mkdir -p $RESULTS
cd /deqp/modules/$DEQP_VER
# Generate test case list file
cp /deqp/mustpass/$DEQP_VER-master.txt /tmp/case-list.txt
# Note: not using sorted input and comm, becuase I want to run the tests in
# the same order that dEQP would.
while read -r line; do
if echo "$line" | grep -q '^[^#]'; then
sed -i "/$line/d" /tmp/case-list.txt
fi
done < $DEQP_SKIPS
# If the job is parallel, take the corresponding fraction of the caselist.
# Note: N~M is a gnu sed extension to match every nth line (first line is #1).
if [ -n "$CI_NODE_INDEX" ]; then
sed -ni $CI_NODE_INDEX~$CI_NODE_TOTAL"p" /tmp/case-list.txt
fi
if [ ! -s /tmp/case-list.txt ]; then
echo "Caselist generation failed"
exit 1
fi
# Cannot use tee because dash doesn't have pipefail
touch /tmp/result.txt
tail -f /tmp/result.txt &
./deqp-$DEQP_VER "${DEQP_OPTIONS[@]}" --deqp-log-filename=$RESULTS/results.qpa --deqp-caselist-file=/tmp/case-list.txt >> /tmp/result.txt
DEQP_EXITCODE=$?
sed -ne \
'/StatusCode="Fail"/{x;p}; s/#beginTestCaseResult //; T; h' \
$RESULTS/results.qpa \
> /tmp/unsorted-fails.txt
# Scrape out the renderer that the test run used, so we can validate that the
# right driver was used.
if grep -q "dEQP-.*.info.renderer" /tmp/case-list.txt; then
# This is an ugly dependency on the .qpa format: Print 3 lines after the
# match, which happens to contain the result.
RENDERER=`sed -n '/#beginTestCaseResult dEQP-.*.info.renderer/{n;n;n;p}' $RESULTS/results.qpa | sed -n -E "s|<Text>(.*)</Text>|\1|p"`
echo "GL_RENDERER for this test run: $RENDERER"
if [ -n "$DEQP_RENDERER_MATCH" ]; then
echo $RENDERER | grep -q $DEQP_RENDERER_MATCH > /dev/null
fi
fi
if [ $DEQP_EXITCODE -ne 0 ]; then
exit $DEQP_EXITCODE
fi
sort < /tmp/unsorted-fails.txt > $RESULTS/fails.txt
comm -23 $RESULTS/fails.txt /tmp/expected-fails.txt > /tmp/new-fails.txt
if [ -s /tmp/new-fails.txt ]; then
echo "Unexpected failures:"
cat /tmp/new-fails.txt
exit 1
else
echo "No new failures"
fi

View File

@@ -0,0 +1,445 @@
dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_center
dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_corner
dEQP-GLES2.functional.clipping.point.wide_point_clip
dEQP-GLES2.functional.clipping.point.wide_point_clip_viewport_center
dEQP-GLES2.functional.clipping.point.wide_point_clip_viewport_corner
dEQP-GLES2.functional.clipping.triangle_vertex.clip_two.clip_neg_y_neg_z_and_neg_x_neg_y_pos_z
dEQP-GLES2.functional.clipping.triangle_vertex.clip_two.clip_pos_y_pos_z_and_neg_x_neg_y_neg_z
dEQP-GLES2.functional.polygon_offset.default_displacement_with_units
dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units
dEQP-GLES2.functional.rasterization.interpolation.basic.line_loop_wide
dEQP-GLES2.functional.rasterization.interpolation.basic.line_strip_wide
dEQP-GLES2.functional.rasterization.interpolation.basic.lines_wide
dEQP-GLES2.functional.rasterization.interpolation.projected.line_loop_wide
dEQP-GLES2.functional.rasterization.interpolation.projected.line_strip_wide
dEQP-GLES2.functional.rasterization.interpolation.projected.lines_wide
dEQP-GLES2.functional.rasterization.limits.points
dEQP-GLES2.functional.rasterization.primitives.points
dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_center
dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_corner
dEQP-GLES3.functional.clipping.point.wide_point_clip
dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center
dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner
dEQP-GLES3.functional.clipping.triangle_vertex.clip_two.clip_neg_y_neg_z_and_neg_x_neg_y_pos_z
dEQP-GLES3.functional.clipping.triangle_vertex.clip_two.clip_pos_y_pos_z_and_neg_x_neg_y_neg_z
dEQP-GLES3.functional.draw.random.124
dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth24_stencil8
dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth32f_stencil8
dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth_component16
dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth_component24
dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth_component32f
dEQP-GLES3.functional.fbo.depth.depth_write_clamp.depth32f_stencil8
dEQP-GLES3.functional.fbo.depth.depth_write_clamp.depth_component32f
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_color
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_depth
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_depth_stencil
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_stencil
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_color
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_depth
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_depth_stencil
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_stencil
dEQP-GLES3.functional.fbo.msaa.2_samples.depth24_stencil8
dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.2_samples.depth_component16
dEQP-GLES3.functional.fbo.msaa.2_samples.depth_component24
dEQP-GLES3.functional.fbo.msaa.2_samples.depth_component32f
dEQP-GLES3.functional.fbo.msaa.2_samples.r11f_g11f_b10f
dEQP-GLES3.functional.fbo.msaa.2_samples.r16f
dEQP-GLES3.functional.fbo.msaa.2_samples.r8
dEQP-GLES3.functional.fbo.msaa.2_samples.rg16f
dEQP-GLES3.functional.fbo.msaa.2_samples.rg8
dEQP-GLES3.functional.fbo.msaa.2_samples.rgb10_a2
dEQP-GLES3.functional.fbo.msaa.2_samples.rgb565
dEQP-GLES3.functional.fbo.msaa.2_samples.rgb5_a1
dEQP-GLES3.functional.fbo.msaa.2_samples.rgb8
dEQP-GLES3.functional.fbo.msaa.2_samples.rgba4
dEQP-GLES3.functional.fbo.msaa.2_samples.rgba8
dEQP-GLES3.functional.fbo.msaa.2_samples.srgb8_alpha8
dEQP-GLES3.functional.fbo.msaa.2_samples.stencil_index8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth24_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth_component16
dEQP-GLES3.functional.fbo.msaa.4_samples.depth_component24
dEQP-GLES3.functional.fbo.msaa.4_samples.depth_component32f
dEQP-GLES3.functional.fbo.msaa.4_samples.r11f_g11f_b10f
dEQP-GLES3.functional.fbo.msaa.4_samples.r16f
dEQP-GLES3.functional.fbo.msaa.4_samples.r8
dEQP-GLES3.functional.fbo.msaa.4_samples.rg16f
dEQP-GLES3.functional.fbo.msaa.4_samples.rg8
dEQP-GLES3.functional.fbo.msaa.4_samples.rgb10_a2
dEQP-GLES3.functional.fbo.msaa.4_samples.rgb565
dEQP-GLES3.functional.fbo.msaa.4_samples.rgb5_a1
dEQP-GLES3.functional.fbo.msaa.4_samples.rgb8
dEQP-GLES3.functional.fbo.msaa.4_samples.rgba4
dEQP-GLES3.functional.fbo.msaa.4_samples.rgba8
dEQP-GLES3.functional.fbo.msaa.4_samples.srgb8_alpha8
dEQP-GLES3.functional.fbo.msaa.4_samples.stencil_index8
dEQP-GLES3.functional.multisample.fbo_max_samples.proportionality_alpha_to_coverage
dEQP-GLES3.functional.multisample.fbo_max_samples.proportionality_sample_coverage
dEQP-GLES3.functional.multisample.fbo_max_samples.proportionality_sample_coverage_inverted
dEQP-GLES3.functional.multisample.fbo_max_samples.sample_coverage_invert
dEQP-GLES3.functional.negative_api.buffer.blit_framebuffer_multisample
dEQP-GLES3.functional.negative_api.buffer.read_pixels_fbo_format_mismatch
dEQP-GLES3.functional.polygon_offset.default_displacement_with_units
dEQP-GLES3.functional.polygon_offset.fixed16_displacement_with_units
dEQP-GLES3.functional.polygon_offset.fixed24_displacement_with_units
dEQP-GLES3.functional.polygon_offset.float32_displacement_with_units
dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.interpolation.lines_wide
dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide
dEQP-GLES3.functional.rasterization.fbo.rbo_singlesample.interpolation.lines_wide
dEQP-GLES3.functional.rasterization.fbo.rbo_singlesample.primitives.points
dEQP-GLES3.functional.rasterization.fbo.texture_2d.interpolation.lines_wide
dEQP-GLES3.functional.rasterization.fbo.texture_2d.primitives.points
dEQP-GLES3.functional.rasterization.interpolation.basic.line_loop_wide
dEQP-GLES3.functional.rasterization.interpolation.basic.line_strip_wide
dEQP-GLES3.functional.rasterization.interpolation.basic.lines_wide
dEQP-GLES3.functional.rasterization.interpolation.projected.line_loop_wide
dEQP-GLES3.functional.rasterization.interpolation.projected.line_strip_wide
dEQP-GLES3.functional.rasterization.interpolation.projected.lines_wide
dEQP-GLES3.functional.rasterization.primitives.points
dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_points
dEQP-GLES3.functional.rasterizer_discard.basic.write_stencil_points
dEQP-GLES3.functional.rasterizer_discard.fbo.write_depth_points
dEQP-GLES3.functional.rasterizer_discard.fbo.write_stencil_points
dEQP-GLES3.functional.rasterizer_discard.scissor.write_depth_points
dEQP-GLES3.functional.rasterizer_discard.scissor.write_stencil_points
dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.float_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.float_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec2_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec3_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec4_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec4_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.float_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.float_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec2_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec2_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec3_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec3_mediump
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec4_highp
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec4_mediump
dEQP-GLES3.functional.state_query.integers.max_samples_getfloat
dEQP-GLES3.functional.state_query.integers.max_samples_getinteger64
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_clamp_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_clamp_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_clamp_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_mirror_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_mirror_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_mirror_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_clamp_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_clamp_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_clamp_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_mirror_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_mirror_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_mirror_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_clamp_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_clamp_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_clamp_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_mirror_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_mirror_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_mirror_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_clamp_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_clamp_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_clamp_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_mirror_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_mirror_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_mirror_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_clamp_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_clamp_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_clamp_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_mirror_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_mirror_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_mirror_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_clamp_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_clamp_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_mirror_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_mirror_repeat
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_repeat_mirror
dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_repeat_repeat
dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_linear
dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_linear
dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_linear_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_linear_mipmap_nearest
dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_nearest_mipmap_linear
dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_nearest_mipmap_nearest
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_linear_clamp
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_linear_mirror
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_linear_repeat
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_linear_clamp
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_linear_mirror
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_linear_repeat
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_nearest_clamp
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_nearest_mirror
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_nearest_repeat
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_nearest_linear_repeat
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_nearest_clamp
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_nearest_mirror
dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_nearest_repeat
dEQP-GLES3.functional.texture.vertex.3d.filtering.nearest_linear_repeat
dEQP-GLES3.functional.texture.vertex.3d.filtering.nearest_mipmap_linear_linear_repeat
dEQP-GLES3.functional.texture.vertex.3d.filtering.nearest_mipmap_nearest_linear_repeat
dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_clamp_clamp
dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_clamp_mirror
dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_clamp_repeat
dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_mirror_mirror
dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_mirror_repeat
dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_repeat_mirror
dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_repeat_repeat
dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_clamp_clamp
dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_clamp_mirror
dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_clamp_repeat
dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_mirror_mirror
dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_mirror_repeat
dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_repeat_mirror
dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_repeat_repeat
dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_clamp_clamp
dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_clamp_mirror
dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_clamp_repeat
dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_mirror_clamp
dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_mirror_mirror
dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_mirror_repeat
dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_repeat_clamp
dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_repeat_mirror
dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_repeat_repeat
dEQP-GLES3.functional.texture.wrap.astc_8x8.repeat_repeat_linear_divisible
dEQP-GLES3.functional.texture.wrap.astc_8x8.repeat_repeat_linear_not_divisible
dEQP-GLES3.functional.texture.wrap.astc_8x8_srgb.repeat_repeat_linear_divisible
dEQP-GLES3.functional.texture.wrap.astc_8x8_srgb.repeat_repeat_linear_not_divisible
dEQP-GLES3.functional.vertex_arrays.single_attribute.normalize.int2_10_10_10.components4_quads1
dEQP-GLES3.functional.vertex_arrays.single_attribute.normalize.int2_10_10_10.components4_quads256

62
.gitlab-ci/meson-build.sh Executable file
View File

@@ -0,0 +1,62 @@
#!/bin/bash
set -e
set -o xtrace
# We need to control the version of llvm-config we're using, so we'll
# generate a native file to do so. This requires meson >=0.49
if test -n "$LLVM_VERSION"; then
LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
echo -e "[binaries]\nllvm-config = '`which $LLVM_CONFIG`'" > native.file
$LLVM_CONFIG --version
else
rm -f native.file
touch native.file
fi
rm -rf _build
meson _build --native-file=native.file \
${CROSS+--cross /cross_file-$CROSS.txt} \
-D prefix=`pwd`/install \
-D libdir=lib \
-D buildtype=${BUILDTYPE:-debug} \
-D build-tests=true \
-D libunwind=${UNWIND} \
${DRI_LOADERS} \
-D dri-drivers=${DRI_DRIVERS:-[]} \
${GALLIUM_ST} \
-D gallium-drivers=${GALLIUM_DRIVERS:-[]} \
-D vulkan-drivers=${VULKAN_DRIVERS:-[]} \
-D I-love-half-baked-turnips=true \
${EXTRA_OPTION}
cd _build
meson configure
ninja -j4
LC_ALL=C.UTF-8 ninja test
ninja install
cd ..
if test -n "$MESON_SHADERDB"; then
./.gitlab-ci/run-shader-db.sh;
fi
# Delete 2MB of includes from artifacts.
rm -rf install/include
# Strip the drivers in the artifacts to cut 80% of the artifacts size.
if [ -n "$CROSS" ]; then
STRIP=`sed -n -E "s/strip\s*=\s*'(.*)'/\1/p" /cross_file-$CROSS.txt`
if [ -z "$STRIP" ]; then
echo "Failed to find strip command in cross file"
exit 1
fi
else
STRIP="strip"
fi
find install -name \*.so -exec $STRIP {} \;
# Test runs don't pull down the git tree, so put the dEQP helper
# script and associated bits there.
mkdir -p artifacts/
cp -Rp .gitlab-ci/deqp* artifacts/
# cp -Rp src/freedreno/ci/expected* artifacts/

17
.gitlab-ci/run-shader-db.sh Executable file
View File

@@ -0,0 +1,17 @@
set -e
set -v
ARTIFACTSDIR=`pwd`/shader-db
mkdir -p $ARTIFACTSDIR
export DRM_SHIM_DEBUG=true
LIBDIR=`pwd`/install/lib
export LD_LIBRARY_PATH=$LIBDIR
cd /usr/local/shader-db
for driver in freedreno v3d; do
env LD_PRELOAD=$LIBDIR/lib${driver}_noop_drm_shim.so \
./run -j 4 ./shaders \
> $ARTIFACTSDIR/${driver}-shader-db.txt
done

View File

@@ -1,198 +1,62 @@
language: c
dist: xenial
os: osx
cache:
apt: true
ccache: true
env:
global:
- XORG_RELEASES=https://xorg.freedesktop.org/releases/individual
- XCB_RELEASES=https://xcb.freedesktop.org/dist
- WAYLAND_RELEASES=https://wayland.freedesktop.org/releases
- XORGMACROS_VERSION=util-macros-1.19.0
- GLPROTO_VERSION=glproto-1.4.17
- DRI2PROTO_VERSION=dri2proto-2.8
- LIBPCIACCESS_VERSION=libpciaccess-0.13.4
- LIBDRM_VERSION=libdrm-2.4.97
- XCBPROTO_VERSION=xcb-proto-1.13
- RANDRPROTO_VERSION=randrproto-1.3.0
- LIBXRANDR_VERSION=libXrandr-1.3.0
- LIBXCB_VERSION=libxcb-1.13
- LIBXSHMFENCE_VERSION=libxshmfence-1.2
- LIBVDPAU_VERSION=libvdpau-1.1
- LIBVA_VERSION=libva-1.7.0
- LIBWAYLAND_VERSION=wayland-1.15.0
- WAYLAND_PROTOCOLS_VERSION=wayland-protocols-1.8
- PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig:$HOME/prefix/share/pkgconfig
- LD_LIBRARY_PATH="$HOME/prefix/lib:$LD_LIBRARY_PATH"
- PATH="$HOME/prefix/bin:$PATH"
- PKG_CONFIG_PATH=""
matrix:
include:
- env:
- LABEL="macOS meson"
- BUILD=meson
- DRI_LOADERS="-Dplatforms=x11"
- GALLIUM_DRIVERS=swrast
os: osx
- BUILD=meson
- env:
- BUILD=scons
before_install:
- |
if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then
HOMEBREW_NO_AUTO_UPDATE=1 brew install python3 ninja expat gettext
# Set PATH for homebrew pip3 installs
PATH="$HOME/Library/Python/3.6/bin:${PATH}"
# Set PKG_CONFIG_PATH for keg-only expat
PKG_CONFIG_PATH="/usr/local/opt/expat/lib/pkgconfig:${PKG_CONFIG_PATH}"
# Set PATH for keg-only gettext
PATH="/usr/local/opt/gettext/bin:${PATH}"
# Install xquartz for prereqs ...
XQUARTZ_VERSION="2.7.11"
wget -nv https://dl.bintray.com/xquartz/downloads/XQuartz-${XQUARTZ_VERSION}.dmg
hdiutil attach XQuartz-${XQUARTZ_VERSION}.dmg
sudo installer -pkg /Volumes/XQuartz-${XQUARTZ_VERSION}/XQuartz.pkg -target /
hdiutil detach /Volumes/XQuartz-${XQUARTZ_VERSION}
# ... and set paths
PATH="/opt/X11/bin:${PATH}"
PKG_CONFIG_PATH="/opt/X11/share/pkgconfig:/opt/X11/lib/pkgconfig:${PKG_CONFIG_PATH}"
ACLOCAL="aclocal -I /opt/X11/share/aclocal -I /usr/local/share/aclocal"
- HOMEBREW_NO_AUTO_UPDATE=1 brew install expat gettext
- if test "x$BUILD" = xmeson; then
HOMEBREW_NO_AUTO_UPDATE=1 brew install python3 ninja;
fi
- if test "x$BUILD" = xscons; then
HOMEBREW_NO_AUTO_UPDATE=1 brew install python2 scons;
fi
# Set PATH for homebrew pip3 installs
- PATH="$HOME/Library/Python/3.6/bin:${PATH}"
# Set PKG_CONFIG_PATH for keg-only expat
- PKG_CONFIG_PATH="/usr/local/opt/expat/lib/pkgconfig:${PKG_CONFIG_PATH}"
# Set PATH for keg-only gettext
- PATH="/usr/local/opt/gettext/bin:${PATH}"
# Install xquartz for prereqs ...
- XQUARTZ_VERSION="2.7.11"
- wget -nv https://dl.bintray.com/xquartz/downloads/XQuartz-${XQUARTZ_VERSION}.dmg
- hdiutil attach XQuartz-${XQUARTZ_VERSION}.dmg
- sudo installer -pkg /Volumes/XQuartz-${XQUARTZ_VERSION}/XQuartz.pkg -target /
- hdiutil detach /Volumes/XQuartz-${XQUARTZ_VERSION}
# ... and set paths
- PKG_CONFIG_PATH="/opt/X11/share/pkgconfig:/opt/X11/lib/pkgconfig:${PKG_CONFIG_PATH}"
install:
# Install a more modern meson from pip, since the version in the
# ubuntu repos is often quite old.
- if test "x$BUILD" = xmeson; then
pip3 install --user meson;
pip3 install --user mako;
fi
# Install dependencies where we require specific versions (or where
# disallowed by Travis CI's package whitelisting).
- |
if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
wget $XORG_RELEASES/util/$XORGMACROS_VERSION.tar.bz2
tar -jxvf $XORGMACROS_VERSION.tar.bz2
(cd $XORGMACROS_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/proto/$GLPROTO_VERSION.tar.bz2
tar -jxvf $GLPROTO_VERSION.tar.bz2
(cd $GLPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/proto/$DRI2PROTO_VERSION.tar.bz2
tar -jxvf $DRI2PROTO_VERSION.tar.bz2
(cd $DRI2PROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XCB_RELEASES/$XCBPROTO_VERSION.tar.bz2
tar -jxvf $XCBPROTO_VERSION.tar.bz2
(cd $XCBPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XCB_RELEASES/$LIBXCB_VERSION.tar.bz2
tar -jxvf $LIBXCB_VERSION.tar.bz2
(cd $LIBXCB_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/lib/$LIBPCIACCESS_VERSION.tar.bz2
tar -jxvf $LIBPCIACCESS_VERSION.tar.bz2
(cd $LIBPCIACCESS_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget https://dri.freedesktop.org/libdrm/$LIBDRM_VERSION.tar.bz2
tar -jxvf $LIBDRM_VERSION.tar.bz2
(cd $LIBDRM_VERSION && ./configure --prefix=$HOME/prefix --enable-vc4 --enable-freedreno --enable-etnaviv-experimental-api && make install)
wget $XORG_RELEASES/proto/$RANDRPROTO_VERSION.tar.bz2
tar -jxvf $RANDRPROTO_VERSION.tar.bz2
(cd $RANDRPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/lib/$LIBXRANDR_VERSION.tar.bz2
tar -jxvf $LIBXRANDR_VERSION.tar.bz2
(cd $LIBXRANDR_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/lib/$LIBXSHMFENCE_VERSION.tar.bz2
tar -jxvf $LIBXSHMFENCE_VERSION.tar.bz2
(cd $LIBXSHMFENCE_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget https://people.freedesktop.org/~aplattner/vdpau/$LIBVDPAU_VERSION.tar.bz2
tar -jxvf $LIBVDPAU_VERSION.tar.bz2
(cd $LIBVDPAU_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget https://www.freedesktop.org/software/vaapi/releases/libva/$LIBVA_VERSION.tar.bz2
tar -jxvf $LIBVA_VERSION.tar.bz2
(cd $LIBVA_VERSION && ./configure --prefix=$HOME/prefix --disable-wayland --disable-dummy-driver && make install)
wget $WAYLAND_RELEASES/$LIBWAYLAND_VERSION.tar.xz
tar -axvf $LIBWAYLAND_VERSION.tar.xz
(cd $LIBWAYLAND_VERSION && ./configure --prefix=$HOME/prefix --enable-libraries --without-host-scanner --disable-documentation --disable-dtd-validation && make install)
wget $WAYLAND_RELEASES/$WAYLAND_PROTOCOLS_VERSION.tar.xz
tar -axvf $WAYLAND_PROTOCOLS_VERSION.tar.xz
(cd $WAYLAND_PROTOCOLS_VERSION && ./configure --prefix=$HOME/prefix && make install)
# Meson requires ninja >= 1.6, but xenial has 1.3.x
wget https://github.com/ninja-build/ninja/releases/download/v1.6.0/ninja-linux.zip
unzip ninja-linux.zip
mv ninja $HOME/prefix/bin/
# Generate this header since one is missing on the Travis instance
mkdir -p linux
printf "%s\n" \
"#ifndef _LINUX_MEMFD_H" \
"#define _LINUX_MEMFD_H" \
"" \
"#define MFD_CLOEXEC 0x0001U" \
"#define MFD_ALLOW_SEALING 0x0002U" \
"" \
"#endif /* _LINUX_MEMFD_H */" > linux/memfd.h
# Generate this header, including the missing SYS_memfd_create
# macro, which is not provided by the header in the Travis
# instance
mkdir -p sys
printf "%s\n" \
"#ifndef _SYSCALL_H" \
"#define _SYSCALL_H 1" \
"" \
"#include <asm/unistd.h>" \
"" \
"#ifndef _LIBC" \
"# include <bits/syscall.h>" \
"#endif" \
"" \
"#ifndef __NR_memfd_create" \
"# define __NR_memfd_create 319 /* Taken from <asm/unistd_64.h> */" \
"#endif" \
"" \
"#ifndef SYS_memfd_create" \
"# define SYS_memfd_create __NR_memfd_create" \
"#endif" \
"" \
"#endif" > sys/syscall.h
- if test "x$BUILD" = xscons; then
pip2 install --user mako;
fi
script:
if test "x$BUILD" = xmeson; then
if test -n "$LLVM_CONFIG"; then
# We need to control the version of llvm-config we're using, so we'll
# generate a native file to do so. This requires meson >=0.49
#
echo -e "[binaries]\nllvm-config = '`which $LLVM_CONFIG`'" > native.file
$LLVM_CONFIG --version
else
: > native.file
- if test "x$BUILD" = xmeson; then
meson _build -Dbuild-tests=true;
ninja -C _build || travis_terminate 1;
ninja -C _build test || travis_terminate 1;
fi
- if test "x$BUILD" = xscons; then
scons || travis_terminate 1;
scons check || travis_terminate 1;
fi
export CFLAGS="$CFLAGS -isystem`pwd`"
meson _build \
--native-file=native.file \
-Dbuild-tests=true \
${DRI_LOADERS} \
-Ddri-drivers=${DRI_DRIVERS:-[]} \
-Dgallium-drivers=${GALLIUM_DRIVERS:-[]} \
-Dvulkan-drivers=${VULKAN_DRIVERS:-[]}
meson configure _build
ninja -C _build
ninja -C _build test
fi

View File

@@ -39,7 +39,7 @@ LOCAL_CFLAGS += \
-Wno-initializer-overrides \
-Wno-mismatched-tags \
-DPACKAGE_VERSION=\"$(MESA_VERSION)\" \
-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\"
-DPACKAGE_BUGREPORT=\"https://gitlab.freedesktop.org/mesa/mesa/issues\"
# XXX: The following __STDC_*_MACROS defines should not be needed.
# It's likely due to a bug elsewhere, but let's temporarily add them
@@ -98,6 +98,11 @@ ifeq ($(filter 5 6 7 8 9, $(MESA_ANDROID_MAJOR_VERSION)),)
LOCAL_CFLAGS += -DHAVE_TIMESPEC_GET
endif
# Android's libc began supporting shm in Oreo
ifeq ($(shell test $(PLATFORM_SDK_VERSION) -ge 26 && echo true),true)
LOCAL_CFLAGS += -DHAVE_SYS_SHM_H
endif
ifeq ($(strip $(MESA_ENABLE_ASM)),true)
ifeq ($(TARGET_ARCH),x86)
LOCAL_CFLAGS += \

View File

@@ -110,6 +110,7 @@ endef
# add subdirectories
SUBDIRS := \
src/freedreno \
src/gbm \
src/loader \
src/mapi \
@@ -121,7 +122,8 @@ SUBDIRS := \
src/broadcom \
src/intel \
src/mesa/drivers/dri \
src/vulkan
src/vulkan \
src/panfrost \
INC_DIRS := $(call all-named-subdir-makefiles,$(SUBDIRS))
INC_DIRS += $(call all-named-subdir-makefiles,src/gallium)

View File

@@ -73,7 +73,7 @@ with open("VERSION") as f:
mesa_version = f.read().strip()
env.Append(CPPDEFINES = [
('PACKAGE_VERSION', '\\"%s\\"' % mesa_version),
('PACKAGE_BUGREPORT', '\\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\\"'),
('PACKAGE_BUGREPORT', '\\"https://gitlab.freedesktop.org/mesa/mesa/issues\\"'),
])
# Includes

View File

@@ -1 +1 @@
19.1.0-devel
19.2.8

51
bin/.cherry-ignore Normal file
View File

@@ -0,0 +1,51 @@
# warnings that are not useful
da5ebe30105f70e3520ce3ae145793b755552569
6b8cb087568699ca9a6e9e8b7bf49179e622b59f
# Jason doesn't want this applied to 19.2 (it's a revert)
d15fe8ca8262d502435c4f83985ac414f950bc5f
# This doesn't apply to 19.2
f833b4cada07b746a10ffa4d93fcd821920c3cb1
d2db43fcad6a2ea2070ff5f7884411f4b7d3925c
66f2aa6ccd0b226eebe2c1a46281160b0a54d522
# The author requested that this not be applied to 19.2
dcc0e23438f3e5929c2ef74d57e8207be25ecb41
# This doesn't apply cleanly, and no one really cares about this file on stable
# branches anyway.
bcd9224728dcb8d8fe4bcddc4bd9b2c36fcfe9dd
# De-nominated by its author due to alternate fix not being backported
43041627445540afda1a05d11861935963660344
# This is immediately reverted, so just don't apply
19546108d3dd5541a189e36df4ea83b3f519e48f
# The authors requested these not be applied to 19.2
869e32593a9096b845dd6106f8f86e1c41fac968
a2c3c65a31de90fdb55f76f2894860dfbafe2043
bb0c5c487e63e88acbb792f092dd8f392bad8540
937b9055698be0dfdb7d2e0673a989e2ecc05912
21376cffb37018160ad3eef38b5a640ba1675a4f
# This is reverted shortly after it was landed
4432a2d14d80081d062f7939a950d65ea3a16eed
# These aren't relevant for 19.2
1a05811936dd8d0c3a367c6f00629624ef39d537
911a8261419f48dcd756f78832fa5a5f4c5b8d93
# This was manually backported
2afeed301010917c4eae55dcd2544f9d329df934
4b392ced2d744fccffe95490ff57e6b41033c266
# This is not being backported to 19.2 due to causing build regressions for
# downstream projects
eaf43966027cf9654e91ca57aecc8f5a65b58f49
# Invalid sha warnings
023282a4f667695ea1dbbe9fbe1cd3a9d550a426
2fca325ea65f068043d4c18c9cd0fe7f25bde8f7
7564c5fc6d79a2ddec49a19f67183fb3be799fe5

View File

@@ -1,35 +0,0 @@
#!/bin/sh
# This script is used to generate the list of fixed bugs that
# appears in the release notes files, with HTML formatting.
#
# Note: This script could take a while until all details have
# been fetched from bugzilla.
#
# Usage examples:
#
# $ bin/bugzilla_mesa.sh mesa-9.0.2..mesa-9.0.3
# $ bin/bugzilla_mesa.sh mesa-9.0.2..mesa-9.0.3 > bugfixes
# $ bin/bugzilla_mesa.sh mesa-9.0.2..mesa-9.0.3 | tee bugfixes
# regex pattern: trim before bug number
trim_before='s/.*show_bug.cgi?id=\([0-9]*\).*/\1/'
# regex pattern: reconstruct the url
use_after='s,^,https://bugs.freedesktop.org/show_bug.cgi?id=,'
echo "<ul>"
echo ""
# extract fdo urls from commit log
git log --pretty=medium $* | grep 'bugs.freedesktop.org/show_bug' | sed -e $trim_before | sort -n -u | sed -e $use_after |\
while read url
do
id=$(echo $url | cut -d'=' -f2)
summary=$(wget --quiet -O - $url | grep -e '<title>.*</title>' | sed -e 's/ *<title>[0-9]\+ &ndash; \(.*\)<\/title>/\1/')
echo "<li><a href=\"$url\">Bug $id</a> - $summary</li>"
echo ""
done
echo "</ul>"

272
bin/gen_release_notes.py Executable file
View File

@@ -0,0 +1,272 @@
#!/usr/bin/env python3
# Copyright © 2019 Intel Corporation
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
"""Generates release notes for a given version of mesa."""
import asyncio
import datetime
import os
import pathlib
import sys
import textwrap
import typing
import urllib.parse
import aiohttp
from mako.template import Template
from mako import exceptions
CURRENT_GL_VERSION = '4.5'
CURRENT_VK_VERSION = '1.1'
TEMPLATE = Template(textwrap.dedent("""\
<%!
import html
%>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa ${next_version} Release Notes / ${today}</h1>
<p>
%if not bugfix:
Mesa ${next_version} is a new development release. People who are concerned
with stability and reliability should stick with a previous release or
wait for Mesa ${version[:-1]}1.
%else:
Mesa ${next_version} is a bug fix release which fixes bugs found since the ${version} release.
%endif
</p>
<p>
Mesa ${next_version} implements the OpenGL ${gl_version} API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL ${gl_version}. OpenGL
${gl_version} is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<p>
Mesa ${next_version} implements the Vulkan ${vk_version} API, but the version reported by
the apiVersion property of the VkPhysicalDeviceProperties struct
depends on the particular driver being used.
</p>
<h2>SHA256 checksum</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<ul>
%for f in features:
<li>${html.escape(f)}</li>
%endfor
</ul>
<h2>Bug fixes</h2>
<ul>
%for b in bugs:
<li>${html.escape(b)}</li>
%endfor
</ul>
<h2>Changes</h2>
<ul>
%for c, author in changes:
%if author:
<p>${html.escape(c)}</p>
%else:
<li>${html.escape(c)}</li>
%endif
%endfor
</ul>
</div>
</body>
</html>
"""))
async def gather_commits(version: str) -> str:
p = await asyncio.create_subprocess_exec(
'git', 'log', f'mesa-{version}..', '--grep', r'Closes: \(https\|#\).*',
stdout=asyncio.subprocess.PIPE)
out, _ = await p.communicate()
assert p.returncode == 0, f"git log didn't work: {version}"
return out.decode().strip()
async def gather_bugs(version: str) -> typing.List[str]:
commits = await gather_commits(version)
issues: typing.List[str] = []
for commit in commits.split('\n'):
sha, message = commit.split(maxsplit=1)
p = await asyncio.create_subprocess_exec(
'git', 'log', '--max-count', '1', r'--format=%b', sha,
stdout=asyncio.subprocess.PIPE)
_out, _ = await p.communicate()
out = _out.decode().split('\n')
for line in reversed(out):
if line.startswith('Closes:'):
bug = line.lstrip('Closes:').strip()
break
else:
raise Exception('No closes found?')
if bug.startswith('h'):
# This means we have a bug in the form "Closes: https://..."
issues.append(os.path.basename(urllib.parse.urlparse(bug).path))
else:
issues.append(bug.lstrip('#'))
loop = asyncio.get_event_loop()
async with aiohttp.ClientSession(loop=loop) as session:
results = await asyncio.gather(*[get_bug(session, i) for i in issues])
typing.cast(typing.Tuple[str, ...], results)
return list(results)
async def get_bug(session: aiohttp.ClientSession, bug_id: str) -> str:
"""Query gitlab to get the name of the issue that was closed."""
# Mesa's gitlab id is 176,
url = 'https://gitlab.freedesktop.org/api/v4/projects/176/issues'
params = {'iids[]': bug_id}
async with session.get(url, params=params) as response:
content = await response.json()
return content[0]['title']
async def get_shortlog(version: str) -> str:
"""Call git shortlog."""
p = await asyncio.create_subprocess_exec('git', 'shortlog', f'mesa-{version}..',
stdout=asyncio.subprocess.PIPE)
out, _ = await p.communicate()
assert p.returncode == 0, 'error getting shortlog'
assert out is not None, 'just for mypy'
return out.decode()
def walk_shortlog(log: str) -> typing.Generator[typing.Tuple[str, bool], None, None]:
for l in log.split('\n'):
if l.startswith(' '): # this means we have a patch description
yield l, False
else:
yield l, True
def calculate_next_version(version: str, is_point: bool) -> str:
"""Calculate the version about to be released."""
if '-' in version:
version = version.split('-')[0]
if is_point:
base = version.split('.')
base[2] = str(int(base[2]) + 1)
return '.'.join(base)
return version
def calculate_previous_version(version: str, is_point: bool) -> str:
"""Calculate the previous version to compare to.
In the case of -rc to final that verison is the previous .0 release,
(19.3.0 in the case of 20.0.0, for example). for point releases that is
the last point release. This value will be the same as the input value
for a point release, but different for a major release.
"""
if '-' in version:
version = version.split('-')[0]
if is_point:
return version
base = version.split('.')
if base[1] == '0':
base[0] = str(int(base[0]) - 1)
base[1] = '3'
else:
base[1] = str(int(base[1]) - 1)
return '.'.join(base)
def get_features(is_point_release: bool) -> typing.Generator[str, None, None]:
p = pathlib.Path(__file__).parent.parent / 'docs' / 'relnotes' / 'new_features.txt'
if p.exists():
if is_point_release:
print("WARNING: new features being introduced in a point release", file=sys.stderr)
with p.open('rt') as f:
for line in f:
yield line
else:
yield "None"
async def main() -> None:
v = pathlib.Path(__file__).parent.parent / 'VERSION'
with v.open('rt') as f:
raw_version = f.read().strip()
is_point_release = '-rc' not in raw_version
assert '-devel' not in raw_version, 'Do not run this script on -devel'
version = raw_version.split('-')[0]
previous_version = calculate_previous_version(version, is_point_release)
next_version = calculate_next_version(version, is_point_release)
shortlog, bugs = await asyncio.gather(
get_shortlog(previous_version),
gather_bugs(previous_version),
)
final = pathlib.Path(__file__).parent.parent / 'docs' / 'relnotes' / f'{next_version}.html'
with final.open('wt') as f:
try:
f.write(TEMPLATE.render(
bugfix=is_point_release,
bugs=bugs,
changes=walk_shortlog(shortlog),
features=get_features(is_point_release),
gl_version=CURRENT_GL_VERSION,
next_version=next_version,
today=datetime.date.today(),
version=previous_version,
vk_version=CURRENT_VK_VERSION,
))
except:
print(exceptions.text_error_template().render())
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

View File

@@ -0,0 +1,62 @@
# Copyright © 2019 Intel Corporation
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
from unittest import mock
import pytest
from .gen_release_notes import *
@pytest.mark.parametrize(
'current, is_point, expected',
[
('19.2.0', True, '19.2.1'),
('19.3.6', True, '19.3.7'),
('20.0.0-rc4', False, '20.0.0'),
])
def test_next_version(current: str, is_point: bool, expected: str) -> None:
assert calculate_next_version(current, is_point) == expected
@pytest.mark.parametrize(
'current, is_point, expected',
[
('19.3.6', True, '19.3.6'),
('20.0.0-rc4', False, '19.3.0'),
])
def test_previous_version(current: str, is_point: bool, expected: str) -> None:
assert calculate_previous_version(current, is_point) == expected
@pytest.mark.asyncio
async def test_get_shortlog():
# Certainly not perfect, but it's something
version = '19.2.0'
out = await get_shortlog(version)
assert out
@pytest.mark.asyncio
async def test_gather_commits():
# Certainly not perfect, but it's something
version = '19.2.0'
out = await gather_commits(version)
assert out

View File

@@ -32,7 +32,7 @@ is_sha_nomination()
{
fixes=`git show --pretty=medium -s $1 | tr -d "\n" | \
sed -e 's/'"$2"'/\nfixes:/Ig' | \
grep -Eo 'fixes:[a-f0-9]{8,40}'`
grep -Eo 'fixes:[a-f0-9]{4,40}'`
fixes_count=`echo "$fixes" | grep "fixes:" | wc -l`
if test $fixes_count -eq 0; then
@@ -92,7 +92,7 @@ is_revert_nomination()
}
# Use the last branchpoint as our limit for the search
latest_branchpoint=`git merge-base origin/master HEAD`
latest_branchpoint=`git merge-base upstream/master HEAD`
# List all the commits between day 1 and the branch point...
git log --reverse --pretty=%H $latest_branchpoint > already_landed
@@ -103,7 +103,7 @@ git log --reverse --pretty=medium --grep="cherry picked from commit" $latest_bra
sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' > already_picked
# Grep for potential candidates
git log --reverse --pretty=%H -i --grep='^CC:.*mesa-stable\|^CC:.*mesa-dev\|\<fixes\>\|\<broken by\>\|This reverts commit' $latest_branchpoint..origin/master |\
git log --reverse --pretty=%H -i --grep='^CC:.*mesa-stable\|^CC:.*mesa-dev\|\<fixes\>\|\<broken by\>\|This reverts commit' $latest_branchpoint..upstream/master |\
while read sha
do
# Check to see whether the patch is on the ignore list.
@@ -143,7 +143,7 @@ do
esac
printf "[ %8s ] " "$tag"
git --no-pager show --no-patch --oneline $sha
git --no-pager show --no-patch --pretty=oneline $sha
done
rm -f already_picked

View File

@@ -19,3 +19,4 @@
# SOFTWARE.
git_sha1_gen_py = files('git_sha1_gen.py')
symbols_check = find_program('symbols-check.py')

117
bin/post_version.py Executable file
View File

@@ -0,0 +1,117 @@
#!/usr/bin/env python3
# Copyright © 2019 Intel Corporation
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
"""Update the main page, release notes, and calendar."""
import argparse
import calendar
import datetime
import pathlib
from lxml import (
etree,
html,
)
def calculate_previous_version(version: str, is_point: bool) -> str:
"""Calculate the previous version to compare to.
In the case of -rc to final that verison is the previous .0 release,
(19.3.0 in the case of 20.0.0, for example). for point releases that is
the last point release. This value will be the same as the input value
for a poiont release, but different for a major release.
"""
if '-' in version:
version = version.split('-')[0]
if is_point:
return version
base = version.split('.')
if base[1] == '0':
base[0] = str(int(base[0]) - 1)
base[1] = '3'
else:
base[1] = str(int(base[1]) - 1)
return '.'.join(base)
def is_point_release(version: str) -> bool:
return not version.endswith('.0')
def update_index(is_point: bool, version: str, previous_version: str) -> None:
p = pathlib.Path(__file__).parent.parent / 'docs' / 'index.html'
with p.open('rt') as f:
tree = html.parse(f)
news = tree.xpath('.//h1')[0]
date = datetime.date.today()
month = calendar.month_name[date.month]
header = etree.Element('h2')
header.text = f"{month} {date.day}, {date.year}"
body = etree.Element('p')
a = etree.SubElement(
body, 'a', attrib={'href': f'relnotes/{previous_version}.html'})
a.text = f"Mesa {previous_version}"
if is_point:
a.tail = " is released. This is a bug fix release."
else:
a.tail = (" is released. This is a new development release. "
"See the release notes for mor information about this release.")
root = news.getparent()
index = root.index(news) + 1
root.insert(index, body)
root.insert(index, header)
tree.write(p.as_posix(), method='html')
def update_release_notes(previous_version: str) -> None:
p = pathlib.Path(__file__).parent.parent / 'docs' / 'relnotes.html'
with p.open('rt') as f:
tree = html.parse(f)
li = etree.Element('li')
a = etree.SubElement(li, 'a', href=f'relnotes/{previous_version}.html')
a.text = f'{previous_version} release notes'
ul = tree.xpath('.//ul')[0]
ul.insert(0, li)
tree.write(p.as_posix(), method='html')
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument('version', help="The released version.")
args = parser.parse_args()
is_point = is_point_release(args.version)
previous_version = calculate_previous_version(args.version, is_point)
update_index(is_point, args.version, previous_version)
update_release_notes(previous_version)
if __name__ == "__main__":
main()

View File

@@ -1,29 +0,0 @@
#!/bin/sh
# This script is used to generate the list of changes that
# appears in the release notes files, with HTML formatting.
#
# Usage examples:
#
# $ bin/shortlog_mesa.sh mesa-9.0.2..mesa-9.0.3
# $ bin/shortlog_mesa.sh mesa-9.0.2..mesa-9.0.3 > changes
# $ bin/shortlog_mesa.sh mesa-9.0.2..mesa-9.0.3 | tee changes
in_log=0
git shortlog $* | while read l
do
if [ $in_log -eq 0 ]; then
echo '<p>'$l'</p>'
echo '<ul>'
in_log=1
elif echo "$l" | egrep -q '^$' ; then
echo '</ul>'
echo
in_log=0
else
mesg=$(echo $l | sed 's/ (cherry picked from commit [0-9a-f]\+)//;s/\&/&amp;/g;s/</\&lt;/g;s/>/\&gt;/g')
echo ' <li>'${mesg}'</li>'
fi
done

130
bin/symbols-check.py Normal file
View File

@@ -0,0 +1,130 @@
#!/usr/bin/env python
import argparse
import os
import platform
import subprocess
# This list contains symbols that _might_ be exported for some platforms
PLATFORM_SYMBOLS = [
'__bss_end__',
'__bss_start__',
'__bss_start',
'__end__',
'_bss_end__',
'_edata',
'_end',
'_fini',
'_init',
]
def get_symbols(nm, lib):
'''
List all the (non platform-specific) symbols exported by the library
'''
symbols = []
platform_name = platform.system()
output = subprocess.check_output([nm, '-gP', lib],
stderr=open(os.devnull, 'w')).decode("ascii")
for line in output.splitlines():
fields = line.split()
if len(fields) == 2 or fields[1] == 'U':
continue
symbol_name = fields[0]
if platform_name == 'Linux':
if symbol_name in PLATFORM_SYMBOLS:
continue
elif platform_name == 'Darwin':
assert symbol_name[0] == '_'
symbol_name = symbol_name[1:]
symbols.append(symbol_name)
return symbols
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--symbols-file',
action='store',
required=True,
help='path to file containing symbols')
parser.add_argument('--lib',
action='store',
required=True,
help='path to library')
parser.add_argument('--nm',
action='store',
required=True,
help='path to binary (or name in $PATH)')
args = parser.parse_args()
try:
lib_symbols = get_symbols(args.nm, args.lib)
except:
# We can't run this test, but we haven't technically failed it either
# Return the GNU "skip" error code
exit(77)
mandatory_symbols = []
optional_symbols = []
with open(args.symbols_file) as symbols_file:
qualifier_optional = '(optional)'
for line in symbols_file.readlines():
# Strip comments
line = line.split('#')[0]
line = line.strip()
if not line:
continue
# Line format:
# [qualifier] symbol
qualifier = None
symbol = None
fields = line.split()
if len(fields) == 1:
symbol = fields[0]
elif len(fields) == 2:
qualifier = fields[0]
symbol = fields[1]
else:
print(args.symbols_file + ': invalid format: ' + line)
exit(1)
# The only supported qualifier is 'optional', which means the
# symbol doesn't have to be exported by the library
if qualifier and not qualifier == qualifier_optional:
print(args.symbols_file + ': invalid qualifier: ' + qualifier)
exit(1)
if qualifier == qualifier_optional:
optional_symbols.append(symbol)
else:
mandatory_symbols.append(symbol)
unknown_symbols = []
for symbol in lib_symbols:
if symbol in mandatory_symbols:
continue
if symbol in optional_symbols:
continue
unknown_symbols.append(symbol)
missing_symbols = [
sym for sym in mandatory_symbols if sym not in lib_symbols
]
for symbol in unknown_symbols:
print(args.lib + ': unknown symbol exported: ' + symbol)
for symbol in missing_symbols:
print(args.lib + ': missing symbol: ' + symbol)
if unknown_symbols or missing_symbols:
exit(1)
exit(0)
if __name__ == '__main__':
main()

View File

@@ -17,6 +17,9 @@ import SCons.Script.SConscript
host_platform = _platform.system().lower()
if host_platform.startswith('cygwin'):
host_platform = 'cygwin'
# MSYS2 default platform selection.
if host_platform.startswith('mingw'):
host_platform = 'windows'
# Search sys.argv[] for a "platform=foo" argument since we don't have
# an 'env' variable at this point.
@@ -49,9 +52,18 @@ if 'PROCESSOR_ARCHITECTURE' in os.environ:
else:
host_machine = _platform.machine()
host_machine = _machine_map.get(host_machine, 'generic')
# MSYS2 default machine selection.
if _platform.system().lower().startswith('mingw') and 'MSYSTEM' in os.environ:
if os.environ['MSYSTEM'] == 'MINGW32':
host_machine = 'x86'
if os.environ['MSYSTEM'] == 'MINGW64':
host_machine = 'x86_64'
default_machine = host_machine
default_toolchain = 'default'
# MSYS2 default toolchain selection.
if _platform.system().lower().startswith('mingw'):
default_toolchain = 'mingw'
if target_platform == 'windows' and host_platform != 'windows':
default_machine = 'x86'

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -48,16 +48,16 @@ start-up because of an extension string buffer-overflow problem.
<p>
The problem is a modern OpenGL driver will return a very long string
for the glGetString(GL_EXTENSIONS) query and if the application
for the <code>glGetString(GL_EXTENSIONS)</code> query and if the application
naively copies the string into a fixed-size buffer it can overflow the
buffer and crash the application.
</p>
<p>
The work-around is to set the MESA_EXTENSION_MAX_YEAR environment variable
to the approximate release year of the game.
This will cause the glGetString(GL_EXTENSIONS) query to only report extensions
older than the given year.
The work-around is to set the <code>MESA_EXTENSION_MAX_YEAR</code>
environment variable to the approximate release year of the game.
This will cause the <code>glGetString(GL_EXTENSIONS)</code> query to only report
extensions older than the given year.
</p>
<p>

View File

@@ -2,19 +2,19 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Bug Reporting</title>
<title>Report a Bug</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Report a bug</h1>
<h1>Report a Bug</h1>
<p>
The Mesa bug database is hosted on
@@ -24,8 +24,8 @@ The old bug database on SourceForge is no longer used.
<p>
To file a Mesa bug, go to
<a href="https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa">
Bugzilla on freedesktop.org</a>
<a href="https://gitlab.freedesktop.org/mesa/mesa/issues">
GitLab on freedesktop.org</a>
</p>
<p>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -48,19 +48,19 @@ For example:
}
</pre>
<li>Put a space before/after operators. For example, <tt>a = b + c;</tt>
and not <tt>a=b+c;</tt>
<li>Put a space before/after operators. For example, <code>a = b + c;</code>
and not <code>a=b+c;</code>
<li>This GNU indent command generally does the right thing for formatting:
<pre>
indent -br -i3 -npcs --no-tabs infile.c -o outfile.c
</pre>
<li>Use comments wherever you think it would be helpful for other developers.
<li>
<p>Use comments wherever you think it would be helpful for other developers.
Several specific cases and style examples follow. Note that we roughly
follow <a href="https://www.stack.nl/~dimitri/doxygen/">Doxygen</a> conventions.
<br>
<br>
follow <a href="http://www.doxygen.nl">Doxygen</a> conventions.
</p>
Single-line comments:
<pre>
/* null-out pointer to prevent dangling reference below */
@@ -120,19 +120,21 @@ the opening brace goes on the next line by itself (see above.)
_mesa_foo_bar() - an internal non-static Mesa function
</pre>
<li>Constants, macros and enum names are ALL_UPPERCASE, with _ between
words.
<li>Mesa usually uses camel case for local variables (Ex: "localVarname")
while gallium typically uses underscores (Ex: "local_var_name").
<li>Constants, macros and enum names are <code>ALL_UPPERCASE</code>, with _
between words.
<li>Mesa usually uses camel case for local variables (Ex:
<code>localVarname</code>) while gallium typically uses underscores (Ex:
<code>local_var_name</code>).
<li>Global variables are almost never used because Mesa should be thread-safe.
<li>Booleans. Places that are not directly visible to the GL API
should prefer the use of <tt>bool</tt>, <tt>true</tt>, and
<tt>false</tt> over <tt>GLboolean</tt>, <tt>GL_TRUE</tt>, and
<tt>GL_FALSE</tt>. In C code, this may mean that
<tt>#include &lt;stdbool.h&gt;</tt> needs to be added. The
<tt>try_emit_</tt>* methods in src/mesa/program/ir_to_mesa.cpp and
src/mesa/state_tracker/st_glsl_to_tgsi.cpp can serve as examples.
should prefer the use of <code>bool</code>, <code>true</code>, and
<code>false</code> over <code>GLboolean</code>, <code>GL_TRUE</code>, and
<code>GL_FALSE</code>. In C code, this may mean that
<code>#include &lt;stdbool.h&gt;</code> needs to be added. The
<code>try_emit_*</code> methods in <code>src/mesa/program/ir_to_mesa.cpp</code>
and <code>src/mesa/state_tracker/st_glsl_to_tgsi.cpp</code> can serve as
examples.
</ul>

View File

@@ -2,19 +2,19 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Conformance</title>
<title>Conformance Testing</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Conformance</h1>
<h1>Conformance Testing</h1>
<p>
The SGI OpenGL conformance tests verify correct operation of OpenGL

View File

@@ -33,18 +33,17 @@
<li><a href="index.html" target="_parent">News</a>
<li><a href="developers.html" target="_parent">Developers</a>
<li><a href="systems.html" target="_parent">Platforms and Drivers</a>
<li><a href="license.html" target="_parent">License &amp; Copyright</a>
<li><a href="faq.html" target="_parent">FAQ</a>
<li><a href="license.html" target="_parent">License and Copyright</a>
<li><a href="faq.html" target="_parent">Frequently Asked Questions</a>
<li><a href="relnotes.html" target="_parent">Release Notes</a>
<li><a href="thanks.html" target="_parent">Acknowledgements</a>
<li><a href="conform.html" target="_parent">Conformance Testing</a>
<li>more docs below...
</ul>
<h2>Download / Install</h2>
<h2>Download and Install</h2>
<ul>
<li><a href="download.html" target="_parent">Downloading / Unpacking</a>
<li><a href="install.html" target="_parent">Compiling / Installing</a>
<li><a href="download.html" target="_parent">Downloading and Unpacking</a>
<li><a href="install.html" target="_parent">Compiling and Installing</a>
<ul>
<li><a href="meson.html" target="_parent">Meson</a></li>
</ul>
@@ -66,13 +65,13 @@
<li><a href="egl.html" target="_parent">EGL</a>
<li><a href="opengles.html" target="_parent">OpenGL ES</a>
<li><a href="envvars.html" target="_parent">Environment Variables</a>
<li><a href="osmesa.html" target="_parent">Off-Screen Rendering</a>
<li><a href="osmesa.html" target="_parent">Off-screen Rendering</a>
<li><a href="debugging.html" target="_parent">Debugging Tips</a>
<li><a href="perf.html" target="_parent">Performance Tips</a>
<li><a href="extensions.html" target="_parent">Mesa Extensions</a>
<li><a href="llvmpipe.html" target="_parent">Gallium llvmpipe driver</a>
<li><a href="vmware-guest.html" target="_parent">VMware SVGA3D guest driver</a>
<li><a href="postprocess.html" target="_parent">Gallium post-processing</a>
<li><a href="llvmpipe.html" target="_parent">Gallium LLVMpipe Driver</a>
<li><a href="vmware-guest.html" target="_parent">VMware SVGA3D Guest Driver</a>
<li><a href="postprocess.html" target="_parent">Gallium Post-processing</a>
<li><a href="application-issues.html" target="_parent">Application Issues</a>
<li><a href="viewperf.html" target="_parent">Viewperf Issues</a>
</ul>
@@ -85,24 +84,24 @@
<li><a href="helpwanted.html" target="_parent">Help Wanted</a>
<li><a href="devinfo.html" target="_parent">Development Notes</a>
<li><a href="codingstyle.html" target="_parent">Coding Style</a>
<li><a href="submittingpatches.html" target="_parent">Submitting patches</a>
<li><a href="releasing.html" target="_parent">Releasing process</a>
<li><a href="release-calendar.html" target="_parent">Release calendar</a>
<li><a href="submittingpatches.html" target="_parent">Submitting Patches</a>
<li><a href="releasing.html" target="_parent">Releasing Process</a>
<li><a href="release-calendar.html" target="_parent">Release Calendar</a>
<li><a href="sourcedocs.html" target="_parent">Source Documentation</a>
<li><a href="dispatch.html" target="_parent">GL Dispatch</a>
</ul>
<h2>Links</h2>
<ul>
<li><a href="https://www.opengl.org" target="_parent">OpenGL website</a>
<li><a href="https://dri.freedesktop.org" target="_parent">DRI website</a>
<li><a href="https://www.opengl.org" target="_parent">OpenGL Website</a>
<li><a href="https://dri.freedesktop.org" target="_parent">DRI Website</a>
<li><a href="https://www.freedesktop.org" target="_parent">freedesktop.org</a>
<li><a href="https://planet.freedesktop.org" target="_parent">Developer blogs</a>
<li><a href="https://planet.freedesktop.org" target="_parent">Developer Blogs</a>
</ul>
<h2>Hosted by:</h2>
<dl>
<dd><a href="https://freedesktop.org" target="_parent">freedesktop.org</a>
<dd><a href="https://www.freedesktop.org" target="_parent">freedesktop.org</a>
</dl>
</body>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -20,9 +20,9 @@
Normally Mesa (and OpenGL) records but does not notify the user of
errors. It is up to the application to call
<code>glGetError</code> to check for errors. Mesa supports an
environment variable, MESA_DEBUG, to help with debugging. If
MESA_DEBUG is defined, a message will be printed to stdout whenever
an error occurs.
environment variable, <code>MESA_DEBUG</code>, to help with debugging. If
<code>MESA_DEBUG</code> is defined, a message will be printed to stdout
whenever an error occurs.
</p>
<p>
@@ -30,12 +30,12 @@
(<code>--buildtype debug</code> for meson, <code>build=debug</code> for scons).
</p>
<p>
In your debugger you can set a breakpoint in _mesa_error() to trap Mesa
errors.
In your debugger you can set a breakpoint in <code>_mesa_error()</code> to trap
Mesa errors.
</p>
<p>
There is a display list printing/debugging facility. See the end of
src/dlist.c for details.
<code>src/dlist.c</code> for details.
</p>
</div>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -29,8 +29,8 @@ To add a new GL extension to Mesa you have to do at least the following.
<ul>
<li>
If glext.h doesn't define the extension, edit include/GL/gl.h and add
code like this:
If <code>glext.h</code> doesn't define the extension, edit
<code>include/GL/gl.h</code> and add code like this:
<pre>
#ifndef GL_EXT_the_extension_name
#define GL_EXT_the_extension_name 1
@@ -41,18 +41,18 @@ To add a new GL extension to Mesa you have to do at least the following.
</pre>
</li>
<li>
In the src/mapi/glapi/gen/ directory, add the new extension functions and
enums to the gl_API.xml file.
In the <code>src/mapi/glapi/gen/</code> directory, add the new extension
functions and enums to the <code>gl_API.xml</code> file.
Then, a bunch of source files must be regenerated by executing the
corresponding Python scripts.
</li>
<li>
Add a new entry to the <code>gl_extensions</code> struct in mtypes.h
if the extension requires driver capabilities not already exposed by
another extension.
Add a new entry to the <code>gl_extensions</code> struct in
<code>mtypes.h</code> if the extension requires driver capabilities not
already exposed by another extension.
</li>
<li>
Add a new entry to the src/mesa/main/extensions_table.h file.
Add a new entry to the <code>src/mesa/main/extensions_table.h</code> file.
</li>
<li>
From this point, the best way to proceed is to find another extension,
@@ -60,18 +60,20 @@ To add a new GL extension to Mesa you have to do at least the following.
as an example.
</li>
<li>
If the new extension adds new GL state, the functions in get.c, enable.c
and attrib.c will most likely require new code.
If the new extension adds new GL state, the functions in
<code>get.c</code>, <code>enable.c</code> and <code>attrib.c</code>
will most likely require new code.
</li>
<li>
To determine if the new extension is active in the current context,
use the auto-generated _mesa_has_##name_str() function defined in
src/mesa/main/extensions.h.
use the auto-generated <code>_mesa_has_##name_str()</code> function
defined in <code>src/mesa/main/extensions.h</code>.
</li>
<li>
The dispatch tests check_table.cpp and dispatch_sanity.cpp
should be updated with details about the new extensions functions. These
tests are run using 'meson test'
The dispatch tests <code>check_table.cpp</code> and
<code>dispatch_sanity.cpp</code> should be updated with details about
the new extensions functions. These tests are run using
<code>meson test</code>.
</li>
</ul>

View File

@@ -2,19 +2,19 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>GL Dispatch in Mesa</title>
<title>GL Dispatch</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>GL Dispatch in Mesa</h1>
<h1>GL Dispatch</h1>
<p>Several factors combine to make efficient dispatch of OpenGL functions
fairly complicated. This document attempts to explain some of the issues
@@ -30,28 +30,28 @@ of the GL related state for the application. Every texture, every buffer
object, every enable, and much, much more is stored in the context. Since
an application can have more than one context, the context to be used is
selected by a window-system dependent function such as
<tt>glXMakeContextCurrent</tt>.</p>
<code>glXMakeContextCurrent</code>.</p>
<p>In environments that implement OpenGL with X-Windows using GLX, every GL
function, including the pointers returned by <tt>glXGetProcAddress</tt>, are
function, including the pointers returned by <code>glXGetProcAddress</code>, are
<em>context independent</em>. This means that no matter what context is
currently active, the same <tt>glVertex3fv</tt> function is used.</p>
currently active, the same <code>glVertex3fv</code> function is used.</p>
<p>This creates the first bit of dispatch complexity. An application can
have two GL contexts. One context is a direct rendering context where
function calls are routed directly to a driver loaded within the
application's address space. The other context is an indirect rendering
context where function calls are converted to GLX protocol and sent to a
server. The same <tt>glVertex3fv</tt> has to do the right thing depending
server. The same <code>glVertex3fv</code> has to do the right thing depending
on which context is current.</p>
<p>Highly optimized drivers or GLX protocol implementations may want to
change the behavior of GL functions depending on current state. For
example, <tt>glFogCoordf</tt> may operate differently depending on whether
example, <code>glFogCoordf</code> may operate differently depending on whether
or not fog is enabled.</p>
<p>In multi-threaded environments, it is possible for each thread to have a
different GL context current. This means that poor old <tt>glVertex3fv</tt>
different GL context current. This means that poor old <code>glVertex3fv</code>
has to know which GL context is current in the thread where it is being
called.</p>
@@ -64,18 +64,18 @@ dispatch table stores pointers to functions that actually implement
specific GL functions. Each time a new context is made current in a thread,
these pointers a updated.</p>
<p>The implementation of functions such as <tt>glVertex3fv</tt> becomes
<p>The implementation of functions such as <code>glVertex3fv</code> becomes
conceptually simple:</p>
<ul>
<li>Fetch the current dispatch table pointer.</li>
<li>Fetch the pointer to the real <tt>glVertex3fv</tt> function from the
<li>Fetch the pointer to the real <code>glVertex3fv</code> function from the
table.</li>
<li>Call the real function.</li>
</ul>
<p>This can be implemented in just a few lines of C code. The file
<tt>src/mesa/glapi/glapitemp.h</tt> contains code very similar to this.</p>
<code>src/mesa/glapi/glapitemp.h</code> contains code very similar to this.</p>
<blockquote>
<table border="1">
@@ -93,9 +93,9 @@ void glVertex3f(GLfloat x, GLfloat y, GLfloat z)
overhead that it adds to every GL function call.</p>
<p>In a multithreaded environment, a naive implementation of
<tt>GET_DISPATCH</tt> involves a call to <tt>pthread_getspecific</tt> or a
<code>GET_DISPATCH</code> involves a call to <code>pthread_getspecific</code> or a
similar function. Mesa provides a wrapper function called
<tt>_glapi_get_dispatch</tt> that is used by default.</p>
<code>_glapi_get_dispatch</code> that is used by default.</p>
<h2>3. Optimizations</h2>
@@ -109,7 +109,7 @@ each can or cannot be used are listed.</p>
<p>The vast majority of OpenGL applications use the API in a single threaded
manner. That is, the application has only one thread that makes calls into
the GL. In these cases, not only do the calls to
<tt>pthread_getspecific</tt> hurt performance, but they are completely
<code>pthread_getspecific</code> hurt performance, but they are completely
unnecessary! It is possible to detect this common case and avoid these
calls.</p>
@@ -118,15 +118,15 @@ of the executing thread. If the same thread ID is always seen, Mesa knows
that the application is, from OpenGL's point of view, single threaded.</p>
<p>As long as an application is single threaded, Mesa stores a pointer to
the dispatch table in a global variable called <tt>_glapi_Dispatch</tt>.
the dispatch table in a global variable called <code>_glapi_Dispatch</code>.
The pointer is also stored in a per-thread location via
<tt>pthread_setspecific</tt>. When Mesa detects that an application has
become multithreaded, <tt>NULL</tt> is stored in <tt>_glapi_Dispatch</tt>.</p>
<code>pthread_setspecific</code>. When Mesa detects that an application has
become multithreaded, <code>NULL</code> is stored in <code>_glapi_Dispatch</code>.</p>
<p>Using this simple mechanism the dispatch functions can detect the
multithreaded case by comparing <tt>_glapi_Dispatch</tt> to <tt>NULL</tt>.
The resulting implementation of <tt>GET_DISPATCH</tt> is slightly more
complex, but it avoids the expensive <tt>pthread_getspecific</tt> call in
multithreaded case by comparing <code>_glapi_Dispatch</code> to <code>NULL</code>.
The resulting implementation of <code>GET_DISPATCH</code> is slightly more
complex, but it avoids the expensive <code>pthread_getspecific</code> call in
the common case.</p>
<blockquote>
@@ -136,7 +136,7 @@ the common case.</p>
(_glapi_Dispatch != NULL) \
? _glapi_Dispatch : pthread_getspecific(&amp;_glapi_Dispatch_key)
</pre></td></tr>
<tr><td>Improved <tt>GET_DISPATCH</tt> Implementation</td></tr></table>
<tr><td>Improved <code>GET_DISPATCH</code> Implementation</td></tr></table>
</blockquote>
<h3>3.2. ELF TLS</h3>
@@ -144,14 +144,14 @@ the common case.</p>
<p>Starting with the 2.4.20 Linux kernel, each thread is allocated an area
of per-thread, global storage. Variables can be put in this area using some
extensions to GCC. By storing the dispatch table pointer in this area, the
expensive call to <tt>pthread_getspecific</tt> and the test of
<tt>_glapi_Dispatch</tt> can be avoided.</p>
expensive call to <code>pthread_getspecific</code> and the test of
<code>_glapi_Dispatch</code> can be avoided.</p>
<p>The dispatch table pointer is stored in a new variable called
<tt>_glapi_tls_Dispatch</tt>. A new variable name is used so that a single
<code>_glapi_tls_Dispatch</code>. A new variable name is used so that a single
libGL can implement both interfaces. This allows the libGL to operate with
direct rendering drivers that use either interface. Once the pointer is
properly declared, <tt>GET_DISPACH</tt> becomes a simple variable
properly declared, <code>GET_DISPACH</code> becomes a simple variable
reference.</p>
<blockquote>
@@ -162,12 +162,12 @@ extern __thread struct _glapi_table *_glapi_tls_Dispatch
#define GET_DISPATCH() _glapi_tls_Dispatch
</pre></td></tr>
<tr><td>TLS <tt>GET_DISPATCH</tt> Implementation</td></tr></table>
<tr><td>TLS <code>GET_DISPATCH</code> Implementation</td></tr></table>
</blockquote>
<p>Use of this path is controlled by the preprocessor define
<tt>GLX_USE_TLS</tt>. Any platform capable of using TLS should use this as
the default dispatch method.</p>
<code>USE_ELF_TLS</code>. Any platform capable of using ELF TLS should use this
as the default dispatch method.</p>
<h3>3.3. Assembly Language Dispatch Stubs</h3>
@@ -185,13 +185,13 @@ ways that the dispatch table pointer can be accessed. There are four
different methods that can be used:</p>
<ol>
<li>Using <tt>_glapi_Dispatch</tt> directly in builds for non-multithreaded
<li>Using <code>_glapi_Dispatch</code> directly in builds for non-multithreaded
environments.</li>
<li>Using <tt>_glapi_Dispatch</tt> and <tt>_glapi_get_dispatch</tt> in
<li>Using <code>_glapi_Dispatch</code> and <code>_glapi_get_dispatch</code> in
multithreaded environments.</li>
<li>Using <tt>_glapi_Dispatch</tt> and <tt>pthread_getspecific</tt> in
<li>Using <code>_glapi_Dispatch</code> and <code>pthread_getspecific</code> in
multithreaded environments.</li>
<li>Using <tt>_glapi_tls_Dispatch</tt> directly in TLS enabled
<li>Using <code>_glapi_tls_Dispatch</code> directly in TLS enabled
multithreaded environments.</li>
</ol>
@@ -204,13 +204,13 @@ terribly relevant.</p>
few preprocessor defines.</p>
<ul>
<li>If <tt>GLX_USE_TLS</tt> is defined, method #3 is used.</li>
<li>If <tt>HAVE_PTHREAD</tt> is defined, method #2 is used.</li>
<li>If <code>USE_ELF_TLS</code> is defined, method #3 is used.</li>
<li>If <code>HAVE_PTHREAD</code> is defined, method #2 is used.</li>
<li>If none of the preceding are defined, method #1 is used.</li>
</ul>
<p>Two different techniques are used to handle the various different cases.
On x86 and SPARC, a macro called <tt>GL_STUB</tt> is used. In the preamble
On x86 and SPARC, a macro called <code>GL_STUB</code> is used. In the preamble
of the assembly source file different implementations of the macro are
selected based on the defined preprocessor variables. The assembly code
then consists of a series of invocations of the macros such as:
@@ -220,7 +220,7 @@ then consists of a series of invocations of the macros such as:
<tr><td><pre>
GL_STUB(Color3fv, _gloffset_Color3fv)
</pre></td></tr>
<tr><td>SPARC Assembly Implementation of <tt>glColor3fv</tt></td></tr></table>
<tr><td>SPARC Assembly Implementation of <code>glColor3fv</code></td></tr></table>
</blockquote>
<p>The benefit of this technique is that changes to the calling pattern
@@ -231,32 +231,32 @@ changed lines in the assembly code.</p>
implementation does not change based on the parameters passed to the
function. For example, since x86 passes all parameters on the stack, no
additional code is needed to save and restore function parameters around a
call to <tt>pthread_getspecific</tt>. Since x86-64 passes parameters in
call to <code>pthread_getspecific</code>. Since x86-64 passes parameters in
registers, varying amounts of code needs to be inserted around the call to
<tt>pthread_getspecific</tt> to save and restore the GL function's
<code>pthread_getspecific</code> to save and restore the GL function's
parameters.</p>
<p>The other technique, used by platforms like x86-64 that cannot use the
first technique, is to insert <tt>#ifdef</tt> within the assembly
first technique, is to insert <code>#ifdef</code> within the assembly
implementation of each function. This makes the assembly file considerably
larger (e.g., 29,332 lines for <tt>glapi_x86-64.S</tt> versus 1,155 lines for
<tt>glapi_x86.S</tt>) and causes simple changes to the function
larger (e.g., 29,332 lines for <code>glapi_x86-64.S</code> versus 1,155 lines for
<code>glapi_x86.S</code>) and causes simple changes to the function
implementation to generate many lines of diffs. Since the assembly files
are typically generated by scripts (see <a href="#autogen">below</a>), this
isn't a significant problem.</p>
<p>Once a new assembly file is created, it must be inserted in the build
system. There are two steps to this. The file must first be added to
<tt>src/mesa/sources</tt>. That gets the file built and linked. The second
step is to add the correct <tt>#ifdef</tt> magic to
<tt>src/mesa/glapi/glapi_dispatch.c</tt> to prevent the C version of the
<code>src/mesa/sources</code>. That gets the file built and linked. The second
step is to add the correct <code>#ifdef</code> magic to
<code>src/mesa/glapi/glapi_dispatch.c</code> to prevent the C version of the
dispatch functions from being built.</p>
<h3 id="fixedsize">3.4. Fixed-Length Dispatch Stubs</h3>
<p>To implement <tt>glXGetProcAddress</tt>, Mesa stores a table that
<p>To implement <code>glXGetProcAddress</code>, Mesa stores a table that
associates function names with pointers to those functions. This table is
stored in <tt>src/mesa/glapi/glprocs.h</tt>. For different reasons on
stored in <code>src/mesa/glapi/glprocs.h</code>. For different reasons on
different platforms, storing all of those pointers is inefficient. On most
platforms, including all known platforms that support TLS, we can avoid this
added overhead.</p>
@@ -267,8 +267,8 @@ calculated by multiplying the size of the dispatch stub by the offset of the
function in the table. This value is then added to the address of the first
dispatch stub.</p>
<p>This path is activated by adding the correct <tt>#ifdef</tt> magic to
<tt>src/mesa/glapi/glapi.c</tt> just before <tt>glprocs.h</tt> is
<p>This path is activated by adding the correct <code>#ifdef</code> magic to
<code>src/mesa/glapi/glapi.c</code> just before <code>glprocs.h</code> is
included.</p>
<h2 id="autogen">4. Automatic Generation of Dispatch Stubs</h2>

View File

@@ -2,19 +2,21 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Getting Mesa</title>
<title>Downloading and Unpacking</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Downloading</h1>
<h1>Downloading and Unpacking</h1>
<h2>Downloading</h2>
<p>
Primary Mesa download site:
@@ -25,23 +27,23 @@ or <a href="https://mesa.freedesktop.org/archive/">mesa.freedesktop.org</a>
<p>
Starting with the first release of 2017, Mesa's version scheme is
year-based. Filenames are in the form <tt>mesa-Y.N.P.tar.gz</tt>, where
<tt>Y</tt> is the year (two digits), <tt>N</tt> is an incremental number
(starting at 0) and <tt>P</tt> is the patch number (0 for the first
year-based. Filenames are in the form <code>mesa-Y.N.P.tar.gz</code>, where
<code>Y</code> is the year (two digits), <code>N</code> is an incremental number
(starting at 0) and <code>P</code> is the patch number (0 for the first
release, 1 for the first patch after that).
</p>
<p>
When a new release is coming, release candidates (betas) may be found
in the same directory, and are recognisable by the
<tt>mesa-Y.N.P-<b>rc</b>X.tar.gz</tt> filename.
<code>mesa-Y.N.P-<b>rc</b>X.tar.gz</code> filename.
</p>
<h1>Unpacking</h1>
<h2>Unpacking</h2>
<p>
Mesa releases are available in two formats: <tt>.tar.xz</tt> and <tt>.tar.gz</tt>.
Mesa releases are available in two formats: <code>.tar.xz</code> and <code>.tar.gz</code>.
</p>
<p>
@@ -56,7 +58,7 @@ To unpack the tarball:
</pre>
<h1>Contents</h1>
<h2>Contents</h2>
<p>
Proceed to the <a href="install.html">compilation and installation
@@ -64,7 +66,7 @@ instructions</a>.
</p>
<h1>Demos, GLUT, and GLU</h1>
<h2>Demos, GLUT, and GLU</h2>
<p>
A package of SGI's GLU library is available

View File

@@ -2,19 +2,19 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa EGL</title>
<title>EGL</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Mesa EGL</h1>
<h1>EGL</h1>
<p>The current version of EGL in Mesa implements EGL 1.4. More information
about EGL can be found at
@@ -86,14 +86,12 @@ and <code>haiku</code>.
The <code>android</code> platform can either be built as a system
component, part of AOSP, using <code>Android.mk</code> files, or
cross-compiled using appropriate options.
The <code>haiku</code> platform can only be built with SCons or Meson.
Unless for special needs, the build system should
select the right platforms automatically.</p>
</dd>
<dt><code>-D gles1=true</code></dt>
<dt><code>-D gles2=true</code></dt>
<dt><code>-D gles1=true</code> and <code>-D gles2=true</code></dt>
<dd>
<p>These options enable OpenGL ES support in OpenGL. The result is one big

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -25,145 +25,206 @@ sometimes be useful for debugging end-user issues.
<h2>LibGL environment variables</h2>
<ul>
<li>LIBGL_DEBUG - If defined debug information will be printed to stderr.
If set to 'verbose' additional information will be printed.
<li>LIBGL_DRIVERS_PATH - colon-separated list of paths to search for DRI drivers
<li>LIBGL_ALWAYS_INDIRECT - if set to `true`, forces an indirect rendering context/connection.
<li>LIBGL_ALWAYS_SOFTWARE - if set to `true`, always use software rendering
<li>LIBGL_NO_DRAWARRAYS - if set to `true`, do not use DrawArrays GLX protocol (for debugging)
<li>LIBGL_SHOW_FPS - print framerate to stdout based on the number of glXSwapBuffers
calls per second.
<li>LIBGL_DRI3_DISABLE - disable DRI3 if set to `true`.
</ul>
<dl>
<dt><code>LIBGL_DEBUG</code></dt>
<dd>If defined debug information will be printed to stderr.
If set to <code>verbose</code> additional information will be
printed.</dd>
<dt><code>LIBGL_DRIVERS_PATH</code></dt>
<dd>colon-separated list of paths to search for DRI drivers</dd>
<dt><code>LIBGL_ALWAYS_INDIRECT</code></dt>
<dd>if set to <code>true</code>, forces an indirect rendering
context/connection.</dd>
<dt><code>LIBGL_ALWAYS_SOFTWARE</code></dt>
<dd>if set to <code>true</code>, always use software rendering</dd>
<dt><code>LIBGL_NO_DRAWARRAYS</code></dt>
<dd>if set to <code>true</code>, do not use DrawArrays GLX protocol
(for debugging)</dd>
<dt><code>LIBGL_SHOW_FPS</code></dt>
<dd>print framerate to stdout based on the number of
<code>glXSwapBuffers</code> calls per second.</dd>
<dt><code>LIBGL_DRI3_DISABLE</code></dt>
<dd>disable DRI3 if set to <code>true</code>.</dd>
</dl>
<h2>Core Mesa environment variables</h2>
<dl>
<dt><code>MESA_NO_ASM</code></dt>
<dd>if set, disables all assembly language optimizations</dd>
<dt><code>MESA_NO_MMX</code></dt>
<dd>if set, disables Intel MMX optimizations</dd>
<dt><code>MESA_NO_3DNOW</code></dt>
<dd>if set, disables AMD 3DNow! optimizations</dd>
<dt><code>MESA_NO_SSE</code></dt>
<dd>if set, disables Intel SSE optimizations</dd>
<dt><code>MESA_NO_ERROR</code></dt>
<dd>if set to 1, error checking is disabled as per <code>KHR_no_error</code>.
This will result in undefined behaviour for invalid use of the api, but
can reduce CPU use for apps that are known to be error free.</dd>
<dt><code>MESA_DEBUG</code></dt>
<dd>if set, error messages are printed to stderr. For example,
if the application generates a <code>GL_INVALID_ENUM</code> error, a
corresponding error message indicating where the error occurred, and
possibly why, will be printed to stderr. For release builds,
<code>MESA_DEBUG</code> defaults to off (no debug output).
<code>MESA_DEBUG</code> accepts the following comma-separated list of
named flags, which adds extra behaviour to just set
<code>MESA_DEBUG=1</code>:
<dl>
<dt><code>silent</code></dt>
<dd>turn off debug messages. Only useful for debug builds.</dd>
<dt><code>flush</code></dt>
<dd>flush after each drawing command</dd>
<dt><code>incomplete_tex</code></dt>
<dd>extra debug messages when a texture is incomplete</dd>
<dt><code>incomplete_fbo</code></dt>
<dd>extra debug messages when a fbo is incomplete</dd>
<dt><code>context</code></dt>
<dd>create a debug context (see <code>GLX_CONTEXT_DEBUG_BIT_ARB</code>)
and print error and performance messages to stderr (or
<code>MESA_LOG_FILE</code>).</dd>
</dl>
</dd>
<dt><code>MESA_LOG_FILE</code></dt>
<dd>specifies a file name for logging all errors, warnings,
etc., rather than stderr</dd>
<dt><code>MESA_TEX_PROG</code></dt>
<dd>if set, implement conventional texture env modes with
fragment programs (intended for developers only)</dd>
<dt><code>MESA_TNL_PROG</code></dt>
<dd>if set, implement conventional vertex transformation operations with
vertex programs (intended for developers only). Setting this variable
automatically sets the <code>MESA_TEX_PROG</code> variable as well.</dd>
<dt><code>MESA_EXTENSION_OVERRIDE</code></dt>
<dd>can be used to enable/disable extensions. A value such as
<code>GL_EXT_foo -GL_EXT_bar</code> will enable the
<code>GL_EXT_foo</code> extension and disable the
<code>GL_EXT_bar</code> extension.</dd>
<dt><code>MESA_EXTENSION_MAX_YEAR</code></dt>
<dd>The <code>GL_EXTENSIONS</code> string returned by Mesa is sorted by
extension year. If this variable is set to year X, only extensions
defined on or before year X will be reported. This is to work-around a
bug in some games where the extension string is copied into a fixed-size
buffer without truncating. If the extension string is too long, the
buffer overrun can cause the game to crash. This is a work-around for
that.</dd>
<dt><code>MESA_GL_VERSION_OVERRIDE</code></dt>
<dd>changes the value returned by
<code>glGetString(GL_VERSION)</code> and possibly the GL API type.
<ul>
<li>MESA_NO_ASM - if set, disables all assembly language optimizations
<li>MESA_NO_MMX - if set, disables Intel MMX optimizations
<li>MESA_NO_3DNOW - if set, disables AMD 3DNow! optimizations
<li>MESA_NO_SSE - if set, disables Intel SSE optimizations
<li>MESA_NO_ERROR - if set to 1, error checking is disabled as per KHR_no_error.
This will result in undefined behaviour for invalid use of the api, but
can reduce CPU use for apps that are known to be error free.</li>
<li>MESA_DEBUG - if set, error messages are printed to stderr. For example,
if the application generates a GL_INVALID_ENUM error, a corresponding error
message indicating where the error occurred, and possibly why, will be
printed to stderr.<br>
For release builds, MESA_DEBUG defaults to off (no debug output).
MESA_DEBUG accepts the following comma-separated list of named
flags, which adds extra behaviour to just set MESA_DEBUG=1:
<ul>
<li>silent - turn off debug messages. Only useful for debug builds.</li>
<li>flush - flush after each drawing command</li>
<li>incomplete_tex - extra debug messages when a texture is incomplete</li>
<li>incomplete_fbo - extra debug messages when a fbo is incomplete</li>
<li>context - create a debug context (see GLX_CONTEXT_DEBUG_BIT_ARB) and
print error and performance messages to stderr (or MESA_LOG_FILE).</li>
</ul>
<li>MESA_LOG_FILE - specifies a file name for logging all errors, warnings,
etc., rather than stderr
<li>MESA_TEX_PROG - if set, implement conventional texture env modes with
fragment programs (intended for developers only)
<li>MESA_TNL_PROG - if set, implement conventional vertex transformation
operations with vertex programs (intended for developers only).
Setting this variable automatically sets the MESA_TEX_PROG variable as well.
<li>MESA_EXTENSION_OVERRIDE - can be used to enable/disable extensions.
A value such as "GL_EXT_foo -GL_EXT_bar" will enable the GL_EXT_foo extension
and disable the GL_EXT_bar extension.
<li>MESA_EXTENSION_MAX_YEAR - The GL_EXTENSIONS string returned by Mesa is sorted
by extension year.
If this variable is set to year X, only extensions defined on or before year
X will be reported.
This is to work-around a bug in some games where the extension string is
copied into a fixed-size buffer without truncating.
If the extension string is too long, the buffer overrun can cause the game
to crash.
This is a work-around for that.
<li>MESA_GL_VERSION_OVERRIDE - changes the value returned by
glGetString(GL_VERSION) and possibly the GL API type.
<ul>
<li>The format should be MAJOR.MINOR[FC|COMPAT]
<li>FC is an optional suffix that indicates a forward compatible
context. This is only valid for versions &gt;= 3.0.
<li>COMPAT is an optional suffix that indicates a compatibility
context or GL_ARB_compatibility support. This is only valid for
versions &gt;= 3.1.
<li>The format should be <code>MAJOR.MINOR[FC|COMPAT]</code>
<li><code>FC</code> is an optional suffix that indicates a forward
compatible context. This is only valid for versions &gt;= 3.0.
<li><code>COMPAT</code> is an optional suffix that indicates a
compatibility context or <code>GL_ARB_compatibility</code> support.
This is only valid for versions &gt;= 3.1.
<li>GL versions &lt;= 3.0 are set to a compatibility (non-Core)
profile
<li>GL versions = 3.1, depending on the driver, it may or may not
have the ARB_compatibility extension enabled.
have the <code>ARB_compatibility</code> extension enabled.
<li>GL versions &gt;= 3.2 are set to a Core profile
<li>Examples: 2.1, 3.0, 3.0FC, 3.1, 3.1FC, 3.1COMPAT, X.Y, X.YFC,
X.YCOMPAT.
<ul>
<li>2.1 - select a compatibility (non-Core) profile with GL
version 2.1.
<li>3.0 - select a compatibility (non-Core) profile with GL
version 3.0.
<li>3.0FC - select a Core+Forward Compatible profile with GL
version 3.0.
<li>3.1 - select GL version 3.1 with GL_ARB_compatibility enabled
per the driver default.
<li>3.1FC - select GL version 3.1 with forward compatibility and
GL_ARB_compatibility disabled.
<li>3.1COMPAT - select GL version 3.1 with GL_ARB_compatibility
enabled.
<li>X.Y - override GL version to X.Y without changing the profile.
<li>X.YFC - select a Core+Forward Compatible profile with GL
version X.Y.
<li>X.YCOMPAT - select a Compatibility profile with GL version
X.Y.
</ul>
<li>Examples:
<dl>
<dt><code>2.1</code></dt>
<dd>select a compatibility (non-Core) profile with GL version 2.1.</dd>
<dt><code>3.0</code></dt>
<dd>select a compatibility (non-Core) profile with GL version 3.0.</dd>
<dt><code>3.0FC</code></dt>
<dd>select a Core+Forward Compatible profile with GL version 3.0.</dd>
<dt><code>3.1</code></dt>
<dd>select GL version 3.1 with <code>GL_ARB_compatibility</code>
enabled per the driver default.</dd>
<dt><code>3.1FC</code></dt>
<dd>select GL version 3.1 with forward compatibility and
<code>GL_ARB_compatibility</code> disabled.</dd>
<dt><code>3.1COMPAT</code></dt>
<dd>select GL version 3.1 with <code>GL_ARB_compatibility</code>
enabled.</dd>
<dt><code>X.Y</code></dt>
<dd>override GL version to X.Y without changing the profile.</dd>
<dt><code>X.YFC</code></dt>
<dd>select a Core+Forward Compatible profile with GL version X.Y.</dd>
<dt><code>X.YCOMPAT</code></dt>
<dd>select a Compatibility profile with GL version X.Y.</dd>
</dl>
<li>Mesa may not really implement all the features of the given
version. (for developers only)
</ul>
<li>MESA_GLES_VERSION_OVERRIDE - changes the value returned by
glGetString(GL_VERSION) for OpenGL ES.
</dd>
<dt><code>MESA_GLES_VERSION_OVERRIDE</code></dt>
<dd>changes the value returned by <code>glGetString(GL_VERSION)</code>
for OpenGL ES.
<ul>
<li> The format should be MAJOR.MINOR
<li> Examples: 2.0, 3.0, 3.1
<li> The format should be <code>MAJOR.MINOR</code>
<li> Examples: <code>2.0</code>, <code>3.0</code>, <code>3.1</code>
<li> Mesa may not really implement all the features of the given version.
(for developers only)
</ul>
<li>MESA_GLSL_VERSION_OVERRIDE - changes the value returned by
glGetString(GL_SHADING_LANGUAGE_VERSION). Valid values are integers, such as
"130". Mesa will not really implement all the features of the given language version
if it's higher than what's normally reported. (for developers only)
<li>MESA_GLSL_CACHE_DISABLE - if set to `true`, disables the GLSL shader cache
<li>MESA_GLSL_CACHE_MAX_SIZE - if set, determines the maximum size of
the on-disk cache of compiled GLSL programs. Should be set to a number
optionally followed by 'K', 'M', or 'G' to specify a size in
kilobytes, megabytes, or gigabytes. By default, gigabytes will be
assumed. And if unset, a maximum size of 1GB will be used. Note: A separate
cache might be created for each architecture that Mesa is installed for on
your system. For example under the default settings you may end up with a 1GB
cache for x86_64 and another 1GB cache for i386.
<li>MESA_GLSL_CACHE_DIR - if set, determines the directory to be used
for the on-disk cache of compiled GLSL programs. If this variable is
not set, then the cache will be stored in $XDG_CACHE_HOME/mesa_shader_cache (if
that variable is set), or else within .cache/mesa_shader_cache within the user's
home directory.
<li>MESA_GLSL - <a href="shading.html#envvars">shading language compiler options</a>
<li>MESA_NO_MINMAX_CACHE - when set, the minmax index cache is globally disabled.
<li>MESA_SHADER_CAPTURE_PATH - see <a href="shading.html#capture">Capturing Shaders</a></li>
<li>MESA_SHADER_DUMP_PATH and MESA_SHADER_READ_PATH - see <a href="shading.html#replacement">Experimenting with Shader Replacements</a></li>
<li>MESA_VK_VERSION_OVERRIDE - changes the Vulkan physical device version
as returned in VkPhysicalDeviceProperties::apiVersion.
</dd>
<dt><code>MESA_GLSL_VERSION_OVERRIDE</code></dt>
<dd>changes the value returned by
<code>glGetString(GL_SHADING_LANGUAGE_VERSION)</code>.
Valid values are integers, such as <code>130</code>. Mesa will not
really implement all the features of the given language version if
it's higher than what's normally reported. (for developers only)
</dd>
<dt><code>MESA_GLSL_CACHE_DISABLE</code></dt>
<dd>if set to <code>true</code>, disables the GLSL shader cache</dd>
<dt><code>MESA_GLSL_CACHE_MAX_SIZE</code></dt>
<dd>if set, determines the maximum size of the on-disk cache of compiled GLSL
programs. Should be set to a number optionally followed by <code>K</code>,
<code>M</code>, or <code>G</code> to specify a size in kilobytes,
megabytes, or gigabytes. By default, gigabytes will be assumed. And if
unset, a maximum size of 1GB will be used. Note: A separate cache might
be created for each architecture that Mesa is installed for on your
system. For example under the default settings you may end up with a 1GB
cache for x86_64 and another 1GB cache for i386.</dd>
<dt><code>MESA_GLSL_CACHE_DIR</code></dt>
<dd>if set, determines the directory to be used for the on-disk cache of
compiled GLSL programs. If this variable is not set, then the cache will
be stored in <code>$XDG_CACHE_HOME/mesa_shader_cache</code> (if that
variable is set), or else within <code>.cache/mesa_shader_cache</code>
within the user's home directory.
</dd>
<dt><code>MESA_GLSL</code></dt>
<dd><a href="shading.html#envvars">shading language compiler options</a></dd>
<dt><code>MESA_NO_MINMAX_CACHE</code></dt>
<dd>when set, the minmax index cache is globally disabled.</dd>
<dt><code>MESA_SHADER_CAPTURE_PATH</code></dt>
<dd>see <a href="shading.html#capture">Capturing Shaders</a></dd>
<dt><code>MESA_SHADER_DUMP_PATH</code> and <code>MESA_SHADER_READ_PATH</code></dt>
<dd>see <a href="shading.html#replacement">Experimenting with Shader Replacements</a></dd>
<dt><code>MESA_VK_VERSION_OVERRIDE</code></dt>
<dd>changes the Vulkan physical device version
as returned in <code>VkPhysicalDeviceProperties::apiVersion</code>.
<ul>
<li>The format should be MAJOR.MINOR[.PATCH]</li>
<li>The format should be <code>MAJOR.MINOR[.PATCH]</code></li>
<li>This will not let you force a version higher than the driver's
instance versionas advertised by vkEnumerateInstanceVersion</li>
instance version as advertised by
<code>vkEnumerateInstanceVersion</code></li>
<li>This can be very useful for debugging but some features may not be
implemented correctly. (For developers only)</li>
</ul>
</li>
</ul>
</dd>
</dl>
<h2>NIR passes enviroment variables</h2>
<p>
The following are only applicable for drivers that uses NIR, as they
modify the behaviour for the common NIR_PASS and NIR_PASS_V macros,
that wrap calls to NIR lowering/optimizations.
</p>
<dl>
<dt><code>NIR_PRINT</code></dt>
<dd>If defined, the resulting NIR shader will be printed out at each succesful NIR lowering/optimization call.</dd>
<dt><code>NIR_TEST_CLONE</code></dt>
<dd>If defined, cloning a NIR shader would be tested at each succesful NIR lowering/optimization call.</dd>
<dt><code>NIR_TEST_SERIALIZE</code></dt>
<dd>If defined, serialize and deserialize a NIR shader would be tested at each succesful NIR lowering/optimization call.</dd>
</dl>
<h2>Mesa Xlib driver environment variables</h2>
@@ -172,80 +233,137 @@ home directory.
The following are only applicable to the Mesa Xlib software driver.
See the <a href="xlibdriver.html">Xlib software driver page</a> for details.
</p>
<ul>
<li>MESA_RGB_VISUAL - specifies the X visual and depth for RGB mode
<li>MESA_CI_VISUAL - specifies the X visual and depth for CI mode
<li>MESA_BACK_BUFFER - specifies how to implement the back color buffer,
either "pixmap" or "ximage"
<li>MESA_GAMMA - gamma correction coefficients for red, green, blue channels
<li>MESA_XSYNC - enable synchronous X behavior (for debugging only)
<li>MESA_GLX_FORCE_CI - if set, force GLX to treat 8bpp visuals as CI visuals
<li>MESA_GLX_FORCE_ALPHA - if set, forces RGB windows to have an alpha channel.
<li>MESA_GLX_DEPTH_BITS - specifies default number of bits for depth buffer.
<li>MESA_GLX_ALPHA_BITS - specifies default number of bits for alpha channel.
</ul>
<dl>
<dt><code>MESA_RGB_VISUAL</code></dt>
<dd>specifies the X visual and depth for RGB mode</dd>
<dt><code>MESA_CI_VISUAL</code></dt>
<dd>specifies the X visual and depth for CI mode</dd>
<dt><code>MESA_BACK_BUFFER</code></dt>
<dd>specifies how to implement the back color buffer, either
<code>pixmap</code> or <code>ximage</code></dd>
<dt><code>MESA_GAMMA</code></dt>
<dd>gamma correction coefficients for red, green, blue channels</dd>
<dt><code>MESA_XSYNC</code></dt>
<dd>enable synchronous X behavior (for debugging only)</dd>
<dt><code>MESA_GLX_FORCE_CI</code></dt>
<dd>if set, force GLX to treat 8bpp visuals as CI visuals</dd>
<dt><code>MESA_GLX_FORCE_ALPHA</code></dt>
<dd>if set, forces RGB windows to have an alpha channel.</dd>
<dt><code>MESA_GLX_DEPTH_BITS</code></dt>
<dd>specifies default number of bits for depth buffer.</dd>
<dt><code>MESA_GLX_ALPHA_BITS</code></dt>
<dd>specifies default number of bits for alpha channel.</dd>
</dl>
<h2>i945/i965 driver environment variables (non-Gallium)</h2>
<ul>
<li>INTEL_NO_HW - if set to 1, prevents batches from being submitted to the hardware.
This is useful for debugging hangs, etc.</li>
<li>INTEL_DEBUG - a comma-separated list of named flags, which do various things:
<ul>
<li>ann - annotate IR in assembly dumps</li>
<li>aub - dump batches into an AUB trace for use with simulation tools</li>
<li>bat - emit batch information</li>
<li>blit - emit messages about blit operations</li>
<li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>
<li>buf - emit messages about buffer objects</li>
<li>clip - emit messages about the clip unit (for old gens, includes the CLIP program)</li>
<li>color - use color in output</li>
<li>cs - dump shader assembly for compute shaders</li>
<li>do32 - generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</li>
<li>dri - emit messages about the DRI interface</li>
<li>fbo - emit messages about framebuffers</li>
<li>fs - dump shader assembly for fragment shaders</li>
<li>gs - dump shader assembly for geometry shaders</li>
<li>hex - print instruction hex dump with the disassembly</li>
<li>l3 - emit messages about the new L3 state during transitions</li>
<li>miptree - emit messages about miptrees</li>
<li>no8 - don't generate SIMD8 fragment shader</li>
<li>no16 - suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</li>
<li>nocompact - disable instruction compaction</li>
<li>nodualobj - suppress generation of dual-object geometry shader code</li>
<li>norbc - disable single sampled render buffer compression</li>
<li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>
<li>perf - emit messages about performance issues</li>
<li>perfmon - emit messages about AMD_performance_monitor</li>
<li>pix - emit messages about pixel operations</li>
<li>prim - emit messages about drawing primitives</li>
<li>reemit - mark all state dirty on each draw call</li>
<li>sf - emit messages about the strips &amp; fans unit (for old gens, includes the SF program)</li>
<li>shader_time - record how much GPU time is spent in each shader</li>
<li>spill_fs - force spilling of all registers in the scalar backend (useful to debug spilling code)</li>
<li>spill_vec4 - force spilling of all registers in the vec4 backend (useful to debug spilling code)</li>
<li>state - emit messages about state flag tracking</li>
<li>submit - emit batchbuffer usage statistics</li>
<li>sync - after sending each batch, emit a message and wait for that batch to finish rendering</li>
<li>tcs - dump shader assembly for tessellation control shaders</li>
<li>tes - dump shader assembly for tessellation evaluation shaders</li>
<li>tex - emit messages about textures.</li>
<li>urb - emit messages about URB setup</li>
<li>vert - emit messages about vertex assembly</li>
<li>vs - dump shader assembly for vertex shaders</li>
</ul>
<li>INTEL_SCALAR_VS (or TCS, TES, GS) - force scalar/vec4 mode for a shader stage (Gen8-9 only)</li>
<li>INTEL_PRECISE_TRIG - if set to 1, true or yes, then the driver prefers
accuracy over performance in trig functions.</li>
</ul>
<dl>
<dt><code>INTEL_NO_HW</code></dt>
<dd>if set to 1, prevents batches from being submitted to the hardware.
This is useful for debugging hangs, etc.</dd>
<dt><code>INTEL_DEBUG</code></dt>
<dd>a comma-separated list of named flags, which do various things:
<dl>
<dt><code>ann</code></dt>
<dd>annotate IR in assembly dumps</dd>
<dt><code>aub</code></dt>
<dd>dump batches into an AUB trace for use with simulation tools</dd>
<dt><code>bat</code></dt>
<dd>emit batch information</dd>
<dt><code>blit</code></dt>
<dd>emit messages about blit operations</dd>
<dt><code>blorp</code></dt>
<dd>emit messages about the blorp operations (blits &amp; clears)</dd>
<dt><code>buf</code></dt>
<dd>emit messages about buffer objects</dd>
<dt><code>clip</code></dt>
<dd>emit messages about the clip unit (for old gens, includes the CLIP program)</dd>
<dt><code>color</code></dt>
<dd>use color in output</dd>
<dt><code>cs</code></dt>
<dd>dump shader assembly for compute shaders</dd>
<dt><code>do32</code></dt>
<dd>generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</dd>
<dt><code>dri</code></dt>
<dd>emit messages about the DRI interface</dd>
<dt><code>fbo</code></dt>
<dd>emit messages about framebuffers</dd>
<dt><code>fs</code></dt>
<dd>dump shader assembly for fragment shaders</dd>
<dt><code>gs</code></dt>
<dd>dump shader assembly for geometry shaders</dd>
<dt><code>hex</code></dt>
<dd>print instruction hex dump with the disassembly</dd>
<dt><code>l3</code></dt>
<dd>emit messages about the new L3 state during transitions</dd>
<dt><code>miptree</code></dt>
<dd>emit messages about miptrees</dd>
<dt><code>no8</code></dt>
<dd>don't generate SIMD8 fragment shader</dd>
<dt><code>no16</code></dt>
<dd>suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</dd>
<dt><code>nocompact</code></dt>
<dd>disable instruction compaction</dd>
<dt><code>nodualobj</code></dt>
<dd>suppress generation of dual-object geometry shader code</dd>
<dt><code>norbc</code></dt>
<dd>disable single sampled render buffer compression</dd>
<dt><code>optimizer</code></dt>
<dd>dump shader assembly to files at each optimization pass and iteration that make progress</dd>
<dt><code>perf</code></dt>
<dd>emit messages about performance issues</dd>
<dt><code>perfmon</code></dt>
<dd>emit messages about <code>AMD_performance_monitor</code></dd>
<dt><code>pix</code></dt>
<dd>emit messages about pixel operations</dd>
<dt><code>prim</code></dt>
<dd>emit messages about drawing primitives</dd>
<dt><code>reemit</code></dt>
<dd>mark all state dirty on each draw call</dd>
<dt><code>sf</code></dt>
<dd>emit messages about the strips &amp; fans unit (for old gens, includes the SF program)</dd>
<dt><code>shader_time</code></dt>
<dd>record how much GPU time is spent in each shader</dd>
<dt><code>spill_fs</code></dt>
<dd>force spilling of all registers in the scalar backend (useful to debug spilling code)</dd>
<dt><code>spill_vec4</code></dt>
<dd>force spilling of all registers in the vec4 backend (useful to debug spilling code)</dd>
<dt><code>state</code></dt>
<dd>emit messages about state flag tracking</dd>
<dt><code>submit</code></dt>
<dd>emit batchbuffer usage statistics</dd>
<dt><code>sync</code></dt>
<dd>after sending each batch, emit a message and wait for that batch to finish rendering</dd>
<dt><code>tcs</code></dt>
<dd>dump shader assembly for tessellation control shaders</dd>
<dt><code>tes</code></dt>
<dd>dump shader assembly for tessellation evaluation shaders</dd>
<dt><code>tex</code></dt>
<dd>emit messages about textures.</dd>
<dt><code>urb</code></dt>
<dd>emit messages about URB setup</dd>
<dt><code>vert</code></dt>
<dd>emit messages about vertex assembly</dd>
<dt><code>vs</code></dt>
<dd>dump shader assembly for vertex shaders</dd>
</dl>
</dd>
<dt><code>INTEL_SCALAR_VS</code> (or <code>TCS</code>, <code>TES</code>,
<code>GS</code>)</dt>
<dd>force scalar/vec4 mode for a shader stage (Gen8-9 only)</dd>
<dt><code>INTEL_PRECISE_TRIG</code></dt>
<dd>if set to 1, true or yes, then the driver prefers accuracy over
performance in trig functions.</dd>
</dl>
<h2>Radeon driver environment variables (radeon, r200, and r300g)</h2>
<ul>
<li>RADEON_NO_TCL - if set, disable hardware-accelerated Transform/Clip/Lighting.
</ul>
<dl>
<dt><code>RADEON_NO_TCL</code></dt>
<dd>if set, disable hardware-accelerated Transform/Clip/Lighting.</dd>
</dl>
<h2>EGL environment variables</h2>
@@ -258,122 +376,170 @@ Mesa EGL supports different sets of environment variables. See the
<h2>Gallium environment variables</h2>
<ul>
<li>GALLIUM_HUD - draws various information on the screen, like framerate,
<dl>
<dt><code>GALLIUM_HUD</code></dt>
<dd>draws various information on the screen, like framerate,
cpu load, driver statistics, performance counters, etc.
Set GALLIUM_HUD=help and run e.g. glxgears for more info.
<li>GALLIUM_HUD_PERIOD - sets the hud update rate in seconds (float). Use zero
to update every frame. The default period is 1/2 second.
<li>GALLIUM_HUD_VISIBLE - control default visibility, defaults to true.
<li>GALLIUM_HUD_TOGGLE_SIGNAL - toggle visibility via user specified signal.
Set <code>GALLIUM_HUD=help</code> and run e.g.
<code>glxgears</code> for more info.</dd>
<dt><code>GALLIUM_HUD_PERIOD</code></dt>
<dd>sets the hud update rate in seconds (float). Use zero
to update every frame. The default period is 1/2 second.</dd>
<dt><code>GALLIUM_HUD_VISIBLE</code></dt>
<dd>control default visibility, defaults to true.</dd>
<dt><code>GALLIUM_HUD_TOGGLE_SIGNAL</code></dt>
<dd>toggle visibility via user specified signal.
Especially useful to toggle hud at specific points of application and
disable for unencumbered viewing the rest of the time. For example, set
GALLIUM_HUD_VISIBLE to false and GALLIUM_HUD_TOGGLE_SIGNAL to 10 (SIGUSR1).
Use kill -10 &lt;pid&gt; to toggle the hud as desired.
<li>GALLIUM_HUD_DUMP_DIR - specifies a directory for writing the displayed
hud values into files.
<li>GALLIUM_DRIVER - useful in combination with LIBGL_ALWAYS_SOFTWARE=true for
choosing one of the software renderers "softpipe", "llvmpipe" or "swr".
<li>GALLIUM_LOG_FILE - specifies a file for logging all errors, warnings, etc.
rather than stderr.
<li>GALLIUM_PRINT_OPTIONS - if non-zero, print all the Gallium environment
variables which are used, and their current values.
<li>GALLIUM_DUMP_CPU - if non-zero, print information about the CPU on start-up
<li>TGSI_PRINT_SANITY - if set, do extra sanity checking on TGSI shaders and
print any errors to stderr.
<LI>DRAW_FSE - ???
<LI>DRAW_NO_FSE - ???
<li>DRAW_USE_LLVM - if set to zero, the draw module will not use LLVM to execute
shaders, vertex fetch, etc.
<li>ST_DEBUG - controls debug output from the Mesa/Gallium state tracker.
Setting to "tgsi", for example, will print all the TGSI shaders.
See src/mesa/state_tracker/st_debug.c for other options.
</ul>
<code>GALLIUM_HUD_VISIBLE</code> to <code>false</code> and
<code>GALLIUM_HUD_TOGGLE_SIGNAL</code> to <code>10</code>
(<code>SIGUSR1</code>).
Use <code>kill -10 &lt;pid&gt;</code> to toggle the hud as desired.</dd>
<dt><code>GALLIUM_HUD_DUMP_DIR</code></dt>
<dd>specifies a directory for writing the displayed hud values into files.</dd>
<dt><code>GALLIUM_DRIVER</code></dt>
<dd>useful in combination with <code>LIBGL_ALWAYS_SOFTWARE=true</code> for
choosing one of the software renderers <code>softpipe</code>,
<code>llvmpipe</code> or <code>swr</code>.</dd>
<dt><code>GALLIUM_LOG_FILE</code></dt>
<dd>specifies a file for logging all errors, warnings, etc.
rather than stderr.</dd>
<dt><code>GALLIUM_PRINT_OPTIONS</code></dt>
<dd>if non-zero, print all the Gallium environment variables which are
used, and their current values.</dd>
<dt><code>GALLIUM_DUMP_CPU</code></dt>
<dd>if non-zero, print information about the CPU on start-up</dd>
<dt><code>TGSI_PRINT_SANITY</code></dt>
<dd>if set, do extra sanity checking on TGSI shaders and
print any errors to stderr.</dd>
<dt><code>DRAW_FSE</code></dt>
<dd>???</dd>
<dt><code>DRAW_NO_FSE</code></dt>
<dd>???</dd>
<dt><code>DRAW_USE_LLVM</code></dt>
<dd>if set to zero, the draw module will not use LLVM to execute
shaders, vertex fetch, etc.</dd>
<dt><code>ST_DEBUG</code></dt>
<dd>controls debug output from the Mesa/Gallium state tracker.
Setting to <code>tgsi</code>, for example, will print all the TGSI
shaders. See <code>src/mesa/state_tracker/st_debug.c</code> for other
options.</dd>
</dl>
<h3>Clover state tracker environment variables</h3>
<ul>
<li>CLOVER_EXTRA_BUILD_OPTIONS - allows specifying additional compiler and linker
<dl>
<dt><code>CLOVER_EXTRA_BUILD_OPTIONS</code></dt>
<dd>allows specifying additional compiler and linker
options. Specified options are appended after the options set by the OpenCL
program in clBuildProgram.
<li>CLOVER_EXTRA_COMPILE_OPTIONS - allows specifying additional compiler
program in <code>clBuildProgram</code>.</dd>
<dt><code>CLOVER_EXTRA_COMPILE_OPTIONS</code></dt>
<dd>allows specifying additional compiler
options. Specified options are appended after the options set by the OpenCL
program in clCompileProgram.
<li>CLOVER_EXTRA_LINK_OPTIONS - allows specifying additional linker
program in <code>clCompileProgram</code>.</dd>
<dt><code>CLOVER_EXTRA_LINK_OPTIONS</code></dt>
<dd>allows specifying additional linker
options. Specified options are appended after the options set by the OpenCL
program in clLinkProgram.
</ul>
program in <code>clLinkProgram</code>.</dd>
</dl>
<h3>Softpipe driver environment variables</h3>
<ul>
<li>SOFTPIPE_DUMP_FS - if set, the softpipe driver will print fragment shaders
to stderr
<li>SOFTPIPE_DUMP_GS - if set, the softpipe driver will print geometry shaders
to stderr
<li>SOFTPIPE_NO_RAST - if set, rasterization is no-op'd. For profiling purposes.
<li>SOFTPIPE_USE_LLVM - if set, the softpipe driver will try to use LLVM JIT for
vertex shading processing.
</ul>
<dl>
<dt><code>SOFTPIPE_DUMP_FS</code></dt>
<dd>if set, the softpipe driver will print fragment shaders to stderr</dd>
<dt><code>SOFTPIPE_DUMP_GS</code></dt>
<dd>if set, the softpipe driver will print geometry shaders to stderr</dd>
<dt><code>SOFTPIPE_NO_RAST</code></dt>
<dd>if set, rasterization is no-op'd. For profiling purposes.</dd>
<dt><code>SOFTPIPE_USE_LLVM</code></dt>
<dd>if set, the softpipe driver will try to use LLVM JIT for
vertex shading processing.</dd>
</dl>
<h3>LLVMpipe driver environment variables</h3>
<ul>
<li>LP_NO_RAST - if set LLVMpipe will no-op rasterization
<li>LP_DEBUG - a comma-separated list of debug options is accepted. See the
source code for details.
<li>LP_PERF - a comma-separated list of options to selectively no-op various
parts of the driver. See the source code for details.
<li>LP_NUM_THREADS - an integer indicating how many threads to use for rendering.
<dl>
<dt><code>LP_NO_RAST</code></dt>
<dd>if set LLVMpipe will no-op rasterization</dd>
<dt><code>LP_DEBUG</code></dt>
<dd>a comma-separated list of debug options is accepted. See the
source code for details.</dd>
<dt><code>LP_PERF</code></dt>
<dd>a comma-separated list of options to selectively no-op various
parts of the driver. See the source code for details.</dd>
<dt><code>LP_NUM_THREADS</code></dt>
<dd>an integer indicating how many threads to use for rendering.
Zero turns off threading completely. The default value is the number of CPU
cores present.
</ul>
cores present.</dd>
</dl>
<h3>VMware SVGA driver environment variables</h3>
<ul>
<li>SVGA_FORCE_SWTNL - force use of software vertex transformation
<li>SVGA_NO_SWTNL - don't allow software vertex transformation fallbacks
(will often result in incorrect rendering).
<li>SVGA_DEBUG - for dumping shaders, constant buffers, etc. See the code
for details.
<li>SVGA_EXTRA_LOGGING - if set, enables extra logging to the vmware.log file,
such as the OpenGL program's name and command line arguments.
<li>SVGA_NO_LOGGING - if set, disables logging to the vmware.log file.
This is useful when using Valgrind because it otherwise crashes when
initializing the host log feature.
<li>See the driver code for other, lesser-used variables.
</ul>
<dl>
<dt><code>SVGA_FORCE_SWTNL</code></dt>
<dd>force use of software vertex transformation</dd>
<dt><code>SVGA_NO_SWTNL</code></dt>
<dd>don't allow software vertex transformation fallbacks (will often result
in incorrect rendering).</dd>
<dt><code>SVGA_DEBUG</code></dt>
<dd>for dumping shaders, constant buffers, etc. See the code for
details.</dd>
<dt><code>SVGA_EXTRA_LOGGING</code></dt>
<dd>if set, enables extra logging to the <code>vmware.log</code> file,
such as the OpenGL program's name and command line arguments.</dd>
<dt><code>SVGA_NO_LOGGING</code></dt>
<dd>if set, disables logging to the <code>vmware.log</code> file. This is
useful when using Valgrind because it otherwise crashes when
initializing the host log feature.</dd>
</dl>
<p>See the driver code for other, lesser-used variables.</p>
<h3>WGL environment variables</h3>
<ul>
<li>WGL_SWAP_INTERVAL - to set a swap interval, equivalent to calling
wglSwapIntervalEXT() in an application. If this environment variable
is set, application calls to wglSwapIntervalEXT() will have no effect.
</ul>
<dl>
<dt><code>WGL_SWAP_INTERVAL</code></dt>
<dd>to set a swap interval, equivalent to calling
<code>wglSwapIntervalEXT()</code> in an application. If this
environment variable is set, application calls to
<code>wglSwapIntervalEXT()</code> will have no effect.</dd>
</dl>
<h3>VA-API state tracker environment variables</h3>
<ul>
<li>VAAPI_MPEG4_ENABLED - enable MPEG4 for VA-API, disabled by default.
</ul>
<dl>
<dt><code>VAAPI_MPEG4_ENABLED</code></dt>
<dd>enable MPEG4 for VA-API, disabled by default.</dd>
</dl>
<h3>VC4 driver environment variables</h3>
<ul>
<li>VC4_DEBUG - a comma-separated list of named flags, which do various things:
<ul>
<li>cl - dump command list during creation</li>
<li>qpu - dump generated QPU instructions</li>
<li>qir - dump QPU IR during program compile</li>
<li>nir - dump NIR during program compile</li>
<li>tgsi - dump TGSI during program compile</li>
<li>shaderdb - dump program compile information for shader-db analysis</li>
<li>perf - print during performance-related events</li>
<li>norast - skip actual hardware execution of commands</li>
<li>always_flush - flush after each draw call</li>
<li>always_sync - wait for finish after each flush</li>
<li>dump - write a GPU command stream trace file (VC4 simulator only)</li>
</ul>
</ul>
<dl>
<dt><code>VC4_DEBUG</code></dt>
<dd>a comma-separated list of named flags, which do various things:
<dl>
<dt><code>cl</code></dt>
<dd>dump command list during creation</dd>
<dt><code>qpu</code></dt>
<dd>dump generated QPU instructions</dd>
<dt><code>qir</code></dt>
<dd>dump QPU IR during program compile</dd>
<dt><code>nir</code></dt>
<dd>dump NIR during program compile</dd>
<dt><code>tgsi</code></dt>
<dd>dump TGSI during program compile</dd>
<dt><code>shaderdb</code></dt>
<dd>dump program compile information for shader-db analysis</dd>
<dt><code>perf</code></dt>
<dd>print during performance-related events</dd>
<dt><code>norast</code></dt>
<dd>skip actual hardware execution of commands</dd>
<dt><code>always_flush</code></dt>
<dd>flush after each draw call</dd>
<dt><code>always_sync</code></dt>
<dd>wait for finish after each flush</dd>
<dt><code>dump</code></dt>
<dd>write a GPU command stream trace file (VC4 simulator only)</dd>
</dl>
</dd>
</dl>
<p>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>

View File

@@ -2,23 +2,21 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa FAQ</title>
<title>Frequently Asked Questions</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Mesa Frequently Asked Questions</h1>
<h1>Frequently Asked Questions</h1>
Last updated: 19 September 2018
<br>
<br>
<h2>Index</h2>
<ol>
<li><a href="#part1">High-level Questions and Answers</a></li>
@@ -26,14 +24,10 @@ Last updated: 19 September 2018
<li><a href="#part3">Runtime / Rendering Problems</a></li>
<li><a href="#part4">Developer Questions</a></li>
</ol>
<br>
<br>
<h2 id="part1">1. High-level Questions and Answers</h2>
<h1 id="part1">1. High-level Questions and Answers</h1>
<h2>1.1 What is Mesa?</h2>
<h3>1.1 What is Mesa?</h3>
<p>
Mesa is an open-source implementation of the OpenGL specification.
OpenGL is a programming library for writing interactive 3D applications.
@@ -102,17 +96,17 @@ the Xlib API:
<li>The GLX wire protocol is not supported and there's no OpenGL extension
loaded by the X server.
<li>There is no hardware acceleration.
<li>The OpenGL library, libGL.so, contains everything (the programming API,
the GLX functions and all the rendering code).
<li>The OpenGL library, <code>libGL.so</code>, contains everything (the
programming API, the GLX functions and all the rendering code).
</ul>
<p>
Alternately, Mesa acts as the core for a number of OpenGL hardware drivers
within the DRI (Direct Rendering Infrastructure):
<ul>
<li>The libGL.so library provides the GL and GLX API functions, a GLX
protocol encoder, and a device driver loader.
<li>The device driver modules (such as r200_dri.so) contain a built-in
copy of the core Mesa code.
<li>The <code>libGL.so</code> library provides the GL and GLX API functions,
a GLX protocol encoder, and a device driver loader.
<li>The device driver modules (such as <code>r200_dri.so</code>) contain
a built-in copy of the core Mesa code.
<li>The X server loads the GLX module.
The GLX module decodes incoming GLX protocol and dispatches the commands
to a rendering module.
@@ -132,7 +126,7 @@ Just follow the Mesa <a href="install.html">compilation instructions</a>.
<h2>1.6 Are there other open-source implementations of OpenGL?</h2>
<p>
Yes, SGI's <a href="http://oss.sgi.com/projects/ogl-sample/index.html">
Yes, SGI's <a href="http://web.archive.org/web/20171010115110_/http://oss.sgi.com/projects/ogl-sample/index.html">
OpenGL Sample Implementation (SI)</a> is available.
The SI was written during the time that OpenGL was originally designed.
Unfortunately, development of the SI has stagnated.
@@ -144,8 +138,9 @@ Mesa is much more up to date with modern features and extensions.
an open-source implementation of OpenGL ES for mobile devices.
<p>
<a href="http://www.dsbox.com/minigl.html">miniGL</a>
is a subset of OpenGL for PalmOS devices.
<a href="http://web.archive.org/web/20130830162848/http://www.dsbox.com/minigl.html">miniGL</a>
is a subset of OpenGL for PalmOS devices. The website is gone, but the source
code can still be found on <a href="https://sourceforge.net/projects/minigl/">sourceforge.net</a>.
<p>
<a href="http://bellard.org/TinyGL/">TinyGL</a>
@@ -175,22 +170,16 @@ popular and feature-complete.
</p>
<h2 id="part2">2. Compilation and Installation Problems</h2>
<br>
<br>
<h1 id="part2">2. Compilation and Installation Problems</h1>
<h2>2.1 What's the easiest way to install Mesa?</h2>
<h3>2.1 What's the easiest way to install Mesa?</h3>
<p>
If you're using a Linux-based system, your distro CD most likely already
has Mesa packages (like RPM or DEB) which you can easily install.
</p>
<h2>2.2 I get undefined symbols such as bgnpolygon, v3f, etc...</h2>
<h3>2.2 I get undefined symbols such as bgnpolygon, v3f, etc...</h3>
<p>
You're application is written in IRIS GL, not OpenGL.
IRIS GL was the predecessor to OpenGL and is a different thing (almost)
@@ -199,38 +188,49 @@ Mesa's not the solution.
</p>
<h2>2.3 Where is the GLUT library?</h2>
<h3>2.3 Where is the GLUT library?</h3>
<p>
GLUT (OpenGL Utility Toolkit) is no longer in the separate MesaGLUT-x.y.z.tar.gz file.
GLUT (OpenGL Utility Toolkit) is no longer in the separate
<code>MesaGLUT-x.y.z.tar.gz</code> file.
If you don't already have GLUT installed, you should grab
<a href="http://freeglut.sourceforge.net/">freeglut</a>.
</p>
<h2>2.4 Where is the GLw library?</h2>
<h3>2.4 Where is the GLw library?</h3>
<p>
GLw (OpenGL widget library) is now available from a separate <a href="https://cgit.freedesktop.org/mesa/glw/">git repository</a>. Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.
GLw (OpenGL widget library) is now available from a separate <a href="https://gitlab.freedesktop.org/mesa/glw">git repository</a>. Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.
</p>
<h2>2.5 What's the proper place for the libraries and headers?</h2>
<p>
On Linux-based systems you'll want to follow the
<a href="http://oss.sgi.com/projects/ogl-sample/ABI/index.html">Linux ABI</a> standard.
<a href="https://www.khronos.org/registry/OpenGL/ABI/">Linux ABI</a> standard.
Basically you'll want the following:
</p>
<ul>
<li>/usr/include/GL/gl.h - the main OpenGL header
</li><li>/usr/include/GL/glu.h - the OpenGL GLU (utility) header
</li><li>/usr/include/GL/glx.h - the OpenGL GLX header
</li><li>/usr/include/GL/glext.h - the OpenGL extensions header
</li><li>/usr/include/GL/glxext.h - the OpenGL GLX extensions header
</li><li>/usr/include/GL/osmesa.h - the Mesa off-screen rendering header
</li><li>/usr/lib/libGL.so - a symlink to libGL.so.1
</li><li>/usr/lib/libGL.so.1 - a symlink to libGL.so.1.xyz
</li><li>/usr/lib/libGL.so.xyz - the actual OpenGL/Mesa library. xyz denotes the
<dl>
<dt><code>/usr/include/GL/gl.h</code></dt>
<dd>the main OpenGL header</dd>
<dt><code>/usr/include/GL/glu.h</code></dt>
<dd>the OpenGL GLU (utility) header</dd>
<dt><code>/usr/include/GL/glx.h</code></dt>
<dd>the OpenGL GLX header</dd>
<dt><code>/usr/include/GL/glext.h</code></dt>
<dd>the OpenGL extensions header</dd>
<dt><code>/usr/include/GL/glxext.h</code></dt>
<dd>the OpenGL GLX extensions header</dd>
<dt><code>/usr/include/GL/osmesa.h</code></dt>
<dd>the Mesa off-screen rendering header</dd>
<dt><code>/usr/lib/libGL.so</code></dt>
<dd>a symlink to <code>libGL.so.1</code></dd>
<dt><code>/usr/lib/libGL.so.1</code></dt>
<dd>a symlink to <code>libGL.so.1.xyz</code></dd>
<dt><code>/usr/lib/libGL.so.xyz</code></dt>
<dd>the actual OpenGL/Mesa library. xyz denotes the
Mesa version number.
</li></ul>
</dd>
</dl>
<p>
When configuring Mesa, there are three meson options that affect the install
location that you should take care with: <code>--prefix</code>,
@@ -249,13 +249,11 @@ After determining the correct values for the install location, configure Mesa
with <code>meson configure --prefix=/usr --libdir=xxx -D dri-drivers-path=xxx</code>
and then install with <code>sudo ninja install</code>.
</p>
<br>
<br>
<h1 id="part3">3. Runtime / Rendering Problems</h1>
<h2 id="part3">3. Runtime / Rendering Problems</h2>
<h2>3.1 Rendering is slow / why isn't my graphics hardware being used?</h2>
<h3>3.1 Rendering is slow / why isn't my graphics hardware being used?</h3>
<p>
If Mesa can't use its hardware accelerated drivers it falls back on one of its software renderers.
(eg. classic swrast, softpipe or llvmpipe)
@@ -276,60 +274,57 @@ If your DRI-based driver isn't working, go to the
</p>
<h2>3.2 I'm seeing errors in depth (Z) buffering. Why?</h2>
<h3>3.2 I'm seeing errors in depth (Z) buffering. Why?</h3>
<p>
Make sure the ratio of the far to near clipping planes isn't too great.
Look
<a href="https://www.opengl.org/resources/faq/technical/depthbuffer.htm#0040">here</a>
<a href="https://www.opengl.org/archives/resources/faq/technical/depthbuffer.htm#0040">here</a>
for details.
</p>
<p>
Mesa uses a 16-bit depth buffer by default which is smaller and faster
to clear than a 32-bit buffer but not as accurate.
If you need a deeper you can modify the parameters to
<code> glXChooseVisual</code> in your code.
<code>glXChooseVisual</code> in your code.
</p>
<h2>3.3 Why Isn't depth buffering working at all?</h2>
<h3>3.3 Why Isn't depth buffering working at all?</h3>
<p>
Be sure you're requesting a depth buffered-visual. If you set the MESA_DEBUG
environment variable it will warn you about trying to enable depth testing
when you don't have a depth buffer.
Be sure you're requesting a depth buffered-visual. If you set the
<code>MESA_DEBUG</code> environment variable it will warn you about trying
to enable depth testing when you don't have a depth buffer.
</p>
<p>Specifically, make sure <code>glutInitDisplayMode</code> is being called
with <code>GLUT_DEPTH</code> or <code>glXChooseVisual</code> is being
called with a non-zero value for GLX_DEPTH_SIZE.
called with a non-zero value for <code>GLX_DEPTH_SIZE</code>.
</p>
<p>This discussion applies to stencil buffers, accumulation buffers and
alpha channels too.
</p>
<h2>3.4 Why does glGetString() always return NULL?</h2>
<h3>3.4 Why does <code>glGetString()</code> always return <code>NULL</code>?</h3>
<p>
Be sure you have an active/current OpenGL rendering context before
calling glGetString.
calling <code>glGetString</code>.
</p>
<h2>3.5 GL_POINTS and GL_LINES don't touch the right pixels</h2>
<h3>3.5 <code>GL_POINTS</code> and <code>GL_LINES</code> don't touch the
right pixels</h3>
<p>
If you're trying to draw a filled region by using GL_POINTS or GL_LINES
and seeing holes or gaps it's because of a float-to-int rounding problem.
But this is not a bug.
See Appendix H of the OpenGL Programming Guide - "OpenGL Correctness Tips".
Basically, applying a translation of (0.375, 0.375, 0.0) to your coordinates
will fix the problem.
If you're trying to draw a filled region by using <code>GL_POINTS</code> or
<code>GL_LINES</code> and seeing holes or gaps it's because of a float-to-int
rounding problem. But this is not a bug. See Appendix H of the OpenGL
Programming Guide - "OpenGL Correctness Tips". Basically, applying a
translation of (0.375, 0.375, 0.0) to your coordinates will fix the problem.
</p>
<br>
<br>
<h2 id="part4">4. Developer Questions</h2>
<h1 id="part4">4. Developer Questions</h1>
<h2>4.1 How can I contribute?</h2>
<h3>4.1 How can I contribute?</h3>
<p>
First, join the <a href="lists.html">mesa-dev mailing list</a>.
That's where Mesa development is discussed.
@@ -343,7 +338,7 @@ You should read it.
extensions, writing hardware drivers (for the DRI), and code optimization.
</p>
<h2>4.2 How do I write a new device driver?</h2>
<h3>4.2 How do I write a new device driver?</h3>
<p>
Unfortunately, writing a device driver isn't easy.
It requires detailed understanding of OpenGL, the Mesa code, and your
@@ -367,7 +362,8 @@ the archives) is a good way to get information.
</p>
<h2>4.3 Why isn't GL_EXT_texture_compression_s3tc implemented in Mesa?</h2>
<h3>4.3 Why isn't <code>GL_EXT_texture_compression_s3tc</code> implemented in
Mesa?</h3>
<p>
Oh but it is! Prior to 2nd October 2017, the Mesa project did not include s3tc
support due to intellectual property (IP) and/or patent issues around the s3tc

View File

@@ -127,14 +127,14 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, radeonsi, virgl
- Enhanced per-sample shading DONE ()
- Interpolation functions DONE (softpipe)
- New overload resolution rules DONE (softpipe)
GL_ARB_gpu_shader_fp64 DONE (i965/gen7+, llvmpipe, softpipe)
GL_ARB_gpu_shader_fp64 DONE (i965/gen7+, llvmpipe, softpipe, swr)
GL_ARB_sample_shading DONE (freedreno/a6xx, i965/gen6+, nv50)
GL_ARB_shader_subroutine DONE (freedreno, i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_tessellation_shader DONE (i965/gen7+)
GL_ARB_texture_buffer_object_rgb32 DONE (freedreno, i965/gen6+, llvmpipe, softpipe, swr)
GL_ARB_texture_cube_map_array DONE (i965/gen6+, nv50, llvmpipe, softpipe)
GL_ARB_texture_cube_map_array DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_gather DONE (freedreno, i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_query_lod DONE (freedreno, i965, nv50, llvmpipe, softpipe)
GL_ARB_texture_query_lod DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback2 DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
GL_ARB_transform_feedback3 DONE (i965/gen7+, llvmpipe, softpipe, swr)
@@ -145,15 +145,15 @@ GL 4.1, GLSL 4.10 --- all DONE: i965/gen7+, nvc0, r600, radeonsi, virgl
GL_ARB_get_program_binary DONE (0 or 1 binary formats)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_shader_precision DONE (i965/gen7+, all drivers that support GLSL 4.10)
GL_ARB_vertex_attrib_64bit DONE (i965/gen7+, llvmpipe, softpipe)
GL_ARB_viewport_array DONE (i965, nv50, llvmpipe, softpipe)
GL_ARB_vertex_attrib_64bit DONE (i965/gen7+, llvmpipe, softpipe, swr)
GL_ARB_viewport_array DONE (i965, nv50, llvmpipe, softpipe, swr)
GL 4.2, GLSL 4.20 -- all DONE: i965/gen7+, nvc0, r600, radeonsi, virgl
GL_ARB_texture_compression_bptc DONE (freedreno, i965)
GL_ARB_compressed_texture_pixel_storage DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (freedreno/a5xx+, i965, softpipe)
GL_ARB_shader_atomic_counters DONE (freedreno/a5xx+, i965, llvmpipe, softpipe)
GL_ARB_texture_storage DONE (all drivers)
GL_ARB_transform_feedback_instanced DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_base_instance DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
@@ -171,7 +171,7 @@ GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, r600, radeonsi, virgl
GL_ARB_ES3_compatibility DONE (all drivers that support GLSL 3.30)
GL_ARB_clear_buffer_object DONE (all drivers)
GL_ARB_compute_shader DONE (freedreno/a5xx+, i965, softpipe)
GL_ARB_copy_image DONE (i965, nv50, softpipe, llvmpipe)
GL_ARB_copy_image DONE (i965, nv50, softpipe, llvmpipe, swr)
GL_KHR_debug DONE (all drivers)
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_fragment_layer_viewport DONE (i965, nv50, llvmpipe, softpipe)
@@ -182,9 +182,9 @@ GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, r600, radeonsi, virgl
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_robust_buffer_access_behavior DONE (i965)
GL_ARB_shader_image_size DONE (freedreno/a5xx+, i965, softpipe)
GL_ARB_shader_storage_buffer_object DONE (freedreno/a5xx+, i965, softpipe)
GL_ARB_shader_storage_buffer_object DONE (freedreno/a5xx+, i965, llvmpipe, softpipe)
GL_ARB_stencil_texturing DONE (freedreno, i965/hsw+, nv50, llvmpipe, softpipe, swr)
GL_ARB_texture_buffer_range DONE (freedreno, nv50, i965, softpipe, llvmpipe)
GL_ARB_texture_buffer_range DONE (freedreno, nv50, i965, softpipe, llvmpipe, swr)
GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30)
GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample)
GL_ARB_texture_view DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
@@ -209,17 +209,17 @@ GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+, nvc0, r600, radeonsi
GL_ARB_texture_stencil8 DONE (freedreno, i965/hsw+, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_vertex_type_10f_11f_11f_rev DONE (i965, nv50, llvmpipe, softpipe, swr, virgl)
GL 4.5, GLSL 4.50 -- all DONE: nvc0, radeonsi
GL 4.5, GLSL 4.50 -- all DONE: nvc0, radeonsi, r600
GL_ARB_ES3_1_compatibility DONE (i965/hsw+, r600, softpipe, virgl)
GL_ARB_clip_control DONE (freedreno, i965, nv50, r600, llvmpipe, softpipe, swr)
GL_ARB_conditional_render_inverted DONE (freedreno, i965, nv50, r600, llvmpipe, softpipe, swr, virgl)
GL_ARB_cull_distance DONE (i965, nv50, r600, llvmpipe, softpipe, swr, virgl)
GL_ARB_derivative_control DONE (i965, nv50, r600, virgl)
GL_ARB_ES3_1_compatibility DONE (i965/hsw+, softpipe, virgl)
GL_ARB_clip_control DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
GL_ARB_conditional_render_inverted DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_cull_distance DONE (i965, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_derivative_control DONE (i965, nv50, softpipe, virgl)
GL_ARB_direct_state_access DONE (all drivers)
GL_ARB_get_texture_sub_image DONE (all drivers)
GL_ARB_shader_texture_image_samples DONE (i965, nv50, r600, virgl)
GL_ARB_texture_barrier DONE (freedreno, i965, nv50, r600, virgl)
GL_ARB_shader_texture_image_samples DONE (i965, nv50, virgl)
GL_ARB_texture_barrier DONE (freedreno, i965, nv50, virgl)
GL_KHR_context_flush_control DONE (all - but needs GLX/EGL extension to be useful)
GL_KHR_robustness DONE (freedreno, i965)
GL_EXT_shader_integer_mix DONE (all drivers that support GLSL)
@@ -230,7 +230,7 @@ GL 4.6, GLSL 4.60
GL_ARB_indirect_parameters DONE (i965/gen7+, nvc0, radeonsi, virgl)
GL_ARB_pipeline_statistics_query DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
GL_ARB_polygon_offset_clamp DONE (freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, swr, virgl)
GL_ARB_shader_atomic_counter_ops DONE (freedreno/a5xx+, i965/gen7+, nvc0, r600, radeonsi, softpipe, virgl)
GL_ARB_shader_atomic_counter_ops DONE (freedreno/a5xx+, i965/gen7+, nvc0, r600, radeonsi, llvmpipe, softpipe, virgl)
GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi)
GL_ARB_shader_group_vote DONE (i965, nvc0, radeonsi)
GL_ARB_spirv_extensions in progress (Nicolai Hähnle, Ian Romanick)
@@ -249,10 +249,10 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, r600, radeonsi, virgl
GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL)
GL_ARB_framebuffer_no_attachments DONE (freedreno, i965/gen7+, softpipe)
GL_ARB_program_interface_query DONE (all drivers)
GL_ARB_shader_atomic_counters DONE (freedreno/a5xx+, i965/gen7+, softpipe)
GL_ARB_shader_atomic_counters DONE (freedreno/a5xx+, i965/gen7+, llvmpipe, softpipe)
GL_ARB_shader_image_load_store DONE (freedreno/a5xx+, i965/gen7+, softpipe)
GL_ARB_shader_image_size DONE (freedreno/a5xx+, i965/gen7+, softpipe)
GL_ARB_shader_storage_buffer_object DONE (freedreno/a5xx+, i965/gen7+, softpipe)
GL_ARB_shader_storage_buffer_object DONE (freedreno/a5xx+, i965/gen7+, llvmpipe, softpipe)
GL_ARB_shading_language_packing DONE (all drivers)
GL_ARB_separate_shader_objects DONE (all drivers)
GL_ARB_stencil_texturing DONE (freedreno, nv50, llvmpipe, softpipe, swr)
@@ -303,10 +303,10 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
GL_ARB_fragment_shader_interlock DONE (i965)
GL_ARB_gpu_shader_int64 DONE (i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe)
GL_ARB_parallel_shader_compile DONE (all drivers)
GL_ARB_post_depth_coverage DONE (i965, nvc0)
GL_ARB_post_depth_coverage DONE (i965, nvc0, radeonsi)
GL_ARB_robustness_isolation not started
GL_ARB_sample_locations DONE (nvc0)
GL_ARB_seamless_cubemap_per_texture DONE (freedreno, i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
GL_ARB_seamless_cubemap_per_texture DONE (etnaviv/SEAMLESS_CUBE_MAP, freedreno, i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
GL_ARB_shader_ballot DONE (i965/gen8+, nvc0, radeonsi)
GL_ARB_shader_clock DONE (i965/gen7+, nv50, nvc0, r600, radeonsi, virgl)
GL_ARB_shader_stencil_export DONE (i965/gen9+, r600, radeonsi, softpipe, llvmpipe, swr, virgl)

View File

@@ -8,13 +8,13 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Help Wanted / To-Do List</h1>
<h1>Help Wanted</h1>
<p>
We can always use more help with the Mesa project.
@@ -29,11 +29,11 @@ immediately checked into git because not enough people are testing them.
Just applying patches, testing and reporting back is helpful.
<li>
<b>Driver debugging.</b>
There are plenty of open bugs in the <a href="https://bugs.freedesktop.org/describecomponents.cgi?product=Mesa">bug database</a>.
There are plenty of open bugs in the <a href="https://gitlab.freedesktop.org/mesa/mesa/issues">bug database</a>.
<li>
<b>Remove aliasing warnings.</b>
Enable gcc -Wstrict-aliasing=2 -fstrict-aliasing and track down aliasing
issues in the code.
Enable gcc's <code>-Wstrict-aliasing=2 -fstrict-aliasing</code> arguments, and
track down aliasing issues in the code.
<li>
<b>Contribute more tests to
<a href="https://piglit.freedesktop.org/">Piglit</a>.</b>
@@ -48,7 +48,8 @@ You can find some further To-do lists here:
</p>
<ul>
<li><a href="https://gitlab.freedesktop.org/mesa/mesa/blob/master/docs/features.txt">
<b>features.txt</b></a> - Status of OpenGL 3.x / 4.x features in Mesa.</li>
<code>features.txt</code></a> - Status of OpenGL 3.x / 4.x features in
Mesa.</li>
</ul>
<p>
@@ -56,9 +57,9 @@ You can find some further To-do lists here:
</p>
<ul>
<li><a href="https://dri.freedesktop.org/wiki/R600ToDo">
<b>r600g</b></a> - Driver for ATI/AMD R600 - Northern Island.</li>
<code>r600g</code></a> - Driver for ATI/AMD R600 - Northern Island.</li>
<li><a href="https://dri.freedesktop.org/wiki/R300ToDo">
<b>r300g</b></a> - Driver for ATI R300 - R500.</li>
<code>r300g</code></a> - Driver for ATI R300 - R500.</li>
</ul>
<p>

View File

@@ -8,13 +8,80 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>News</h1>
<h2>August 7, 2019</h2>
<p>
<a href="relnotes/19.1.4.html">Mesa 19.1.4</a> is released.
This is a bug-fix release.
</p>
<h2>July 23, 2019</h2>
<p>
<a href="relnotes/19.1.3.html">Mesa 19.1.3</a> is released.
This is a bug-fix release.
</p>
<h2>July 9, 2019</h2>
<p>
<a href="relnotes/19.1.2.html">Mesa 19.1.2</a> is released.
This is a bug-fix release.
</p>
<h2>June 26, 2019</h2>
<p>
<a href="relnotes/19.0.8.html">Mesa 19.0.8</a> is released.
This is an emergency bug fix release. Users of 19.0.7 should updated to 19.0.8
or 19.1.1 immediately.
</p>
<h2>June 25, 2019</h2>
<p>
<a href="relnotes/19.1.1.html">Mesa 19.1.1</a> is released.
This is a bug-fix release.
</p>
<h2>June 24, 2019</h2>
<p>
<a href="relnotes/19.0.7.html">Mesa 19.0.7</a> is released.
This is a bug-fix release.
</p>
<p>
NOTE: It is anticipated that 19.0.7 will be the final release in the
19.0 series. Users of 19.0 are encouraged to migrate to the 19.1
series in order to obtain future fixes.
</p>
<h2>June 11, 2019</h2>
<p>
<a href="relnotes/19.1.0.html">Mesa 19.1.0</a> is released.
This is a new development release. See the release notes for more
information about this release
</p>
<h2>June 5, 2019</h2>
<p>
<a href="relnotes/19.0.6.html">Mesa 19.0.6</a> is released.
This is a bug-fix release.
</p>
<h2>May 21, 2019</h2>
<p>
<a href="relnotes/19.0.5.html">Mesa 19.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>May 9, 2019</h2>
<p>
<a href="relnotes/19.0.4.html">Mesa 19.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>April 24, 2019</h2>
<p>
<a href="relnotes/19.0.3.html">Mesa 19.0.3</a> is released.
@@ -31,7 +98,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/18.3.6.html">Mesa 18.3.6</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 18.3.6 will be the final release in the
18.3 series. Users of 18.3 are encouraged to migrate to the 19.0
series in order to obtain future fixes.
@@ -78,7 +146,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/18.2.8.html">Mesa 18.2.8</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 18.2.8 will be the final release in the
18.2 series. Users of 18.2 are encouraged to migrate to the 18.3
series in order to obtain future fixes.
@@ -137,7 +206,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/18.1.9.html">Mesa 18.1.9</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 18.1.9 will be the final release in the
18.1 series. Users of 18.1 are encouraged to migrate to the 18.2
series in order to obtain future fixes.
@@ -199,7 +269,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/18.0.5.html">Mesa 18.0.5</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 18.0.5 will be the final release in the
18.0 series. Users of 18.0 are encouraged to migrate to the 18.1
series in order to obtain future fixes.
@@ -246,7 +317,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/17.3.9.html">Mesa 17.3.9</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 17.3.9 will be the final release in the
17.3 series. Users of 17.3 are encouraged to migrate to the 18.0
series in order to obtain future fixes.
@@ -305,7 +377,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/17.2.8.html">Mesa 17.2.8</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 17.2.8 will be the final release in the
17.2 series. Users of 17.2 are encouraged to migrate to the 17.3
series in order to obtain future fixes.
@@ -364,7 +437,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/17.1.10.html">Mesa 17.1.10</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 17.1.10 will be the final release in the
17.1 series. Users of 17.1 are encouraged to migrate to the 17.2
series in order to obtain future fixes.
@@ -435,7 +509,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/17.0.7.html">Mesa 17.0.7</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 17.0.7 will be the final release in the 17.0
series. Users of 17.0 are encouraged to migrate to the 17.1 series in order
to obtain future fixes.
@@ -484,7 +559,8 @@ This is a bug-fix release.
<a href="relnotes/17.0.2.html">Mesa 17.0.2</a> are released.
These are bug-fix releases from the 13.0 and 17.0 branches, respectively.
<br>
</p>
<p>
NOTE: It is anticipated that 13.0.6 will be the final release in the 13.0
series. Users of 13.0 are encouraged to migrate to the 17.0 series in order
to obtain future fixes.
@@ -519,7 +595,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/12.0.6.html">Mesa 12.0.6</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: This is an extra release for the 12.0 stable branch, as per developers'
feedback. It is anticipated that 12.0.6 will be the final release in the 12.0
series. Users of 12.0 are encouraged to migrate to the 13.0 series in order
@@ -536,7 +613,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/12.0.5.html">Mesa 12.0.5</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 12.0.5 will be the final release in the 12.0
series. Users of 12.0 are encouraged to migrate to the 13.0 series in order
to obtain future fixes.
@@ -598,7 +676,8 @@ about the release.
<a href="relnotes/11.2.2.html">Mesa 11.2.2</a> are released.
These are bug-fix releases from the 11.1 and 11.2 branches, respectively.
<br>
</p>
<p>
NOTE: It is anticipated that 11.1.4 will be the final release in the 11.1.4
series. Users of 11.1 are encouraged to migrate to the 11.2 series in order
to obtain future fixes.
@@ -629,7 +708,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/11.0.9.html">Mesa 11.0.9</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 11.0.9 will be the final release in the 11.0
series. Users of 11.0 are encouraged to migrate to the 11.1 series in order
to obtain future fixes.
@@ -693,7 +773,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/10.6.9.html">Mesa 10.6.9</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 10.6.9 will be the final release in the 10.6
series. Users of 10.6 are encouraged to migrate to the 11.0 series in order
to obtain future fixes.
@@ -764,7 +845,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/10.5.9.html">Mesa 10.5.9</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: It is anticipated that 10.5.9 will be the final release in the 10.5
series. Users of 10.5 are encouraged to migrate to the 10.6 series in order
to obtain future fixes.
@@ -874,7 +956,8 @@ This is a bug-fix release.
and <a href="relnotes/10.4.2.html">Mesa 10.4.2</a> are released.
These are bug-fix releases from the 10.3 and 10.4 branches, respectively.
<br>
</p>
<p>
NOTE: It is anticipated that 10.3.7 will be the final release in the 10.3
series. Users of 10.3 are encouraged to migrate to the 10.4 series in order
to obtain future fixes.
@@ -925,7 +1008,8 @@ This is a bug-fix release.
and <a href="relnotes/10.3.1.html">Mesa 10.3.1</a> are released.
These are bug-fix releases from the 10.2 and 10.3 branches, respectively.
<br>
</p>
<p>
NOTE: It is anticipated that 10.2.9 will be the final release in the 10.2
series. Users of 10.2 are encouraged to migrate to the 10.3 series in order
to obtain future fixes.
@@ -1037,7 +1121,8 @@ This is a bug-fix release.
<p>
<a href="relnotes/10.0.5.html">Mesa 10.0.5</a> is released.
This is a bug-fix release.
<br>
</p>
<p>
NOTE: Since the 10.1.1 release is being released concurrently, it is
anticipated that 10.0.5 will be the final release in the 10.0
series. Users of 10.0 are encouraged to migrate to the 10.1 series in
@@ -1516,7 +1601,7 @@ with a new test that does over 130 tests of the
shading language and built-in functions.
</p>
<h2>April 2007</h2>
<h2>April 4, 2007</h2>
<p>
Thomas Hellstr&ouml;m of Tungsten Graphics has written a whitepaper
describing the new DRI memory management system.
@@ -1969,7 +2054,7 @@ Mesa 5.0.2 has been released. This is a stable, bug-fix release.
</pre>
<h2>June 2003</h2>
<h2>June 8, 2003</h2>
<p>
Mesa's directory tree has been overhauled.
@@ -2346,7 +2431,7 @@ Here's what's new:</p>
<h2>April 29, 2001</h2>
<p>New Mesa website</p>
<p>Mark Manning produced the new website.<br>Thanks, Mark!</p>
<p>Mark Manning produced the new website. Thanks, Mark!</p>
<h2>February 14, 2001</h2>
@@ -2465,8 +2550,9 @@ just bug fixes.</p>
</pre>
<p>Please report any problems with this release ASAP. Bugs should be filed on the
Mesa3D website at sourceforge.<br>
After 3.2 is wrapped up I hope to release 3.3 beta 1 soon afterward.</p>
Mesa3D website at sourceforge.
</p>
<p>After 3.2 is wrapped up I hope to release 3.3 beta 1 soon afterward.</p>
<p>-- Brian</p>
<h2>December 17, 1999</h2>
@@ -2511,21 +2597,27 @@ ftp, and CVS services aren't fully restored yet. Please be patient.</p>
<p>-Brian</p>
<h2>June 7, 1999</h2>
<p>RPMS of the nVidia RIVA server can be found at <code>ftp://ftp.mesa3d.org/mesa/misc/nVidia/</code>.</p>
<p>RPMS of the nVidia RIVA server can be found at
<a href="ftp://ftp.mesa3d.org/mesa/misc/nVidia/">
ftp://ftp.mesa3d.org/mesa/misc/nVidia/</a>.</p>
<h2>June 2, 1999</h2>
<p><a href="https://www.nvidia.com/">nVidia</a> has released some Linux binaries for
xfree86 3.3.3.1, along with the <b>full source</b>, which includes GLX acceleration
based on Mesa 3.0. They can be downloaded from <code>https://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</code>.</p>
based on Mesa 3.0. They can be downloaded from
<a href="https://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html">
https://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</a>.</p>
<h2>May 24, 1999</h2>
<p>Beta 2 of Mesa 3.1 has been make available at <code>ftp://ftp.mesa3d.org/mesa/beta/</code>.
If you are into the quake scene, you may want to try this out, as it contains some
optimizations specifically in the Q3A rendering path.
<p>Beta 2 of Mesa 3.1 has been make available at
<a href="ftp://ftp.mesa3d.org/mesa/beta/">ftp://ftp.mesa3d.org/mesa/beta/</a>. If you are into the
quake scene, you may want to try this out, as it contains some optimizations
specifically in the Q3A rendering path.
<h2>May 13, 1999</h2>
<p>For those interested in the integration of Mesa into XFree86 4.0, Precision Insight
has posted their lowlevel design documents at <code>http://www.precisioninsight.com</code>.</p>
has posted their lowlevel design documents at
<a href="http://www.precisioninsight.com">www.precisioninsight.com</a>.</p>
<h2>May 13, 1999</h2>
<pre>May 1999 - John Carmack of id Software, Inc. has made a donation of
@@ -2551,11 +2643,11 @@ grateful.
<h2>May 1, 1999</h2>
<p>John Carmack made an interesting .plan update yesterday:</p>
<blockquote>
<i>"I put together a document on optimizing OpenGL drivers for Q3 that
should be helpful to the various Linux 3D teams.</i><br>
http://www.quake3arena.com/news/glopt.html"
</blockquote>
<pre>
I put together a document on optimizing OpenGL drivers for Q3 that should be helpful to the various Linux 3D teams.
http://www.quake3arena.com/news/glopt.html
</pre>
<h2>April 7, 1999</h2>
<p>Updated the Mesa contributors section and added links to RPM Mesa packages.</p>
@@ -2565,7 +2657,8 @@ grateful.
<h2>February 16, 1999</h2>
<p><a href="https://www.sgi.com/">SGI</a> releases its
<a href="https://www.sgi.com/software/opensource/glx/">GLX source code</a>.</p>
<a href="http://web.archive.org/web/20040805154836/http://www.sgi.com/software/opensource/glx/download.html">GLX source code</a>.
</p>
<h2>January 22, 1999</h2>
<p><a href="https://www.mesa3d.org">www.mesa3d.org</a> established</p>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -31,13 +31,11 @@
</ol>
<h1 id="prereq-general">1. Prerequisites for building</h1>
<h2 id="prereq-general">1. Prerequisites for building</h2>
<h2>1.1 General</h2>
<h3>1.1 General</h3>
<p>
Build system.
</p>
<h4>Build system</h4>
<ul>
<li><a href="https://mesonbuild.com">meson</a> is required when building on *nix platforms.
@@ -51,6 +49,7 @@ is used when when building ARC.
</ul>
<h4>Compiler</h4>
<p>
The following compilers are known to work, if you know of others or you're
willing to maintain support for other compiler get in touch.
@@ -63,9 +62,8 @@ willing to maintain support for other compiler get in touch.
</ul>
<h4>Third party/extra tools.</h4>
<p>
Third party/extra tools.
<br>
<strong>Note</strong>: These should not be required, when building from a release tarball. If
you think you've spotted a bug let developers know by filing a
<a href="bugs.html">bug report</a>.
@@ -81,14 +79,14 @@ When building with meson 3.5 or newer is required.
Python Mako module is required. Version 0.8.0 or later should work.
</li>
<li>lex / yacc - for building the Mesa IR and GLSL compiler.
<div>
<p>
On Linux systems, flex and bison versions 2.5.35 and 2.4.1, respectively,
(or later) should work.
On Windows with MinGW, install flex and bison with:
<pre>mingw-get install msys-flex msys-bison</pre>
For MSVC on Windows, install
<a href="http://winflexbison.sourceforge.net/">Win flex-bison</a>.
</div>
</p>
</ul>
<p><strong>Note</strong>: Some versions can be buggy (eg. flex 2.6.2) so do try others if things fail.</p>
@@ -114,7 +112,7 @@ the packaging tool used by your distro.
... # others
</pre>
<h1 id="meson">2. Building with meson</h1>
<h2 id="meson">2. Building with meson</h2>
<p>
Meson is the latest build system in mesa, it is currently able to build for
@@ -134,7 +132,7 @@ Please read the <a href="meson.html">detailed meson instructions</a>
for more information
</p>
<h1 id="autoconf">3. Building with autoconf (Linux/Unix/X11)</h1>
<h2 id="autoconf">3. Building with autoconf (Linux/Unix/X11)</h2>
<p>
Autoconf support was removed in Mesa 19.1.0. Please use meson instead.
@@ -142,7 +140,7 @@ for more information
<h1 id="scons">4. Building with SCons (Windows/Linux)</h1>
<h2 id="scons">4. Building with SCons (Windows/Linux)</h2>
<p>
To build Mesa with SCons on Linux or Windows do
@@ -178,7 +176,7 @@ Additional information is available in <a href="README.WIN32">README.WIN32</a>.
<h1 id="android">5. Building with AOSP (Android)</h1>
<h2 id="android">5. Building with AOSP (Android)</h2>
<p>
Currently one can build Mesa for Android as part of the AOSP project, yet
@@ -197,7 +195,7 @@ Android-x86 and/or other resources.
</p>
<h1 id="libs">6. Library Information</h1>
<h2 id="libs">6. Library Information</h2>
<p>
When compilation has finished, look in the top-level <code>lib/</code>
@@ -214,9 +212,8 @@ lrwxrwxrwx 1 brian users 23 Mar 26 07:53 libOSMesa.so.6 -&gt; lib
</pre>
<p>
<b>libGL</b> is the main OpenGL library (i.e. Mesa).
<br>
<b>libOSMesa</b> is the OSMesa (Off-Screen) interface library.
<b>libGL</b> is the main OpenGL library (i.e. Mesa), while <b>libOSMesa</b>
is the OSMesa (Off-Screen) interface library.
</p>
<p>
@@ -235,7 +232,7 @@ versions of libGL and device drivers.
</p>
<h1 id="pkg-config">7. Building OpenGL programs with pkg-config</h1>
<h2 id="pkg-config">7. Building OpenGL programs with pkg-config</h2>
<p>
Running <code>ninja install</code> will install package configuration files
@@ -254,8 +251,6 @@ For example, compiling and linking a GLUT application can be done with:
gcc `pkg-config --cflags --libs glut` mydemo.c -o mydemo
</pre>
<br>
</div>
</body>
</html>

View File

@@ -2,13 +2,13 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Introduction</title>
<title>Introduction</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -50,7 +50,7 @@ systems.
<h1>Project History</h1>
<h2>Project History</h2>
<p>
The Mesa project was originally started by Brian Paul.
@@ -185,7 +185,7 @@ of the OpenGL, OpenGL ES and Vulkan specifications.
<h1>Major Versions</h1>
<h2>Major Versions</h2>
<p>
This is a summary of the major versions of Mesa.
@@ -194,7 +194,7 @@ of the OpenGL specification is implemented.
</p>
<h2>Version 12.x features</h2>
<h3>Version 12.x features</h3>
<p>
Version 12.x of Mesa implements the OpenGL 4.3 API, but not all drivers
support OpenGL 4.3.
@@ -204,21 +204,21 @@ Initial support for Vulkan is also included.
</p>
<h2>Version 11.x features</h2>
<h3>Version 11.x features</h3>
<p>
Version 11.x of Mesa implements the OpenGL 4.1 API, but not all drivers
support OpenGL 4.1.
</p>
<h2>Version 10.x features</h2>
<h3>Version 10.x features</h3>
<p>
Version 10.x of Mesa implements the OpenGL 3.3 API, but not all drivers
support OpenGL 3.3.
</p>
<h2>Version 9.x features</h2>
<h3>Version 9.x features</h3>
<p>
Version 9.x of Mesa implements the OpenGL 3.1 API.
While the driver for Intel Sandy Bridge and Ivy Bridge is the only
@@ -233,7 +233,7 @@ tracker for OpenCL.
</p>
<h2>Version 8.x features</h2>
<h3>Version 8.x features</h3>
<p>
Version 8.x of Mesa implements the OpenGL 3.0 API.
The developers at Intel deserve a lot of credit for implementing most
@@ -242,14 +242,14 @@ the i965 driver.
</p>
<h2>Version 7.x features</h2>
<h3>Version 7.x features</h3>
<p>
Version 7.x of Mesa implements the OpenGL 2.1 API. The main feature
of OpenGL 2.x is the OpenGL Shading Language.
</p>
<h2>Version 6.x features</h2>
<h3>Version 6.x features</h3>
<p>
Version 6.x of Mesa implements the OpenGL 1.5 API with the following
extensions incorporated as standard features:
@@ -289,7 +289,7 @@ OpenGL specification</a> for more details.
<h2>Version 5.x features</h2>
<h3>Version 5.x features</h3>
<p>
Version 5.x of Mesa implements the OpenGL 1.4 API with the following
extensions incorporated as standard features:
@@ -315,7 +315,7 @@ extensions incorporated as standard features:
</ul>
<h2>Version 4.x features</h2>
<h3>Version 4.x features</h3>
<p>
Version 4.x of Mesa implements the OpenGL 1.3 API with the following
@@ -334,7 +334,7 @@ extensions incorporated as standard features:
<li>GL_ARB_transpose_matrix
</ul>
<h2>Version 3.x features</h2>
<h3>Version 3.x features</h3>
<p>
Version 3.x of Mesa implements the OpenGL 1.2 API with the following
@@ -350,7 +350,7 @@ features:
</ul>
<h2>Version 2.x features</h2>
<h3>Version 2.x features</h3>
<p>
Version 2.x of Mesa implements the OpenGL 1.1 API with the following
features.

View File

@@ -2,19 +2,21 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>License / Copyright Information</title>
<title>License and Copyright</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Disclaimer</h1>
<h1>License and Copyright</h1>
<h2>Disclaimer</h2>
<p>
Mesa is a 3-D graphics library with an API which is very similar to
@@ -32,7 +34,7 @@ vendor.
<p>
Please do not refer to the library as <em>MesaGL</em> (for legal
reasons). It's just <em>Mesa</em> or <em>The Mesa 3-D graphics
library</em>. <br>
library</em>.
</p>
<p>
@@ -42,7 +44,7 @@ library</em>. <br>
<h1>License / Copyright Information</h1>
<h2>License / Copyright Information</h2>
<p>
The Mesa distribution consists of several components. Different copyrights
@@ -82,7 +84,7 @@ SOFTWARE.
</pre>
<h1>Attention, Contributors</h1>
<h2>Attention, Contributors</h2>
<p>
When contributing to the Mesa project you must agree to the licensing terms
@@ -92,7 +94,7 @@ and their respective licenses.
</p>
<h1>Mesa Component Licenses</h1>
<h2>Mesa Component Licenses</h2>
<pre>
Component Location License

View File

@@ -2,13 +2,13 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Mailing Lists</title>
<title>Mailing Lists</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -68,14 +68,14 @@ kernels, see the
</p>
<h1>IRC</h1>
<h2>IRC</h2>
<p>join <a href="irc://chat.freenode.net#dri-devel">#dri-devel channel</a>
on <a href="https://webchat.freenode.net/">irc.freenode.net</a>
</p>
<h1>OpenGL Forums</h1>
<h2>OpenGL Forums</h2>
<p>
Here are some other OpenGL-related forums you might find useful:

View File

@@ -2,19 +2,21 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>llvmpipe</title>
<title>Gallium LLVMpipe Driver</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Introduction</h1>
<h1>Gallium LLVMpipe Driver</h1>
<h2>Introduction</h2>
<p>
The Gallium llvmpipe driver is a software rasterizer that uses LLVM to
@@ -28,7 +30,7 @@ It's the fastest software rasterizer for Mesa.
</p>
<h1>Requirements</h1>
<h2>Requirements</h2>
<ul>
<li>
@@ -45,7 +47,7 @@ It's the fastest software rasterizer for Mesa.
built with LLVM version 4.0 or later.
</p>
<p>
See /proc/cpuinfo to know what your CPU supports.
See <code>/proc/cpuinfo</code> to know what your CPU supports.
</p>
</li>
<li>
@@ -71,8 +73,9 @@ It's the fastest software rasterizer for Mesa.
<p>
For Windows you will need to build LLVM from source with MSVC or MINGW
(either natively or through cross compilers) and CMake, and set the LLVM
environment variable to the directory you installed it to.
(either natively or through cross compilers) and CMake, and set the
<code>LLVM</code> environment variable to the directory you installed
it to.
LLVM will be statically linked, so when building on MSVC it needs to be
built with a matching CRT as Mesa, and you'll need to pass
@@ -101,8 +104,8 @@ It's the fastest software rasterizer for Mesa.
</table>
<p>
You can build only the x86 target by passing -DLLVM_TARGETS_TO_BUILD=X86
to cmake.
You can build only the x86 target by passing
<code>-DLLVM_TARGETS_TO_BUILD=X86</code> to cmake.
</p>
</li>
@@ -112,7 +115,7 @@ It's the fastest software rasterizer for Mesa.
</ul>
<h1>Building</h1>
<h2>Building</h2>
To build everything on Linux invoke scons as:
@@ -137,11 +140,12 @@ For Windows the procedure is similar except the target:
</pre>
<h1>Using</h1>
<h2>Using</h2>
<h2>Linux</h2>
<h3>Linux</h3>
<p>On Linux, building will create a drop-in alternative for libGL.so into</p>
<p>On Linux, building will create a drop-in alternative for
<code>libGL.so</code> into</p>
<pre>
build/foo/gallium/targets/libgl-xlib/libGL.so
@@ -151,13 +155,15 @@ or
lib/gallium/libGL.so
</pre>
<p>To use it set the LD_LIBRARY_PATH environment variable accordingly.</p>
<p>To use it set the <code>LD_LIBRARY_PATH</code> environment variable
accordingly.</p>
<p>For performance evaluation pass build=release to scons, and use the corresponding
lib directory without the "-debug" suffix.</p>
<p>For performance evaluation pass <code>build=release</code> to scons,
and use the corresponding lib directory without the <code>-debug</code>
suffix.</p>
<h2>Windows</h2>
<h3>Windows</h3>
<p>
On Windows, building will create
@@ -175,7 +181,9 @@ any OpenGL drivers):
</p>
<ul>
<li><p>copy build/windows-x86-debug/gallium/targets/libgl-gdi/opengl32.dll to C:\Windows\SysWOW64\mesadrv.dll</p></li>
<li><p>copy <code>build/windows-x86-debug/gallium/targets/libgl-gdi/opengl32.dll</code>
to <code>C:\Windows\SysWOW64\mesadrv.dll</code>
</p></li>
<li><p>load this registry settings:</p>
<pre>REGEDIT4
@@ -192,7 +200,7 @@ any OpenGL drivers):
</ul>
<h1>Profiling</h1>
<h2>Profiling</h2>
<p>
To profile llvmpipe you should build as
@@ -206,7 +214,7 @@ This will ensure that frame pointers are used both in C and JIT functions, and
that no tail call optimizations are done by gcc.
</p>
<h2>Linux perf integration</h2>
<h3>Linux perf integration</h3>
<p>
On Linux, it is possible to have symbol resolution of JIT code with <a href="https://perf.wiki.kernel.org/">Linux perf</a>:
@@ -218,27 +226,28 @@ On Linux, it is possible to have symbol resolution of JIT code with <a href="htt
</pre>
<p>
When run inside Linux perf, llvmpipe will create a /tmp/perf-XXXXX.map file with
symbol address table. It also dumps assembly code to /tmp/perf-XXXXX.map.asm,
which can be used by the bin/perf-annotate-jit.py script to produce disassembly of
the generated code annotated with the samples.
When run inside Linux perf, llvmpipe will create a
<code>/tmp/perf-XXXXX.map</code> file with symbol address table. It also
dumps assembly code to <code>/tmp/perf-XXXXX.map.asm</code>, which can be
used by the <code>bin/perf-annotate-jit.py</code> script to produce
disassembly of the generated code annotated with the samples.
</p>
<p>You can obtain a call graph via
<a href="https://github.com/jrfonseca/gprof2dot#linux-perf">Gprof2Dot</a>.</p>
<h1>Unit testing</h1>
<h2>Unit testing</h2>
<p>
Building will also create several unit tests in
build/linux-???-debug/gallium/drivers/llvmpipe:
<code>build/linux-???-debug/gallium/drivers/llvmpipe</code>:
</p>
<ul>
<li> lp_test_blend: blending
<li> lp_test_conv: SIMD vector conversion
<li> lp_test_format: pixel unpacking/packing
<li> <code>lp_test_blend</code>: blending
<li> <code>lp_test_conv</code>: SIMD vector conversion
<li> <code>lp_test_format</code>: pixel unpacking/packing
</ul>
<p>
@@ -250,29 +259,31 @@ for later analysis, e.g.:
</pre>
<h1>Development Notes</h1>
<h2>Development Notes</h2>
<ul>
<li>
When looking at this code for the first time, start in lp_state_fs.c, and
then skim through the lp_bld_* functions called there, and the comments
at the top of the lp_bld_*.c functions.
then skim through the <code>lp_bld_*</code> functions called there, and
the comments at the top of the <code>lp_bld_*.c</code> functions.
</li>
<li>
The driver-independent parts of the LLVM / Gallium code are found in
src/gallium/auxiliary/gallivm/. The filenames and function prefixes
need to be renamed from "lp_bld_" to something else though.
<code>src/gallium/auxiliary/gallivm/</code>. The filenames and function
prefixes need to be renamed from <code>lp_bld_</code> to something else
though.
</li>
<li>
We use LLVM-C bindings for now. They are not documented, but follow the C++
interfaces very closely, and appear to be complete enough for code
generation. See
<a href="https://npcontemplation.blogspot.com/2008/06/secret-of-llvm-c-bindings.html">
this stand-alone example</a>. See the llvm-c/Core.h file for reference.
this stand-alone example</a>. See the <code>llvm-c/Core.h</code> file for
reference.
</li>
</ul>
<h1 id="recommended_reading">Recommended Reading</h1>
<h2 id="recommended_reading">Recommended Reading</h2>
<ul>
<li>

View File

@@ -43,13 +43,8 @@ iframe {
.header {
background: url('gears.png') 15px no-repeat, black url('gears.png') right no-repeat;
padding: 2em;
display: flex;
padding: 1.75rem;
text-align: center;
}
.header h1 {
color: white;
font: x-large sans-serif;
margin: auto;
}

View File

@@ -2,19 +2,19 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Compilation and Installation using Meson</title>
<title>Compilation and Installation Using Meson</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Compilation and Installation using Meson</h1>
<h1>Compilation and Installation Using Meson</h1>
<ul>
<li><a href="#intro">Introduction</a></li>
@@ -44,7 +44,7 @@ or
sudo dnf install meson # Fedora
</pre>
<p><strong>Mesa requires Meson &gt;= 0.45.0 to build.</strong>
<p><strong>Mesa requires Meson &gt;= 0.46.0 to build.</strong>
Some older versions of meson do not check that they are too old and will error
out in odd ways.
@@ -254,6 +254,18 @@ Then configure meson:
</pre>
</dd>
<dd><p>
Meson &lt; 0.49 doesn't support native files, so to specify a custom
<code>llvm-config</code> you need to modify your <code>$PATH</code> (or
<code>%PATH%</code> on windows), which will be searched for
<code>llvm-config</code>, <code>llvm-config<i>$version</i></code>,
and <code>llvm-config-<i>$version</i></code>:
</p>
<pre>
PATH=/path/to/folder/with/llvm-config:$PATH meson build
</pre>
</dd>
<dd><p>
For selecting llvm-config for cross compiling a
<a href="https://mesonbuild.com/Cross-compilation.html#defining-the-environment">"cross file"</a>
@@ -275,13 +287,6 @@ should be used. It uses the same format as the native file above:
See the <a href="#cross-compilation">Cross Compilation</a> section for more information.
</dd>
<dd><p>
For older versions of meson <code>$PATH</code> (or <code>%PATH%</code> on
windows) will be searched for llvm-config (and llvm-config$version and
llvm-config-$version), you can override this environment variable to control
the search: <code>PATH=/path/with/llvm-config:$PATH meson build</code>.
</p></dd>
<dt><code>PKG_CONFIG_PATH</code></dt>
<dd><p>The
<code>pkg-config</code> utility is a hard requirement for configuring and

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -45,13 +45,13 @@ The OSMesa interface may be used with any of three software renderers:
There are several examples of OSMesa in the mesa/demos repository.
</p>
<h1>Building OSMesa</h1>
<h2>Building OSMesa</h2>
<p>
Configure and build Mesa with something like:
<pre>
meson builddir -Dosmesa=gallium -Dgallium-drivers=swrast -Ddri-drivers= -Dvulkan-drivers= -Dprefix=$PWD/builddir/install
meson builddir -Dosmesa=gallium -Dgallium-drivers=swrast -Ddri-drivers=[] -Dvulkan-drivers=[] -Dprefix=$PWD/builddir/install
ninja -C builddir install
</pre>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -55,10 +55,6 @@ Numbers higher than 8 see minimizing gains.
<li>pp_celshade - set to 1 to enable cell shading (a more complex color filter).
</ul>
<br>
<br>
</div>
</body>
</html>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -22,9 +22,8 @@ In general, precompiled Mesa libraries are not available.
<p>
Some Linux distributions closely follow the latest Mesa releases. On others one
has to use unofficial channels.
<br>
There are some general directions:
</p>
<p>There are some general directions:</p>
<ul>
<li>Debian/Ubuntu based distros - PPA: xorg-edgers, oibaf and padoka</li>
<li>Fedora - Corp: erp and che</li>

View File

@@ -2,42 +2,53 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Release calendar</title>
<title>Release Calendar</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Overview</h1>
<h1>Release Calendar</h1>
<h2>Overview</h2>
<p>
Mesa provides feature/development and stable releases.
</p>
<p>
The table below lists the date and release manager that is expected to do the
specific release.
<br>
Regular updates will ensure that the schedule for the current and the
next two feature releases are shown in the table.
<br>
In order to keep the whole releasing team up to date with the tools
used, best practices and other details, the member in charge of the
next feature release will be in constant rotation.
<br>
The way the release schedule works is
explained <a href="releasing.html#schedule" target="_parent">here</a>.
<br>
</p>
<p>
Regular updates will ensure that the schedule for the current and the next two
feature releases are shown in the table.
</p>
<p>
In order to keep the whole releasing team up to date with the tools used, best
practices and other details, the member in charge of the next feature release
will be in constant rotation.
</p>
<p>
The way the release schedule works is explained
<a href="releasing.html#schedule" target="_parent">here</a>.
</p
>
<p>
Take a look <a href="submittingpatches.html#criteria" target="_parent">here</a>
if you'd like to nominate a patch in the next stable release.
</p>
<h1 id="calendar">Calendar</h1>
<h2 id="calendar">Calendar</h2>
<table border="1">
@@ -49,48 +60,23 @@ if you'd like to nominate a patch in the next stable release.
<th>Notes</th>
</tr>
<tr>
<td rowspan="3">19.0</td>
<td>2019-05-07</td>
<td>19.0.4</td>
<td>Dylan Baker</td>
<td>
</tr>
<tr>
<td>2019-05-21</td>
<td>19.0.5</td>
<td>Dylan Baker</td>
<td>
</tr>
<tr>
<td>2019-06-04</td>
<td>19.0.6</td>
<td>Dylan Baker</td>
<td>Last planned 19.0.x release</td>
</tr>
<tr>
<td rowspan="4">19.1</td>
<td>2019-04-30</td>
<td>19.1.0-rc1</td>
<td rowspan="3">19.1</td>
<td>2019-08-20</td>
<td>19.1.5</td>
<td>Juan A. Suarez</td>
<td>
</tr>
<tr>
<td>2019-05-07</td>
<td>19.1.0-rc2</td>
<td>2019-09-03</td>
<td>19.1.6</td>
<td>Juan A. Suarez</td>
<td>
</tr>
<tr>
<td>2019-05-14</td>
<td>19.1.0-rc3</td>
<td>2019-09-17</td>
<td>19.1.7</td>
<td>Juan A. Suarez</td>
<td>
</tr>
<tr>
<td>2019-05-21</td>
<td>19.1.0-rc4</td>
<td>Juan A. Suarez</td>
<td>Last planned RC/Final release</td>
<td>Last planned 19.1.x release</td>
</tr>
<tr>
<td rowspan="4">19.2</td>

View File

@@ -2,20 +2,20 @@
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Releasing process</title>
<title>Releasing Process</title>
<link rel="stylesheet" type="text/css" href="mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Releasing process</h1>
<h1>Releasing Process</h1>
<ul>
<li><a href="#overview">Overview</a>
@@ -31,12 +31,14 @@
</ul>
<h1 id="overview">Overview</h1>
<h2 id="overview">Overview</h2>
<p>
This document uses the convention X.Y.Z for the release number with X.Y being
the stable branch name.
<br>
</p>
<p>
Mesa provides feature and bugfix releases. Former use zero as patch version (Z),
while the latter have a non-zero one.
</p>
@@ -52,12 +54,14 @@ For example:
</pre>
<h1 id="schedule">Release schedule</h1>
<h2 id="schedule">Release schedule</h2>
<p>
Releases should happen on Wednesdays. Delays can occur although those
should be kept to a minimum.
<br>
</p>
<p>
See our <a href="release-calendar.html" target="_parent">calendar</a>
for information about how the release schedule is planned, and the
date and other details for individual releases.
@@ -85,10 +89,14 @@ approximately 48 hours before the actual release.
<p>
Note: There is one or two releases overlap when changing branches. For example:
<br>
</p>
<p>
The final release from the 12.0 series Mesa 12.0.5 will be out around the same
time (or shortly after) 13.0.1 is out.
<br>
</p>
<p>
This also involves that, as a final release may be delayed due to the
need of additional candidates to solve some blocking regression(s),
the release manager might have to update
@@ -97,7 +105,7 @@ additional bug fix releases of the current stable branch.
</p>
<h1 id="pickntest">Cherry-picking and testing</h1>
<h2 id="pickntest">Cherry-picking and testing</h2>
<p>
Commits nominated for the active branch are picked as based on the
@@ -114,7 +122,7 @@ a casual search for terms such as regression, fix, broken and similar.
<p>
Maintainer is also responsible for testing in various possible permutations of
the autoconf and scons build.
the meson and scons build.
</p>
<h2>Cherry-picking and build/check testing</h2>
@@ -167,9 +175,8 @@ good contact point.
<p>
<strong>Note:</strong> If a patch in the current queue needs any additional
fix(es), then they should be squashed together.
<br>
The commit messages and the <code>cherry picked from</code> tags must be preserved.
fix(es), then they should be squashed together. The commit messages and the
&quot;<code>cherry picked from</code>&quot;-tags must be preserved.
</p>
<p>
@@ -223,7 +230,7 @@ system and making some every day's use until the release may be a good
idea too.
</p>
<h1 id="stagingbranch">Staging branch</h1>
<h2 id="stagingbranch">Staging branch</h2>
<p>
A live branch, which contains the currently merge/rejected patches is available
@@ -243,7 +250,7 @@ Notes:
</ul>
<h1 id="branch">Making a branchpoint</h1>
<h2 id="branch">Making a branchpoint</h2>
<p>
A branchpoint is made such that new development can continue in parallel to
@@ -252,9 +259,8 @@ stabilisation and bugfixing.
<p>
Note: Before doing a branch ensure that basic build and <code>meson test</code>
testing is done and there are little to-no issues.
<br>
Ideally all of those should be tackled already.
testing is done and there are little to-no issues. Ideally all of those should
be tackled already.
</p>
<p>
@@ -279,7 +285,7 @@ To setup the branchpoint:
<p>
Now go to
<a href="https://bugs.freedesktop.org/editversions.cgi?action=add&amp;product=Mesa" target="_parent">Bugzilla</a> and add the new Mesa version X.Y.
<a href="https://gitlab.freedesktop.org/mesa/mesa/-/milestones" target="_parent">gitlab</a> and add the new Mesa version X.Y.
</p>
<p>
@@ -293,14 +299,15 @@ Proceed to <a href="#release">release</a> -rc1.
</p>
<h1 id="prerelease">Pre-release announcement</h1>
<h2 id="prerelease">Pre-release announcement</h2>
<p>
It comes shortly after outstanding patches in the respective branch are pushed.
Developers can check, in brief, what's the status of their patches. They,
alongside very early testers, are strongly encouraged to test the branch and
report any regressions.
<br>
</p>
<p>
It is followed by a brief period (normally 24 or 48 hours) before the actual
release is made.
</p>
@@ -330,10 +337,8 @@ Barring reported regressions or objections from developers.
<p>
Patch does not fit the
<a href="submittingpatches.html#criteria" target="_parent">criteria</a> and
is followed by a brief information.
<br>
The release maintainer is human so if you believe you've spotted a mistake do
let them know.
is followed by a brief information. The release maintainer is human so if you
believe you've spotted a mistake do let them know.
</p>
<h2>Format/template</h2>
@@ -445,7 +450,7 @@ Reason: The patch was reverted shortly after it was merged.
</pre>
<h1 id="release">Making a new release</h1>
<h2 id="release">Making a new release</h2>
<p>
These are the instructions for making a new Mesa release.
@@ -507,7 +512,7 @@ So we do a quick 'touch test'
unset LIBGL_DEBUG
unset LIBGL_ALWAYS_SOFTWARE
unset GALLIUM_DRIVER
export VK_ICD_FILENAMES=`pwd`/src/intel/vulkan/dev_icd.json
export VK_ICD_FILENAMES=`pwd`/test/usr/local/share/vulkan/icd.d/intel_icd.x86_64.json
steam steam://rungameid/570 -vconsole -vulkan
unset VK_ICD_FILENAMES
</pre>
@@ -599,7 +604,7 @@ docs/release-calendar.html. Then commit and push:
</pre>
<h1 id="announce">Announce the release</h1>
<h2 id="announce">Announce the release</h2>
<p>
Use the generated template during the releasing process.
@@ -611,7 +616,7 @@ series, if that is the case.
</p>
<h1 id="website">Update the mesa3d.org website</h1>
<h2 id="website">Update the mesa3d.org website</h2>
<p>
As the hosting was moved to freedesktop, git hooks are deployed to update the
@@ -619,14 +624,12 @@ website. Manually check that it is updated 5-10 minutes after the final <code>gi
</p>
<h1 id="bugzilla">Update Bugzilla</h1>
<h2 id="bugzilla">Update Bugzilla</h2>
<p>
Parse through the bugreports as listed in the docs/relnotes/X.Y.Z.html
document.
<br>
If there's outstanding action, close the bug referencing the commit ID which
addresses the bug and mention the Mesa version that has the fix.
document. If there's outstanding action, close the bug referencing the commit
ID which addresses the bug and mention the Mesa version that has the fix.
</p>
<p>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="contents.html"></iframe>
@@ -21,6 +21,16 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/19.1.4.html">19.1.4 release notes</a>
<li><a href="relnotes/19.1.3.html">19.1.3 release notes</a>
<li><a href="relnotes/19.1.2.html">19.1.2 release notes</a>
<li><a href="relnotes/19.0.8.html">19.0.8 release notes</a>
<li><a href="relnotes/19.1.1.html">19.1.1 release notes</a>
<li><a href="relnotes/19.0.7.html">19.0.7 release notes</a>
<li><a href="relnotes/19.1.0.html">19.1.0 release notes</a>
<li><a href="relnotes/19.0.6.html">19.0.6 release notes</a>
<li><a href="relnotes/19.0.5.html">19.0.5 release notes</a>
<li><a href="relnotes/19.0.4.html">19.0.4 release notes</a>
<li><a href="relnotes/19.0.3.html">19.0.3 release notes</a>
<li><a href="relnotes/19.0.2.html">19.0.2 release notes</a>
<li><a href="relnotes/18.3.6.html">18.3.6 release notes</a>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

View File

@@ -8,7 +8,7 @@
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
The Mesa 3D Graphics Library
</div>
<iframe src="../contents.html"></iframe>

Some files were not shown because too many files have changed in this diff Show More