Commit Graph

94389 Commits

Author SHA1 Message Date
Nanley Chery
8e532aa028 anv/cmd_buffer: Disable CCS on gen7 color attachments upfront
The next patch enables the use of CCS_D even when the color attachment
will not be fast-cleared. Catch the gen7 case early to simplify the
changes required.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
9fd1f2aa3c anv/cmd_buffer: Ensure fast-clear values are current
v2: Rewrite functions, change location of synchronization.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
0b16600056 anv/gpu_memcpy: Add a lighter-weight GPU memcpy function
We'll be performing a GPU memcpy in more places to copy small amounts of
data. Add an alternate function that thrashes less state.

v2:
- Make a new function (Jason Ekstrand).
- Move the #define into the function.
v3:
- Update the function name (Jason).
- Update comments.
v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
dcff5ab9f1 anv/cmd_buffer: Restrict fast clears in the GENERAL layout
v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
v3: Allow some fast clears in the GENERAL layout.
v4: Remove extra '||' and adjust line break (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
9ffe87122b anv/cmd_buffer: Don't partially fast clear image layers
v2: Don't pass in the command buffer (Jason Ekstrand).
v3: Remove an incorrect assertion and an if condition for gen7.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
07cc2ec9db anv/cmd_buffer: Initialize the clear values buffer
v2: Rewrite functions.
v3 (Jason Ekstrand):
- Don't set ResourceMinLOD.
- Fix clamp of level_count.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
88200e87f6 anv/image: Append CCS/MCS with a fast-clear state buffer
v2: Update comments, function signatures, and add assertions.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
325ecffc62 anv/image: Disable CCS if the image doesn't support rendering
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
01db9a74c6 intel/isl: Add surface state clear value information
This will be used to load and store clear values from surface state
objects.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
b178e239dd anv: Transition MCS buffers from the undefined layout
v2: Define MCS buffers with any sample count (Jason)

Cc: <mesa-stable@lists.freedesktop.org>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-07-22 20:12:09 -07:00
Jason Ekstrand
f793c57cc5 intel/isl: Tighten up restrictions for CCS on gen7
It may technically be possible to enable some sort of fast-clear support
for at least the base slice of a 2D array texture on gen7.  However,
it's not documented to work, we've never tried to do it in GL, and we
have no idea what the hardware does if you turn on CCS_D with arrayed
rendering.  Let's just play it safe and disallow it for now.  If someone
really cares that much about gen7 performance, they can come along and
try to get it working later.
2017-07-22 20:12:07 -07:00
Chris Wilson
4aee05b6c6 i965/bufmgr: Add comments about GTT coherency issues.
(Patch written by Ken, but entirely comments written by Chris.)

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-22 19:34:48 -07:00
Kenneth Graunke
0044de931f i965: Drop non-LLC lunacy in the program cache code.
The non-LLC story was a horror show.  We uploaded data via pwrite
(drm_intel_bo_subdata), which would stall if the cache BO was in
use (being read) by the GPU.  Obviously, we wanted to avoid that.
So, we tried to detect whether the buffer was busy, and if so, we'd
allocate a new BO, map the old one read-only (hopefully not stalling),
copy all shaders compiled since the dawn of time to the new buffer,
upload our new one, toss the old BO, and let the state upload code
know that our program cache BO changed.  This was a lot of extra data
copying, and flagging BRW_NEW_PROGRAM_CACHE would also cause a new
STATE_BASE_ADDRESS to be emitted, stalling the entire pipeline.

Not only that, but our rudimentary busy tracking consistented of a flag
set at execbuf time, and not cleared until we threw out the program
cache BO.  So, the first shader upload after any drawing would hit this
"abandon the cache and start over" copying path.

This is largely unnecessary - it's just ancient and crufty code.  We can
use the same persistent mapping paths on all platforms.  On non-ancient
kernels, this will use a write combining map, which should be reasonably
fast.

One aspect that is worse: we do occasionally grow the program cache BO,
and copy the old contents to the newer BO.  This will suffer from UC
readback performance now.  To mitigate this, we use the MOVNTDQA based
streaming memcpy on platforms with SSE 4.1 (all Gen7+ atoms).  Gen4-5
are unfortunately going to be penalized.

v2: Add MOVNTDQA path, rebase on other map flag changes.
v3: Drop cache->bo_used_by_gpu too (caught by Chris Wilson).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
8bdbc0c5b9 i965: Set MAP_PERSISTENT on program cache buffers.
Chris Wilson pointed out that this mapping really is persistant.

Shouldn't actually have any effect today, but best to set it anyway.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
2e3d825982 i965: Correctly set MAP_WRITE when creating the LLC program cache map.
Using a read-only mapping is completely bogus - we use this mapping to
write all new shaders to the cache.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Matt Turner
f37ede40ba i965/bufmgr: Use write-combine mappings where available
Write-combine mappings give much better performance on writes than
uncached access through the GTT.

Improves performance of GFXBench 4's gl_driver2 benchmark at 1024x768
on Apollolake by 3.6086% +/- 0.674193% (n=15).

v2: (by Ken) Rebase on lockless mappings, map_count deletion, valgrind
    updates, potential for CPU/WC maps failing, and other changes.

v3: (by Ken and Chris Wilson)

    (Ken): Rebase on set_domain -> gem_wait
    (Chris): Fix up a failed CPU/WC mmaping with a GTT mapping

    Not all objects will be mappable for direct access by the CPU
    (either using WC/CPU or WC paths), for example, a dmabuf wrapping an
    object on a foreign device or an object wrapping access to stolen
    memory. Since either the physical pages are not known or even do not
    exist, we need to use the mediated, indirect access via the GTT. (If
    one day, the kernel does suddenly start providing mediated access
    via a regular WB/WC mmapping, we no longer need the fallback.)

v4: Avoid falling back for MAP_RAW (Chris).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
bdae2ddff8 i965/bufmgr: Skip wait ioctl when not busy.
If the buffer is idle, we I915_GEM_WAIT will return immediately,
so we may as well skip the ioctl altogether.  We can't trust the
"idle" flag for external buffers, but for most, it should be fine.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
38e2142f39 i965/bufmgr: Explicitly wait instead of using I915_GEM_SET_DOMAIN.
With the advent of asynchronous maps, domain tracking doesn't make a
whole lot of sense.  Buffers can be in use on both the CPU and GPU at
the same time.  In order to avoid blocking, we stopped using set_domain
for asynchronous mappings, which means that the kernel's tracking has
lies.  We can't properly track it in userspace either, as the kernel
can change domains on us spontaneously (for example, when un-swapping).

According to Chris Wilson, I915_GEM_SET_DOMAIN does the following:

1. pins the backing storage (acquiring pages outside of the
   struct_mutex)

2. waits either for read/write access, including inter-device waits

3. updates the domain, clflushing as required

4. marks the object as used (for swapping)

5. turns off FBC/PSR/fancy scanout caching

Item (1) is not terribly important.  Most BOs are recycled via the
BO cache, so they already have pages.  Regardless, we fixed this
via an initial set_domain in the previous patch.

We implement item (2) with I915_GEM_WAIT.  This has one downside:
we'll stall unnecessarily if we do a read-only mapping of a buffer
that the GPU is reading.  I believe this is pretty uncommon.  We
may want to extend the wait ioctl at some point.

Mesa already does item (3) itself.  For cache-coherent buffers (most on
LLC systems), we don't need to do any clflushing - the CPU and GPU views
are coherent.  For non-coherent buffers (most on non-LLC systems), we
currently only use the CPU for read-only maps, and we explicitly clflush
when necessary.

We don't care about item (4)...swapping has already killed performance.
Plus, with async maps, the kernel's domain tracking is already bogus,
so it can't do this accurately regardless.

Item (5) should be okay because we avoid cached maps of scanout buffers.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
eb1497e968 i965/bufmgr: Allocate BO pages outside of the kernel's locking.
Suggested by Chris Wilson.

v2: Set the write domain to 0 (suggested by Chris).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Timothy Arceri
d91108b1f4 glsl: rework misleading block layout code
From the ARB_uniform_buffer_object spec:

   ""shared" uniform blocks, the default layout, ..."

This doesn't fix anything as the default layout is already applied
at this point but fixes the misleading code/comment.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-23 10:06:01 +10:00
Timothy Arceri
316b4c9ada glsl: remove placeholder comment
This was added in 2d03f48a65 and seems like it was intended
as a TODO comment in a function stub rather than a useful
code comment.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-23 10:06:01 +10:00
Brian Paul
b4debc0d69 st/mesa: use proper resource target type in st_AllocTextureStorage()
When we validate the texture sample count, pass the correct
pipe_texture_target for the texture, rather than PIPE_TEXTURE_2D.

Also add more comments about MSAA.

No piglit regressions with VMware driver.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22 13:18:56 -06:00
Brian Paul
aeade86db5 mesa: remove pointless assignments in init_teximage_fields_ms()
The NumSamples and FixedSampleLocation fields are set again later at
the end of the function so these earlier assignments aren't needed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22 13:18:56 -06:00
Neha Bhende
1820ef64c9 svga: Limit number of immediates in shader
imm {128.0, -128.0, 2.0, 3.0} is used for lit instruction which
is not used very frequently. So allocate it only if lit instruction is used.

Tested with mtt piglit and mtt glretrace

v2: As per Charmaine's comment

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22 13:18:56 -06:00
Charmaine Lee
83ca6b9d31 svga: fix constant indices for texcoord scale factors and texture buffer size
This patch fixes the ordering of the constant indices for texcoord scale
factor and texture buffer size to match the order they were added to the
constant buffer in svga_get_extra_constants_common().

Tested with MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-22 13:18:56 -06:00
Neha Bhende
acfb1583a5 svga: fix unnormalized->normalized texture coordinate conversion
Sometimes, converting unnormalized coordinates to normalized
coordinates requires an epsilon value to produce the right texels with
nearest filtering.  Adding 0.0001 to the coordinates when the min/mag
filter is nearest fixes the issue.
Fixes piglit test fbo-blit-scaled-linear

Tested with mtt-piglit, mtt-glretrace

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22 13:18:56 -06:00
Brian Paul
dc62ddfb39 svga: only support 4x, 8x, 16x msaa
Skip 2x MSAA, for example, since it's seldom used and just bloats
the list of pixel formats.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22 13:18:56 -06:00
Brian Paul
922dc27273 mesa: include texture size in error messages
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22 13:18:56 -06:00
Kenneth Graunke
665fd10396 i965: Support the mesa_no_error driconf option.
This allows us to override contexts to use no_error functionality
even if the applications themselves do not.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 11:42:42 -07:00
Jason Ekstrand
20533e0da7 anv/blorp: Assert isl_surf_init success in do_buffer_copy
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 08:21:27 -07:00
Jason Ekstrand
cf39fb06e3 anv/blorp: Explicitly set row_pitch in do_buffer_copy
We have a very specific row pitch that we want and we don't want ISL to
be changing it on us so just be explicit about it.

Fixes: a40f043034
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 08:20:07 -07:00
Kenneth Graunke
fd199fe4a8 i965: Delete gen8_draw_upload.c
For some reason we left an empty file, rather than deleting it.
2017-07-22 00:42:51 -07:00
Karol Herbst
f98a221f2d nv50/ir: disable mul+add to mad for precise instructions
fixes
    missrendering in TombRaider
    KHR-GL44.gpu_shader5.precise_qualifier
    KHR-GL45.gpu_shader5.precise_qualifier

v4: disable opt only for MAD, it's fine for SAD

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21 23:45:18 -04:00
Karol Herbst
f9bfc93014 nv50/ir/tgsi: handle precise for most ALU instructions
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21 23:45:18 -04:00
Karol Herbst
1d7c232fbd nv50/ir: add precise field to Instruction
v4: initialize field with NULL

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21 23:45:18 -04:00
Karol Herbst
4ad9e2e17a st/glsl_to_tgsi: don't optimize mul+add to mad if expression is precise
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
c5cbb9a543 gallium/docs: add precise instruction modifier
v4: add comment about intermediate rounding step to MAD

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
4611343bcc tgsi/text: parse _PRECISE modifier
v2: use str_match_no_case to fix _SAT_PRECISE detection
v4: usd is_digit_alpha_underscore to match end of mods

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
d0dfdf704d tgsi: populate precise
Only implemented for glsl->tgsi. Other converters just set precise to 0.

v2: remove precise paramter from ureg_tex_insn and ureg_memory_insn

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
28a5e7104e st/glsl_to_tgsi: handle precise modifier
all subexpression inside an ir_assignment needs to be tagged as precise.

v2: make precise handling more global inside the visitor

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
0341aea2f8 tgsi/dump: print _PRECISE modifier on Instructions
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
af22adee4f tgsi: add precise flag to tgsi_instruction
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-07-21 23:45:18 -04:00
Kenneth Graunke
30d6bc470a i965: Set lower_vote_trivial in vector_nir_options_gen6 too.
There's a second struct for Gen6+.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-21 18:09:01 -07:00
Dave Airlie
22bca8ef19 radv: reset non-syncobj semaphore context after wait.
When I ported from libdrm, I forgot to add the line to reset
the sem, we just need to reset the context.

This fixes a regression in DOOM.

Fixes: 9ac1432a57 ("radv: port to new libdrm API.")
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-22 00:03:26 +01:00
Charmaine Lee
5124bf9823 st/mesa: add destroy_drawable interface
With this patch, the st manager will maintain a hash table for
the active framebuffer interface objects. A destroy_drawable interface
is added to allow the state tracker to notify the st manager to remove
the associated framebuffer interface object from the hash table,
so the associated framebuffer and its resources can be deleted
at framebuffers purge time.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101829
Fixes: 147d7fb772 ("st/mesa: add a winsys buffers list in st_context")
Tested-by: Brad King <brad.king@kitware.com>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-20 17:34:34 -07:00
Dylan Baker
59a141c95a radv: rebase radv_entrypoints_gen.py on anv_entrypoints_gen.py
The two generators forked from each other, and they remain basically the
same. This rebases the radv version on the anv version, but with the
radv changes ported over. The result is that we get rid of the "cat |"
madness and gain mako, correct "generated by" attributions, and write
files out directly.

The only differences between the output is whitespace and comments.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-07-21 14:27:02 -07:00
Topi Pohjolainen
bf24c3539e i965/miptree: Clean-up unused
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
f5859b45b1 i965/miptree: Switch remaining surfaces to isl
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
38ddb3bc60 i965/miptree: Drop miptree_array_layout in get_isl_dim_layout()
This was only needed for checking gen6 stencil which is already
using isl. One could delete GEN6_HIZ_STENCIL layout altogether
but that will be gone with the rest after a while anyway.

The dim_layout converter is needed even after transition to isl
when setting up surface states - see brw_emit_surface_state().
Hence dropping the unneeded argument separately.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
61c95c94a0 i965/miptree: Relax size alignment for linear surfaces
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00