Compare commits

...

578 Commits

Author SHA1 Message Date
Emil Velikov
29df4deef2 Update version to 17.2.0-rc3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-07 12:45:40 +01:00
Jason Ekstrand
e4371d14f1 anv: Stop advertising VK_KHX_multiview
We don't want to advertise experimental extensions in actual releases.
However, there's no harm in leaving the code lying around in the tree.
2017-08-05 00:09:26 +01:00
Tomasz Figa
0b2c034f64 st/dri: enable 32-bit RGBX/RGBA formats only on Android
X/GLX can't handle them. This removes almost 500 GLX visuals that were
incorrectly exposed.

This is a less invasive version of Marek's .getCapability series.
Note: the patch is not applicable for master, but only for the 17.2
branch.

Suggested-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f33d8af7aa "st/dri: add 32-bit RGBX/RGBA formats"
[Emil Velikov: commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-08-05 00:09:26 +01:00
Tim Rowley
f4e163094d swr/rast: fix scons gen_knobs.h dependency
Copy/paste error was duplicating a gen_knobs.cpp rule.

Fixes: 5079c277b5 ("swr: [scons] Fix windows build")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit e4a6ae06cf)
2017-08-05 00:09:26 +01:00
Timothy Arceri
4a181e6244 mesa/st: fix conditional jump depends on uninitialised value
Reported by valgrind at:
glsl_to_tgsi_visitor::visit(ir_expression*) (st_glsl_to_tgsi.cpp:1560)

When compiling the Deus Ex shaders.

Fixes: 28a5e7104 ("st/glsl_to_tgsi: handle precise modifier")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 06237fc9e1)
2017-08-05 00:09:26 +01:00
Scott D Phillips
8ef9fe7229 gles: Restore some lost typedefs
GLES/gl.h has historically provided some typedefs that are not
used in the API itself. Restore these typedefs that were lost to
avoid breaking applications.

These seem to be the only typedefs removed in the update.

Fixes: 7fd0817 "Update Khronos-supplied headers"

[Eric: added a big warning to revert this patch when pulling the updated header]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 3db05ed1d1)
2017-08-05 00:09:26 +01:00
Dave Airlie
4f872e62c2 Revert "st_glsl_to_tgsi: rewrite rename registers to use array fully."
This reverts commit 3008161d28,
which caused a regression for VMWare.

The initial code had some recursion in it, that I removed by accident
trying to add back the recursion broke lots of things, take the high
road and revert for now.

Fixes: 3008161d (st_glsl_to_tgsi: rewrite rename registers to use array fully.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b8bea9a050)
2017-08-05 00:09:26 +01:00
Dave Airlie
c5fac38ced radv: handle 10-bit format clamping workaround.
This fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.*
for a2r10g10b10 formats as destination on SI/CIK hardware.

This adds support to the meta program for emitting 10-bit
outputs, and adds 10-bit support to the fragment shader key.

It also only does the int8/10 on SI/CIK.

Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit df61a05019)
2017-08-05 00:09:26 +01:00
Bas Nieuwenhuizen
579ecfd91e radv: Don't underflow non-visible VRAM size.
In some APU situations the reported visible size can be larger than
VRAM size. This properly clamps the value.

Surprisingly both CTS and spec seem to allow a heap type with size 0,
so this seemed like the easiest option to me.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 4ae84efbc5 "radv: Use enum for memory heaps."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 8229706ad8)
2017-08-05 00:09:26 +01:00
Bruce Cherniak
6b279b3271 st/osmesa: add osmesa framebuffer iface hash table per st manager
Commit bbc29393d3 didn't include osmesa state_tracker.  This patch adds
necessary initialization.

Fixes crash in OSMesa initialization.

Created-by: Charmaine Lee <charmainel@vmware.com>
Tested-by: Bruce Cherniak <bruce.cherniak@intel.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 9966c85e01)
2017-08-05 00:09:26 +01:00
Dave Airlie
6efb8d79a9 intel/vec4/gs: reset nr_pull_param if DUAL_INSTANCED compile failed.
If dual object compile fails (as seems to happen with virgl a
fair bit, and does piglit even have any tests for it?), we end up
not restarting the pull params, so we call
vec4_visitor::move_uniform_array_access_to_pull_constant
a second time and it runs over the ends of the alloc.

Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test
running inside virgl on ivybridge.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 271fa3a684)
2017-08-05 00:09:25 +01:00
Chris Wilson
795b712bd7 i965/blit: Remember to include miptree buffer offset in relocs
Remember to add the offset to the start of the buffer in the relocation
or else we write 0xff into random bytes elsewhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit fb63c43fd1)
2017-08-05 00:09:25 +01:00
Kenneth Graunke
43a2b178c2 i965: Delete pitch alignment assertion in get_blit_intratile_offset_el.
The cacheline alignment restriction is on the base address; the pitch
can be anything.

Fixes assertion failures when using primus (say, on glxgears, which
creates a 300x300 linear BGRX surface with a pitch of 1200):

intel_blit.c:190: get_blit_intratile_offset_el: Assertion `mt->surf.row_pitch % 64 == 0' failed.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 595a47b829)
2017-08-05 00:09:25 +01:00
Jason Ekstrand
d5def4f5a9 spirv: Fix SpvImageFormatR16ui
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.1 17.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 95c6a97464)
2017-08-05 00:09:25 +01:00
Thomas Hellstrom
f2a60ff20a dri3: Wait for all pending swapbuffers to be scheduled before touching the front
This implements a wait for glXWaitGL, glXCopySubBuffer, dri flush_front and
creation of fake front until all pending SwapBuffers have been committed to
hardware. Among other things this fixes piglit glx-copy-sub-buffers on dri3.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 185ef06fd2)
2017-08-05 00:09:25 +01:00
Nicolai Hähnle
3c8673d420 gallium/radeon: fix ARB_query_buffer_object conversion to boolean
The issue here is that the immediate is treated as a 64-bit value,
and fetching it does not work reliably with swizzles that are different
from xy and zw.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit da83687c4b)
2017-08-05 00:09:25 +01:00
Dave Airlie
381ccaa1cb radeon/ac: use ds_swizzle for derivs on si/cik.
This looks like it's supported since llvm 3.9 at least,
so switch over radeonsi and radv to using it, -pro also
uses this. We can now drop creating lds for these operations
as the ds_swizzle operation doesn't actually write to lds at all.

Acked-by: Marek Olšák <marek.olsak@amd.com>
(stable requested due to fixing radv CIK conformance tests)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cb6f16dce9)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

Conflicts:
	src/amd/common/ac_nir_to_llvm.c
2017-08-05 00:09:25 +01:00
Connor Abbott
2dd6030fbb ac/nir: fix lsb emission
This makes it match radeonsi. The LLVM backend itself will emit the
correct instruction, but LLVM might do incorrect optimizations since it
thinks the output is undefined when the input is 0, even though it's not
supposed to be. We really need a new intrinsic, or for the backend to
become smarter and recognize this pattern.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
(cherry picked from commit 6d731c5651)
2017-08-05 00:09:25 +01:00
Connor Abbott
a90d99f7a5 nir: fix algebraic optimizations
The optimizations are only valid for 32-bit integers. They were
mistakenly firing for 64-bit integers as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit de91461575)
2017-08-05 00:09:25 +01:00
Marek Olšák
6806773905 Revert "st/mesa: release sampler views when redefining a texture in st_context_teximage"
This reverts commit 5c1241268b.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101961

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d85802e501)
2017-08-05 00:09:25 +01:00
Nicolai Hähnle
3e872f4b53 radeonsi: ensure that temp array allocas are in the entry block
Otherwise, code generation fails. This has become necessary since some
shaders are wrapped in control flow.

Fixes: 081ac6e5c6 ("radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 2879a602dd)
2017-08-05 00:09:25 +01:00
Samuel Pitoiset
21ea75b3e9 mesa: fix mismatch when returning 64-bit bindless uniform handles
The slower convert-and-copy process performs a bad conversion
because it converts the value to signed 64-bit integer, but
bindless uniform handles are considered unsigned 64-bit.

This fixes "Check glUniform*() with mixed texture units/handles"
from arb_bindless_texture-uniform piglit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b38c9c57f2)
2017-08-05 00:09:25 +01:00
Emil Velikov
58fe86a6d6 Update version to 17.2.0-rc2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-31 10:52:13 +01:00
Samuel Pitoiset
d466a70532 st/glsl_to_tgsi: fix getting the image type for array of structs
Since array splitting for AoA is disabled, we have to retrieve
the type of the first non-array type when an array of images is
declared inside a structure. Otherwise, it will hit an assert
in glsl_type::sampler_index() because it expects either a sampler
or an image type.

This fixes a regression in the following piglit test:
arb_bindless_texture/compiler/images/arrays-of-struct.frag

Fixes: 57165f2ef8 ("glsl: disable array splitting for AoA")
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f99e9335e2)
2017-07-31 10:26:27 +01:00
Marek Olšák
e62eddcdbe st/mesa: release sampler views when redefining a texture in st_context_teximage
Noticed randomly.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 5c1241268b)
2017-07-31 10:25:55 +01:00
Dave Airlie
6d07e58afb radv: for stencil only set Z tile mode index to same value
On SI this was causing a hang in
dEQP-VK.pipeline.render_to_image.core.2d_array.mipmap.r16g16_sint_s8_uint

This was due to not handling the tile mode index for depth like
I fixed previously for new GPUs.

Fixes: 01d0c5a9 (radv: fix stencil regression since new addrlib import)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 800d162209)
2017-07-31 10:24:42 +01:00
Dave Airlie
f9e563597d virgl: drop precise modifier.
The host doesn't understand this yet, so drop it for now.

Fixes: virgl regressions.

Fixes: af22adee4f (tgsi: add precise flag to tgsi_instruction)
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 554aa09440)
2017-07-31 10:23:09 +01:00
Marek Olšák
5b61ba4432 radeonsi: update dirty_level_mask only when flushing or unbinding framebuffer
This fixes corruption with bindless textures in Dawn Of War 3.

The do_update_surf_dirtiness mechanism was complicated and dirty_level_mask
was only updated after the first draw call. The problem is bindless textures
are checked for decompression every draw call and we would only decompress
after the first draw call. The solution is to set dirtiness after the last
draw call to the framebuffer, so the (unconditional) decompression of
bindless textures happens at the right time.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit f4d095cc65)
2017-07-31 10:21:17 +01:00
Marek Olšák
6625382b1c st/mesa: always unconditionally revalidate main framebuffer after SwapBuffers
This fixes the black Feral launcher window.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101867

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
(cherry picked from commit 7257c171e9)
2017-07-31 10:20:44 +01:00
Nicolai Hähnle
2bca74253d radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)
With merged ESGS shaders, the GS part of a wave may be empty, and the
hardware gets confused if any GS messages are sent from that wave. Since
S_SENDMSG is executed even when EXEC = 0, we have to wrap even
non-monolithic GS shaders in an if-block, so that the entire shader and
hence the S_SENDMSG instructions are skipped in empty waves.

This change is not required for TCS/HS, but applying it there as well
simplifies the logic a bit.

Fixes GL45-CTS.geometry_shader.rendering.rendering.*

v2: ensure that the TCS epilog doesn't run for non-existing patches

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 081ac6e5c6)
2017-07-31 10:20:20 +01:00
Nicolai Hähnle
b36ff2d1f2 radeonsi/gfx9: fix vertex idx in ES with multiple waves per threadgroup
Cc: mesa-stable@lists.freedesktop.org
Reviewed: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 873789002f)
2017-07-31 10:20:12 +01:00
George Kyriazis
99b2613ce1 swr: fix transform feedback logic
The shader that is used to copy vertex data out of the vs/gs shaders to
the user-specified buffer (streamout or SO shader) was not using the
correct offsets.

Adjust the offsets that are used just for the SO shader:
- Make sure that position is handled in the same special way
  as in the vs/gs shaders
- Use the correct offset to be passed in the core
- consolidate register slot mapping logic into one function, since it's
  been calculated in 2 different places (one for calcuating the slot mask,
  and one for the register offsets themselves

Also make room for all attibutes in the backend vertex area.

Fixes:
- all vtk GL2PS tests
- 18 piglit tests (16 ext_transform_feedback tests,
  arb-quads-follow-provoking-vertex and primitive-type gl_points

v2:

- take care of more SGV slots in slot mapping logic
- trim feState.vsVertexSize
- fix GS interface and incorporate GS while calculating vsVertexSize

Note that vsVertexSize is used in the core as the one parameter that
controls vertex size between all stages, so it has to be adjusted appropriately
for the whole vs/gs/fs pipeline.

Also note that GS and SO is not fully implemented.  This will be addressed
later.

fixes:
- fixes total of 20 piglit tests

CC: 17.2 <mesa-stable@lists.freedesktop.org>

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit 194ff5eed1)
2017-07-31 10:19:55 +01:00
Dave Airlie
f9c7549605 radv/ac: port SI TC L1 write corruption fix.
This ports 72e46c988 to radv.
    radeonsi: apply a TC L1 write corruption workaround for SI

Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e77ff11ffe)
2017-07-27 19:54:53 +01:00
Dave Airlie
2ce4f0afd3 radv/ac: realign SI workaround with radeonsi.
This ports: da7453666a
radeonsi: don't apply the Z export bug workaround to Hainan
to radv.

Just noticed in passing.

Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a81e99f50a)
2017-07-27 19:54:52 +01:00
Dave Airlie
546282e8bc radv: only report external semaphore info for opaque fd.
Until we support sync fd, don't report the info.

Fixes CTS dEQP-VK.api.external.semaphore.sync_fd.* from crashing.

Fixes: eaa56eab6 (radv: initial support for shared semaphores (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6cbc8cf178)
2017-07-27 19:54:52 +01:00
Dave Airlie
de55bc8f49 radv: fix buffer views on SI/CIK.
Fixes CTS dEQP-VK.memory.pipeline_barrier.host_write_uniform_texel_buffer.1024
on SI/CIK with radv.

Fixes: f4e499ec (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit ca82ef5ac7)
2017-07-27 19:54:52 +01:00
Daniel Stone
f0b7563a36 st/dri2: Return invalid modifier when no driver support
Always initialise whandle.modifier for DRIImage modifier queries, so if
the driver doesn't support it then we return false for the query.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: d33fe8b84e ("st/dri: enable DRIimage modifier queries")
(cherry picked from commit 45383d32d4)
2017-07-27 19:54:52 +01:00
Daniel Stone
5bee196840 st/dri: Check get-handle return value in queryImage
In the DRIImage queryImage hook, check if resource_get_handle() failed
and return FALSE if so.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b4a18f13ce)
2017-07-27 19:54:52 +01:00
Daniel Stone
2a1792981c egl/wayland: Ignore invalid modifiers
If the underlying driver does not support modifiers, dmabuf will still
advertise formats through the 'modifier' event, but send them with an
invalid modifier. Ignore them if this is the case, rather than passing
them through to the driver.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
(cherry picked from commit dd072cf4b1)
2017-07-27 19:54:52 +01:00
Tim Rowley
a2b7477603 swr/rast: non-regex knob fallback code for gcc < 4.9
gcc prior to 4.9 didn't implement <regex>, causing a startup crash
in the swr knob parameter reading code.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit e21fc2c625)
2017-07-27 19:54:52 +01:00
Dave Airlie
3e777d5cab virgl: encode index buffer offset.
Fixes arb_vertex_buffer_object-combined-vertex-index

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit c4652a0a5b)
2017-07-27 19:54:52 +01:00
Marek Olšák
9bb6aa5794 ac/surface: fix hybrid graphics where APU=GFX9, dGPU=older
v2: don't do it for compressed textures (bpp = 0)

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
(cherry picked from commit 5e81df0f10)
2017-07-27 19:54:52 +01:00
Marek Olšák
90bbcb93b1 radeonsi: decrease the number of compiler threads
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit ed2b3f5c81)
2017-07-27 19:54:52 +01:00
Marek Olšák
529c440dd3 gallium/radeon: make S_FIXED function signed and move it to shared code
This fixes a bug uncovered by:
    2412c4c81e
    util: Make CLAMP turn NaN into MIN.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(cherry picked from commit 433f6f7ac9)
2017-07-27 19:54:52 +01:00
Charmaine Lee
94e0de90ee st/mesa: create framebuffer iface hash table per st manager
With commit 5124bf9823, a framebuffer interface hash table is
created in st_gl_api_create(), which is called in
dri_init_screen_helper() for each screen. When the hash table is
overwritten with multiple calls to st_gl_api_create(), it can cause
race condition. This patch fixes the problem by creating a
framebuffer interface hash table per state tracker manager.

Fixes crash with steam.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101876
Fixes: 5124bf9823 ("st/mesa: add destroy_drawable interface")
Tested-by: Christoph Haag <haagch@frickel.club>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit bbc29393d3)

Squashed with:

st/mesa: fix unconditional return in st_framebuffer_iface_remove

Noticed by James Legg @ Feral.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 914f11e75b)

Squashed with:

st/mesa: Fix inversed test in st_api_destroy_drawable

Fixes a drawable leak.

Fixes: bbc29393d3 ("st/mesa: create framebuffer iface hash table per
                      st manager")
Bugzilla: https://bugs.freedesktop.org/101930
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 57132d126f)
2017-07-27 19:54:24 +01:00
Grigori Goronzy
3180f0fa0d egl: move KHR_no_error vs debug/robustness check further down
We'll fail to flag an error if the context flags appear after the
no-error attribute in the context attribute list.

Delay the check to after attribute parsing to fix this.

Fixes: 4909519a66 ("egl: Add EGL_KHR_create_context_no_error support")
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: add fixes/stable tags, commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

(cherry picked from commit 39bf7756b9)
2017-07-27 18:56:45 +01:00
Nicolai Hähnle
e7f14a8b52 radeonsi/gfx9: reduce max threads per block to 1024 on gfx9+
The number of supported waves per thread group has been reduced to 16
with gfx9. Trying to use 32 waves causes hangs, and barriers might
not work correctly with > 16 waves.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a0e6b9a2db)
2017-07-27 18:56:45 +01:00
Nicolai Hähnle
7f5d9d7a6d radeonsi: fix detection of DRAW_INDIRECT_MULTI on SI
The firmware version numbers for SI were wrong. The new numbers are probably
too conservative (we don't have a definitive answer by the firmware team),
but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on
Tahiti (by Gustaw) and on Verde (by myself).

While this is technically adding a feature, it's a feature we thought we had
for a long time. The change is small enough and we're early enough in the 17.2
release cycle that it should still go in.

Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 65fbaab0b7)
2017-07-27 18:56:45 +01:00
Iago Toral Quiroga
3d0960e761 anv: only expose up to 28 vertex attributes
The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs.
However, the maximum allowed value of "Vertex URB Entry Read Length"
in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements.
Because we also need to reserve a vertex buffer to upload
VertexIndex/InstanceIndex and another to upload DrawID when needed,
we can only expose 28.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 31f1863ace)
2017-07-27 18:56:45 +01:00
Iago Toral Quiroga
bdbd8ab517 anv/cmd_buffer: fix off by one error in assertion
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit a848e693ef)
2017-07-27 18:56:44 +01:00
Kenneth Graunke
47bca2cfa7 i965: Fix = vs == in MCS aux usage assert.
Caught by Coverity (CID 1415680).

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 698636cc97)
Fixes: 0f9b609cf4 ("i965/blorp: Do prepare/finish manually")
2017-07-27 18:54:41 +01:00
Kenneth Graunke
04bb687f04 i965: Fix offset addition in get_isl_surf.
Increase the value, not the pointer to the stack variable.

Caught by Coverity (CID 1415574).  Not shipped in a real release.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit f6e674fa51)
Fixes: 63a43f4161 ("i965: Refactor miptree to isl converter and adjustment")
2017-07-27 18:54:13 +01:00
Eric Anholt
4e0f29ed0b broadcom/vc4: Prefer blit via rendering to the software fallback.
I don't know how I managed to leave this here for so long.  Found when
working on a 1:1 overlapping blit extension for X11.

Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 93fec49a75)
2017-07-27 18:21:27 +01:00
Lionel Landwerlin
0ccb853cc0 i965: perf: flush batchbuffers at the beginning of queries
As Chris commented, it makes more sense to have batch buffer flushes
before the query. Usually applications like frame_retrace do a series
of queries and in that case, with flushes at the end of the queries,
we might still have the first query contained in 2 different batchs.
More generally it would be quite usual to have the query contained in
2 batch buffers because we never now what's the fill rate of the
current batch buffer.

If we move the flushing at the beginning of the queries, it's pretty
much guaranteed that queries will be contained in a single batch
buffer (unless the amount of commands is huge, but then it's only fair
to include reloading request times in the measurements).

Fixes: adafe4b733 ("i965: perf: minimize the chances to spread queries across batchbuffers")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9f439ae120)
2017-07-27 18:21:22 +01:00
Emil Velikov
a455f594bb Update version to 17.2.0-rc1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 16:59:32 +01:00
Emil Velikov
a955622c1a intel/blorp: ship blorp_genX_exec.h within the tarball
Fixes: c9cb37b2a6 ("intel/blorp: Add a partial resolve pass for MCS")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 5d47dd9c2a)
2017-07-24 16:59:32 +01:00
Emil Velikov
33236a306d egl: guard wayland header dep. tracking behind HAVE_PLATFORM_WAYLAND
Otherwise we'll attemt to generate the header even we don't need to.
In that case the dependencies may not be met, leading to build failure.

Fixes: 166852e "configure.ac: rework wayland-protocols handling"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-07-24 14:09:08 +01:00
Emil Velikov
da9e6fdfe2 swrast: add dri2ConfigQueryExtension to the correct extension list
The extension should be in the list as returned by getExtensions().
Seems to have gone unnoticed since close to nobody wants to change the
vblank mode for the software driver.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-07-24 10:58:51 +01:00
Emil Velikov
3057ca9a50 wayland-egl: update the SHA1 of the commit introducing v3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:36:30 +01:00
Miguel A. Vico
b6356c023d wayland-egl: Update ABI checker
This change updates wayland-egl-abi-check.c with the latest changes to
wl_egl_window.

Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:27:56 +01:00
Miguel A. Vico
2d5d61bc49 wayland-egl: Make wl_egl_window a versioned struct
We need wl_egl_window to be a versioned struct in order to keep track of
ABI changes.

This change makes the first member of wl_egl_window the version number.

An heuristic in the wayland driver is added so that we don't break
backwards compatibility:

 - If the first field (version) is an actual pointer, it is an old
   implementation of wl_egl_window, and version points to the wl_surface
   proxy.

 - Else, the first field is the version number, and we have
   wl_egl_window::surface pointing to the wl_surface proxy.

Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:27:52 +01:00
Miguel A. Vico
63c251e38f egl: Fix _eglPointerIsDereferencable() to ignore page residency
mincore() returns 0 on success, and -1 on failure.  The last parameter
is a vector of bytes with one entry for each page queried.  mincore
returns page residency information in the first bit of each byte in the
vector.

Residency doesn't actually matter when determining whether a pointer is
dereferenceable, so the output vector can be ignored.  What matters is
whether mincore succeeds. See:

  http://man7.org/linux/man-pages/man2/mincore.2.html

Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:27:48 +01:00
Miguel A. Vico
045108938c egl: Move _eglPointerIsDereferencable() to eglglobals.[ch]
Move _eglPointerIsDereferencable() to eglglobals.[ch] and make it a
non-static function so it can be used out of egldisplay.c

Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:27:43 +01:00
Miguel A. Vico
dad0c5d2d7 wayland-egl: Add wl_egl_window ABI checker
Add a small ABI checker for wl_egl_window so that we can check for
backwards incompatible changes at 'make check' time.

Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:27:10 +01:00
Emil Velikov
4d53b16f55 swr: use the correct variable for no undefined symbols
The variable name was missing a leading LD_, which resulted in a missing
check for unresolved symbols in the backend binaries.

With the link addressed with earlier patches, we can correct the typo.

Thanks to Laurent for the help spotting this.

v2: Split from a larger patch.

Cc: mesa-stable@lists.freedesktop.org
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Laurent Carlier <lordheavym@gmail.com>
Fixes: 9475251145 "swr: standardize linkage and check for
                             unresolved symbols"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reported-by: Laurent Carlier <lordheavym@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:23:45 +01:00
Emil Velikov
9fd23435c2 swr: don't forget to link KNL/SKX against pthreads
Analogous to previous commit but for the KNL/SKX backends.

Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Laurent Carlier <lordheavym@gmail.com>
Fixes: 1cb5a6061c ("configure/swr: add KNL and SKX architecture targets")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:23:45 +01:00
Emil Velikov
33d397ada5 swr: don't forget to link AVX/AVX2 against pthreads
Seems like the backends have been using pthreads since day one, yet
we've been missing the link.

With later commit we'll fix a typo, hence the libraries will be build
with -Wl,no-undefined, aka failing the build on unresolved symbols.

v2: Split from a larger patch.

Cc: mesa-stable@lists.freedesktop.org
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Laurent Carlier <lordheavym@gmail.com>
Fixes: c6e67f5a93 "gallium/swr: add OpenSWR rasterizer"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 10:23:45 +01:00
Emil Velikov
166852ee95 configure.ac: rework wayland-protocols handling
At dist/distcheck time we need to ensure that all the files and their
respective dependencies are handled.

At the moment we'll bail out as the linux-dmabuf rules are guarded in a
conditional. Move them outside of it and drop the sources from
BUILT_SOURCES.

Thus the files will be generated only as needed, which will happen only
after the wayland-protocols dependency is enforced in configure.ac.

v2: add dependency tracking for the header

Cc: Andres Gomez <agomez@igalia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-07-24 10:23:41 +01:00
Dave Airlie
feef47bb59 radv: enable sample shading
This calculates ps_iter_samples from the minSampleShading input

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-24 17:45:03 +10:00
Dave Airlie
486472a98d radv: don't set dedicated bit for buffer external memory.
This is an alternate fix for the buffer export dedicated interaction.

Fixes CTS dEQP-VK.api.external.memory.opaque_fd.dedicated.buffer.info

Fixes: b70829708a (radv: Implement VK_KHR_external_memory)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-24 08:30:15 +01:00
Dave Airlie
75392e76ad radv: fix non-0 based layer clears.
If the layer base was > 0, it wasn't getting passed as the start
instance or getting added in the shaders.

Fixes CTS dEQP-VK.api.image_clearing.core.clear_color_attachment.2d_r8_uint_multiple_layers

Fixes: 7e0382fb (radv: add support for layered clears (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-24 17:27:55 +10:00
Dave Airlie
22b59b99cb radv: check enabled device features.
The spec says we should return VK_ERROR_FEATURE_NOT_PRESENT.

Ported from anv.

Fixes CTS test dEQP-VK.api.device_init.create_device_unsupported_features

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-24 08:16:52 +01:00
Dave Airlie
b7cc07432a radv: for external memory imports close the fd on import success
If we get an fd, we need to close it before returning.

Fixes CTS test dEQP-VK.api.external.memory.opaque_fd.dedicated.device_only.import_multiple_times

Fixes: b70829708a (radv: Implement VK_KHR_external_memory)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-24 04:41:36 +01:00
Bas Nieuwenhuizen
daaf7efb93 radv: Don't segfault when exporting an image which hasn't been bound yet.
The image is set on Memory allocation already, but the image doesn't
have to have the BindImageMemory called yet. Luckily, we know offset
within a BO has to be 0 for dedicated allocations, so we can just
use the dummy 0 in the address calaculations.

Fixes CTS test dEQP-VK.api.external.memory.opaque_fd.dedicated.image.export_bind_import_bind

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: b70829708a "radv: Implement VK_KHR_external_memory"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-07-24 01:50:52 +02:00
Bas Nieuwenhuizen
ea08a296fe radv: Handle VK_ATTACHMENT_UNUSED in color attachments.
This just sets them to INVALID COLOR,  instead of shifting the
attachments together.

This also fixes a number of cases where we use it first and only
then check if it is VK_ATTACHMENT_UNUSED.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-07-24 01:50:52 +02:00
Andres Gomez
bfe8134472 broadcom: correct header file in BROADCOM_FILES
This fixes `make distcheck`

> make[3]: *** No rule to make target 'common/v3d_devinfo.h', needed by 'distdir'.  Stop.
> make[3]: Leaving directory '/home/local/mesa/src/broadcom'
> Makefile:945: recipe for target 'distdir' failed
> make[2]: Leaving directory '/home/local/mesa/src'
> make[2]: *** [distdir] Error 1
> make[1]: *** [distdir] Error 1

Fixes: 427bbbb99c ("broadcom: Introduce a header for talking about chip revisions.")
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-24 01:40:05 +03:00
Wladimir J. van der Laan
15a1ceb127 etnaviv: Clear lbl_usage array correctly
Fill the entire array instead of just a quarter. This avoids
crashes with large shaders.
(currently this never causes a problem because shaders larger than 2048/4
instructions are not supported by this driver on any hardware, but it will
cause problems in the future)

Fixes: ec43605189 ("etnaviv: fix shader miscompilation with more than 16 labels")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-07-23 21:52:44 +02:00
Jason Ekstrand
6874b953f6 anv/image: zalloc image views
This allows us to avoid some extra zeroing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand
a1cad8218e anv/image: Use vk_zalloc instead of an explicit memset
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand
1e32c8303a anv: Separate surface states by layout instead of aux_usage
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand
628bfaf1c6 intel/isl: Add some sanity checks for compressed surfaces
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand
5de4209f91 intel/isl: Add a helper to get a subimage surface
We already have a helper for doing this in BLORP, this just moves the
logic into ISL where we can share it with other components.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand
72bc38cfc5 anv: Get rid of some unused function declarations
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-22 21:41:12 -07:00
Jason Ekstrand
3e57e9494c i965: Enable regular fast-clears (CCS_D) on gen9+
The set of formats which supports CCS_E is actually fairly small on
gen9.  However, everything that supports fast-clears on gen8 also
supports fast-clears on gen9+.  The one very annoying exception is
that blending is broken for non-0/1 clear colors with sRGB formats.
In order to solve that problem, we do a resolve to get rid of the
clear color.  Another option would be to just not fast-clear with
non-0/1 clear colors however non-0/1 + blending + sRGB is uncommon
enough that this shouldn't be a significant performance problem.

This appears to help gl_manhattan31_off by about 2%.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
d4de403f91 intel/isl: Add a helper for determining if a color is 0/1
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
b26b2490e5 intel/blorp: Allow blorp_copy on sRGB formats
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
6c2842f95b i965: Weaken the texture view rules for formats slightly
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
fb86ac94cb intel/isl/format: Add an srgb_to_linear helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
44e9d65757 intel/isl/format: Dedent the template in gen_format_layout.py
This makes it much easier to edit the template and doesn't really dirty
the python all that much.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
dd75edb429 i965/surface_state: Get the aux usage from the miptree code
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
0175077af5 i965/surface_state: Take an isl_aux_usage in emit_surface_state
This commit replaces the generic "flags" parameter with a more explicit
aux usage parameter.  This leads to a lot of duplicated code at the
moment but this will all get cleaned up directly.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
fe38d3e3a4 i965/miptree: Take an isl_format in prepare_texture
This will be a bit more convenient momentarily.  It's also more correct
because it makes prepare_texture take sRGB into account.
2017-07-22 20:59:22 -07:00
Jason Ekstrand
2ccfc0ffdd i965/miptree: Use miptree range helpers in has_color_unresolved
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
1c70c57aed i965/miptree: Allow for accessing a CCS_E image as CCS_D
This requires us to start using the partial clear state.  It makes
things quite a bit more complicated but it's still a fairly
straightforward exercise in diagram following.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
06ef36d319 i965/miptree: Use ISL_AUX_STATE_PARTIAL_CLEAR for CCS_D
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
268ba028dc intel/isl: Add an aux state for "partial clear"
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
d6ee832cbc i965/miptree: Take an aux_usage in prepare/finish
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
e1ce252106 i965/miptree: Refactor some things to use mt->aux_usage
Now that we have this field, it's much easier to switch on it than to
walk an if ladder that checks different things.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
427d248fbd i965/blorp: Use prepare/finish_depth for depth clears
We also simplify the way we handle stencil since we know a priori that
it will have ISL_AUX_USAGE_NONE.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
e8fca3ffde i965/blorp: Use render_aux_usage for color clears
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
4185b982af i965/blorp: Be more accurate about aux usage in blorp_copy
The only real change here is that we now reject clear colors for MCS
with certain formats on gen < 9 because we can't trust that the
reinterpretation will work.  This may cause some MCS partial resolves.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
fc1639e46d i965/blorp: Use texture/render_aux_usage for blits
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
0f9b609cf4 i965/blorp: Do prepare/finish manually
Our attempts to do it automatically are problematic at best.  In order
to really be precise, we need to know both the desired aux usage and
whether or not clear is supported.  The current automatic mechanism
doesn't cover this.  This commit itself is not a functional change since
it just reworks everything to be in terms of a silly helper.  Later
commits will switch things over to more sensible ways of choosing usage.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
baa9e05965 i965/miptree: Rework prepare/finish_render to be in terms of aux_usage
We keep the old and possibly broken method of determining aux usage
intact for now.  Therefore, the only functional change here is that we
may call finish_render a bit more accurately.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
c9314d2c46 i965/miptree: Add a helper for getting the aux usage for texturing
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
d3c01c6a9a i965/miptree: Partially resolve MCS for texture views
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
c8aa5191eb i965/miptree: Add support for partially resolving MCS
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
987c09e044 i965/miptree: Tighten up finish_mcs_write
Multisample surfaces only have a single miplevel so there's no reason to
be passing the extra parameters around.  It only leads to confusion.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
18a69bbc0f i965/miptree: Make aux_state work in terms of logical layers
This commit changes layer_range_length to return locical layers and also
changes the way we allocate the aux_state field to not allocate extra
layers for MCS.  This will be important as we're about to start doing
significantly more detailed tracking of MCS state.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
c9cb37b2a6 intel/blorp: Add a partial resolve pass for MCS
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
efc4c6b702 i965/miptree: Remove some unneeded restrictions
intel_miptree_supports_ccs_e should handle the gen >= 9 requirement and
there's no reason why we can't do CCS_E on window system buffers so long
as we resolve.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
9c09672ad4 i965/miptree: Stop setting FOR_SCANOUT for renderbuffers
Nothing created through intel_miptree_create_for_renderbuffer will ever
be exposed externally so there's no need to set FOR_SCANOUT.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
9ef517276d i965/blorp: Do flushes around depth resolves
It turns out that if you have rendering in-flight with CCS_E enabled and
you go to do a depth resolve without flushing, the CCS data may never
hit the memory.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Jason Ekstrand
36aed7c74c i965/blorp: Use the renderbuffer format for clears
This fixes the Piglit ARB_texture_views rendering-formats test.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 20:59:22 -07:00
Nanley Chery
67027ddf3f anv: Predicate fast-clear resolves
Image layouts only let us know that an image *may* be fast-cleared. For
this reason we can end up with redundant resolves. Testing has shown
that such resolves can measurably hurt performance and that predicating
them can avoid the penalty.

v2:
- Introduce additional resolve state management function (Jason Ekstrand).
- Enable easy retrieval of fast clear state fields.
v3: Use more descriptive field enums (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
8e2729fbb8 intel/blorp: Allow BLORP calls to be predicated
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
be516ba9b1 anv/cmd_buffer: Skip some input attachment transitions
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
597ff919e7 anv: Stop resolving CCS implicitly
With an earlier patch from this series, resolves are additionally
performed on layout transitions. Remove the now unnecessary implicit
resolves within render passes.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
5ba93e6f5a anv: Transition more color buffer layouts
v2: Expound on comment for the pipe controls (Jason Ekstrand).
v3:
- Cast base_layer to uint64_t to avoid overflow.
- Remove "seems" from the pipe control comment.
- Fix clamp of layer_count (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
a899747eb3 anv/cmd_buffer: Warn about not enabling CCS_E
Use the performance warning infrastructure to provide helpful
information when testing applications.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
9c9f63d1c7 anv/cmd_buffer: Move aux_usage assignment up
For readability, bring the assignment of CCS closer to the assignment of
NONE and MCS.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
62d72bb5d0 anv/cmd_buffer: Always enable CCS_D in render passes
The lifespan of the fast-clear data will surpass the render pass scope.
We need CCS_D to be enabled in order to invalidate blocks previously
marked as cleared and to sample cleared data correctly.

v2: Avoid refactoring.
v3: Allow CCS_D for subpass resolves.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
8e532aa028 anv/cmd_buffer: Disable CCS on gen7 color attachments upfront
The next patch enables the use of CCS_D even when the color attachment
will not be fast-cleared. Catch the gen7 case early to simplify the
changes required.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
9fd1f2aa3c anv/cmd_buffer: Ensure fast-clear values are current
v2: Rewrite functions, change location of synchronization.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:10 -07:00
Nanley Chery
0b16600056 anv/gpu_memcpy: Add a lighter-weight GPU memcpy function
We'll be performing a GPU memcpy in more places to copy small amounts of
data. Add an alternate function that thrashes less state.

v2:
- Make a new function (Jason Ekstrand).
- Move the #define into the function.
v3:
- Update the function name (Jason).
- Update comments.
v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
dcff5ab9f1 anv/cmd_buffer: Restrict fast clears in the GENERAL layout
v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
v3: Allow some fast clears in the GENERAL layout.
v4: Remove extra '||' and adjust line break (Jason Ekstrand).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
9ffe87122b anv/cmd_buffer: Don't partially fast clear image layers
v2: Don't pass in the command buffer (Jason Ekstrand).
v3: Remove an incorrect assertion and an if condition for gen7.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
07cc2ec9db anv/cmd_buffer: Initialize the clear values buffer
v2: Rewrite functions.
v3 (Jason Ekstrand):
- Don't set ResourceMinLOD.
- Fix clamp of level_count.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
88200e87f6 anv/image: Append CCS/MCS with a fast-clear state buffer
v2: Update comments, function signatures, and add assertions.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
325ecffc62 anv/image: Disable CCS if the image doesn't support rendering
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
01db9a74c6 intel/isl: Add surface state clear value information
This will be used to load and store clear values from surface state
objects.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-22 20:12:09 -07:00
Nanley Chery
b178e239dd anv: Transition MCS buffers from the undefined layout
v2: Define MCS buffers with any sample count (Jason)

Cc: <mesa-stable@lists.freedesktop.org>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
2017-07-22 20:12:09 -07:00
Jason Ekstrand
f793c57cc5 intel/isl: Tighten up restrictions for CCS on gen7
It may technically be possible to enable some sort of fast-clear support
for at least the base slice of a 2D array texture on gen7.  However,
it's not documented to work, we've never tried to do it in GL, and we
have no idea what the hardware does if you turn on CCS_D with arrayed
rendering.  Let's just play it safe and disallow it for now.  If someone
really cares that much about gen7 performance, they can come along and
try to get it working later.
2017-07-22 20:12:07 -07:00
Chris Wilson
4aee05b6c6 i965/bufmgr: Add comments about GTT coherency issues.
(Patch written by Ken, but entirely comments written by Chris.)

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-22 19:34:48 -07:00
Kenneth Graunke
0044de931f i965: Drop non-LLC lunacy in the program cache code.
The non-LLC story was a horror show.  We uploaded data via pwrite
(drm_intel_bo_subdata), which would stall if the cache BO was in
use (being read) by the GPU.  Obviously, we wanted to avoid that.
So, we tried to detect whether the buffer was busy, and if so, we'd
allocate a new BO, map the old one read-only (hopefully not stalling),
copy all shaders compiled since the dawn of time to the new buffer,
upload our new one, toss the old BO, and let the state upload code
know that our program cache BO changed.  This was a lot of extra data
copying, and flagging BRW_NEW_PROGRAM_CACHE would also cause a new
STATE_BASE_ADDRESS to be emitted, stalling the entire pipeline.

Not only that, but our rudimentary busy tracking consistented of a flag
set at execbuf time, and not cleared until we threw out the program
cache BO.  So, the first shader upload after any drawing would hit this
"abandon the cache and start over" copying path.

This is largely unnecessary - it's just ancient and crufty code.  We can
use the same persistent mapping paths on all platforms.  On non-ancient
kernels, this will use a write combining map, which should be reasonably
fast.

One aspect that is worse: we do occasionally grow the program cache BO,
and copy the old contents to the newer BO.  This will suffer from UC
readback performance now.  To mitigate this, we use the MOVNTDQA based
streaming memcpy on platforms with SSE 4.1 (all Gen7+ atoms).  Gen4-5
are unfortunately going to be penalized.

v2: Add MOVNTDQA path, rebase on other map flag changes.
v3: Drop cache->bo_used_by_gpu too (caught by Chris Wilson).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
8bdbc0c5b9 i965: Set MAP_PERSISTENT on program cache buffers.
Chris Wilson pointed out that this mapping really is persistant.

Shouldn't actually have any effect today, but best to set it anyway.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
2e3d825982 i965: Correctly set MAP_WRITE when creating the LLC program cache map.
Using a read-only mapping is completely bogus - we use this mapping to
write all new shaders to the cache.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Matt Turner
f37ede40ba i965/bufmgr: Use write-combine mappings where available
Write-combine mappings give much better performance on writes than
uncached access through the GTT.

Improves performance of GFXBench 4's gl_driver2 benchmark at 1024x768
on Apollolake by 3.6086% +/- 0.674193% (n=15).

v2: (by Ken) Rebase on lockless mappings, map_count deletion, valgrind
    updates, potential for CPU/WC maps failing, and other changes.

v3: (by Ken and Chris Wilson)

    (Ken): Rebase on set_domain -> gem_wait
    (Chris): Fix up a failed CPU/WC mmaping with a GTT mapping

    Not all objects will be mappable for direct access by the CPU
    (either using WC/CPU or WC paths), for example, a dmabuf wrapping an
    object on a foreign device or an object wrapping access to stolen
    memory. Since either the physical pages are not known or even do not
    exist, we need to use the mediated, indirect access via the GTT. (If
    one day, the kernel does suddenly start providing mediated access
    via a regular WB/WC mmapping, we no longer need the fallback.)

v4: Avoid falling back for MAP_RAW (Chris).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
bdae2ddff8 i965/bufmgr: Skip wait ioctl when not busy.
If the buffer is idle, we I915_GEM_WAIT will return immediately,
so we may as well skip the ioctl altogether.  We can't trust the
"idle" flag for external buffers, but for most, it should be fine.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
38e2142f39 i965/bufmgr: Explicitly wait instead of using I915_GEM_SET_DOMAIN.
With the advent of asynchronous maps, domain tracking doesn't make a
whole lot of sense.  Buffers can be in use on both the CPU and GPU at
the same time.  In order to avoid blocking, we stopped using set_domain
for asynchronous mappings, which means that the kernel's tracking has
lies.  We can't properly track it in userspace either, as the kernel
can change domains on us spontaneously (for example, when un-swapping).

According to Chris Wilson, I915_GEM_SET_DOMAIN does the following:

1. pins the backing storage (acquiring pages outside of the
   struct_mutex)

2. waits either for read/write access, including inter-device waits

3. updates the domain, clflushing as required

4. marks the object as used (for swapping)

5. turns off FBC/PSR/fancy scanout caching

Item (1) is not terribly important.  Most BOs are recycled via the
BO cache, so they already have pages.  Regardless, we fixed this
via an initial set_domain in the previous patch.

We implement item (2) with I915_GEM_WAIT.  This has one downside:
we'll stall unnecessarily if we do a read-only mapping of a buffer
that the GPU is reading.  I believe this is pretty uncommon.  We
may want to extend the wait ioctl at some point.

Mesa already does item (3) itself.  For cache-coherent buffers (most on
LLC systems), we don't need to do any clflushing - the CPU and GPU views
are coherent.  For non-coherent buffers (most on non-LLC systems), we
currently only use the CPU for read-only maps, and we explicitly clflush
when necessary.

We don't care about item (4)...swapping has already killed performance.
Plus, with async maps, the kernel's domain tracking is already bogus,
so it can't do this accurately regardless.

Item (5) should be okay because we avoid cached maps of scanout buffers.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Kenneth Graunke
eb1497e968 i965/bufmgr: Allocate BO pages outside of the kernel's locking.
Suggested by Chris Wilson.

v2: Set the write domain to 0 (suggested by Chris).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 19:34:42 -07:00
Timothy Arceri
d91108b1f4 glsl: rework misleading block layout code
From the ARB_uniform_buffer_object spec:

   ""shared" uniform blocks, the default layout, ..."

This doesn't fix anything as the default layout is already applied
at this point but fixes the misleading code/comment.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-23 10:06:01 +10:00
Timothy Arceri
316b4c9ada glsl: remove placeholder comment
This was added in 2d03f48a65 and seems like it was intended
as a TODO comment in a function stub rather than a useful
code comment.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-23 10:06:01 +10:00
Brian Paul
b4debc0d69 st/mesa: use proper resource target type in st_AllocTextureStorage()
When we validate the texture sample count, pass the correct
pipe_texture_target for the texture, rather than PIPE_TEXTURE_2D.

Also add more comments about MSAA.

No piglit regressions with VMware driver.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22 13:18:56 -06:00
Brian Paul
aeade86db5 mesa: remove pointless assignments in init_teximage_fields_ms()
The NumSamples and FixedSampleLocation fields are set again later at
the end of the function so these earlier assignments aren't needed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22 13:18:56 -06:00
Neha Bhende
1820ef64c9 svga: Limit number of immediates in shader
imm {128.0, -128.0, 2.0, 3.0} is used for lit instruction which
is not used very frequently. So allocate it only if lit instruction is used.

Tested with mtt piglit and mtt glretrace

v2: As per Charmaine's comment

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22 13:18:56 -06:00
Charmaine Lee
83ca6b9d31 svga: fix constant indices for texcoord scale factors and texture buffer size
This patch fixes the ordering of the constant indices for texcoord scale
factor and texture buffer size to match the order they were added to the
constant buffer in svga_get_extra_constants_common().

Tested with MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-22 13:18:56 -06:00
Neha Bhende
acfb1583a5 svga: fix unnormalized->normalized texture coordinate conversion
Sometimes, converting unnormalized coordinates to normalized
coordinates requires an epsilon value to produce the right texels with
nearest filtering.  Adding 0.0001 to the coordinates when the min/mag
filter is nearest fixes the issue.
Fixes piglit test fbo-blit-scaled-linear

Tested with mtt-piglit, mtt-glretrace

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22 13:18:56 -06:00
Brian Paul
dc62ddfb39 svga: only support 4x, 8x, 16x msaa
Skip 2x MSAA, for example, since it's seldom used and just bloats
the list of pixel formats.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-22 13:18:56 -06:00
Brian Paul
922dc27273 mesa: include texture size in error messages
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-07-22 13:18:56 -06:00
Kenneth Graunke
665fd10396 i965: Support the mesa_no_error driconf option.
This allows us to override contexts to use no_error functionality
even if the applications themselves do not.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-22 11:42:42 -07:00
Jason Ekstrand
20533e0da7 anv/blorp: Assert isl_surf_init success in do_buffer_copy
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 08:21:27 -07:00
Jason Ekstrand
cf39fb06e3 anv/blorp: Explicitly set row_pitch in do_buffer_copy
We have a very specific row pitch that we want and we don't want ISL to
be changing it on us so just be explicit about it.

Fixes: a40f043034
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 08:20:07 -07:00
Kenneth Graunke
fd199fe4a8 i965: Delete gen8_draw_upload.c
For some reason we left an empty file, rather than deleting it.
2017-07-22 00:42:51 -07:00
Karol Herbst
f98a221f2d nv50/ir: disable mul+add to mad for precise instructions
fixes
    missrendering in TombRaider
    KHR-GL44.gpu_shader5.precise_qualifier
    KHR-GL45.gpu_shader5.precise_qualifier

v4: disable opt only for MAD, it's fine for SAD

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21 23:45:18 -04:00
Karol Herbst
f9bfc93014 nv50/ir/tgsi: handle precise for most ALU instructions
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21 23:45:18 -04:00
Karol Herbst
1d7c232fbd nv50/ir: add precise field to Instruction
v4: initialize field with NULL

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2017-07-21 23:45:18 -04:00
Karol Herbst
4ad9e2e17a st/glsl_to_tgsi: don't optimize mul+add to mad if expression is precise
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
c5cbb9a543 gallium/docs: add precise instruction modifier
v4: add comment about intermediate rounding step to MAD

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
4611343bcc tgsi/text: parse _PRECISE modifier
v2: use str_match_no_case to fix _SAT_PRECISE detection
v4: usd is_digit_alpha_underscore to match end of mods

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
d0dfdf704d tgsi: populate precise
Only implemented for glsl->tgsi. Other converters just set precise to 0.

v2: remove precise paramter from ureg_tex_insn and ureg_memory_insn

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
28a5e7104e st/glsl_to_tgsi: handle precise modifier
all subexpression inside an ir_assignment needs to be tagged as precise.

v2: make precise handling more global inside the visitor

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
0341aea2f8 tgsi/dump: print _PRECISE modifier on Instructions
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-21 23:45:18 -04:00
Karol Herbst
af22adee4f tgsi: add precise flag to tgsi_instruction
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-07-21 23:45:18 -04:00
Kenneth Graunke
30d6bc470a i965: Set lower_vote_trivial in vector_nir_options_gen6 too.
There's a second struct for Gen6+.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-21 18:09:01 -07:00
Dave Airlie
22bca8ef19 radv: reset non-syncobj semaphore context after wait.
When I ported from libdrm, I forgot to add the line to reset
the sem, we just need to reset the context.

This fixes a regression in DOOM.

Fixes: 9ac1432a57 ("radv: port to new libdrm API.")
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-22 00:03:26 +01:00
Charmaine Lee
5124bf9823 st/mesa: add destroy_drawable interface
With this patch, the st manager will maintain a hash table for
the active framebuffer interface objects. A destroy_drawable interface
is added to allow the state tracker to notify the st manager to remove
the associated framebuffer interface object from the hash table,
so the associated framebuffer and its resources can be deleted
at framebuffers purge time.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101829
Fixes: 147d7fb772 ("st/mesa: add a winsys buffers list in st_context")
Tested-by: Brad King <brad.king@kitware.com>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-20 17:34:34 -07:00
Dylan Baker
59a141c95a radv: rebase radv_entrypoints_gen.py on anv_entrypoints_gen.py
The two generators forked from each other, and they remain basically the
same. This rebases the radv version on the anv version, but with the
radv changes ported over. The result is that we get rid of the "cat |"
madness and gain mako, correct "generated by" attributions, and write
files out directly.

The only differences between the output is whitespace and comments.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-07-21 14:27:02 -07:00
Topi Pohjolainen
bf24c3539e i965/miptree: Clean-up unused
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
f5859b45b1 i965/miptree: Switch remaining surfaces to isl
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
38ddb3bc60 i965/miptree: Drop miptree_array_layout in get_isl_dim_layout()
This was only needed for checking gen6 stencil which is already
using isl. One could delete GEN6_HIZ_STENCIL layout altogether
but that will be gone with the rest after a while anyway.

The dim_layout converter is needed even after transition to isl
when setting up surface states - see brw_emit_surface_state().
Hence dropping the unneeded argument separately.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
61c95c94a0 i965/miptree: Relax size alignment for linear surfaces
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
bbd89c1951 i965/miptree: Store compression flag also for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
f8894fab02 i965/miptree: Check tex image allocation failures
allowing graceful failure instead of crash on assert later on.

This can be hit, for example, on SNB when trying to allocate
8kx8k CUBE_MAP against isl: x-tiled buffer size becomes
2421161984 exceeding the maximum of 1 << 31 == 2147483648.

Another way to hit this on SNB is with multisampling of over
64-bit formats.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
4aea4d6d64 main/teximage: Even on failure use valid format for init()
Otherwise init_teximage_fields_ms() (called by
_mesa_init_teximage_fields()) will always assert as it can't
find valid base format.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
fbfc6a2f67 intel/isl/gen7: Don't allow multisampled surfaces with valign2
There is the same constraintg later on as assert in
isl_gen7_choose_image_alignment_el() so catch it earlier in order
to return error instead of crash.

Needed to avoid crashes with piglits on IVB and HSW:

arb_internalformat_query2.image_format_compatibility_type pname checks
arb_internalformat_query2.all internalformat_<x>_type pname checks
arb_internalformat_query2.max dimensions related pname checks
arb_copy_image.arb_copy_image-formats --samples=2/4/6/8
arb_texture_float.multisample-fast-clear gl_arb_texture_float

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
df9bb8dc05 intel/isl/gen7: Allow msaa with signed integer formats
These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.

There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try 16I/32I in addition to GL_RGBA8I.
IvyBridge passed all tests with all sample numbers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
abb84e3f2d intel/isl/gen7: Allow msaa with 128-bit formats
These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.

There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try GL_RGBA16F/32F/16I/16UI/32I/32UI in addition to GL_RGBA/8I.
IvyBridge passed all tests with all sample numbers and even
with 128-bit formats.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
514d68576d intel/isl: Allow 1D surfaces with compressed formats
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
a40f043034 intel/isl: Align non-tiled horizontally by cache line
in order to support blit engine.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
75f95c710f i965/miptree/gen4: Prepare x-tiled fallback for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
4ea63fab77 i965/miptree: Prepare non-tiled fallback for isl based
See brw_miptree_choose_tiling().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Topi Pohjolainen
d84f929d85 i965/miptree: Prepare has_color_unresolved() for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-22 00:14:16 +03:00
Roland Scheidegger
dbde58dd31 gallivm: handle call attributes for llvm < 4.0 in lp_add_function_attr
We had some caller using LLVMAddInstrAttributes, which couldn't be
converted to lp_add_function_attr, because attributes were only handled
for functions in this case, so fix this.
For llvm >= 4.0, this already works correctly.
(radeonsi seems to avoid setting call site attributes prior to llvm 4.0,
the patch then citing it doesn't work when calling intrinsics. But at
least for calling external functions we always used that, albeit only
for actual call attributes, not call parameter attributes, though some
quick test shows llvm seems to handle that as well. The attribute index
is sort of iffy though, since attribute 0 of the call is the actual function,
attribute 1 corresponds to the first parameter of the called function.)
(Verified with GALLIVM_DEBUG=dumpbc plus llvm-dis that the correct
attributes are shown for calls, both for llvm 4.0 and 3.3.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-21 22:46:04 +02:00
Alex Smith
af9d6a8a99 radv: Generate storage image descriptors unconditionally
We can also use storage images internally for resolves, which don't
require TRANSFER_DST usage on the image, so currently we may not create
the needed descriptors.

Just create these descriptors unconditionally.

Fixes: 0e1886efb9 ("radv: Fix descriptors for cube images with VK_IMAGE_USAGE_STORAGE_BIT")
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-22 06:40:29 +10:00
Tim Rowley
d1e7153228 swr/rast: quit using linux-specific gettid()
Linux-specific gettid() syscall shouldn't be used in portable code.
Fix does assume a 1:1 thread:LWP architecture, but works for our
current target platforms and can be revisited later if needed.

Fixes unresolved symbol in linux scons builds.

v2: add comment in code about the 1:1 assumption.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-21 15:37:56 -05:00
Dave Airlie
eaa56eab6d radv: initial support for shared semaphores (v2)
This adds support for sharing semaphores using kernel syncobjects.

Syncobj backed semaphores are used for any semaphore which is
created with external flags, and when a semaphore is imported,
otherwise we use the current non-kernel semaphores.

Temporary imports from syncobj fd are also available, these
just override the current user until the next wait, when the
temp syncobj is dropped.

v2: allocate more chunks upfront, fix off by one after
previous refactor of syncobj setup, remove unnecessary null
check.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-21 21:31:54 +01:00
Dave Airlie
b5670beb31 radv/winsys: add syncobj hooks
This just adds syncobj create/destroy/export/import paths into
the winsys interface.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-21 21:31:54 +01:00
Dave Airlie
80562f2b77 ac/gpu: add code to detect if kernel supports sync objects.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-21 21:31:54 +01:00
Tim Rowley
3e03ecaaf6 swr/rast: fix memory paths for avx512 optimized avx/sse
Source/destination will not be AVX512 aligned, use the
unaligned load/store intrinsics.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-21 15:13:14 -05:00
Tim Rowley
2656a940c2 swr/rast: cache line align hottile buffers
Prevents unalignment crashes with avx512 code on gcc/clang.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-21 15:13:08 -05:00
Tim Rowley
6970f48b6e swr/rast: simdlib changes for clang/gcc
Tested with clang-4.0 and gcc-6.3.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-21 15:12:00 -05:00
Wladimir J. van der Laan
c27cbd88e6 etnaviv: Avoid duplicates in formats table
Remove the following duplicates from the formats table:

- R8G8B8A8_UNORM (V_,_T)
- R8G8B8X8_UNORM (_T,_T)
- DXT3_RGBA (_T,_T)

Only the first has an effect because the _T overrides the V_ initializer,
the latter two were harmless duplications of the same.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-07-21 14:41:07 +02:00
Wladimir J. van der Laan
322b34e57e etnaviv: Add support for ETC2 texture compression
Add support for ETC2 compressed textures in the etnaviv driver.

One step closer towards GL ES 3 support.

For now, treat SRGB and RGB formats the same. It looks like these are
distinguished using a different bit in sampler state, and not part of
the format, but I have not yet been able to confirm this for sure.

(Only enabled on GC3000+ for now, as the GC2000 ETC2 decoder
implementation is buggy and we don't work around that)

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-07-21 12:18:35 +02:00
Wladimir J. van der Laan
c8fe372a15 gallium/util: Implement util_format_is_etc
This is the equivalent of util_format_is_s3tc, but for ETC.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-07-21 12:17:45 +02:00
Chih-Wei Huang
c1a29e104c Android: fix spirv_info.c generation
It's incorrect to use $(LOCAL_PATH) in makefile recipes since it's
changing. The typical way to handle it is to use private variable.
Fortunately in this case we can just simplify them to $^.

See further:
https://patchwork.freedesktop.org/patch/167718/

Also simplify LOCAL_GENERATED_SOURCES.

Fixes: 2dd4e2ec (spirv: Generate spirv_info.c)

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-21 08:48:45 +03:00
Tapani Pälli
b78563f0d0 android: fix libmesa_nir build
current build did not find required include 'spirv_info.h'

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-21 08:47:56 +03:00
Matt Turner
aff108f2fd nir: Optimize find_lsb/imsb/umsb error checks
Two of the ARB_shader_ballot piglit tests hit the find_lsb case,
removing some of the noise allowed me to better debug the test when it
was failing.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-07-20 16:56:50 -07:00
Matt Turner
069bf7c907 i965/fs: Match destination type to size for ballot
No use in taking a 64-bit value when we know the high 32-bits are zero.
2017-07-20 16:56:50 -07:00
Matt Turner
1038d385a9 nir: Reduce destination size of ballot intrinsic when possible
Some hardware, like i965, doesn't support group sizes greater than 32.
In that case, we can reduce the destination size of the ballot
intrinsic, which will simplify our code generation.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
51c1659af8 i965: Enable ARB_shader_ballot on Gen8+
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
782ef30451 i965/fs: Implement ARB_shader_ballot operations
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
8238930510 i965/fs: Do not move MOVs writing the flag outside of control flow
The implementation of ballotARB() will start by zeroing the flags
register. So, a doing something like

        if (gl_SubGroupInvocationARB % 2u == 0u) {
                ... = ballotARB(true);
		[...]
        } else {
                ... = ballotARB(true);
		[...]
	}

(like fs-ballot-if-else.shader_test does) would generate identical MOVs
to the same destination (the flag register!), and we definitely do not
want to pull that out of the control flow.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Francisco Jerez
f1b7c47913 i965/fs: Handle explicit flag sources in flags_read()
The implementations of the ARB_shader_ballot intrinsics will explicitly
read the flag as a source register.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-20 16:56:49 -07:00
Matt Turner
3e7b8f6cd4 nir: Add pass to scalarize read_invocation/read_first_invocation
i965 will want these to be scalar operations.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
43ef75b394 nir: Add system values from ARB_shader_ballot
We already had a channel_num system value, which I'm renaming to
subgroup_invocation to match the rest of the new system values.

Note that while ballotARB(true) will return zeros in the high 32-bits on
systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
variables do not consider whether channels are enabled. See issue (1) of
ARB_shader_ballot.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
636fe4d1c6 nir: Add intrinsics from ARB_shader_ballot
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
41437f1b77 i965: Enable ARB_shader_group_vote 2017-07-20 16:56:49 -07:00
Matt Turner
ee9fa4ac18 i965/fs: Implement ARB_shader_group_vote operations
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Francisco Jerez
93dc736f4e i965/fs: Handle explicit flag destinations in flags_written()
The implementations of the ARB_shader_group_vote intrinsics will
explicitly write the flag as the destination register.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-20 16:56:49 -07:00
Matt Turner
30b72f4126 i965/vec4: Lower ARB_shader_group_vote intrinsics
I don't expect anyone is going to care about using this in vec4 programs
(vertex/tessellation/geometry on Gen6/7), no one has come up with a good
way to implement it much less test it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
742cc6118a nir: Support lowering vote intrinsics
... trivially (as allowed by the spec!) by reusing the existing
nir_opt_intrinsics code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
d4c9d6a3b2 nir: Add pass to optimize intrinsics
Specifically, constant fold intrinsics from ARB_shader_group_vote, but I
suspect it'll be useful for other things in the future.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Matt Turner
ba2fbbf1c0 nir: Add intrinsics from ARB_shader_group_vote
These are intrinsics rather than opcodes, because they operate across
channels.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:56:49 -07:00
Chris Wilson
0e6ad379dd i965: Rename batch->exec_objects to validation_list
Within i965, we have many different objects and confusingly when
submitting an execbuf we have lists of both our internal objects and a
list of the kernel's drm_i915_gem_exec_object with very similar names.
Rename the kernel's validation list to avoid the collison as it is only
used for interfacing with the kernel and so a peripheral use of
"object".

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-20 16:15:32 -07:00
Kenneth Graunke
8696c3e997 Revert "i965: Call intel_prepare_render() from intel_update_state()"
This reverts commit b7153c3e9f.

The point of that commit was to ensure intel_prepare_render() occurred
before color resolves on the current framebuffer.  In 0673bbfd9b
(i965: Move surface resolves back to draw/dispatch time), Jason moved
brw_predraw_resolve_framebuffer back to draw time, which is already
after a intel_prepare_render() call.  So, this is no longer necessary.

Furthermore, it caused problems.  "mpv" would only display a small
corner of movies, and Android started failing camera CTS tests.

This is because intel_prepare_render() ended up handling DRI2 events
which caused the drawable to be resized at an inopportune time, flagging
ctx->NewState |= _NEW_BUFFERS, but at a point where we've already copied
ctx->NewState, and failed to notice the newly set flag.

The lack of _NEW_BUFFERS caused us to skip 3DSTATE_DRAWING_RECTANGLE,
so the drawing ended up being clipped to an outdated framebuffer size.

Just drop the hack and go back to handling this at the proper time.

Thanks to Matti Hämäläinen (ccr), Tomasz Figa (tfiga), and Tapani Palli
for reporting these issues.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101558
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101704
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2017-07-20 16:10:10 -07:00
Samuel Pitoiset
e87e4f239f mesa: remove useless assert in _mesa_TextureView()
Already checked in _mesa_choose_texture_format().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:36 +02:00
Samuel Pitoiset
1ebe4305fd mesa: remove duplicated code around framebuffer_renderbuffer()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:34 +02:00
Samuel Pitoiset
0752428a32 mesa: remove one extra check in _mesa_DeleteTextures()
Already checked above.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:32 +02:00
Samuel Pitoiset
ca7085061d mesa: make _mesa_generate_texture_mipmap() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:30 +02:00
Samuel Pitoiset
a1819704c8 mesa: inline save_array_object()
No need to check if ID is not 0 because _mesa_HashFindFreeKeyBlock()
can't generate this value.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:28 +02:00
Samuel Pitoiset
015c6eba52 mesa: inline remove_array_object()
No need to check if ID is not 0 because _mesa_lookup_vao()
already prevents this to happen.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:26 +02:00
Samuel Pitoiset
ea13aa8530 mesa: tidy up _mesa_DeleteVertexArrays()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:24 +02:00
Samuel Pitoiset
1c6c42c289 mesa: remove useless assert in texture_storage()
Already checked in _mesa_choose_texture_format().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:22 +02:00
Samuel Pitoiset
f95420d74e mesa: pass the 'caller' function to texstorage()
To be consistent with texturestorage().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:20 +02:00
Samuel Pitoiset
9f9441535a mesa: make _mesa_texture_storage() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-20 16:14:18 +02:00
Topi Pohjolainen
67b53ee418 i965: Represent depth surfaces with isl
v2 (Jason):
   - s/separate_stencil_surface/make_separate_stencil_surface/
   - drop the check for separate stencil when wrapping an
     existing buffer object with miptree. This is dead code as
     the first needs_separate_stencil() checks is
     MIPTREE_LAYOUT_FOR_BO-flag and says no.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
05232a2361 i965: Drop redundant check for non-tiled depth buffer
Depth buffers are always Y-tiled. In brw_miptree_choose_tiling()
driver opts to use linear buffers for small and 1D but this does
not apply for depth - GL_DEPTH_COMPONENT and GL_DEPTH_STENCIL_EXT
are considered first.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
c4ac0d4949 intel/isl/gen4: Represent cube maps with 3D layout
v2 (Jason): Check for !ISL_SURF_DIM_3D instead of CUBE_BIT.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
f9d3880346 i965/miptree: Prepare 3D surfaces with physical 2D layout
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
ba4d0593f9 i965/miptree: Prepare aux state map for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
bec048d9e2 i965/miptree: Represent y-tiled stencil copies with isl
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
f69a2ffe44 i965/miptree: Represent w-tiled stencil surfaces with isl
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
c84cb81771 i965/miptree: Prepare compressed offsets for isl based
v2 (Jason): Simply switch to isl_surf_get_image_offset_el()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
40e75aba73 i965/miptree: Add support for imported bo offsets for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
0f795effe5 i965/fbo: Add support for isl-based miptrees in rb wrapper
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
066dc9335e i965: Prepare image setup from miptree for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
76a3ce8fa5 i965: Prepare tex, img and rt state emission for isl based miptrees
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
63a43f4161 i965: Refactor miptree to isl converter and adjustment
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
f1caa6194e i965: Prepare tex (sub)image for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
7e5c8e593b i965/wm: Prepare image surfaces for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
59bf765c36 i965/wm: Fix number of layers in 3D images
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
07caa5932c i965/miptree: Prepare intel_miptree_copy() for isl based
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
a844e6a8f4 i965: Prepare blit engine for isl based miptrees
v2: Do not concern cpp, pitch and tiling which are already
    transitioned.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
8e1494f139 i965/miptree: Store chars-per-pixel even for isl based
This will significantly reduce chrun when switching remaaining
surface types to isl. After the full transition it will be easier
to calculate on-demand and drop the helper member in miptree.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
b95caac539 i965/miptree: Switch to isl_surf::row_pitch
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
37152a5596 i965/miptree: Take interleaving into account in stencil pitch
This makes intel_mipmap_tree::pitch and isl_surf::row_pitch
semantically equivalent.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
43c3b5b523 i965/miptree: Switch to isl_surf::tiling
v2 (Daniel): Use isl tiling converters instead of introducing local.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
171b72542c intel/isl: Add i915 to isl_tiling converter
v2: s/i915_tiling_to_isl_tiling(/isl_tiling_from_i915_tiling/

Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
d8521b9960 i965/miptree: Use isl_tiling_to_i915_tiling()
and drop local copy.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
a92e6ff315 i965/miptree: Switch to isl_surf::samples
v2 (Jason):
   - Don't trigger miptree re-creation in vain later on with ISL
     based. Core GL uses zero to indicate single sampled while
     ISL uses one - this would cause intel_miptree_match_image()
     to always fail.
   - Now that native miptree is already using sample number of
     one, there is no need for MAX2() when converting to ISL.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
76e2f390f9 i965/miptree: Use num_samples of 1 instead of 0 for single-sampled
Patch moves "assert(brw->num_samples <= 16)" from
emit_3dstate_multisample2() to upload_multisample_state(). Latter
is the only caller of the former and passes "brw->num_samples"
as argument. Therefore it is clearer to assert in the caller.

Possible bug fix in genX(emit_3dstate_multisample2) which
doesn't have a case for num_samples == 0 in the switch
statement.

It should be noted that intel_miptree_map()/unmap() now checks
additionally for "mt->surf.samples == 1" in order to support gen6
stencil which is already transitioned to ISL. This will go away in
next patch when native miptrees start to use isl_surf::samples as
well.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Topi Pohjolainen
0e8b81af7b i965/miptree: Switch to isl_surf::msaa_layout
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-20 11:32:21 +03:00
Bas Nieuwenhuizen
21d777a122 radv: Add support for VK_KHR_variable_pointers.
Just a trivial enable.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-07-20 09:13:01 +02:00
Bas Nieuwenhuizen
31469c0265 radv: Add VK_KHR_storage_buffer_storage_class support.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-07-20 09:13:01 +02:00
Brian Paul
98240f6399 mesa: check API profile for GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION
If we have a compat profile context, it means that GL_QUADS[_STRIP] are
supported so this query makes sense.  It's also legal for 3.2 core profile
because of a spec bug.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-07-19 20:09:09 -06:00
Dave Airlie
9ac1432a57 radv: port to new libdrm API.
This bumps the libdrm requirement for amdgpu to the 2.4.82.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-20 01:56:04 +01:00
Dave Airlie
aee382510e radv: introduce some wrapper in cs code to make porting off libdrm_amdgpu easier.
This just introduces a central semaphore info struct, and passes it around,
and introduces some wrappers that will make porting off libdrm_amdgpu easier.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-20 01:55:36 +01:00
Tim Rowley
1cb5a6061c configure/swr: add KNL and SKX architecture targets
Not built by default.  Currently only builds with icc.

v2:
 * document knl,skx possibilities for swr_archs
 * merge with changed loader lib selection code

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 15:12:07 -05:00
Tim Rowley
f42186b01d configure/swr: configurable swr architectures
Allow configuration of the SWR architecture depend libraries
we build for with --with-swr-archs.  Maintains current behavior
by defaulting to avx,avx2.

Scons changes made to make it still build and work, but
without the changes for configuring which architectures.

v2:
 * add missing comma for swr_archs default
 * check that at least one architecture is enabled
 * modify loader logic to make it clearer how to add archs

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 15:12:07 -05:00
Tim Rowley
131b9f644c gallium/util: fix nondeterministic avx512 detection
cpuid.7 requires cx=0 to select the extended feature leaf.

avx512 detection was using the non-indexed cpuid resulting
in random non-detection of avx512.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-07-19 15:12:07 -05:00
Marek Olšák
19c101f704 drirc: whitelist War Thunder (Wine) for glthread
Nominated by František Zatloukal <zatloukal.frantisek@gmail.com>
2017-07-19 16:14:47 -04:00
Andres Gomez
9375c1d896 travis: add missing wayland-protocols
> checking for WAYLAND... no
>
> configure: error: Package requirements (wayland-client >= 1.11 wayland-server >= 1.11 wayland-protocols >= 1.8) were not met:
>
> No package 'wayland-protocols' found
>
> Consider adjusting the PKG_CONFIG_PATH environment variable if you
> installed software in a non-standard prefix.
>
> Alternatively, you may set the environment variables WAYLAND_CFLAGS
> and WAYLAND_LIBS to avoid the need to call pkg-config.
> See the pkg-config man page for more details.

Also, added extra path to PKG_CONFIG_PATH env variable.

Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 22:17:41 +03:00
Chad Versace
5d69052113 anv/image: Fix VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT
We incorrectly detected VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT.  We looked
for the bit in VkImageCreateInfo::usage, but it's actually in
VkImageCreateInfo::flags.

Found by assertion failures while enabling VK_ANDROID_native_buffer.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-19 11:25:50 -07:00
Andres Gomez
80a0c9745c docs: update master's release notes, news and calendar commit
This reflects closer what we are actually doing.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 19:10:10 +03:00
Andres Gomez
e6f455646a docs: avoid overwrite of LD_LIBRARY_PATH during basic testing
The LD_LIBRARY_PATH environment variable could be already defined so
we extend it and restore it rather than just overwriting it.

v2:
 - Unset the __old_ld helper variable when we are done with it.
 - Corrected test for and escaping of variables (Eric).

v3: Remove unneeded variable (Emil).

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 19:10:10 +03:00
Andres Gomez
8c1d87b251 docs: add instructions to specify LLVM version for basic testing
The "Perform basic testing" and "Use the release.sh script from xorg
util-modular" sections provide some instructions to do so. We add now
some comments in order to use a recent enough LLVM version to run
dist/distcheck and the automake generated binaries.

v2: Suggested the need to define LLVM_CONFIG also before running the
    release.sh script.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 19:10:10 +03:00
Eric Engestrom
50d478036a egl: fix line continuation
Trailing space after the backslash meant the rest of the AM_CFLAGS lines
were no longer included.
This has been silently ignored because of the next line starting with
a `-` dash, instructing make to be silent about that line.

Fixes: 02cc359372 "egl/wayland: Use linux-dmabuf interface for buffers"
Cc: Daniel Stone <daniels@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-19 15:51:54 +01:00
Eric Engestrom
21c2aca6b7 gbm: fix typo
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-19 15:51:54 +01:00
Eric Engestrom
8616a9bd35 configure.ac: fix whitespace
Whitespace-only change (`diff -w` is empty).

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-19 15:51:54 +01:00
Lucas Stach
c8a0660ab4 etnaviv: advertise supported dmabuf modifiers
Simply advertise all supported modifiers, independent of the format.
Special formats, like compressed, which don't support all those modifiers
are already culled from the dmabuf format list, as we don't support
the render target binding for them.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-19 16:26:50 +02:00
Lucas Stach
58c3ce071c etnaviv: implement resource creation with modifier
This allows to create buffers with a specific tiling layout, which is primarily
used by GBM to allocate the EGL back buffers with the correct tiling/modifier
for use with the scanout engines.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-19 16:26:50 +02:00
Lucas Stach
d06cfaf4fc etnaviv: fill in modifier in etna_resource_get_handle
This allows the state trackers to know the tiling layout of the
resource and pass this through the various userspace protocols.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-19 16:26:50 +02:00
Lucas Stach
eebf6ee6e9 etnaviv: fold etna_screen_bo_get_handle into etna_resource_get_handle
There is no point in keeping this indirection. Makes the code easier to
follow.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com> (v1)
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-19 16:26:50 +02:00
Lucas Stach
8a44aa5043 etnaviv: implement resource import with modifier
This implements resource import with modifier, deriving the correct
internal layout from the modifier and constructing a render compatible
base resource if needed.

This removes the special cases for DDX and renderonly scanout allocated
buffers, as the linear modifier is enough to trigger correct handling
of those buffers.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-07-19 16:26:49 +02:00
Lucas Stach
605007d5c7 etnaviv: also update textures from external resources
This reworks the logic in etna_update_sampler_source to select the
newest resource view for updating the texture view. This should make
the logic easier to follow and fixes texture updates from imported
dma-bufs.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-07-19 16:26:49 +02:00
Lucas Stach
836d22a2fb etnaviv: increment correct seqno for external resources
If we import a dma-buf with a sampler/pixel pipe incompatible modifier,
the imported buffer will end up in an external resource view. As
resource_changed signals the change of the imported resource, we need
to update the external view seqno, instead of the base resource seqno.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-07-19 16:26:49 +02:00
Lucas Stach
b158ccf1d9 etnaviv: pad scanout buffer size to RS alignment
This fixes failures to import the scanout buffer with screen resolutions
that don't satisfy the RS alignment restrictions, like 1680x1050.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-07-19 16:26:49 +02:00
Lucas Stach
68ec876a25 etnaviv: add helper to work out RS alignment
The minimum RS alignment calculation is needed in various places.
Extract a helper to avoid open-coding the calcuation at every site.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-07-19 16:26:49 +02:00
Lucas Stach
c481880899 renderonly/etnaviv: stop importing resource from renderonly
The current way of importing the resource from renderonly after allocation
is opaque and is taking away control from the driver, which it needs in
order to implement more advanced scenarios than the simple linear scanout
with matching stride alignments.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
2017-07-19 16:26:49 +02:00
Lucas Stach
a9fad437f7 configure.ac: bump required etnaviv libdrm version to 2.4.82
The following changes need the modifier definitions for the Vivante tiled
formats, which are shipped with libdrm 2.4.82.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-07-19 16:26:49 +02:00
Emil Velikov
b359957469 dri/common: use designated initializers for OptConfElems
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-19 13:28:54 +01:00
Tomasz Figa
107b9c70d0 gallium: auxiliary: Fix standalone Android build of u_cpu_detect (v2)
Commit 463b7d0332c5("gallium: Enable ARM NEON CPU detection.")
introduced CPU feature detection based Android cpufeatures library.
Unfortunately it also added an assumption that if PIPE_OS_ANDROID is
defined, the library is also available, which is not true for the
standalone build without using Android build system.

Fix it by defining HAS_ANDROID_CPUFEATURES in Android.mk and replacing
respective #ifdefs to use it instead.

v2:
 - Add a comment explaining why the separate flag is needed (Emil).

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 13:28:23 +01:00
Emil Velikov
644ac2b780 egl: propagate EGL_BAD_ATTRIBUTE during EGLImage attr parsing
Earlier commit refactored/split the parsing into separate hunks.
While no functional change was intended, it did not attribute that
different error is set when the attrib. value is incorrect.

Fixes:  3ee2be4113 ("egl: split _eglParseImageAttribList into per
extension functions")
Cc: Michel Dänzer <michel@daenzer.net>
Reported-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-19 13:06:50 +01:00
Emil Velikov
a0755f2e6a swr: remove unneeded fallback strcasecmp define
The last user of the function was removed with earlier commit.

Fixes: 50842e8a93 ("swr: replace gallium->swr format enum conversion")
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-07-19 13:06:50 +01:00
Emil Velikov
8e25e23dae st/dri: list __DRI2_FENCE extension only where needed
The extension should be present (if applicable) in the list returned by
getExtensions(). AFAICT no loader has ever looked for it in
__driDriverExtensions/__driDriverGetExtensions.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-07-19 13:06:50 +01:00
Emil Velikov
7791949dad swrast: add dri2ConfigQueryExtension to the correct extension list
The extension should be in the list as returned by getExtensions().
Seems to have gone unnoticed since close to nobody wants to change the
vblank mode for the software driver.

v2: Rebase

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
2017-07-19 13:06:50 +01:00
Emil Velikov
225e45d45c radeon: remove local vblank_mode option
Analogous to previous commits.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-07-19 13:06:50 +01:00
Emil Velikov
6ba3fd2d6d i915: remove local vblank_mode option
Analogous to previous commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-07-19 13:06:50 +01:00
Emil Velikov
b655205ff2 i965: remove local vblank_mode option
The option is only queried from the loader, which has access to the
dri common code in src/mesa/drivers/dri/common/.

One could grant the loader access to brw_config_options but even
then, having the same option in both places is not a good idea.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-07-19 13:06:50 +01:00
Gwan-gyeong Mun
3f6cc931eb egl/dri2: remove unused buffer_count variable
It removes unused buffer_count variable from dri2_egl_surface.
And it polishes the assert of dri2_drm_get_buffers_with_format().

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 13:06:50 +01:00
Gwan-gyeong Mun
faada25f47 egl/drm: Format code in platform_drm.c according to style guide.
This is a tiny housekeeping patch which does the following:
  * Limit lines to 78 or fewer characters.
According to the mesa coding style guidelines.

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 13:06:50 +01:00
Gwan-gyeong Mun
7c89585551 egl/drm: add going out of the loop when the designated buffer is found
Because the color_buffers have a each unique bo, if the designated buffer is
found, release_buffer() can go out the loop which seaches the buffer.

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 13:06:50 +01:00
Gwan-gyeong Mun
89505f7ead gbm: fix typo in doxygen comment
This fixes the misspelling of gbm_bo_import api param.

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-19 13:06:50 +01:00
Daniel Stone
46dace14ff egl: Add MKDIR_GEN definition
Adding linux-dmabuf Wayland protocol files as generated did the right
thing, by prepending $(MKDIR_GEN) so autotools didn't try to write into
a build directory which didn't yet exist.

Unfortunately MKDIR_GEN needs to be defined in every Makefile it's used
in (which we do now), or alternately defined and substituted in
configure.ac (which we don't do), and src/egl/ didn't actually have it
from either method. As unset variables expand to nothing, it was
silently being skipped.

Copy & paste the defintion to make sure drivers/dri2/ exists before we
try to generate files into it.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Nick Sarnie <commendsarnex@gmail.com>
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
2017-07-19 13:05:02 +01:00
Kenneth Graunke
2412c4c81e util: Make CLAMP turn NaN into MIN.
The previous implementation of CLAMP() allowed NaN to pass through
unscathed, by failing both comparisons.  NaN isn't exactly a value
between MIN and MAX, which can break the assumptions of many callers.

This patch changes CLAMP to convert NaN to MIN, arbitrarily.  Callers
that need NaN to be handled in a specific manner should probably open
code something, or use a macro specifically designed to do that.

Section 2.3.4.1 of the OpenGL 4.5 spec says:

   "Any representable floating-point value is legal as input to a GL
    command that requires floating-point data. The result of providing a
    value that is not a floating-point number to such a command is
    unspecified, but must not lead to GL interruption or termination.
    In IEEE arithmetic, for example, providing a negative zero or a
    denormalized number to a GL command yields predictable results,
    while providing a NaN or an infinity yields unspecified results."

While CLAMP may apply to more than just GL inputs, it seems reasonable
to follow those rules, and allow MIN as an "unspecified result".

This prevents assertion failures in i965 when running the games
"XCOM: Enemy Unknown" and "XCOM: Enemy Within", which call

   glTexEnv(GL_TEXTURE_FILTER_CONTROL_EXT, GL_TEXTURE_LOD_BIAS_EXT,
            -nan(0x7ffff3));

presumably unintentionally.  i965 clamps the LOD bias to be in range,
and asserts that it's in the proper range when converting to fixed
point.  NaN is not, so it crashed.  We'd like to at least avoid that.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-07-18 23:48:46 -07:00
Kenneth Graunke
0320bb2c6c nir: Use nir_src_copy instead of direct assignments.
If the source is an indirect register, there is ralloc'd data.  Copying
with a direct assignment will copy the pointer, but the data will still
belong to the old instruction's memory context.  Since we're lowering
and throwing away instructions, that could free the data by mistake.

Instead, use nir_src_copy, which properly handles this.

This is admittedly not a common case, so I think the bug is real,
but unlikely to be hit.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-18 23:44:50 -07:00
Timothy Arceri
57165f2ef8 glsl: disable array splitting for AoA
While it produces functioning code the pass creates worse code
for arrays of arrays. See the comment added in this patch for more
detail.

V2: skip splitting of AoA of matrices too.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-19 11:06:23 +10:00
Timothy Arceri
3f0fb23b03 nir: fix nir_opt_copy_prop_vars() for arrays of arrays
Previously we only incremented the guide for a single
dimension/wildcard.

V2: rework logic to avoid code duplication

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
2017-07-19 11:06:23 +10:00
Jason Ekstrand
ecf91898e0 nir/vars_to_ssa: Handle missing struct members in foreach_deref_node
This can happen if, for instance, you have an array of structs and there
are both direct and wildcard references to the same struct and some
members only have direct or only have indirect.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
2017-07-19 11:06:23 +10:00
Kenneth Graunke
d9015b1eab i965/blorp: Use the return value of brw_emit_reloc.
This guarantees that the value written in the batch matches the
value recorded in the relocation entry.

(Chris Wilson wrote an identical patch as well.)
2017-07-18 15:53:33 -07:00
Kenneth Graunke
77844406d5 i965: Delete dead brw_program_reloc function.
Rafael eliminated the last use of brw_program_reloc recently.
2017-07-18 15:45:26 -07:00
Rafael Antognolli
d883ec0400 i965: Convert WM_STATE to genxml on gen4-5.
The code doesn't get exactly a lot simpler but at least it is in a single
place, and we delete more than we add.

Another good point is that you get rid of struct brw_wm_unit_state
which was a third mechanism for encoding GEN state. We used to have
GENXML, manual packing and these bitfield structs. Now we're down to
just GENXML and some manual packing. (Khristian)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-18 15:45:26 -07:00
Rafael Antognolli
e490382326 i965: Convert CLIP_STATE to genxml.
Add the code into its own function and atom, since almost nothing is
shared with GEN >= 6.

v2: Split GEN <=5 and GEN >= 6 into separate functions (Ken).
v3: Minor tidying by Ken.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-18 15:45:26 -07:00
Daniel Stone
02cc359372 egl/wayland: Use linux-dmabuf interface for buffers
When available, use the zwp_linux_dambuf_v1 interface to create buffers,
which allows multiple planes and buffer modifiers to be used.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 22:16:21 +01:00
Daniel Stone
cfaca5742e egl/wayland: Remove duplicate wl_buffer creation code
Now create_wl_buffer is generic enough, we can use it for the
EGL_WL_create_wayland_buffer_from_image extension.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 22:16:21 +01:00
Daniel Stone
6595c69951 egl/wayland: Remove more surface specifics from create_wl_buffer
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 22:16:21 +01:00
Daniel Stone
c4a1c7a2eb egl/wayland: Make create_wl_buffer more generic
Remove surface-specific code from create_wl_buffer, so it's now just a
generic translation from DRIimage to wl_buffer.

Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 22:16:20 +01:00
Daniel Stone
7f157a21f1 gbm: Remove is_planar_format dead code
This was only used in create_dumb() to blacklist planar formats.
However, the start of the function already whitelists ARGB8888 (cursor)
and XRGB8888 (scanout), and nothing else. So this entire function can be
removed.

Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 22:16:20 +01:00
Daniel Stone
7ac09e0c55 gbm: Check harder for supported formats
Luckily no-one really used the is_format_supported() call, because it
only supported three formats.

Also, since buffers with alpha can be displayed on planes, stop banning
them from use.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 22:16:20 +01:00
Daniel Stone
2ede894384 gbm: Pull out FourCC <-> DRIimage format table
Rather than duplicated (yet asymmetric) open-coded tables, pull them out
to a common structure.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 22:16:20 +01:00
Daniel Stone
6f8d8b17a1 gbm: Axe buffer import format conversion table
Wayland buffers coming from wl_drm use the WL_DRM_FORMAT_* enums, which
are identical to GBM_FORMAT_*. Similarly, FD imports do not need to
convert between GBM and DRI FourCC, since they are (almost) completely
compatible.

This widens the formats accepted by gbm_bo_import() when importing
wl_buffers; previously, only XRGB8888, ARGB8888, RGB565 and YUYV were
supported.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 22:16:20 +01:00
Topi Pohjolainen
28ccf8587e i965/gen4: Set tile offsets to zero after depth rebase
Current logic calls intel_renderbuffer_set_draw_offset() which in
turn tries to calculate x and y offset against layer/level settings
that are against the original miptree actually having sufficient
levels/layers. This returns correctly x=0 y=0 regardless of the given
layer/level only because one calls intel_miptree_get_image_offset()
which goes and consults miptree offset table which in turn luckily
contains entries for max-mipmap levels, all initialised to zero even
in case of non-mipmapped.

This patch stops consulting the table and simply sets the draw
offsets to zero that are compatible with the single slice miptree
backing the renderbuffer.
This prepares for ISL based miptrees that calculate offsets
on-demand and do not tolerate levels beyond what the miptree has.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:36:13 +03:00
Topi Pohjolainen
7507563291 i965: Refactor check for separate stencil
v2 (Jason): s/needs_stencil/needs_separate_stencil/

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:36:13 +03:00
Topi Pohjolainen
0926fb69a4 intel/blorp/gen4: Drop cube map flag for single face copy
This will falsely trigger an assert on number of layers once
isl is used for 3D layouts of Gen4 cube maps.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:36:13 +03:00
Topi Pohjolainen
91608e4ac1 i965/wm: Use level offsets directly
dropping dependency to slice table.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:36:13 +03:00
Topi Pohjolainen
5ff1d76caa i965: Use offset helper in intel_readpixels_tiled_memcpy()
providing support for isl based.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:36:13 +03:00
Topi Pohjolainen
b8d63f50ee i965/miptree: Pass flags instead of explicit tiling to surface creator
allowing one to use isl tiling filter.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:36:13 +03:00
Topi Pohjolainen
f23599fa5b i965/miptree: Add pitch override for imported buffer objects
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:36:13 +03:00
Topi Pohjolainen
d401bd3e8f i965/miptree: Stop setting total_width/height for existing bo
Now that image surface vertical slice calculator doesn't depend
on total_height, total dimensions are only needed when new buffer
objects are created. Therefore one can safely ignore them when
miptrees are created for already exisiting buffer objects.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:36:13 +03:00
Topi Pohjolainen
0a287e4501 i965/wm: Use isl for filling tex image parameters
This helps to drop dependency to miptree::total_height which is
used in brw_miptree_get_vertical_slice_pitch().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:36:03 +03:00
Topi Pohjolainen
4733891e51 intel/isl: Take 3D surfaces into account in image params
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:35:44 +03:00
Topi Pohjolainen
2309363868 i965/miptree: Check for miptree_create() failures
Rest of the function assumes it always succeeds.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:35:03 +03:00
Topi Pohjolainen
0320d9bd84 i965/miptree: Do not rely on msaa type to decide if aux is needed
Once the driver moves to ISL both compressed and uncompressed have
the same type. One needs to tell them apart by other means. This
can be done by checking the existence of mcs_buf.

There is a short period of time within intel_miptree_create()
where mcs_buf doesn't exist yet (between calls to
intel_miptree_create_layout() and intel_miptree_alloc_mcs()).
First compute_msaa_layout() makes the decision if compression is
to be used and sets the msaa_layout type. Then based on the type
one sets aux_usage and finally decides if mcs_buf is needed.

This patch duplicates the logic in compute_msaa_layout() and uses
that to make the decision on aux_usage and mcs_buf allocation.
Most of the original logic in compute_msaa_layout() will be gone
in later patch leaving only one version.

Elsewhere only brw_populate_sampler_prog_key_data() needs to know
if compression is used based on the msaa_type. This is now
replaced with consideration for number of samples and existence
of mcs_buf. All other occurrences consider CMS || UMS which can
be represented using single the type of ISL_MSAA_LAYOUT_ARRAY
without any tweaks.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:35:03 +03:00
Topi Pohjolainen
d86244fb16 i965: Make irb::mt_layer logical instead of physical
same as irb::layer_count. In case of copies and blits msaa
surfacas already fall to blorp which natively works with logical
slices.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:35:03 +03:00
Topi Pohjolainen
b9400b7ecd i965/tex: Use offset helper instead of accessing table directly
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:35:03 +03:00
Topi Pohjolainen
ee97b78a3e i965: Mark read-only args as const in intel_miptree_supports_hiz()
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:35:03 +03:00
Topi Pohjolainen
38f3d03ea9 i965/miptree: Use > 1 instead of > 0 to check for multisampling
Checking against zero currently works as single sampling is
represented with zero. Once one moves to isl single sampling
really has sample number of one.

This keeps later patches simpler.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:35:03 +03:00
Topi Pohjolainen
8fd18642e7 i965/miptree: Set refcount before failing via _release()
Otherwise one wraps uint to UINT_MAX via -1.

Fixes: 3cf470f2b6 ("i965: Add isl based miptree creator")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-18 21:35:03 +03:00
Kenneth Graunke
c2bb39d8d6 build: Add $(top_srcdir)/src/compiler/spirv to AM_CPPFLAGS
Generated C files try to include spirv_info.h.  For in-tree builds,
the header is in the same directory, so it just works.  For out-of-tree
builds, we need to look for it in srcdir rather than builddir.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101831
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 11:14:47 -07:00
Marek Olšák
ecec21add2 radeonsi: add back the USE_MININUM_PRIORITY flag to the low-prio compiler queue
Accidentally removed in 9f320e0a38.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-18 13:13:34 -04:00
Jason Ekstrand
2f52d2dc97 compiler/spirv: Add a .gitignore and ignore spirv_info.c 2017-07-18 09:49:13 -07:00
Jason Ekstrand
cd9fd68a50 anv: Advertise support for VK_KHR_variable_pointers
We don't support the general version yet because that requires us to
lower shared variables up-front in SPIR-V -> NIR.  This shouldn't be a
whole lot of work but it's not something we support today.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:13 -07:00
Jason Ekstrand
bc9319583a anv: Advertise support for VK_KHR_storage_buffer_storage_class
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:13 -07:00
Jason Ekstrand
f2fe74a462 nir/spirv: Add support for SPV_KHR_variable_pointers
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
182950ceaf nir/spirv: Add a helper for pushing SSA values
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
868456fbf7 nir/spirv: Implement OpPtrAccessChain for buffers
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
a968889237 spirv/nir: Add some useful asserts for type decorations
Now that vtn_type has piles of unions, we should assert sanity before
setting fields that may stomp others.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
999918bd01 spirv: Add support for the StorageBuffer storage class
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Ian Romanick
2dd4e2ece3 spirv: Generate spirv_info.c
The old table based spirv_*_to_string functions would return NULL for
any values "inside" the table that didn't have entries.  The tables also
needed to be updated by hand each time a new spirv.h was imported.
Generate the file instead.

v2: Make this script work more like src/mesa/main/format_fallback.py.
Suggested by Jason.  Remove SCons supports.  Suggested by Jason and
Emil.  Put all the build work in Makefile.nir.am in lieu of adding a new
Makefile.spirv.am.  Suggested by Emil.  Add support for Android builds
based on code provided by Emil.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-18 09:43:12 -07:00
Ian Romanick
de765ec9dc spirv: Import the lastest 1.0.2 JSON from Khronos
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-18 09:43:12 -07:00
Jason Ekstrand
7141e8105a spirv: Import the latest 1.2 header from Khronos
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-18 09:43:12 -07:00
Brian Paul
9d8ebf1c77 mesa: whitespace fixes in get.c
Remove trailing whitespace.
Replace tabs with spaces.
Trivial.
2017-07-18 08:32:29 -06:00
Brian Paul
3d49fcb3e5 mesa: fix GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION_EXT query
This query is not allowed in GL core profile 3.3 and later (since
GL_QUADS and GL_QUAD_STRIP are disallowed).  The query was (mistakenly)
supported in GL 3.2.  This fixes the glGet error test accordingly.

Reviewed-by: Neha Bhende<bhenden@vmware.com>
2017-07-18 08:32:29 -06:00
Eric Engestrom
a522ce9977 vulkan/util: fix typo in comment
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-18 13:56:04 +01:00
Samuel Pitoiset
838b9c21d4 mapi: add missing no_error tag to glBlitNamedFramebuffer()
Fixes: 6fedb31785 ("mesa: add KHR_no_error support for glBlitNamedFramebuffer()")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-07-18 10:07:34 +02:00
Alex Smith
f25c7f9f3e radv: Set the RADEON_SURF_OPTIMIZE_FOR_SPACE flag for images
This looks like a regression from df30123794 ("radv: use
ac_compute_surface"). Before that, the opt4Space addrlib flag was set
to true unless the image has FMASK (ac_compute_surface will similarly
only set that flag for images without FMASK).

This saves multiple gigabytes of VRAM on one of our games, and brings
its VRAM utilisation on RADV in line with AMDGPU-PRO and NVIDIA.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-18 16:18:35 +10:00
Dave Airlie
687d241559 radv: don't shadow meta_va.
Coverity warned about dead code below, as meta_va was being shadowed.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-18 16:17:28 +10:00
Kenneth Graunke
795848c232 i965: Delete brw_sf_state.c again
"...and stay dead!"

Rafael deleted this file in c2b5a26dc2
(i965: Convert SF_STATE to genxml.) but Marek accidentally brought it
back in commit e7a091936f (mesa: replace
ctx->Polygon._FrontBit with a helper function) when resolving conflicts.

It's not actually even compiled, but it's still here trolling people
into thinking it still exists and needs patching.
2017-07-17 22:46:19 -07:00
Connor Abbott
91dd2ca99f ac/nir: rewrite shared variable handling (v2)
Translate the NIR variables directly to LLVM instead of lowering to a
TGSI-style giant array of vec4's and then back to a variable. This
should fix indirect dereferences, make shared variables more tightly
packed, and make LLVM's alias analysis more precise. This should fix an
upcoming Feral title, which has a compute shader that was failing to
compile because the extra padding made us run out of LDS space.

v2: Combine the previous two patches into one, only use this for shared
variables for now until LLVM becomes smarter.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Alex Smith <asmith@feralinteractive.com>
2017-07-17 14:16:03 -07:00
Jason Ekstrand
7947d05f84 i965: Check if the modifier is supported in select_best_modifier
Otherwise, if a client gave us a list of modifiers that contained a
modifier we understand but which is not supported on the hardware, we
might return that one and then fail to create the image.

Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
ec4364d57e i965: Rework the modifier info map
This commit splits the mapping in half.  The modifier_infos table now
only contains the modifier and the since_gen field.  The tiling bits
have been moved into a table in tiling_to_modifier as that's the only
place it was ever used.  The modifier_is_supported function now takes a
devinfo and does the since_gen check.

Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
f44171ef62 i965/surface_state: Remove the mcs_buf->offset == 0 restriction
This assert was removed in b0cc55f298 but
got added back in 1a43d774b6, probably by
accident.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
828c437078 intel/isl: Add a row_pitch parameter to surf_get_ccs_surf
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
766784ef82 i965/miptree: Use BO_ALLOC_ZEROED for CCS_E buffers
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
cbee2d1102 i965/screen: Allocate ZEROED BOs for images
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
fb0caadc2a i965/bufmgr: Add a BO_ALLOC_ZEROED flag
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
14570ecf63 i965/miptree: Replace is_lossless_compressed with mt->aux_usage checks
Now that we have an actual aux_usage field, we no longer need the
complex logic of is_lossless_compressed in order to figure out if a
miptree is CCS_E compressed.  As a side-effect, there is not longer any
need to overload MSAA_LAYOUT_CMS for CCS_E and we can stop doing so.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
67143a5037 i965/miptree: Allocate HiZ up-front
HiZ, like MCS and CCS_E, can compress more than just clear colors so we
want it turned on whenever the miptree is being used as a depth
attachment.  It's theoretically possible for someone to create a depth
texture, upload data with glTexSubImage2D, and texture from it without
ever binding it as a depth target.  If this happens, we would end up
wasting a bit of space by allocating a HiZ surface we never use.
However, this is rather unlikely out side of test cases, so we're better
off just allocating it up-front.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
138316cc99 i965/miptree: Add an intel_tiling_supports_hiz helper
We need this split for the same reason that we need the split for CCS:
intel_miptree_supports_hiz is called *before* we choose the actual
tiling.  Adding a tiling_supports_hiz helper lets choose_aux_usage
more accurately decide whether or not to enable hiz.  In particular,
this prevents us from enabling HiZ on linear depth buffers.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-17 13:48:38 -07:00
Jason Ekstrand
e6b8877a54 i965/miptree: Gather initial aux allocation into a single function
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-17 13:48:38 -07:00
Charmaine Lee
d8f51bfcbf st/mesa: init winsys buffers list only if context creation succeeds
Fixes piglit test crash when context creation fails.

v2: As suggested by Brian, move the init to st_create_context_priv()

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-11 22:46:55 -07:00
Sinclair Yeh
ed45e8db3c winsys/svga/drm: Enable import/export fence FD
Enable the capability if the DRM supports it.

Hook up mechanism to send and receive fence FD from the DRM.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-17 10:09:25 -06:00
Sinclair Yeh
d554f72c41 winsys/svga/drm: Connect winsys-side fence_* functions
Connect fence_get_fd, fence_create_fd, and fence_server_sync.

Implement the required functions in vmw_fence module.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-17 10:09:25 -06:00
Sinclair Yeh
56a6e890f3 drivers/svga: Connect driver-side fence_* functions
Connect fence_get_fd, fence_create_fd, and fence_server_sync.
Return PIPE_CAP_NATIVE_FENCE_FD capability based on what the
winsys reports

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-17 10:09:25 -06:00
Sinclair Yeh
4da543e30a winsys/svga/drm: Create winsys interface for Fence FD
The new interfaces will be used to enable
EGL_ANDROID_native_fence_sync.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-17 10:09:25 -06:00
Sinclair Yeh
2431cccad1 winsys/svga/drm: Prepare to support fence fd
Make the fields and flags available.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-17 10:09:25 -06:00
Sinclair Yeh
65175df601 drivers/svga, winsys/svga/drm: Thread through timeout for fence_finish
The timeout parameter is required to implement
EGL_ANDROID_native_fence_sync.

v2
* Replaced default timeout from 0 to PIPE_TIMEOUT_INFINITE
* Add more documentation to the new timeout parameter

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-17 10:09:25 -06:00
Brian Paul
9ee86d6db7 svga: whitespace clean-up in svga_winsys.h
Trivial.
2017-07-17 10:09:25 -06:00
Brian Paul
6f4923bd38 svga: add some const qualifiers
Trivial.
2017-07-17 10:06:01 -06:00
Brian Paul
589f546256 svga: add comment about 'extra' constant locations
Trivial.
2017-07-17 10:06:00 -06:00
Jason Ekstrand
c5700ed72e anv/image: Add INPUT_ATTACHMENT to the list of required usages
From the Vulkan 1.0.53 spec VU for vkCreateImageView:

    "image must have been created with a usage value containing at least
    one of VK_IMAGE_USAGE_SAMPLED_BIT, VK_IMAGE_USAGE_STORAGE_BIT,
    VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT,
    VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, or
    VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT"

We were missing VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT from out list.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-07-17 08:18:46 -07:00
Jason Ekstrand
cbdfd1daa2 anv: Stop leaking the no_aux sampler surface state
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-07-17 08:18:46 -07:00
Jason Ekstrand
bd41564746 anv/cmd_buffer: Properly handle render passes with 0 attachments
We were early returning and never created the NULL surface state.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: James Legg <jlegg@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
2017-07-17 08:18:46 -07:00
Marek Olšák
c62809171c radeonsi/gfx9: add VM fault dmesg parser support
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:57:34 -04:00
Marek Olšák
9f320e0a38 radeonsi: automatically resize shader compiler thread queues when they are full
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:57:29 -04:00
Marek Olšák
4cae274116 radeonsi: prevent a deadlock in util_queue_add_job with too many GL contexts
If the queue is full, util_queue_add_job will wait while bo_fence_lock is
held.

It pb_slab wants to reuse a buffer, it will lock the pb_slab mutex and
try to check BO fence busyness, but it has to wait for bo_fence_lock to get
released. Both bo_fence_lock and pb_slab mutex are locked now.

When the CS thread unreferences and releases a suballocated buffer,
it will try to lock the pb_slab mutex and has to wait. The CS thread
can't finish its job in order to free a queue slot and unblock
util_queue_add_job ==> deadlock.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:57:25 -04:00
Marek Olšák
59ad769770 util/u_queue: add an option to resize the queue when it's full
Consider the following situation:
  mtx_lock(mutex);
  do_something();
  util_queue_add_job(...);
  mtx_unlock(mutex);

If the queue is full, util_queue_add_job will wait for a free slot.
If the job which is currently being executed tries to lock the mutex,
it will be stuck forever, because util_queue_add_job is stuck.

The deadlock can be trivially resolved by increasing the queue size
(reallocating the queue) in util_queue_add_job if the queue is full.
Then util_queue_add_job becomes wait-free.

radeonsi will use it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:57:20 -04:00
Marek Olšák
465bb47d6f radeonsi: expose ARB_timer_query unconditionally
clock_crystal_freq is always non-zero now.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:57:17 -04:00
Marek Olšák
3d1a576fa6 ac/gpu_info: if clock crystal frequency is 0, print an error and set 1
During bring-up, this is often 0. Prevent automatic disablement of
ARB_timer_query and demotion of the OpenGL version to 3.2 by setting
a non-zero frequency. Print an error message instead.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:56:59 -04:00
Marek Olšák
d0963ef084 radeonsi/gfx9: don't read back non-existent register SRBM_STATUS2
It looks like there is no way to monitor SDMA busyness on GFX9.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:56:56 -04:00
Marek Olšák
5fb80a1e84 radeonsi: prevent a crash with DBG_CHECK_VM and u_threaded_context
by setting PIPE_CONTEXT_DEBUG in the caller

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:56:51 -04:00
Marek Olšák
ddbd2f4c54 ac/surface/gfx9: flags.texture currently refers to TC-compatible HTILE
This should lead to better MSAA performance on GFX9.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:56:46 -04:00
Marek Olšák
ffa7ec9e22 radeonsi: simplify computation of tessellation offchip buffers
This is overly cautious, but better safe than sorry.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:55:07 -04:00
Marek Olšák
facfab28fe radeonsi/gfx9: add workarounds to avoid VGPR indexing completely
For inputs and outputs, indirect indexing is lowered by the GLSL compiler.
For temporaries, use alloca and disable the "promote-alloca" pass.

In the future, we could switch all codepaths to alloca permanently and
just rely on the "promote-alloca" pass.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
93391ac478 radeonsi: emit param exports after position exports
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
9d9ffc8475 radeonsi: move building parameter exports into a separate function
Both loops now look simple.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
4e30fb4ecc radeonsi: don't use info.num_inputs when it's unused
For clarity. It's only used by color interpolation.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
f8d6dd9b3d radeonsi: add si_build_fs_interp helper
This is much simpler.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
4560f2b90a radeonsi: merge si_llvm_get_amdgpu_target into ac_get_llvm_target
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
c351037d6c gallivm: inline gallivm_init_llvm_targets
there is only one user.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
ece0c0439f radeonsi: don't call gallivm_init_llvm_targets
It's for initializing the native (x86) target.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
d308460586 gallium/radeon: reallocate suballocated buffers when exported
This should fix exports of suballocated buffers.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Marek Olšák
5b555854cc gallium/radeon: flush the context after in-place texture realloc before export
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 10:50:39 -04:00
Mark Thompson
63dcfed81f st/va: Fix scaling list ordering for H.265
Mesa here requires the scaling lists in diagonal scan order, but
VAAPI passes them in raster scan order.  Therefore, rearrange the
elements when copying.

v2: Move scan tables to vl_zscan.c.
    Fix type in size assertion.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-07-17 15:24:56 +01:00
Emil Velikov
4168c162c5 radv: advertise v6 of the wayland surface extension
Jason updated the Khronos spec to explicitly state that Wayland surfaces
must support VK_PRESENT_MODE_MAILBOX_KHR.

ANV did so since day one (back in 2015)

Cc: mesa-stable@lists.freedesktop.org
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-17 15:24:48 +01:00
Emil Velikov
43c188f970 anv: advertise v6 of the wayland surface extension
Jason updated the Khronos spec to explicitly state that Wayland surfaces
must support VK_PRESENT_MODE_MAILBOX_KHR.

ANV did so since day one (back in 2015)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-17 15:24:32 +01:00
Emil Velikov
647b5a18df i965: use strtol to convert the integer deviceID override
One can override the deviceID, by setting the INTEL_DEVID_OVERRIDE
variable. A few symbolic names or a numerical value for the actual
device ID is accepted.

At the same time we're using strtod (string to double) to convert the
string to a decimal numeral. A seeming thinko, made by the original
commit that introduces the code in libdrm_intel and got here with the
import.

Fixes: 514db96c11 ("i965: Import libdrm_intel.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-17 15:23:49 +01:00
Marek Olšák
f9d5611617 gallium/u_blitter: don't use TXF for scaled blits
There seems to be a rounding difference with F2I vs nearest filtering.
The precise problem in the rounding is unknown.

This fixes an incorrect output with OpenMAX encoding.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 15:47:30 +02:00
Lionel Landwerlin
59adde0eab anv: ensure device name contains terminating character
v2: Use sizeof() (Chris)

CID: 1415113
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-07-17 14:36:38 +01:00
Lionel Landwerlin
f03f893cb8 i965: miptree: silence coverity warning
This probably can't happen, but we're better off with initialized
variables.

CID: 1415114
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-07-17 14:36:38 +01:00
Marek Olšák
0d190913bf mesa: flag _NEW_TEXTURE_OBJECT for GL_TEXTURE_LOD_BIAS_EXT
Only the compatibility profile can set it.
It was done incorrectly when we split _NEW_TEXTURE.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-17 15:27:14 +02:00
Kenneth Graunke
32c79cdacc meta: Actually initialize ImmutableLevels to 1.
Otherwise, ImmutableLevels is 0, which is an illegal value.  Later,
_mesa_meta_setup_sampler will use _mesa_texture_parameteriv to set

   texObj->MaxLevel = CLAMP(params[0], texObj->BaseLevel,
                            texObj->ImmutableLevels - 1);

which turns into a completely bogus CLAMP(value, 0, -1)...where the
upper bound is smaller than the lower bound.  This ends up being -1
today due to the way CLAMP is implemented, which is a bogus MaxLevel.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-07-17 01:37:51 -07:00
Kenneth Graunke
6374288b62 dri: Make classic drivers allow __DRI_CTX_FLAG_NO_ERROR.
Grigori recently added EGL_KHR_create_context_no_error support,
which causes EGL to pass a new __DRI_CTX_FLAG_NO_ERROR flag to
drivers when requesting an appropriate context mode.

driContextSetFlags() will already handle it properly for us, but the
classic drivers all have code to explicitly balk at unknown flags.  We
need to let it through or they'll fail to create a no_error context.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
2017-07-17 01:37:51 -07:00
Samuel Pitoiset
c745beaf10 ddebug: fix parsing of the pipelined mode
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-17 10:28:45 +02:00
Dave Airlie
9ee67467c9 radv: predicate cmask eliminate when using DCC.
When using DCC some clear values don't require a cmask eliminate
step. This patch adds support for black and black with alpha 1,
there are other values, but I don't have access to a comprehensive list.

This works by setting the cmask eliminate predicate when doing the
fast clear, and later when doing the cmask elimination making sure
the draws are predicated.

This increases the fps on Sascha Willems deferred.

Tonga: 580fps->670fps on a Tonga PRO card.
Polaris 730->850fps

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-17 01:44:43 +01:00
Dave Airlie
8eed291c2c radv/clear: add r32g32b32a32 fast clear support (v2)
We can only fast clear 128-bit images if the r/g/b channels
are the same, and we are using DCC.

For DCC we'll bail out on translate if this isn't true,
and we catch cmask clears explicitly.

v2: remove 64-bit block (Bas), add uint32 as well.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-17 01:44:25 +01:00
Dave Airlie
acf1e132af amd/addrlib: fix typo in api name.
This fixes the misspelling of ALIGNMENTS in addrlib.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-17 01:44:14 +01:00
Dave Airlie
f8d5b377c8 radv: set cb base tile swizzles for MRT speedups (v4)
This patch uses addrlib to workout the tile swizzles according
to the surface index. It seems to produce the same values as
amdgpu-pro for the deferred test.

v2: don't apply swizzle to CMASK. the eg docs don't mention
it, and we clearly don't align cmask for that.
v3: disable surf index for dedicated images, as these will
most likely be shared, and I don't think the metadata has
space for this info in it yet.
v4: update for shareable images, rename combined_swizzle
to tile_swizzle

This gets the deferred demo from 730->950fps on my rx480.
(dcc cmask elim predication patches get it further)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-17 01:43:41 +01:00
Dave Airlie
b86f86f55c radv: allow clear merging for depth/stencil with no care stencil
Some of the Sascha Willems demos pick a D32/S8 format for the depth
buffer, then do a LOAD_OP_CLEAR/LOAD_OP_DONT_CARE on it, which means
we don't get to merge the undefined->depth and clear htile transitions.

This add the stencil aspect to the pending clears if there is a depth
clear pending and the stencil aspect is don't care.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-17 01:16:59 +01:00
Bas Nieuwenhuizen
373f707fbb radv: Remove NV dedicated alloc extension.
To not confuse apps in thinking it might be faster.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
2017-07-15 20:10:43 +02:00
Bas Nieuwenhuizen
515da29360 radv: Use the KHR dedicated alloc for the WSI.
NV isn't valid for external images anymore.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 6ddc64b93e "radv: Add support for VK_KHR_dedicated_allocation."
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
2017-07-15 20:10:25 +02:00
Jason Ekstrand
b70829708a radv: Implement VK_KHR_external_memory
This effectively reverts commit 43a171878bb4b5aedb36a.  Technically,
VK_KHR_get_memory_requirements2 and VK_KHR_dedicated_allocation are
required for the KHR version but this at least restores the removed
functionality.  This patch builds but has received zero testing.

Acked-by: Dave Airlie <airlied@redhat.com>
2017-07-15 08:59:38 -07:00
Bas Nieuwenhuizen
6ddc64b93e radv: Add support for VK_KHR_dedicated_allocation.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-07-15 08:59:38 -07:00
Bas Nieuwenhuizen
97931f0297 radv: Add support for VK_KHR_get_memory_requirements2.
Fished the SparseImage call out of the headers as the spec missed
the definition.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-07-15 08:59:38 -07:00
Jason Ekstrand
0ee8d81718 anv: Implement VK_KHR_external_memory_*
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-15 08:59:38 -07:00
Jason Ekstrand
c02da9cad6 anv: Implement VK_KHR_dedicated_allocation
We always recommend sub-allocation and don't do anything special for
dedicated allocations.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-15 08:59:38 -07:00
Jason Ekstrand
8c82aa5f43 anv: Implement VK_KHR_get_memory_requirements2
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-15 08:59:38 -07:00
Jason Ekstrand
5b57bdc1cf anv: Advertise version 1.0.54
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-15 08:59:38 -07:00
Jason Ekstrand
227debdc92 vulkan: Update to the new 1.0.54 spec XML and headers
There is one small ANV change here because we used the
VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX enum in the BO cache and that had
to be updated to have the _KHR suffix.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-15 08:59:38 -07:00
Jason Ekstrand
3b95e03b2c radv: Drop support for VK_KHX_external_semaphore_*
These have been formally deprecated by Khronos never to be shipped
again.  The KHR versions should be implemented/used instead.

Acked-by: Dave Airlie <airlied@redhat.com>
2017-07-15 08:58:55 -07:00
Jason Ekstrand
dc179aa123 anv: Drop support for VK_KHX_external_semaphore_*
These have been formally deprecated by Khronos never to be shipped
again.  The KHR versions should be implemented/used instead.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-15 08:58:51 -07:00
Jason Ekstrand
4ac94d0dee anv: Drop support for VK_KHX_external_memory_*
These have been formally deprecated by Khronos never to be shipped
again.  The KHR versions should be implemented/used instead.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-14 22:12:39 -07:00
Matt Turner
5ffe0c9e1b i965: Compile with -msse2 (instead of -msse2)
Ian noted that were were two Pentium 4 Extreme Edition LGA 775 CPUs, and
they only have SSE2.
2017-07-14 22:01:11 -07:00
Matt Turner
6b05c080f2 i965: Compile with -msse3
All CPUs that can be paired with a GPU supported by i965_dri.so supports
SSE3. This allows us to ensure that some vectorized version of the tiled
memcpy path is enabled on 32-bit systems.

This also ensures that __builtin_ia32_clflush is always usable.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101774
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-14 16:54:43 -07:00
Kenneth Graunke
c7af6d2690 egl: Fix predecence problem when setting __DRI_CTX_FLAG_NO_ERROR
This accidentally set __DRI_CTX_FLAG_NO_ERROR whenever any flags were
present.  Just needs extra parenthesis.

Fixes: 4909519a66 (egl: Add EGL_KHR_create_context_no_error support)

Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2017-07-14 15:25:44 -07:00
Petr Sebor
b317cc1b3c drirc: whitelist glthread for Euro Truck Simulator 2 2017-07-14 23:56:37 +02:00
Petr Sebor
483aa3997d drirc: whitelist glthread for American Truck Simulator 2017-07-14 23:56:37 +02:00
Grigori Goronzy
d063168514 mesa/marshal: fix Windows build
This was broken by commit 1ad24faa.

Reported by AppVeyor:
https://ci.appveyor.com/project/mesa3d/mesa/build/4918
2017-07-14 23:46:21 +02:00
Edmondo Tommasina
952f21bc1e drirc: whitelist glthread for The Witcher 2
Performance delta on AMD Phenom II X3 720 / RX 470

    The Witcher 2: +18%

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 23:41:59 +02:00
Edmondo Tommasina
9a9b9882a5 drirc: whitelist glthread for Civilization 5
Performance delta on AMD Phenom II X3 720

    Civilization 5: +28%

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 23:41:59 +02:00
Tim Rowley
818209118c swr: JitManager runtime determination of architecture
Fixes performance regression from f50aa21456 - was forcing internal
code generation to target AVX (no gather, etc).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-14 15:09:22 -05:00
Andres Gomez
25d43cd656 docs: update calendar, add news item and link release notes for 17.1.5
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-07-14 22:30:08 +03:00
Andres Gomez
7ad2f7078d docs: add sha256 checksums for 17.1.5
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-07-14 22:30:08 +03:00
Andres Gomez
ea417c4c64 docs: add release notes for 17.1.5
Signed-off-by: Andres Gomez <agomez@igalia.com>
2017-07-14 22:30:08 +03:00
Grigori Goronzy
8d980bf920 st/mesa: Add KHR_no_error toggle to driconf
Allows applications to be whitelisted.

v2: Remove misguided DRI common part.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 21:23:44 +02:00
Grigori Goronzy
4909519a66 egl: Add EGL_KHR_create_context_no_error support
This only adds the EGL side, needs to be plumbed into Mesa frontend.

v2: Add check for extension availability.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 21:23:44 +02:00
Grigori Goronzy
2bbe235053 st/mesa: Add support for KHR_no_error flag
Add a new context flag and plumb it through the various layers of the
context creation code to set up dispatch tables for the no-error mode.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 21:23:40 +02:00
Grigori Goronzy
7299e82fa4 dri: Add KHR_no_error DRI extension
This basic extension allows usage of the __DRI_CTX_FLAG_NO_ERROR flag.
This includes support code for classic Mesa drivers to switch on the
no-error mode if the flag is set.

v2: Move to common DRI code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 21:20:31 +02:00
Grigori Goronzy
cfbf60b0c2 mesa/marshal: fix glNamedBufferData with NULL data
The semantics are similar to glBufferData.

Tested-by: Marc Dietrich <marvin24@gmx.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 21:20:31 +02:00
Grigori Goronzy
1ad24faa11 mesa/marshal: add marshalling for glClearBuffer*
Add async marshalling/unmarshalling for all glClearBuffer variants.
These entry points are commonly used in general and Alien Isolation
specifically uses glClearBufferiv. Slightly reduces the number of
thread synchronizations with glthread in that game.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 21:20:31 +02:00
Grigori Goronzy
8036198c0f mesa/marshal: extract ClearBuffer helpers
Extract clear buffer helper functions in preparation for adding
marshal/unmarshal functions for the various glClearBuffer variants.

v2: Fix command size.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 21:20:08 +02:00
Christoph Haag
98514e9959 gallium/hud: use double values for all graphs
The fps graph for example calculates the fps as double with small
variations based on when query_new_value() is called, which causes
many values to be truncated on the cast to uint64_t.

The HUD internally stores the values as double, so just use double
everywhere instead of fixing this with rounding. Using doubles also
allows the hud to show small variations instead of being clamped to
discrete values.

v2: Don't print decimals in the dump file when not necessary
Signed-off-by: Christoph Haag <haagch+mesadev@frickel.club>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-07-14 17:34:39 +02:00
Lucas Stach
7e426ef6ec Revert "etnaviv: add support for snorm textures"
This reverts commit d8b2ccdb88, which causes priglit regressions on GPUs
with SNORM support. We'll have another try at enabling this feature after
the 17.2 branchpoint.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-07-14 17:21:50 +02:00
Wladimir J. van der Laan
1d05cec205 etnaviv: reset indexed rendering information when not rendering indexed
A dangling bo object would result in memory corruption while loading a
level in ioquake3_opengl2.

Fixes: 330d0607ed (gallium: remove pipe_index_buffer and set_index_buffer)
Suggested-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-07-14 17:19:42 +02:00
Wladimir J. van der Laan
bb2498a7f6 etnaviv: Use the correct LOG instruction on GC3000
GC3000 has a new LOG instruction, similar to the new SIN and COS instructions.

Generate the new instruction sequence when appropriate; there are
two occasions, as part of LIT and the generator for the LG2
instruction itself.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-07-14 17:15:41 +02:00
Lucas Stach
bccd21ee88 etnaviv: flush source TS before resolve
If we blit from a rendertarget or a depthstencil buffer there might still
be dirty data in the TS buffer which needs to be flushed out.

Fixes missing shadow tiles in glmark2 shadow.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
2017-07-14 17:13:12 +02:00
Philipp Zabel
e9b3381715 etnaviv: flush color cache and depth cache together before resolves
Before resolving a rendertarget or a depth/stencil resource into a
texture, flush both the color cache and the depth cache together.

It is unclear whether this is necessary for the following stall to
work properly, or whether the depth flush just adds enough time
for the color cache flush to finish before the resolver is started,
but this change removes artifacts that otherwise appear if a texture
is sampled directly after rendering into it.

The test case is a simple QML scene graph with a QtWebEngine based
WebView rendered on top of a blue background:

	import QtQuick 2.0
	import QtQuick.Window 2.2
	import QtWebView 1.1

	Window {
		Rectangle {
			id: background
			anchors.fill: parent
			color: "blue"
		}

		WebView {
			id: webView
			anchors.fill: parent
		}

		Component.onCompleted: {
			webView.url = "<some animated website>"
		}
	}

If the website is animated, the WebView renders the site contents into
texture tiles and immediately afterwards samples from them to draw the
tiles into the Qt renderbuffer. Without this patch, a small irregular
triangle in the lower right of each browser tile appears solid blue, as
if the texture sampler samples zeroes instead of the website contents,
and the previously rendered blue Rectangle shows through.

Other attempts such as adding a pipeline stall before the color flush or
a TS cache flush afterwards or flushing multiple times, with stalls
before and after each flush, have shown no effect.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2017-07-14 17:12:36 +02:00
Lucas Stach
a98c1fbd9b st/mesa: handle stfbi being NULL on entry of st_framebuffer_reuse_or_create
Apparently this can happen. Just bail out early in that case, as all the called
functions return NULL in that case.

Fixes weston-terminal for me.

Fixes: 147d7fb772 ("st/mesa: add a winsys buffers list in st_context")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-14 17:12:00 +02:00
Daniel Stone
5295df63ad egl/wayland: Use MIN2 for wl_drm version
Use a slightly more explicit version cap for binding wl_drm, so we can
add other interfaces with different versioning schemes later.

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-14 14:14:05 +01:00
Daniel Stone
4b8ef27e84 egl/wayland: Fix whitespace damage
Convert tabs to spaces, fix misalignments.

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-14 14:14:05 +01:00
Daniel Stone
2b895475f6 util: Remove u_math from u_vector
u_vector.h doesn't actually use anything from u_math, but it does mean
everyone has to pull in src/gallium/auxiliary/util includes.

Just remove it, adding a <string.h> include to u_vector.c to cover
memcpy.

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-14 14:14:05 +01:00
Eric Engestrom
8821ef4be1 configure: only install khrplatform.h if needed
khrplatform.h is only used by EGL and GLES; let's only install it when
one of those is enabled.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jussi Kukkonen <jussi.kukkonen@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-14 13:23:54 +01:00
Eric Engestrom
b50b4b6f84 scons: split out check_header() helper
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-14 13:23:54 +01:00
Juan A. Suarez Romero
5cd4ece34e anv/pipeline: do not use BITFIELD64_BIT()
In the previous commit, forgot to apply v2 suggestions.

Fixes: 28d0c38 (anv/pipeline: use unsigned long long constant to check
enable vertex inputs)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-07-14 10:33:19 +00:00
Juan A. Suarez Romero
28d0c38d85 anv/pipeline: use unsigned long long constant to check enable vertex inputs
When initializing the ANV pipeline, one of the tasks is checking which
vertex inputs are enabled. This is done by checking if the enabled bits
in inputs_read.

But the mask to use is computed doing `(1 << (VERT_ATTRIB_GENERIC0 +
desc->location))`. The problem here is that if location is 15 or
greater, the sum is 32 or greater. But C is handling 1 as a 32-bit
integer, which means the displaced bit is out of range and thus the full
value is 0.

Thus, use 1ull, which is an unsigned long long value.

This fixes:
dEQP-VK.pipeline.vertex_input.max_attributes.16_attributes.binding_one_to_one.interleaved

v2: use 1ull instead of BITFIELD64_BIT() (Matt Turner)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
2017-07-14 08:09:18 +00:00
Kenneth Graunke
b2da123801 i965: Use pushed UBO data in the scalar backend.
This actually takes advantage of the newly pushed UBO data, avoiding
pull loads.

Improves performance in GLBenchmark Manhattan 3.1 by:

   HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%.
   (thanks to Eero Tamminen for these numbers)

shader-db results on Skylake, ignoring programs with spill/fill changes:

   total instructions in shared programs: 13963994 -> 13651893 (-2.24%)
   instructions in affected programs: 4250328 -> 3938227 (-7.34%)
   helped: 28527
   HURT: 0

   total cycles in shared programs: 179808608 -> 172535170 (-4.05%)
   cycles in affected programs: 79720410 -> 72446972 (-9.12%)
   helped: 26951
   HURT: 1248

   LOST:   46
   GAINED: 21

Many "Deus Ex: Mankind Divided" shaders which already spilled end up
spill a lot more (about 240 programs hurt, 9 helped).  The cycle
estimator suggests this is still overall a win (-0.23% in cycle counts)
presumably because we trade pull loads for fills.

v2: Drop "PULL" environment variable left in for initial debugging
    (caught by Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 20:18:54 -07:00
Kenneth Graunke
c9ef27e77b i965: Factor out push locations.
With UBOs, the answer of "have we decided to push this uniform" gets
a bit more complicated - for one, we have multiple surfaces.  This
patch refactors things so we can add the new code in a single place.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 20:18:54 -07:00
Kenneth Graunke
4f586cd8f1 i965: Push UBO data, but don't use it just yet.
This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets,
and updates the compiler to know that there's extra payload data, so
things continue working.  However, it still issues pull loads for all
data.  I wanted to separate the two aspects for greater bisectability.

v2: Update for new intel_bufferobj_buffer parameter.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 20:18:30 -07:00
Kenneth Graunke
6834b1ebe3 i965: Pad buffer objects by 2kB in robust contexts to avoid OOB access.
This is an annoyingly big hammer, but it seems less mean than disabling
UBO pushing, and I'm not sure what else to do.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 19:56:49 -07:00
Kenneth Graunke
67fec96452 i965: Stop re-uploading push constants after URB reconfiguration.
Previously we would re-upload the constant data to the batchbuffer,
then re-emit the packets.  We only need to do the last step (causing
the existing data in the batchbuffer to be re-uploaded to the push
constant staging area in the L3).

Now that we've separated the two, it's pretty easy to accomplish.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 19:56:49 -07:00
Kenneth Graunke
29603ae208 i965: Separate uploading push constant data from the pointer packets.
I hope to upload UBO via 3DSTATE_CONSTANT_XS packets, in addition to
normal uniforms.  In order to do that, I'll need to re-emit the packets
when UBOs change.  But I don't want to re-copy the regular uniform data
to the batchbuffer every time.

This patch separates out the data uploading from the packet submission.
We're running low on dirty bits, so I made the new atom happen on every
draw call, and added a flag to stage_state indicating that we want the
packet for that stage emitted.

I would have preferred to do this outside the atom system, but it has
to happen between the uploading of push constant data and the binding
table upload.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 19:56:49 -07:00
Kenneth Graunke
a18ab92d3c i965: Introduce a BRW_NEW_DRAW_CALL dirty bit.
This allows us to have atoms which are signalled on every draw call.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 19:56:49 -07:00
Kenneth Graunke
24891d7c05 i965: Store per-stage push constant BO pointers.
Right now, we always upload new push constant data, and immediately
emit 3DSTATE_CONSTANT_* packets.  We call intel_upload_space and store
the resulting BO pointer in brw->curbe.curbe_bo.  We read that when
emitting the packets.  This works today, but is fragile - it depends on
upload and packet emission being interleaved.

If we instead were to upload all the data, then emit all the packets,
then upload BO wrapping will get us into trouble.  For example, the VS
constants may land in one upload BO, but the FS constants may not fit
and land in a second upload BO.  Uploading FS constants would overwrite
the brw->curbe.curbe_bo pointer, so when we emitted 3DSTATE_CONSTANT_VS,
we'd get the wrong BO.

I intend to separate out this code in a future commit, so I need to fix
this.  To fix it, we simply store a per-stage BO pointer.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 19:56:49 -07:00
Kenneth Graunke
6d28c6e52c i965: Select ranges of UBO data to be uploaded as push constants.
This adds a NIR pass that decides which portions of UBOS we should
upload as push constants, rather than pull constants.

v2: Switch to uint16_t for the UBO block number, because we may
    have a lot of them in Vulkan (suggested by Jason).  Add more
    comments about bitfield trickery (requested by Matt).

v3: Skip vec4 stages for now...I haven't finished wiring up support
    in the vec4 backend, and so pushing the data but not using it
    will just be wasteful.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 19:56:49 -07:00
Kenneth Graunke
2a5e4f15ef i965: Require a UBO offset alignment of 32 bytes.
Soon, we're going to start providing UBO data to shaders as push
constants, rather than requiring them to issue pull loads.  The
3DSTATE_CONSTANT_* commands require 32 byte aligned pointers.

So, we need to increase this from 16 to 32.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 19:56:49 -07:00
Kenneth Graunke
8ec5a4e4a4 i965: Switch to absolute addressing for constant buffer 0.
By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic
state base address.  This makes it unusable for pushing UBOs.  I'd like
to be able to use all four push buffers.

There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake)
which controls whether buffer 0 is relative to dynamic state base
address, or simply a normal pointer.  Setting that gives us full
flexibility.

We can't currently write this on Haswell and earlier, and will need
to update the kernel command parser, and then do the whole version
checking song and dance.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-07-13 19:56:49 -07:00
Kenneth Graunke
86bd3fd864 i965: Use async maps for BufferSubData to regions with no valid data.
When writing a region of a buffer via glBufferSubData(), we can write
the data asynchronously if the destination doesn't contain any data.
Even if it's busy, the data was undefined, so the new data is fine too.

Removes all stall avoidance blits on BufferSubData calls in
"Total War: WARHAMMER" on my Skylake GT4.

Decreases the number of stall avoidance blits in Manhattan 3.1:
- Skylake GT4: -18.3544% +/- 6.76483% (n=13)
- Apollolake:  -12.1095% +/- 5.24458% (n=13)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-13 16:58:17 -07:00
Kenneth Graunke
5f223648f2 i965: Track a range of the buffer which contains valid data.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-13 16:58:17 -07:00
Kenneth Graunke
f47612dafb i965: Add a "write" parameter to intel_bufferobj_buffer.
This doesn't do anything yet, but soon we'll want to know whether an
access to a buffer section may write that data, or simply reads it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-13 16:58:17 -07:00
Rafael Antognolli
9a9c7e452b i965: Convert GS_STATE to genxml.
Merge the code with gen6+ 3DSTATE_GS, and delete brw_gs_state.c,
together with brw_gs_unit_state.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-13 15:39:49 -07:00
Rafael Antognolli
2936388205 i965: Prepare gs_state emitting code to include gen4-5.
Since we always call brw_batch_emit anyways, we can hopefully make things
simpler by calling it only once, and then branching inside its body. This
can be helpful when bringing the gen4-5 code into this function.

Additionally, check for GEN_GEN == 6 instead of < 7 in cases that won't apply
to lower gens.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-13 15:39:49 -07:00
Rafael Antognolli
9a2cca929f i965: Remove upload_gs_state_for_tf.
This function only emits a particular case of 3DSTATE_GS. Instead, we can do
that inside genX(upload_gs_state), and later reuse part of that code for
emitting gen4-5 state.

There's the additional benefit of allowing us to remove gen6_gs_state.c, which
was only left because of this function.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-13 15:39:49 -07:00
Rafael Antognolli
ad7663b838 i965: Convert BLEND_CONSTANT_COLOR state to genxml.
It's a very simple conversion, and it allows us to delete brw_cc.c.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-13 15:39:49 -07:00
Rafael Antognolli
5d48710981 i965: Convert CC state on gen4-5 to genxml.
Use set_blend_entry_bits and set_depth_stencil_bits to fill most of the
color calc struct, and then manually update the rest.

v2:
   - Always check for depth_irb (Ken)
   - Always set Backface Stencil Ref (Ken)
   - Always set alpha reference value (Ken)

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-13 15:39:49 -07:00
Rafael Antognolli
b0052aa46f i965: Move color calc code around a bit.
This makes the code more consistent accross generations.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-13 15:39:49 -07:00
Rafael Antognolli
5246d3eb83 i965: Check for alpha channel just like in gen6+.
gen6+ uses _mesa_base_format_has_channel() to check for the alpha
channel, while gen4-5 use ctx->DrawBuffer->Visual.alphaBits. By using
_mesa_base_format_has_channel() here we keep the same behavior accross
all gen.

While initially both ways of checking the alpha channel seemed correct
to me, this change also seems to fix fbo-blending-formats piglit test on
gen4.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-13 15:39:49 -07:00
Rafael Antognolli
1d2d3dbc8a i965: Make a helper function for blend entry related state.
Add a helper function to reuse code that fills blend entry related
state, and make genX(upload_blend_state) use it. This function can later
be used by gen4-5 color calc state to set the blend related bits.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-13 15:39:49 -07:00
Kenneth Graunke
e84cb56f48 i965: Make a helper function for depth/stencil related state.
Gen4-5 basically glue DEPTH_STENCIL_STATE, COLOR_CALC_STATE, and
BLEND_STATE together into a single COLOR_CALC_STATE structure.

By making a helper function, we'll be able to reuse it when filling
out Gen4-5 COLOR_CALC_STATE without replicating any actual logic.

We use generation-defined typedef to handle the polymorphism.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-07-13 15:39:49 -07:00
Lionel Landwerlin
6131a1ae40 aubinator: don't leak fd of opened aubfile
CID: 1373563
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:50 +01:00
Lionel Landwerlin
d1bd731e30 anv: don't use strcpy for copying strings
CID: 1358935
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:47 +01:00
Lionel Landwerlin
226fae7849 intel/compiler: no need to check unsigned is >= 0
CID: 1338342
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:45 +01:00
Lionel Landwerlin
7c4daf8c37 i965: fix missing NULL return if allocation fails
CID: 1250585
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:41 +01:00
Lionel Landwerlin
95c917668c intel/compiler: don't check unsigned is >= 0
CID: 1224468
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:38 +01:00
Lionel Landwerlin
fd8e8fdbfe i965: check pointer before dereferencing it
Check that irb isn't NULL before accessing irb->Base.Base.NumSamples.

CID: 1026046
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:35 +01:00
Lionel Landwerlin
b02d136b5e i965: map_gtt: check mapping address before adding offset
The NULL check might fail if offset isn't 0.

CID: 971379
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:32 +01:00
Lionel Landwerlin
a25a533458 intel/compiler: remove check unsigned is >= 0
By definition unsigned are always >= 0.

CID: 742212
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:29 +01:00
Lionel Landwerlin
19869d6091 isl: use 64bit arithmetic to compute size
If we allow the size to be more than 2^32, then we should compute it
in 64bit arithmetic otherwise we might run into overflow issues.

CID: 1412892, 1412891
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-07-13 22:50:26 +01:00
Connor Abbott
4df93a54f1 nir/lower_io_to_temporaries: don't set compact on shadow vars
The compact flag doesn't make sense on local variables, since the
packing on them is up to the driver. This fixes nir_validate assertions
in some cases, particularly when lower_io_to_temporaries is used on
per-vertex inputs/outputs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-07-13 14:45:25 -07:00
Connor Abbott
99ff7a9f1f nir: don't segfault when printing variables with no name
While normally we give variables whose name field is NULL a temporary
name when called from nir_print_shader(), when we were calling from
nir_print_instr() we never bothered, meaning that we just segfaulted
when trying to print out instructions with such a variable. Since
nir_print_instr() is meant to be called while debugging, we don't need
to bother too much about giving a consistent name, but we don't want to
crash in the middle of debugging.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-07-13 14:40:23 -07:00
Jason Ekstrand
add72599d9 i965/urb: Trigger upload_urb on NEW_BLORP
It's a bit rare, but blorp can trigger a urb reconfiguration.  When
that happens, we need to re-upload the URB config.  Previoulsy blorp
would set BRW_NEW_URB_SIZE, but this is a pretty big hammer as it
would cause back-to-black blorp operations to reconfigure both times.
Using BRW_NEW_BLORP is a small, more accurate hammer.

v2 (idr): Sort BRW_NEW_ tokens to match brw_recalculate_urb_fence and
gen6_urb.

v3 (idr): Don't whack BRW_NEW_URB_SIZE in blorp.  Suggested by Jason.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-13 13:41:09 -07:00
Kenneth Graunke
42c64b5f87 mesa: Return GL_INVALID_ENUM for bogus TEXTURE_SRGB_DECODE_EXT params.
Fixes dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.srgb_decode_samplerparameter{f,fv,i,Iiv,Iuiv,iv}.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-07-13 13:00:58 -07:00
Marek Olšák
f33d8af7aa st/dri: add 32-bit RGBX/RGBA formats
Add support for 32-bit RGBX/RGBA formats which are required for Android.

The original patch (commit ccdcf91104) was reverted (commit
c0c6ca40a2) in mesa as it broke GLX resulting in swapped colors. Based
on further investigation by Chad Versace, moving the RGBX/RGBA configs
to the end is enough to prevent breaking GLX.

The handling of RGBA/RGBX in dri_fill_st_visual is a fix from Marek
Olšák.

Cc: Eric Anholt <eric@anholt.net>
Cc: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-07-13 14:36:47 -05:00
Eric Anholt
5a9fb2eabc broadcom/vc4: Add more packets to the v2.1 XML.
These will be used to replace vc4_cl_dump.c's hand-written dumping.
2017-07-13 11:30:42 -07:00
Eric Anholt
427bbbb99c broadcom: Introduce a header for talking about chip revisions.
This will be used by the VC5 driver and various shared VC4/VC5 tooling,
like the XML decoder.
2017-07-13 11:28:28 -07:00
Eric Anholt
fd37ce6bec broadcom/genxml: Use the same "gen" attr for HW version as Intel does.
This will let us reuse their tools more easily.
2017-07-13 11:28:28 -07:00
Eric Anholt
ee170c9d83 broadcom/genxml: Support unpacking fixed-point fractional values.
This was an oversight in the original XML support, because unpacking
wasn't used much.  The new XML-based CL dumper will want it, though.
2017-07-13 11:28:28 -07:00
Michel Dänzer
655a32f729 st/mesa: Handle st_framebuffer_create returning NULL
st_framebuffer_create returns NULL if stfbi == NULL or
st_framebuffer_add_renderbuffer returns false for the colour buffer.

Fixes Xorg crashing on startup using glamor on radeonsi.

Fixes: 147d7fb772 ("st/mesa: add a winsys buffers list in st_context")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101775
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-13 09:26:20 -06:00
Tim Rowley
254fa3dbf5 swr/rast: Fix use of KNL-only intrinsics in SKX build
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-13 08:47:10 -05:00
Tim Rowley
4c185dd3b3 swr/rast: Fix build warnings when using the Intel compiler
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-13 08:47:10 -05:00
Tim Rowley
bbc3b5c0dc swr/rast: SIMD16 Frontend - Fix USE_SIMD16_FRONTEND build
Previous check-ins without testing with USE_SIMD16_FRONTEND have
introduced regressions. This fixes the build, not the regressions.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-13 08:47:10 -05:00
Tim Rowley
640ea4d9a1 swr/rast: Removing unneeded MSVC warning pragma
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-13 08:47:10 -05:00
Tim Rowley
185b37f641 swr/rast: Add support for read-only render targets
Core will ensure hot tiles are loaded for read and write render targets,
and will skip all output merger for read-only render targets.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-13 08:47:10 -05:00
Tim Rowley
d8ebcad540 swr/rast: Support render target mask instead of render target count
WIP to support read-only render targets.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-13 08:47:10 -05:00
Alejandro Piñeiro
57671025b0 egl: remove unused err variable
Fixes: 81e95924ea ("egl: call _eglError within _eglParseImageAttribList")

Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-13 13:36:37 +02:00
Nicolai Hähnle
c22e3c5373 radeonsi/gfx9: fix crash building monolithic merged ES-GS shader
Forwarding from the ES prolog to the ES just barely exceeds the current
maximum array size when 16 vertex attributes are used. Give it a decent
bump to account for merged shaders having up to 32 user SGPRs.

Fixes a crash in GL45-CTS.multi_bind.draw_bind_vertex_buffers.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-13 13:01:15 +02:00
Thomas Hellstrom
81fb154777 loader/dri3: Use dri3_find_back in loader_dri3_swap_buffers_msc
If the application hasn't done any drawing since the last call, we
would reuse the same back buffer which was used for the previous swap,
which may not have completed yet. This could result in various issues
such as tearing or application hangs.

In the normal case, the behaviour is unchanged.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97957
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101683
Cc: mesa-stable@lists.freedesktop.org

[Michel Dänzer: Make Thomas' fix from bugzilla actually work as
 intended, write commit log]
2017-07-13 16:49:28 +09:00
Jason Ekstrand
c3b5c2ca19 i965/screen: Drop get_tiled_height
It's no longer used.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
b29c364f0d i965/screen: Use ISL for doing image import checks
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
76f683e91d i965/screen: Use ISL for allocating image BOs
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
5b3363e3f1 intel/isl: Add a helper to convert tilings from ISL to i915
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
a668ba9c18 intel/isl: Add basic modifier introspection
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
285242e674 i965: Add an isl_device to intel_screen
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
06c95f1282 i965/miptree: Move CCS allocation into create_for_dri_image
Any form of CCS on gen9+ only works on Y-tiled images.  The only caller
of create_for_bo which uses Y-tiled BOs is create_for_dri_image.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
b3a44ae7a4 i965: Use create_for_dri_image in intel_update_image_buffer
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
90d93755d1 i965/miptree: Add support for window system images to create_for_dri_image
We want to start using create_for_dri_image for all miptrees created
from __DRIimage, including those which come from a window system.  In
order to allow for fast clears to still work on window system buffers,
we need to allow for creating aux surfaces.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
2dd4e2348f i965/miptree: Add a colorspace parameter to create_for_dri_image
The __DRI_FORMAT enums are all UNORM but we will frequently want sRGB
when creating miptrees for renderbuffers.  This lets us specify.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
14ce44a7bc main/formats: Add a get_linear_format_srgb helper
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
361eb1c6e7 main/formats: Autogenerate _mesa_get_srgb_format_linear
Due to the wonders of autogeneration, this new version covers a few
formats that the old version was missing:

    MESA_FORMAT_SRGB8_ALPHA8_ASTC_3x3x3
    MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x3x3
    MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x4x3
    MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x4x4
    MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x4x4
    MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5x4
    MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5x5
    MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x5x5
    MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x6x5
    MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x6x6

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Ben Widawsky
34e1ccbfbe i965/miptree: Allocate mt earlier in update winsys
Later commits require intel_update_image_buffer() to have control over
the miptree creation.   However, intel_update_winsys_renderbuffer_miptree()
currently  creates it based on the given buffer object. This patch moves
the creation to the caller side.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Ben Widawsky
aadd37298c i965/miptree: Add a return for updating of winsys
There is nothing particularly useful to do currently if the update
fails, but there is no point carrying on either. As a result, this has a
behavior change.

v2: Make the return type a bool (Topi)

v3: Don't leak the bo if update_winsys_renderbuffer fails. (Jason)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
30cfed57ce i965: Use miptree_create_for_dri_image in image_target_renderbuffer_storage
This does make a tiny functional change in that we now also test for
whether or not the format supports texturing and not just rendering.
However, this should have no practical effect as all renderbuffers use
texturable formats.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
091965760d i965/miptree: Set level_x/h in create_for_dri_image
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
4bf140576a i965/miptree: Add tile_x/y to total_width/height
This is what we do in intel_image_target_renderbuffer_storage and it
makes more sense than stomping them.  Because the image gets created as
a 2D image with one miplevel, they should already be equal to the
provided width/height.  Adding the tile offset makes some sense
depending on how you interpret the fields.

The only place these fields are used for in state setup is to set up the
image parameters we pass into shaders.  There may be issues here if you
try to use image_load_store on something pulled in from EGL but that's
probably broken already.  This just makes it consistently broken.

Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
947b72ab5d i965/miptree: Pass the offset into create_for_bo in create_for_dri_image
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-12 21:15:46 -07:00
Jason Ekstrand
72e7a6b0b6 i965: Move the DRIimage -> miptree code to intel_mipmap_tree.c
This is mostly a direct port.  The only bit of refactoring that was done
was to make creating a planar miptree be an early return from the
non-planar case.  Alternatively, we could have three functions: two
helpers and a main function to just call the right helper.  Making the
planar case an early return seemed cleaner.

Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-07-12 21:15:46 -07:00
Ilia Mirkin
3645268748 nv50/ir: fix threads calculation for non-compute shaders
We were using the "cp" union fields, which are only valid for compute
shaders. The threads calculation affects the available GPRs, so just
pick a small number for other shader types to avoid limiting available
registers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-07-12 22:09:59 -04:00
Ilia Mirkin
87028f8639 freedreno/ir3: fix load_front_face conversion
The comments are correct - we get -1 and 0. However by adding 1, we
convert this into 0,1. This mostly works for conditionals, but when
negated, this will yield the wrong result. Instead just negate the
values (as they are backwards -- -1 means back instead of front).

Fixes tests/shaders/glsl-fs-frontfacing-not.shader_test and
dEQP-GLES3.functional.shaders.builtin_variable.frontfacing on A530.

The latter also tested on A306 by Rob Clark.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-07-12 19:30:46 -04:00
Alex Smith
0e1886efb9 radv: Fix descriptors for cube images with VK_IMAGE_USAGE_STORAGE_BIT
If a cube image has VK_IMAGE_USAGE_STORAGE_BIT set, the type in an image
view's descriptor was set to a 2D array (and a few other fields adjusted
accordingly). This is correct when the image view is actually bound as a
storage image, but not when bound as a sampled image. In that case the
type should be set as a cube.

Fix by generating 2 sets of descriptors at view creation time for both
storage and non-storage usage, and then choose between them based on
descriptor type when writing descriptor sets.

v2: Generate storage descriptors for images with TRANSFER_DST, since
    those may be used as storage images internally.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-07-13 00:21:20 +02:00
Alex Smith
4d5c0c189d radv: Fix possible invalid free of dynamic descriptors
This free was left in after dynamic descriptors were changed to not be
allocated separately from the descriptor set, and can cause a crash.

Fixes: 39644fa40a ("radv: Don't allocate dynamic descriptors separately")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-07-13 00:21:20 +02:00
Bruce Cherniak
02735e6cf8 swr: Add path to draw directly from client memory without copy.
If size of client memory copy is too large, don't copy. The draw will
access user-buffer directly and then block.  This is faster and more
efficient than queuing many large client draws.

Applications that still use large client arrays benefit from this.  VMD
is an example.

The threshold for this path defaults to 32KB.  This value can be
overridden by setting environment variable SWR_CLIENT_COPY_LIMIT.

v2: Use #define for default value, rather than hard-coded constant.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-07-12 16:56:40 -05:00
Bruce Cherniak
1520a06607 swr: Move environment config options into separate function.
Moved reading of environment config options out of
swr_create_screen_internal, into a separate swr_validate_env_options.
This is to keep from cluttering create_screen.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-07-12 16:56:40 -05:00
Bruce Cherniak
5bd9554f3d swr: Remove hard-coded constant and "todo" comment.
Removed the hard-coded constant in favor of a #define.  Also removed
TODO comment.  The constant value doesn't need an environment
configurable option.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2017-07-12 16:56:40 -05:00
Rob Herring
7a7a84c8db Android: Fix vc4 build since XML changes.
Since commit 7f80a9ff13 ("vc4: Introduce XML-based packet header
generation like Intel's."), the vc4 build on Android is broken:

out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'v3d_packet_helpers.h' file not found
external/mesa3d/src/gallium/drivers/vc4/vc4_cl_dump.c:28:10: fatal error: 'vc4_packet.h' file not found

The path of the generated header needs to be fixed since we build out of
tree.

Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-07-12 16:47:10 -05:00
Charmaine Lee
147d7fb772 st/mesa: add a winsys buffers list in st_context
Commit a5e733c6b5 fixes the dangling
framebuffer object by unreferencing the window system draw/read buffers
when context is released. However this can prematurely destroy the
resources associated with these window system buffers. The problem is
reproducible with Turbine Demo running with VMware driver. In this case,
the depth buffer content was lost when the context is rebound to a
drawable.

To prevent premature destroy of the resources associated with
window system buffers, this patch maintains a list of these buffers in
the context, making sure the reference counts of these buffers will not
reach zero until the associated framebuffer interface objects no
longer exist. This also helps to avoid unnecessary destruction and
re-construction of the resources associated with the framebuffer.

Fixes VMware bug 1909807.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-07-11 19:40:17 -07:00
Kenneth Graunke
76acbd07fc i965: Drop bogus pthread_mutex_unlock in map_gtt error path.
The locking was supposed to go away in commit 314647c4c2
(i965: Drop global bufmgr lock from brw_bo_map_* functions.), but
this lone unlock remains.

I'm guessing I messed this up when splitting up Chris's patch.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-07-12 12:39:10 -07:00
Anuj Phogat
0a56c5f3f1 intel/compiler: Don't use opt_sampler_eot() optimization on gen10+
This optimization has been removed on gen10+.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-12 11:27:31 -07:00
Eric Anholt
84ed8b67c5 vc4: Set shareable BOs as T tiled if possible
X11 and GL compositor performance on VC4 has been terrible because of our
SHARED-usage buffers all being forced to linear.  This swaps SHARED &&
!LINEAR buffers over to being tiled.

This is an expected win for all GL compositors during rendering (a full
copy of each shared texture per draw call), allows X11 to be used with
decent performance without a GL compositor, and improves X11 windowed
swapbuffers performance as well.  It also halves the memory usage of
shared buffers that get textured from.  The only cost should be idle
systems with a scanout-only buffer that isn't flagged as LINEAR, in which
case the memory bandwidth cost of scanout goes up ~25%.

This implements the EGL_EXT_image_dma_buf_import_modifiers extension,
supporting the VC4 T_TILED modifier.

v2: Added modifier support to resource creation/import, and
    advertisement (by daniels).
v3: Fix old-kernel fallback path, fix compiler error and warnings, and
    comment touchups (by anholt).

Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-12 10:58:33 -07:00
Eric Anholt
bb466a996f vc4: Use vc4_setup_slices for resource import
Rather than open-coding populating the first slice inside resource
import, use vc4_setup_slices to do it for us.

v2: Rebase on VC4_DEBUG=surf change

Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-12 10:58:33 -07:00
Eric Anholt
111b6b77cb vc4: Make the miptree debug code available under VC4_DEBUG=surf
I kept flipping the bool on for debug, so let's just make it available.

Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-12 10:58:33 -07:00
Eric Anholt
a2d87a0019 vc4: Switch back to using a local copy of vc4_drm.h.
Needing to get our uapi header from libdrm has only complicated things.
Follow intel's lead and drop our requirement for it.

Generated from the same commit mentioned in the README.

v2: Update Android.mk as well, move vc4_drm.h reference for distcheck.

Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-07-12 10:58:33 -07:00
Eric Anholt
5d6271c6a5 intel: Move the DRM uapi headers to a non-Intel location.
I want to remove vc4's dependency on headers from libdrm as well, but
storing multiple copies of drm_fourcc.h in our tree would be silly.

v2: Update Android.mk as well, move distcheck drm*.h references to
    top-level noinst_HEADERS.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Rob Herring <robh@kernel.org>
2017-07-12 10:58:33 -07:00
Eric Anholt
2aec62a45b vc4: Remove a stale comment.
The kernel hasn't been synchronous in a couple of years, plus there was
synchronization code right there.
2017-07-12 10:58:33 -07:00
Jason Ekstrand
8e3d9c5d09 anv: Round u_vector element sizes to a power of two
This fixes 32-bit builds of the driver.  Commit 08413a81b9
changed things so that we now put struct anv_states in the u_vector for
binding tables.  On 64-bit builds, sizeof(struct anv_state) is a power
of two but it isn't on 32-bit builds.

Fixes: 08413a81b9
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2017-07-12 10:34:13 -07:00
Brian Paul
5e5f251db1 svga: whitespace, formatting fixes in svga_swtnl_backend.c 2017-07-12 10:58:14 -06:00
Brian Paul
f2b59f6c02 svga: whitespace, formatting fixes in svga_swtnl_draw.c 2017-07-12 10:58:14 -06:00
Brian Paul
183d4193b8 svga: whitespace, formatting fixes in svga_swtnl_state.c 2017-07-12 10:58:13 -06:00
Brian Paul
f62bc96dd6 svga: move comment, declaration in svga_init_shader_key_common()
put the comment before the relevant code.  Move declaration of
swizzle_tab var to where it's used.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-12 10:58:08 -06:00
Brian Paul
33eedd081e draw: whitespace, formatting fixes in draw_vs_exec.c
Trivial.
2017-07-12 10:58:07 -06:00
Brian Paul
8871c3ccf6 draw: s/unsigned/enum tgsi_semantic/
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-12 10:58:02 -06:00
Emil Velikov
459274144d travis: lower SWR requirement to GCC 4.8, aka std=c++11
With ealier commit we relaxed the requirement from C++14 to C++11.
Update the build script so that it

Cc: Tim Rowley <timothy.o.rowley@intel.com
Fixes: 0b80b02502 ("swr: relax c++ requirement from c++14 to c++11")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-12 15:46:25 +01:00
Emil Velikov
432f8bff5a docs: update HTTP -> HTTPS reference to reflect reality
The link recently got updated to https.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:45:30 +01:00
Emil Velikov
4506a74cc6 egl: set KHR_gl_texture_3D_image only when the requirements are met.
DRI_IMAGE's createImageFromTexture is used to implement the extension,
so we should check for it prior to advertising.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:45:27 +01:00
Emil Velikov
962110fa57 egl: enhance KHR_gl_image extensions checks
Drop the (duplicate) top-level check in dri2_create_image_khr() and add
the respective checks in dri2_create_image_khr_{texture,renderbuffer}

v2: use unreachable instead of assert in dri2_create_image_khr_texture

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:44:26 +01:00
Emil Velikov
a2ae8e6076 egl: don't set modifier if no modifiers are available
If no modifiers are available, the variable will never be used. Thus
there's no point in initialising it.

Cc: Varad Gautam <varad.gautam@collabora.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:43:15 +01:00
Emil Velikov
4d8191fd00 egl: check for extensions' presence during attr parsing
If the respective extension is not supported, one should return
EGL_BAD_PARAMETER as mentioned in earlier commits.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:43:12 +01:00
Emil Velikov
cd859452e9 egl: add width/height as EXT_image_dma_buf_import attrs
Although not listed amongst the initial EGL_LINUX_DRM_FOURCC_EXT and
friends list, the spec reads

   ... Required attributes and their values are as
   follows:

    * EGL_WIDTH & EGL_HEIGHT: The logical dimensions of the buffer in pixels

    * EGL_LINUX_DRM_FOURCC_EXT: The pixel format of the buffer, as specified
      by drm_fourcc.h and used as the pixel_format parameter of the
      drm_mode_fb_cmd2 ioctl.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:43:09 +01:00
Emil Velikov
d13dcca2c2 egl: polish EXT_image_dma_buf_import attr parsing
Simplify the existing if/else + temporary variable into if (foo) return
X.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:43:05 +01:00
Emil Velikov
448f70e366 egl: simplify EXT_image_dma_buf_import_modifiers attr parsing
Move the common extension check at the top.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:42:59 +01:00
Emil Velikov
3ee2be4113 egl: split _eglParseImageAttribList into per extension functions
Will allow us to simplify existing code and make further improvements
short and simple.

No functional change intended.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:42:54 +01:00
Emil Velikov
81e95924ea egl: call _eglError within _eglParseImageAttribList
As per EGL_KHR_image_base:

   If an attribute specified in <attrib_list> is not one of the
   attributes listed in Table bbb, the error EGL_BAD_PARAMETER is
   generated.

We should set the error as opposed to simply log it.

Currently we have a partial solution, whereby only some of the callers
call _eglError().

Since that has proven to be less robust, simply set the error by the
function itself and change the return type to EGLBoolean, updating the
callers.

So now the code is slightly simpler. Plus the follow-up fixes will be
easier to manage.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:42:51 +01:00
Emil Velikov
9365ff4b88 egl: move eglCreateDRMImageMESA's malloc later
Don't bother allocating any memory until we're finished parsing and
sanitising all the attributes.

As a nice side effect we now consistently set eglError when any of
the attrib/values are not correct.

Strangely enough the spec does not mention _anything_ about what error
should be set where, even if the implementation already sets the odd
one.

Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-07-12 15:42:03 +01:00
Brian Paul
f7e78abdf4 svga: fix texture swizzle writemasking
Commit bfe1e7737a changed how texture swizzles are set up.
This exposed a latent bug in the VMware driver: we were ignoring
the texture instruction's writemask when applying the 0 and 1
swizzle terms.

This wasn't caught by the Piglit texture swizzle test because it
only exercises fixed function (no write masking).

Fixes issues seen with ETQW apitrace.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-07-11 15:43:36 -06:00
Chris Wilson
cead51a0c6 i965: Use VALGRIND_MAKE_MEM_x in place of MALLOCLIKE/FREELIKE
Valgrind doesn't actually implement VALGRIND_FREELIKE_BLOCK as the
exact inverse of VALGRIND_MALLOCLIKE_BLOCK. It makes the block
inaccessible, but still leaves it defined in its allocation tracker i.e.
it will report the mmap as lost despite the call to FREELIKE!

Instead of treating the mmap as an allocation, treat it as changing the
access bits upon the memory, i.e. that it becomes defined (because of
the buffer objects always contain valid content from the user's
perspective) upon mmap and inaccessible upon munmap. This makes memcheck
happy without leaving it thinking there is a very large leak.

Finally for consistency, we treat all the mmap/munmap paths the same
even though valgrind can intercept the regular mmap used for GTT. We
could move this in the drm_mmap/drm_munmap macros, but that quickly
looks ugly given the desire for those to support different OSes, but I
didn't try that hard!

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-07-11 14:07:47 -07:00
Kenneth Graunke
314879f7fe i965: Fix asynchronous mappings on !LLC platforms.
When using a read-only CPU mapping, we may encounter stale buffer
contents.  For example, the Piglit primitive-restart test offers the
following scenario:

   1. Read data via a CPU map.
   2. Destroy that buffer.
   3. Create a new buffer - obtaining the same one via the BO cache.
   4. Call BufferSubData, which does a GTT map with MAP_WRITE | MAP_ASYNC.
      (We avoid set_domain for async mappings, so no flushing occurs.)
   5. Read data via a CPU map.
      (Without explicit clflushing, this will contain data from step 1!)

Otherwise, everything ought to work, keeping in mind that we never use
CPU maps for writing - just read-only CPU maps.

This restores the performance gains after Matt's revert in commit
71651b3139.

v2: Do the invalidate later, and even when asking for a brand new map.
v3: Add more comments from Chris.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-07-11 13:26:53 -07:00
Kenneth Graunke
20104f1926 i965: Don't use PREAD for glGetBufferSubData().
Just map the buffer and memcpy.  This will do a CPU mmap, which should
be reasonably efficient, and doing this gives us full control over the
domains and caching instead of leaving it to the kernel.

This prevents regressions on Braswell in the next commit.  Specifically
GL45-CTS.shader_atomic_counters.basic-buffer-operations.  Because async
maps start skipping set-domain, the pread thought everything was nicely
still in the CPU domain, and returned stale data.

v2: Use _mesa_error_no_memory() if the map fails instead of crashing.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-07-11 13:26:46 -07:00
Tim Rowley
f50aa21456 swr: build driver proper separate from rasterizer
swr used to build and link the rasterizer to the driver, and to support
multiple architectures we needed to have multiple versions of the
driver/rasterizer combination, which needed to link in much of mesa.

Changing to having one instance of the driver and just building
architecture specific versions of the rasterizer gives a large reduction
in disk space.

libGL.so        6464 Kb ->  7000 Kb
libswrAVX.so   10068 Kb ->  5432 Kb
libswrAVX2.so   9828 Kb ->  5200 Kb

Total          26360 Kb -> 17632 Kb

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-07-11 13:38:20 -05:00
Tim Rowley
50cd222116 swr: switch to using SwrGetInterface api table
Use the SWR rasterizer API through the table returned from
SwrGetInterface rather than referencing the functions directly.
This will allow us to move to a model of having the driver dynamically
load the appropriate swr architecture library.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-11 13:38:20 -05:00
George Kyriazis
27c5568de3 swr/rast: make SWR_VISIBLE attribute work for windows
Needed to expose SwrGetInterface

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-07-11 13:37:57 -05:00
Lionel Landwerlin
9d681a7a18 i965: perf: use new subslices numbers from device info
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2017-07-11 16:14:57 +01:00
Lionel Landwerlin
384aaa4d3f intel: add number of subslices to device info
We could have used a single integer to store that value, but
Cannonlake has different number of subslices per slice depending on
the GT.

v2: Add CFL subslice numbers (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2017-07-11 16:14:57 +01:00
Ben Widawsky
25c1a7cc7a i965: Use already existing eu_total
Reduces IOCTL calls by 1, and provides a centralized place to override
such configurations if we have a need to do so.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-07-11 16:14:57 +01:00
Chris Wilson
618be8cc1a i965: Resolve framebuffers before signaling the fence
From KHR_fence_sync:

  When the condition of the sync object is satisfied by the fence
  command, the sync is signaled by the associated client API context,
  causing any eglClientWaitSyncKHR commands (see below) blocking on
  <sync> to unblock. The only condition currently supported is
  EGL_SYNC_PRIOR_COMMANDS_COMPLETE_KHR, which is satisfied by
  completion of the fence command corresponding to the sync object,
  and all preceding commands in the associated client API context's
  command stream. The sync object will not be signaled until all
  effects from these commands on the client API's internal and
  framebuffer state are fully realized. No other state is affected by
  execution of the fence command.

If clients are passing the fence fd (from EGL_ANDROID_native_fence_sync)
to a compositor, that fence must only be signaled once the framebuffer
is resolved and not before as is currently the case.

v2: fixup assert to use GL_SYNC_GPU_COMMANDS_COMPLETE (Chad)

Reported-by: Sergi Granell <xerpi.g.12@gmail.com>
Fixes: c636284ee8 ("i965/sync: Implement DRI2_Fence extension")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sergi Granell <xerpi.g.12@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Chad Versace <chadversary@chromium.org>
Cc: Daniel Stone <daniels@collabora.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2017-07-11 15:46:58 +01:00
Brian Paul
bf7a4f4441 svga: s/unsigned/enum tgsi_texture_type/
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-07-11 08:09:14 -06:00
Brian Paul
1d82674969 svga: s/unsigned/enum tgsi_swizzle
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-07-11 08:09:14 -06:00
Brian Paul
3effacf172 svga: s/unsigned/enum tgsi_interpolate_mode/
And s/unsigned/enum tgsi_interpolate_loc/

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-07-11 08:09:14 -06:00
Brian Paul
9330112b35 svga: s/unsigned/enum tgsi_file_type/
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-07-11 08:09:14 -06:00
Brian Paul
1b5e88becd svga: s/unsigned/enum tgsi_semantic/
Makes gdb debugging a little nicer.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-07-11 08:09:14 -06:00
442 changed files with 20850 additions and 9750 deletions

View File

@@ -11,6 +11,7 @@ env:
global:
- XORG_RELEASES=http://xorg.freedesktop.org/releases/individual
- XCB_RELEASES=http://xcb.freedesktop.org/dist
- WAYLAND_RELEASES=http://wayland.freedesktop.org/releases
- XORGMACROS_VERSION=util-macros-1.19.0
- GLPROTO_VERSION=glproto-1.4.17
- DRI2PROTO_VERSION=dri2proto-2.8
@@ -23,7 +24,8 @@ env:
- LIBVDPAU_VERSION=libvdpau-1.1
- LIBVA_VERSION=libva-1.6.2
- LIBWAYLAND_VERSION=wayland-1.11.1
- PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig
- WAYLAND_PROTOCOLS_VERSION=wayland-protocols-1.8
- PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig:$HOME/prefix/share/pkgconfig
- LD_LIBRARY_PATH="$HOME/prefix/lib:$LD_LIBRARY_PATH"
matrix:
@@ -57,8 +59,8 @@ matrix:
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=3.9
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- OVERRIDE_CC="gcc-5"
- OVERRIDE_CXX="g++-5"
- OVERRIDE_CC="gcc-4.8"
- OVERRIDE_CXX="g++-4.8"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx --disable-gallium-osmesa"
@@ -67,13 +69,11 @@ matrix:
addons:
apt:
sources:
- ubuntu-toolchain-r-test
- llvm-toolchain-trusty-3.9
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# From sources above
- g++-5
- llvm-3.9-dev
# Common
- xz-utils
@@ -250,19 +250,17 @@ matrix:
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
# Keep it symmetrical to the make build. There's no actual SWR, yet.
- SCONS_CHECK_COMMAND="true"
- OVERRIDE_CC="gcc-5"
- OVERRIDE_CXX="g++-5"
- OVERRIDE_CC="gcc-4.8"
- OVERRIDE_CXX="g++-4.8"
addons:
apt:
sources:
- ubuntu-toolchain-r-test
- llvm-toolchain-trusty-3.9
packages:
- scons
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# From sources above
- g++-5
- llvm-3.9-dev
# Common
- xz-utils
@@ -340,10 +338,14 @@ install:
- tar -jxvf $LIBVA_VERSION.tar.bz2
- (cd $LIBVA_VERSION && ./configure --prefix=$HOME/prefix --disable-wayland --disable-dummy-driver && make install)
- wget http://wayland.freedesktop.org/releases/$LIBWAYLAND_VERSION.tar.xz
- wget $WAYLAND_RELEASES/$LIBWAYLAND_VERSION.tar.xz
- tar -axvf $LIBWAYLAND_VERSION.tar.xz
- (cd $LIBWAYLAND_VERSION && ./configure --prefix=$HOME/prefix --enable-libraries --without-host-scanner --disable-documentation --disable-dtd-validation && make install)
- wget $WAYLAND_RELEASES/$WAYLAND_PROTOCOLS_VERSION.tar.xz
- tar -axvf $WAYLAND_PROTOCOLS_VERSION.tar.xz
- (cd $WAYLAND_PROTOCOLS_VERSION && ./configure --prefix=$HOME/prefix && make install)
# Generate the header since one is missing on the Travis instance
- mkdir -p linux
- printf "%s\n" \

View File

@@ -62,6 +62,11 @@ noinst_HEADERS = \
include/c99_compat.h \
include/c99_math.h \
include/c11 \
include/drm-uapi/drm.h \
include/drm-uapi/drm_fourcc.h \
include/drm-uapi/drm_mode.h \
include/drm-uapi/i915_drm.h \
include/drm-uapi/vc4_drm.h \
include/D3D9 \
include/GL/wglext.h \
include/HaikuGL \

View File

@@ -1 +1 @@
17.2.0-devel
17.2.0-rc3

View File

@@ -74,13 +74,12 @@ AC_SUBST([OPENCL_VERSION])
# in the first entry.
LIBDRM_REQUIRED=2.4.75
LIBDRM_RADEON_REQUIRED=2.4.71
LIBDRM_AMDGPU_REQUIRED=2.4.81
LIBDRM_AMDGPU_REQUIRED=2.4.82
LIBDRM_INTEL_REQUIRED=2.4.75
LIBDRM_NVVIEUX_REQUIRED=2.4.66
LIBDRM_NOUVEAU_REQUIRED=2.4.66
LIBDRM_FREEDRENO_REQUIRED=2.4.74
LIBDRM_VC4_REQUIRED=2.4.69
LIBDRM_ETNAVIV_REQUIRED=2.4.80
LIBDRM_ETNAVIV_REQUIRED=2.4.82
dnl Versions for external dependencies
DRI2PROTO_REQUIRED=2.8
@@ -89,6 +88,7 @@ LIBOMXIL_BELLAGIO_REQUIRED=0.0
LIBVA_REQUIRED=0.38.0
VDPAU_REQUIRED=1.1
WAYLAND_REQUIRED=1.11
WAYLAND_PROTOCOLS_REQUIRED=1.8
XCB_REQUIRED=1.9.3
XCBDRI2_REQUIRED=1.8
XCBGLX_REQUIRED=1.8.1
@@ -287,9 +287,9 @@ if test "x$GCC" = xyes; then
CFLAGS="$CFLAGS -Wall"
if test "x$USE_GNU99" = xyes; then
CFLAGS="$CFLAGS -std=gnu99"
CFLAGS="$CFLAGS -std=gnu99"
else
CFLAGS="$CFLAGS -std=c99"
CFLAGS="$CFLAGS -std=c99"
fi
# Enable -Werror=implicit-function-declaration and
@@ -301,9 +301,9 @@ if test "x$GCC" = xyes; then
CFLAGS="$CFLAGS -Werror=implicit-function-declaration"
CFLAGS="$CFLAGS -Werror=missing-prototypes"
AC_LINK_IFELSE([AC_LANG_PROGRAM()],
AC_MSG_RESULT([yes]),
[CFLAGS="$save_CFLAGS -Wmissing-prototypes";
AC_MSG_RESULT([no])])
AC_MSG_RESULT([yes]),
[CFLAGS="$save_CFLAGS -Wmissing-prototypes";
AC_MSG_RESULT([no])])
# Enable -fvisibility=hidden if using a gcc that supports it
save_CFLAGS="$CFLAGS"
@@ -311,7 +311,7 @@ if test "x$GCC" = xyes; then
VISIBILITY_CFLAGS="-fvisibility=hidden"
CFLAGS="$CFLAGS $VISIBILITY_CFLAGS"
AC_LINK_IFELSE([AC_LANG_PROGRAM()], AC_MSG_RESULT([yes]),
[VISIBILITY_CFLAGS=""; AC_MSG_RESULT([no])])
[VISIBILITY_CFLAGS=""; AC_MSG_RESULT([no])])
# Restore CFLAGS; VISIBILITY_CFLAGS are added to it where needed.
CFLAGS=$save_CFLAGS
@@ -333,10 +333,10 @@ if test "x$GCC" = xyes; then
AC_MSG_CHECKING([whether $CC supports -Werror=vla])
CFLAGS="$CFLAGS -Werror=vla"
AC_LINK_IFELSE([AC_LANG_PROGRAM()],
[MSVC2013_COMPAT_CFLAGS="$MSVC2013_COMPAT_CFLAGS -Werror=vla";
MSVC2013_COMPAT_CXXFLAGS="$MSVC2013_COMPAT_CXXFLAGS -Werror=vla";
AC_MSG_RESULT([yes])],
AC_MSG_RESULT([no]))
[MSVC2013_COMPAT_CFLAGS="$MSVC2013_COMPAT_CFLAGS -Werror=vla";
MSVC2013_COMPAT_CXXFLAGS="$MSVC2013_COMPAT_CXXFLAGS -Werror=vla";
AC_MSG_RESULT([yes])],
AC_MSG_RESULT([no]))
CFLAGS="$save_CFLAGS"
fi
if test "x$GXX" = xyes; then
@@ -349,7 +349,7 @@ if test "x$GXX" = xyes; then
CXXFLAGS="$CXXFLAGS $VISIBILITY_CXXFLAGS"
AC_LANG_PUSH([C++])
AC_LINK_IFELSE([AC_LANG_PROGRAM()], AC_MSG_RESULT([yes]),
[VISIBILITY_CXXFLAGS="" ; AC_MSG_RESULT([no])])
[VISIBILITY_CXXFLAGS="" ; AC_MSG_RESULT([no])])
AC_LANG_POP([C++])
# Restore CXXFLAGS; VISIBILITY_CXXFLAGS are added to it where needed.
@@ -1292,6 +1292,9 @@ AM_CONDITIONAL(HAVE_OPENGL_ES2, test "x$enable_gles2" = xyes)
AM_CONDITIONAL(NEED_OPENGL_COMMON, test "x$enable_opengl" = xyes -o \
"x$enable_gles1" = xyes -o \
"x$enable_gles2" = xyes)
AM_CONDITIONAL(NEED_KHRPLATFORM, test "x$enable_egl" = xyes -o \
"x$enable_gles1" = xyes -o \
"x$enable_gles2" = xyes)
# Validate GLX options
if test "x$enable_glx" = xyes; then
@@ -1413,7 +1416,7 @@ AC_SUBST([OSMESA_LIB])
PKG_CHECK_MODULES([LIBDRM], [libdrm >= $LIBDRM_REQUIRED],
[have_libdrm=yes], [have_libdrm=no])
if test "x$have_libdrm" = xyes; then
DEFINES="$DEFINES -DHAVE_LIBDRM"
DEFINES="$DEFINES -DHAVE_LIBDRM"
fi
require_libdrm() {
@@ -1678,50 +1681,59 @@ if test "x$WAYLAND_SCANNER" = x; then
AC_PATH_PROG([WAYLAND_SCANNER], [wayland-scanner], [:])
fi
PKG_CHECK_EXISTS([wayland-protocols >= $WAYLAND_PROTOCOLS_REQUIRED], [have_wayland_protocols=yes], [have_wayland_protocols=no])
if test "x$have_wayland_protocols" = xyes; then
ac_wayland_protocols_pkgdatadir=`$PKG_CONFIG --variable=pkgdatadir wayland-protocols`
fi
AC_SUBST(WAYLAND_PROTOCOLS_DATADIR, $ac_wayland_protocols_pkgdatadir)
# Do per platform setups and checks
platforms=`IFS=', '; echo $with_platforms`
for plat in $platforms; do
case "$plat" in
wayland)
case "$plat" in
wayland)
PKG_CHECK_MODULES([WAYLAND], [wayland-client >= $WAYLAND_REQUIRED wayland-server >= $WAYLAND_REQUIRED])
PKG_CHECK_MODULES([WAYLAND], [wayland-client >= $WAYLAND_REQUIRED wayland-server >= $WAYLAND_REQUIRED])
if test "x$WAYLAND_SCANNER" = "x:"; then
AC_MSG_ERROR([wayland-scanner is needed to compile the wayland platform])
fi
DEFINES="$DEFINES -DHAVE_WAYLAND_PLATFORM"
;;
if test "x$WAYLAND_SCANNER" = "x:"; then
AC_MSG_ERROR([wayland-scanner is needed to compile the wayland platform])
fi
if test "x$have_wayland_protocols" = xno; then
AC_MSG_ERROR([wayland-protocols >= $WAYLAND_PROTOCOLS_REQUIRED is needed to compile the wayland platform])
fi
DEFINES="$DEFINES -DHAVE_WAYLAND_PLATFORM"
;;
x11)
PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED xcb-xfixes])
DEFINES="$DEFINES -DHAVE_X11_PLATFORM"
;;
x11)
PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED xcb-xfixes])
DEFINES="$DEFINES -DHAVE_X11_PLATFORM"
;;
drm)
test "x$enable_gbm" = "xno" &&
AC_MSG_ERROR([EGL platform drm needs gbm])
DEFINES="$DEFINES -DHAVE_DRM_PLATFORM"
;;
drm)
test "x$enable_gbm" = "xno" &&
AC_MSG_ERROR([EGL platform drm needs gbm])
DEFINES="$DEFINES -DHAVE_DRM_PLATFORM"
;;
surfaceless)
DEFINES="$DEFINES -DHAVE_SURFACELESS_PLATFORM"
;;
surfaceless)
DEFINES="$DEFINES -DHAVE_SURFACELESS_PLATFORM"
;;
android)
PKG_CHECK_MODULES([ANDROID], [cutils hardware sync])
DEFINES="$DEFINES -DHAVE_ANDROID_PLATFORM"
;;
android)
PKG_CHECK_MODULES([ANDROID], [cutils hardware sync])
DEFINES="$DEFINES -DHAVE_ANDROID_PLATFORM"
;;
*)
AC_MSG_ERROR([platform '$plat' does not exist])
;;
esac
*)
AC_MSG_ERROR([platform '$plat' does not exist])
;;
esac
case "$plat" in
wayland|drm|surfaceless)
require_libdrm "Platform $plat"
;;
esac
case "$plat" in
wayland|drm|surfaceless)
require_libdrm "Platform $plat"
;;
esac
done
if test "x$enable_glx" != xno; then
@@ -2103,7 +2115,7 @@ if test -n "$with_gallium_drivers" -a "x$with_gallium_drivers" != xswrast; then
fi
if test "x$enable_vdpau" = xauto -a "x$have_vdpau_platform" = xyes; then
PKG_CHECK_EXISTS([vdpau >= $VDPAU_REQUIRED], [enable_vdpau=yes], [enable_vdpau=no])
PKG_CHECK_EXISTS([vdpau >= $VDPAU_REQUIRED], [enable_vdpau=yes], [enable_vdpau=no])
fi
if test "x$enable_omx" = xauto -a "x$have_omx_platform" = xyes; then
@@ -2347,6 +2359,15 @@ AC_ARG_WITH([d3d-libdir],
[D3D_DRIVER_INSTALL_DIR="${libdir}/d3d"])
AC_SUBST([D3D_DRIVER_INSTALL_DIR])
dnl Architectures to build SWR library for
AC_ARG_WITH([swr-archs],
[AS_HELP_STRING([--with-swr-archs@<:@=DIRS...@:>@],
[comma delimited swr architectures list, e.g.
"avx,avx2,knl,skx" @<:@default="avx,avx2"@:>@])],
[with_swr_archs="$withval"],
[with_swr_archs="avx,avx2"])
dnl
dnl r300 doesn't strictly require LLVM, but for performance reasons we
dnl highly recommend LLVM usage. So require it at least on x86 and x86_64
@@ -2494,16 +2515,50 @@ if test -n "$with_gallium_drivers"; then
SWR_AVX_CXXFLAGS
AC_SUBST([SWR_AVX_CXXFLAGS])
swr_require_cxx_feature_flags "AVX2" "defined(__AVX2__)" \
",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2" \
SWR_AVX2_CXXFLAGS
AC_SUBST([SWR_AVX2_CXXFLAGS])
swr_archs=`IFS=', '; echo $with_swr_archs`
for arch in $swr_archs; do
case "x$arch" in
xavx)
HAVE_SWR_AVX=yes
;;
xavx2)
swr_require_cxx_feature_flags "AVX2" "defined(__AVX2__)" \
",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2" \
SWR_AVX2_CXXFLAGS
AC_SUBST([SWR_AVX2_CXXFLAGS])
HAVE_SWR_AVX2=yes
;;
xknl)
swr_require_cxx_feature_flags "KNL" "defined(__AVX512F__) && defined(__AVX512ER__)" \
",-march=knl,-xMIC-AVX512" \
SWR_KNL_CXXFLAGS
AC_SUBST([SWR_KNL_CXXFLAGS])
HAVE_SWR_KNL=yes
;;
xskx)
swr_require_cxx_feature_flags "SKX" "defined(__AVX512F__) && defined(__AVX512BW__)" \
",-march=skylake-avx512,-xCORE-AVX512" \
SWR_SKX_CXXFLAGS
AC_SUBST([SWR_SKX_CXXFLAGS])
HAVE_SWR_SKX=yes
;;
*)
AC_MSG_ERROR([unknown SWR build architecture '$arch'])
;;
esac
done
if test "x$HAVE_SWR_AVX" != xyes -a \
"x$HAVE_SWR_AVX2" != xyes -a \
"x$HAVE_SWR_KNL" != xyes -a \
"x$HAVE_SWR_SKX" != xyes -a; then
AC_MSG_ERROR([swr enabled but no swr architectures selected])
fi
HAVE_GALLIUM_SWR=yes
;;
xvc4)
HAVE_GALLIUM_VC4=yes
PKG_CHECK_MODULES([VC4], [libdrm >= $LIBDRM_VC4_REQUIRED libdrm_vc4 >= $LIBDRM_VC4_REQUIRED])
require_libdrm "vc4"
PKG_CHECK_MODULES([SIMPENROSE], [simpenrose],
@@ -2537,6 +2592,11 @@ if test "x$enable_llvm" = "xyes" -a "$with_gallium_drivers"; then
llvm_add_default_components "gallium"
fi
AM_CONDITIONAL(HAVE_SWR_AVX, test "x$HAVE_SWR_AVX" = xyes)
AM_CONDITIONAL(HAVE_SWR_AVX2, test "x$HAVE_SWR_AVX2" = xyes)
AM_CONDITIONAL(HAVE_SWR_KNL, test "x$HAVE_SWR_KNL" = xyes)
AM_CONDITIONAL(HAVE_SWR_SKX, test "x$HAVE_SWR_SKX" = xyes)
dnl We need to validate some needed dependencies for renderonly drivers.
if test "x$HAVE_GALLIUM_ETNAVIV" != xyes -a "x$HAVE_GALLIUM_IMX" = xyes ; then
@@ -2711,18 +2771,18 @@ AC_ARG_ENABLE(valgrind,
[Build mesa with valgrind support (default: auto)])],
[VALGRIND=$enableval], [VALGRIND=auto])
if test "x$VALGRIND" != xno; then
PKG_CHECK_MODULES(VALGRIND, [valgrind], [have_valgrind=yes], [have_valgrind=no])
PKG_CHECK_MODULES(VALGRIND, [valgrind], [have_valgrind=yes], [have_valgrind=no])
fi
AC_MSG_CHECKING([whether to enable Valgrind support])
if test "x$VALGRIND" = xauto; then
VALGRIND="$have_valgrind"
VALGRIND="$have_valgrind"
fi
if test "x$VALGRIND" = "xyes"; then
if ! test "x$have_valgrind" = xyes; then
AC_MSG_ERROR([Valgrind support required but not present])
fi
AC_DEFINE([HAVE_VALGRIND], 1, [Use valgrind intrinsics to suppress false warnings])
if ! test "x$have_valgrind" = xyes; then
AC_MSG_ERROR([Valgrind support required but not present])
fi
AC_DEFINE([HAVE_VALGRIND], 1, [Use valgrind intrinsics to suppress false warnings])
fi
AC_MSG_RESULT([$VALGRIND])
@@ -2743,116 +2803,116 @@ CXXFLAGS="$CXXFLAGS $USER_CXXFLAGS"
dnl Substitute the config
AC_CONFIG_FILES([Makefile
src/Makefile
src/amd/Makefile
src/amd/vulkan/Makefile
src/broadcom/Makefile
src/compiler/Makefile
src/egl/Makefile
src/egl/main/egl.pc
src/egl/wayland/wayland-drm/Makefile
src/egl/wayland/wayland-egl/Makefile
src/egl/wayland/wayland-egl/wayland-egl.pc
src/gallium/Makefile
src/gallium/auxiliary/Makefile
src/gallium/auxiliary/pipe-loader/Makefile
src/gallium/drivers/freedreno/Makefile
src/gallium/drivers/ddebug/Makefile
src/gallium/drivers/i915/Makefile
src/gallium/drivers/llvmpipe/Makefile
src/gallium/drivers/noop/Makefile
src/gallium/drivers/nouveau/Makefile
src/gallium/drivers/pl111/Makefile
src/gallium/drivers/r300/Makefile
src/gallium/drivers/r600/Makefile
src/gallium/drivers/radeon/Makefile
src/gallium/drivers/radeonsi/Makefile
src/gallium/drivers/rbug/Makefile
src/gallium/drivers/softpipe/Makefile
src/gallium/drivers/svga/Makefile
src/gallium/drivers/swr/Makefile
src/gallium/drivers/trace/Makefile
src/gallium/drivers/etnaviv/Makefile
src/gallium/drivers/imx/Makefile
src/gallium/drivers/vc4/Makefile
src/gallium/drivers/virgl/Makefile
src/gallium/state_trackers/clover/Makefile
src/gallium/state_trackers/dri/Makefile
src/gallium/state_trackers/glx/xlib/Makefile
src/gallium/state_trackers/nine/Makefile
src/gallium/state_trackers/omx/Makefile
src/gallium/state_trackers/osmesa/Makefile
src/gallium/state_trackers/va/Makefile
src/gallium/state_trackers/vdpau/Makefile
src/gallium/state_trackers/xa/Makefile
src/gallium/state_trackers/xvmc/Makefile
src/gallium/targets/d3dadapter9/Makefile
src/gallium/targets/d3dadapter9/d3d.pc
src/gallium/targets/dri/Makefile
src/gallium/targets/libgl-xlib/Makefile
src/gallium/targets/omx/Makefile
src/gallium/targets/opencl/Makefile
src/gallium/targets/opencl/mesa.icd
src/gallium/targets/osmesa/Makefile
src/gallium/targets/osmesa/osmesa.pc
src/gallium/targets/pipe-loader/Makefile
src/gallium/targets/va/Makefile
src/gallium/targets/vdpau/Makefile
src/gallium/targets/xa/Makefile
src/gallium/targets/xa/xatracker.pc
src/gallium/targets/xvmc/Makefile
src/gallium/tests/trivial/Makefile
src/gallium/tests/unit/Makefile
src/gallium/winsys/etnaviv/drm/Makefile
src/gallium/winsys/imx/drm/Makefile
src/gallium/winsys/freedreno/drm/Makefile
src/gallium/winsys/i915/drm/Makefile
src/gallium/winsys/nouveau/drm/Makefile
src/gallium/winsys/pl111/drm/Makefile
src/gallium/winsys/radeon/drm/Makefile
src/gallium/winsys/amdgpu/drm/Makefile
src/gallium/winsys/svga/drm/Makefile
src/gallium/winsys/sw/dri/Makefile
src/gallium/winsys/sw/kms-dri/Makefile
src/gallium/winsys/sw/null/Makefile
src/gallium/winsys/sw/wrapper/Makefile
src/gallium/winsys/sw/xlib/Makefile
src/gallium/winsys/vc4/drm/Makefile
src/gallium/winsys/virgl/drm/Makefile
src/gallium/winsys/virgl/vtest/Makefile
src/gbm/Makefile
src/gbm/main/gbm.pc
src/glx/Makefile
src/glx/apple/Makefile
src/glx/tests/Makefile
src/glx/windows/Makefile
src/glx/windows/windowsdriproto.pc
src/gtest/Makefile
src/intel/Makefile
src/loader/Makefile
src/mapi/Makefile
src/mapi/es1api/glesv1_cm.pc
src/mapi/es2api/glesv2.pc
src/mapi/glapi/gen/Makefile
src/mesa/Makefile
src/mesa/gl.pc
src/mesa/drivers/dri/dri.pc
src/mesa/drivers/dri/common/Makefile
src/mesa/drivers/dri/common/xmlpool/Makefile
src/mesa/drivers/dri/i915/Makefile
src/mesa/drivers/dri/i965/Makefile
src/mesa/drivers/dri/Makefile
src/mesa/drivers/dri/nouveau/Makefile
src/mesa/drivers/dri/r200/Makefile
src/mesa/drivers/dri/radeon/Makefile
src/mesa/drivers/dri/swrast/Makefile
src/mesa/drivers/osmesa/Makefile
src/mesa/drivers/osmesa/osmesa.pc
src/mesa/drivers/x11/Makefile
src/mesa/main/tests/Makefile
src/util/Makefile
src/util/tests/hash_table/Makefile
src/vulkan/Makefile])
src/Makefile
src/amd/Makefile
src/amd/vulkan/Makefile
src/broadcom/Makefile
src/compiler/Makefile
src/egl/Makefile
src/egl/main/egl.pc
src/egl/wayland/wayland-drm/Makefile
src/egl/wayland/wayland-egl/Makefile
src/egl/wayland/wayland-egl/wayland-egl.pc
src/gallium/Makefile
src/gallium/auxiliary/Makefile
src/gallium/auxiliary/pipe-loader/Makefile
src/gallium/drivers/freedreno/Makefile
src/gallium/drivers/ddebug/Makefile
src/gallium/drivers/i915/Makefile
src/gallium/drivers/llvmpipe/Makefile
src/gallium/drivers/noop/Makefile
src/gallium/drivers/nouveau/Makefile
src/gallium/drivers/pl111/Makefile
src/gallium/drivers/r300/Makefile
src/gallium/drivers/r600/Makefile
src/gallium/drivers/radeon/Makefile
src/gallium/drivers/radeonsi/Makefile
src/gallium/drivers/rbug/Makefile
src/gallium/drivers/softpipe/Makefile
src/gallium/drivers/svga/Makefile
src/gallium/drivers/swr/Makefile
src/gallium/drivers/trace/Makefile
src/gallium/drivers/etnaviv/Makefile
src/gallium/drivers/imx/Makefile
src/gallium/drivers/vc4/Makefile
src/gallium/drivers/virgl/Makefile
src/gallium/state_trackers/clover/Makefile
src/gallium/state_trackers/dri/Makefile
src/gallium/state_trackers/glx/xlib/Makefile
src/gallium/state_trackers/nine/Makefile
src/gallium/state_trackers/omx/Makefile
src/gallium/state_trackers/osmesa/Makefile
src/gallium/state_trackers/va/Makefile
src/gallium/state_trackers/vdpau/Makefile
src/gallium/state_trackers/xa/Makefile
src/gallium/state_trackers/xvmc/Makefile
src/gallium/targets/d3dadapter9/Makefile
src/gallium/targets/d3dadapter9/d3d.pc
src/gallium/targets/dri/Makefile
src/gallium/targets/libgl-xlib/Makefile
src/gallium/targets/omx/Makefile
src/gallium/targets/opencl/Makefile
src/gallium/targets/opencl/mesa.icd
src/gallium/targets/osmesa/Makefile
src/gallium/targets/osmesa/osmesa.pc
src/gallium/targets/pipe-loader/Makefile
src/gallium/targets/va/Makefile
src/gallium/targets/vdpau/Makefile
src/gallium/targets/xa/Makefile
src/gallium/targets/xa/xatracker.pc
src/gallium/targets/xvmc/Makefile
src/gallium/tests/trivial/Makefile
src/gallium/tests/unit/Makefile
src/gallium/winsys/etnaviv/drm/Makefile
src/gallium/winsys/imx/drm/Makefile
src/gallium/winsys/freedreno/drm/Makefile
src/gallium/winsys/i915/drm/Makefile
src/gallium/winsys/nouveau/drm/Makefile
src/gallium/winsys/pl111/drm/Makefile
src/gallium/winsys/radeon/drm/Makefile
src/gallium/winsys/amdgpu/drm/Makefile
src/gallium/winsys/svga/drm/Makefile
src/gallium/winsys/sw/dri/Makefile
src/gallium/winsys/sw/kms-dri/Makefile
src/gallium/winsys/sw/null/Makefile
src/gallium/winsys/sw/wrapper/Makefile
src/gallium/winsys/sw/xlib/Makefile
src/gallium/winsys/vc4/drm/Makefile
src/gallium/winsys/virgl/drm/Makefile
src/gallium/winsys/virgl/vtest/Makefile
src/gbm/Makefile
src/gbm/main/gbm.pc
src/glx/Makefile
src/glx/apple/Makefile
src/glx/tests/Makefile
src/glx/windows/Makefile
src/glx/windows/windowsdriproto.pc
src/gtest/Makefile
src/intel/Makefile
src/loader/Makefile
src/mapi/Makefile
src/mapi/es1api/glesv1_cm.pc
src/mapi/es2api/glesv2.pc
src/mapi/glapi/gen/Makefile
src/mesa/Makefile
src/mesa/gl.pc
src/mesa/drivers/dri/dri.pc
src/mesa/drivers/dri/common/Makefile
src/mesa/drivers/dri/common/xmlpool/Makefile
src/mesa/drivers/dri/i915/Makefile
src/mesa/drivers/dri/i965/Makefile
src/mesa/drivers/dri/Makefile
src/mesa/drivers/dri/nouveau/Makefile
src/mesa/drivers/dri/r200/Makefile
src/mesa/drivers/dri/radeon/Makefile
src/mesa/drivers/dri/swrast/Makefile
src/mesa/drivers/osmesa/Makefile
src/mesa/drivers/osmesa/osmesa.pc
src/mesa/drivers/x11/Makefile
src/mesa/main/tests/Makefile
src/util/Makefile
src/util/tests/hash_table/Makefile
src/vulkan/Makefile])
AC_OUTPUT
@@ -2976,6 +3036,11 @@ else
echo " HUD lmsensors: yes"
fi
echo ""
if test "x$HAVE_GALLIUM_SWR" != x; then
echo " SWR archs: $swr_archs"
fi
dnl Libraries
echo ""
echo " Shared libs: $enable_shared"

View File

@@ -20,7 +20,7 @@
Primary Mesa download site:
<a href="ftp://ftp.freedesktop.org/pub/mesa/">ftp.freedesktop.org</a> (FTP)
or <a href="https://mesa.freedesktop.org/archive/">mesa.freedesktop.org</a>
(HTTP).
(HTTPS).
</p>
<p>

View File

@@ -292,10 +292,10 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
GL_ARB_sample_locations not started
GL_ARB_seamless_cubemap_per_texture DONE (i965, nvc0, radeonsi, r600, softpipe, swr)
GL_ARB_shader_atomic_counter_ops DONE (i965/gen7+, nvc0, radeonsi, softpipe)
GL_ARB_shader_ballot DONE (nvc0, radeonsi)
GL_ARB_shader_ballot DONE (i965/gen8+, nvc0, radeonsi)
GL_ARB_shader_clock DONE (i965/gen7+, nv50, nvc0, radeonsi)
GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi)
GL_ARB_shader_group_vote DONE (nvc0, radeonsi)
GL_ARB_shader_group_vote DONE (i965, nvc0, radeonsi)
GL_ARB_shader_stencil_export DONE (i965/gen9+, radeonsi, softpipe, llvmpipe, swr)
GL_ARB_shader_viewport_layer_array DONE (i965/gen6+, nvc0, radeonsi)
GL_ARB_sparse_buffer DONE (radeonsi/CIK+)

View File

@@ -16,6 +16,12 @@
<h1>News</h1>
<h2>July 14, 2017</h2>
<p>
<a href="relnotes/17.1.5.html">Mesa 17.1.5</a> is released.
This is a bug-fix release.
</p>
<h2>June 30, 2017</h2>
<p>
<a href="relnotes/17.1.4.html">Mesa 17.1.4</a> is released.

View File

@@ -39,13 +39,7 @@ if you'd like to nominate a patch in the next stable release.
<th>Notes</th>
</tr>
<tr>
<td rowspan="4">17.1</td>
<td>2017-07-14</td>
<td>17.1.5</td>
<td>Andres Gomez</td>
<td></td>
</tr>
<tr>
<td rowspan="3">17.1</td>
<td>2017-07-28</td>
<td>17.1.6</td>
<td>Emil Velikov</td>

View File

@@ -24,7 +24,6 @@
<li><a href="#branch">Making a branchpoint</a>
<li><a href="#prerelease">Pre-release announcement</a>
<li><a href="#release">Making a new release</a>
<li><a href="#calendar">Update the calendar</a>
<li><a href="#announce">Announce the release</a>
<li><a href="#website">Update the mesa3d.org website</a>
<li><a href="#bugzilla">Update Bugzilla</a>
@@ -437,6 +436,8 @@ Here is one solution that I've been using.
chmod 755 -fR $__build_root; rm -rf $__build_root
mkdir -p $__build_root &amp;&amp; cd $__build_root
# For the distcheck, you may want to specify which LLVM to use:
# export LLVM_CONFIG=/usr/lib/llvm-3.9/bin/llvm-config
$__mesa_root/autogen.sh &amp;&amp; make -j2 distcheck
# Build check the tarballs (scons, linux)
@@ -445,18 +446,22 @@ Here is one solution that I've been using.
cd .. &amp;&amp; rm -rf mesa-$__version
# Build check the tarballs (scons, windows/mingw)
# You may need to unset LLVM if you set it before:
# unset LLVM_CONFIG
tar -xaf mesa-$__version.tar.xz &amp;&amp; cd mesa-$__version
scons platform=windows toolchain=crossmingw
cd .. &amp;&amp; rm -rf mesa-$__version
# Test the automake binaries
tar -xaf mesa-$__version.tar.xz &amp;&amp; cd mesa-$__version
# You may want to specify which LLVM to use:
./configure \
--with-dri-drivers=i965,swrast \
--with-gallium-drivers=swrast \
--with-vulkan-drivers=intel \
--enable-llvm-shared-libs \
--enable-llvm \
--with-llvm-prefix=/usr/lib/llvm-3.9 \
--enable-glx-tls \
--enable-gbm \
--enable-egl \
@@ -466,7 +471,8 @@ Here is one solution that I've been using.
__glxgears_cmd='glxgears 2>&amp;1 | grep -v "configuration file"'
__es2info_cmd='es2_info 2>&amp;1 | egrep "GL_VERSION|GL_RENDERER|.*dri\.so"'
__es2gears_cmd='es2gears_x11 2>&amp;1 | grep -v "configuration file"'
export LD_LIBRARY_PATH=`pwd`/test/usr/local/lib/
test "x$LD_LIBRARY_PATH" != 'x' &amp;&amp; __old_ld="$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH=`pwd`/test/usr/local/lib/:"${__old_ld}"
export LIBGL_DRIVERS_PATH=`pwd`/test/usr/local/lib/dri/
export LIBGL_DEBUG=verbose
eval $__glxinfo_cmd
@@ -486,6 +492,7 @@ Here is one solution that I've been using.
eval $__es2gears_cmd
# Smoke test DOTA2
unset LD_LIBRARY_PATH
test "x$__old_ld" != 'x' &amp;&amp; export LD_LIBRARY_PATH="$__old_ld" &amp;&amp; unset __old_ld
unset LIBGL_DRIVERS_PATH
unset LIBGL_DEBUG
unset LIBGL_ALWAYS_SOFTWARE
@@ -540,6 +547,8 @@ Start the release process.
</p>
<pre>
# For the dist/distcheck, you may want to specify which LLVM to use:
# export LLVM_CONFIG=/usr/lib/llvm-3.9/bin/llvm-config
../relative/path/to/release.sh . # append --dist if you've already done distcheck above
</pre>
@@ -566,23 +575,17 @@ Something like the following steps will do the trick:
</pre>
<p>
Also, edit docs/relnotes.html to add a link to the new release notes, and edit
docs/index.html to add a news entry. Then commit and push:
Also, edit docs/relnotes.html to add a link to the new release notes,
edit docs/index.html to add a news entry, and remove the version from
docs/release-calendar.html. Then commit and push:
</p>
<pre>
git commit -as -m "docs: add news item and link release notes for X.Y.Z"
git commit -as -m "docs: update calendar, add news item and link release notes for X.Y.Z"
git push origin master X.Y
</pre>
<h1 id="calendar">Update the calendar</h1>
<p>
Remove the version from the <a href="release-calendar.html" target="_parent">calendar</a>.
</p>
<h1 id="announce">Announce the release</h1>
<p>

View File

@@ -21,6 +21,7 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/17.1.5.html">17.1.5 release notes</a>
<li><a href="relnotes/17.1.4.html">17.1.4 release notes</a>
<li><a href="relnotes/17.1.3.html">17.1.3 release notes</a>
<li><a href="relnotes/17.1.2.html">17.1.2 release notes</a>

203
docs/relnotes/17.1.5.html Normal file
View File

@@ -0,0 +1,203 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 17.1.5 Release Notes / July 14, 2017</h1>
<p>
Mesa 17.1.5 is a bug fix release which fixes bugs found since the 17.1.4 release.
</p>
<p>
Mesa 17.1.5 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
7e3eeee8f9c28052796eb18133c2be12c38ba34864cc496382a2fa20c29b0317 mesa-17.1.5.tar.gz
378516b171712687aace4c7ea8b37c85895231d7a6d61e1e27362cf6034fded9 mesa-17.1.5.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100242">Bug 100242</a> - radeon buffer allocation failure during startup of Factorio</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101657">Bug 101657</a> - strtod.c:32:10: fatal error: xlocale.h: No such file or directory</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101666">Bug 101666</a> - bitfieldExtract is marked as a built-in function on OpenGL ES 3.0, but was added in OpenGL ES 3.1</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=101703">Bug 101703</a> - No stencil buffer allocated when requested by GLUT</li>
</ul>
<h2>Changes</h2>
<p>Aaron Watry (1):</p>
<ul>
<li>radeon/winsys: Limit max allocation size to 70% of VRAM</li>
</ul>
<p>Aleksander Morgado (2):</p>
<ul>
<li>etnaviv: fix refcnt initialization in etna_screen</li>
<li>etnaviv: don't dereference etna_resource pointer if allocation fails</li>
</ul>
<p>Alex Smith (2):</p>
<ul>
<li>ac/nir: Use correct LLVM intrinsics for atomic ops on imageBuffers</li>
<li>ac/nir: Fix ordering of parameters for image atomic cmpswap intrinsics</li>
</ul>
<p>Andres Gomez (3):</p>
<ul>
<li>docs: add sha256 checksums for 17.1.4</li>
<li>cherry-ignore: i965: Fix anisotropic filtering for mag filter</li>
<li>Update version to 17.1.5</li>
</ul>
<p>Anuj Phogat (2):</p>
<ul>
<li>intel/isl: Use uint64_t to store total surface size</li>
<li>intel/isl: Add the maximum surface size limit</li>
</ul>
<p>Brian Paul (3):</p>
<ul>
<li>draw: check for line_width != 1.0f in validate_pipeline()</li>
<li>svga: clamp device line width to at least 1 to fix HWv8 line stippling</li>
<li>svga: fix PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE value</li>
</ul>
<p>Bruce Cherniak (1):</p>
<ul>
<li>swr: Limit memory held by defer deleted resources.</li>
</ul>
<p>Chandu Babu N (1):</p>
<ul>
<li>st/va: Fix leak in VAAPI subpictures</li>
</ul>
<p>Charmaine Lee (1):</p>
<ul>
<li>svga: fixed surface size to include array size</li>
</ul>
<p>Connor Abbott (2):</p>
<ul>
<li>spirv: fix OpBitcast when the src and dst bitsize are different (v3)</li>
<li>ac/nir: implement 64-bit packing and unpacking</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>glsl: gl_Max{Vertex,Fragment}UniformComponents exist in all desktop GL versions</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>glsl: check if any of the named builtins are available first</li>
</ul>
<p>James Legg (2):</p>
<ul>
<li>ac/nir: Make intrinsic_name buffer long enough</li>
<li>spirv: Fix reaching unreachable for compare exchange on images</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>nir/spirv: Use the type from the deref for atomics</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>glsl: do not call link_xfb_stride_layout_qualifiers() for fragment shaders</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Use true AA line distance on G45/Ironlake.</li>
<li>i965: Always set AALINEDISTANCE_TRUE on Sandybridge.</li>
</ul>
<p>Lucas Stach (1):</p>
<ul>
<li>etnaviv: fix shader miscompilation with more than 16 labels</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>gallium/radeon: fix a possible crash for buffer exports</li>
</ul>
<p>Neha Bhende (1):</p>
<ul>
<li>svga: loop over box.depth for ReadBack_image on each slice</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>winsys/radeon: only call pb_slabs_reclaim when slabs are actually used</li>
</ul>
<p>Olivier Lauffenburger (1):</p>
<ul>
<li>st/wgl: improve selection of pixel format</li>
</ul>
<p>Philipp Zabel (1):</p>
<ul>
<li>st/mesa: release EGLImage on EGLImageTarget* error</li>
</ul>
<p>Plamena Manolova (1):</p>
<ul>
<li>mesa/main: Move NULL pointer check.</li>
</ul>
<p>Tim Rowley (2):</p>
<ul>
<li>swr/rast: _mm*_undefined_* implementations for gcc&lt;4.9</li>
<li>swr/rast: Correctly allocate SWR_STATS memory as cacheline aligned</li>
</ul>
<p>Tomasz Figa (1):</p>
<ul>
<li>intel: common: Fix link failure with standalone Android build</li>
</ul>
<p>Vinson Lee (1):</p>
<ul>
<li>scons: Check for xlocale.h before defining HAVE_XLOCALE_H.</li>
</ul>
</div>
</body>
</html>

View File

@@ -46,6 +46,8 @@ Note: some of the new features are only available with certain drivers.
<ul>
<li>GL_ARB_bindless_texture on radeonsi</li>
<li>GL_ARB_post_depth_coverage on nvc0 (GM200+)</li>
<li>GL_ARB_shader_ballot on i965/gen8+</li>
<li>GL_ARB_shader_group_vote on i965 (with a no-op vec4 implementation)</li>
<li>GL_ARB_shader_viewport_layer_array on nvc0 (GM200+)</li>
<li>GL_AMD_vertex_shader_layer on nvc0 (GM200+)</li>
<li>GL_AMD_vertex_shader_viewport_index on nvc0 (GM200+)</li>

View File

@@ -1049,6 +1049,12 @@ struct __DRIdri2LoaderExtensionRec {
*/
#define __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS 0x00000004
/**
* \requires __DRI2_NO_ERROR.
*
*/
#define __DRI_CTX_FLAG_NO_ERROR 0x00000008
/**
* \name Context reset strategies.
*/
@@ -1611,6 +1617,19 @@ struct __DRIrobustnessExtensionRec {
__DRIextension base;
};
/**
* No-error context driver extension.
*
* Existence of this extension means the driver can accept the
* __DRI_CTX_FLAG_NO_ERROR flag.
*/
#define __DRI2_NO_ERROR "DRI_NoError"
#define __DRI2_NO_ERROR_VERSION 1
typedef struct __DRInoErrorExtensionRec {
__DRIextension base;
} __DRInoErrorExtension;
/**
* DRI config options extension.
*

View File

@@ -50,9 +50,22 @@ extern "C" {
#ifndef GL_VERSION_ES_CM_1_0
#define GL_VERSION_ES_CM_1_0 1
/*
* XXX: Temporary fix; needs to be reverted as part of the next
* header update.
* For more details:
* https://github.com/KhronosGroup/OpenGL-Registry/pull/76
* https://lists.freedesktop.org/archives/mesa-dev/2017-June/161647.html
*/
#include <KHR/khrplatform.h>
typedef khronos_int8_t GLbyte;
typedef khronos_float_t GLclampf;
typedef short GLshort;
typedef unsigned short GLushort;
typedef void GLvoid;
typedef unsigned int GLenum;
#include <KHR/khrplatform.h>
typedef khronos_float_t GLfloat;
typedef khronos_int32_t GLfixed;
typedef unsigned int GLuint;

View File

@@ -104,7 +104,6 @@ GL_API void GL_APIENTRY glBlendEquationOES (GLenum mode);
#ifndef GL_OES_byte_coordinates
#define GL_OES_byte_coordinates 1
typedef khronos_int8_t GLbyte;
#endif /* GL_OES_byte_coordinates */
#ifndef GL_OES_compressed_ETC1_RGB8_sub_texture
@@ -128,7 +127,6 @@ typedef khronos_int8_t GLbyte;
#ifndef GL_OES_draw_texture
#define GL_OES_draw_texture 1
typedef short GLshort;
#define GL_TEXTURE_CROP_RECT_OES 0x8B9D
typedef void (GL_APIENTRYP PFNGLDRAWTEXSOESPROC) (GLshort x, GLshort y, GLshort z, GLshort width, GLshort height);
typedef void (GL_APIENTRYP PFNGLDRAWTEXIOESPROC) (GLint x, GLint y, GLint z, GLint width, GLint height);
@@ -409,7 +407,6 @@ GL_API GLbitfield GL_APIENTRY glQueryMatrixxOES (GLfixed *mantissa, GLint *expon
#ifndef GL_OES_single_precision
#define GL_OES_single_precision 1
typedef khronos_float_t GLclampf;
typedef void (GL_APIENTRYP PFNGLCLEARDEPTHFOESPROC) (GLclampf depth);
typedef void (GL_APIENTRYP PFNGLCLIPPLANEFOESPROC) (GLenum plane, const GLfloat *equation);
typedef void (GL_APIENTRYP PFNGLDEPTHRANGEFOESPROC) (GLclampf n, GLclampf f);

318
include/drm-uapi/vc4_drm.h Normal file
View File

@@ -0,0 +1,318 @@
/*
* Copyright © 2014-2015 Broadcom
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#ifndef _VC4_DRM_H_
#define _VC4_DRM_H_
#include "drm.h"
#if defined(__cplusplus)
extern "C" {
#endif
#define DRM_VC4_SUBMIT_CL 0x00
#define DRM_VC4_WAIT_SEQNO 0x01
#define DRM_VC4_WAIT_BO 0x02
#define DRM_VC4_CREATE_BO 0x03
#define DRM_VC4_MMAP_BO 0x04
#define DRM_VC4_CREATE_SHADER_BO 0x05
#define DRM_VC4_GET_HANG_STATE 0x06
#define DRM_VC4_GET_PARAM 0x07
#define DRM_VC4_SET_TILING 0x08
#define DRM_VC4_GET_TILING 0x09
#define DRM_IOCTL_VC4_SUBMIT_CL DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_SUBMIT_CL, struct drm_vc4_submit_cl)
#define DRM_IOCTL_VC4_WAIT_SEQNO DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_WAIT_SEQNO, struct drm_vc4_wait_seqno)
#define DRM_IOCTL_VC4_WAIT_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_WAIT_BO, struct drm_vc4_wait_bo)
#define DRM_IOCTL_VC4_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_CREATE_BO, struct drm_vc4_create_bo)
#define DRM_IOCTL_VC4_MMAP_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_MMAP_BO, struct drm_vc4_mmap_bo)
#define DRM_IOCTL_VC4_CREATE_SHADER_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_CREATE_SHADER_BO, struct drm_vc4_create_shader_bo)
#define DRM_IOCTL_VC4_GET_HANG_STATE DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_GET_HANG_STATE, struct drm_vc4_get_hang_state)
#define DRM_IOCTL_VC4_GET_PARAM DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_GET_PARAM, struct drm_vc4_get_param)
#define DRM_IOCTL_VC4_SET_TILING DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_SET_TILING, struct drm_vc4_set_tiling)
#define DRM_IOCTL_VC4_GET_TILING DRM_IOWR(DRM_COMMAND_BASE + DRM_VC4_GET_TILING, struct drm_vc4_get_tiling)
struct drm_vc4_submit_rcl_surface {
__u32 hindex; /* Handle index, or ~0 if not present. */
__u32 offset; /* Offset to start of buffer. */
/*
* Bits for either render config (color_write) or load/store packet.
* Bits should all be 0 for MSAA load/stores.
*/
__u16 bits;
#define VC4_SUBMIT_RCL_SURFACE_READ_IS_FULL_RES (1 << 0)
__u16 flags;
};
/**
* struct drm_vc4_submit_cl - ioctl argument for submitting commands to the 3D
* engine.
*
* Drivers typically use GPU BOs to store batchbuffers / command lists and
* their associated state. However, because the VC4 lacks an MMU, we have to
* do validation of memory accesses by the GPU commands. If we were to store
* our commands in BOs, we'd need to do uncached readback from them to do the
* validation process, which is too expensive. Instead, userspace accumulates
* commands and associated state in plain memory, then the kernel copies the
* data to its own address space, and then validates and stores it in a GPU
* BO.
*/
struct drm_vc4_submit_cl {
/* Pointer to the binner command list.
*
* This is the first set of commands executed, which runs the
* coordinate shader to determine where primitives land on the screen,
* then writes out the state updates and draw calls necessary per tile
* to the tile allocation BO.
*/
__u64 bin_cl;
/* Pointer to the shader records.
*
* Shader records are the structures read by the hardware that contain
* pointers to uniforms, shaders, and vertex attributes. The
* reference to the shader record has enough information to determine
* how many pointers are necessary (fixed number for shaders/uniforms,
* and an attribute count), so those BO indices into bo_handles are
* just stored as __u32s before each shader record passed in.
*/
__u64 shader_rec;
/* Pointer to uniform data and texture handles for the textures
* referenced by the shader.
*
* For each shader state record, there is a set of uniform data in the
* order referenced by the record (FS, VS, then CS). Each set of
* uniform data has a __u32 index into bo_handles per texture
* sample operation, in the order the QPU_W_TMUn_S writes appear in
* the program. Following the texture BO handle indices is the actual
* uniform data.
*
* The individual uniform state blocks don't have sizes passed in,
* because the kernel has to determine the sizes anyway during shader
* code validation.
*/
__u64 uniforms;
__u64 bo_handles;
/* Size in bytes of the binner command list. */
__u32 bin_cl_size;
/* Size in bytes of the set of shader records. */
__u32 shader_rec_size;
/* Number of shader records.
*
* This could just be computed from the contents of shader_records and
* the address bits of references to them from the bin CL, but it
* keeps the kernel from having to resize some allocations it makes.
*/
__u32 shader_rec_count;
/* Size in bytes of the uniform state. */
__u32 uniforms_size;
/* Number of BO handles passed in (size is that times 4). */
__u32 bo_handle_count;
/* RCL setup: */
__u16 width;
__u16 height;
__u8 min_x_tile;
__u8 min_y_tile;
__u8 max_x_tile;
__u8 max_y_tile;
struct drm_vc4_submit_rcl_surface color_read;
struct drm_vc4_submit_rcl_surface color_write;
struct drm_vc4_submit_rcl_surface zs_read;
struct drm_vc4_submit_rcl_surface zs_write;
struct drm_vc4_submit_rcl_surface msaa_color_write;
struct drm_vc4_submit_rcl_surface msaa_zs_write;
__u32 clear_color[2];
__u32 clear_z;
__u8 clear_s;
__u32 pad:24;
#define VC4_SUBMIT_CL_USE_CLEAR_COLOR (1 << 0)
__u32 flags;
/* Returned value of the seqno of this render job (for the
* wait ioctl).
*/
__u64 seqno;
};
/**
* struct drm_vc4_wait_seqno - ioctl argument for waiting for
* DRM_VC4_SUBMIT_CL completion using its returned seqno.
*
* timeout_ns is the timeout in nanoseconds, where "0" means "don't
* block, just return the status."
*/
struct drm_vc4_wait_seqno {
__u64 seqno;
__u64 timeout_ns;
};
/**
* struct drm_vc4_wait_bo - ioctl argument for waiting for
* completion of the last DRM_VC4_SUBMIT_CL on a BO.
*
* This is useful for cases where multiple processes might be
* rendering to a BO and you want to wait for all rendering to be
* completed.
*/
struct drm_vc4_wait_bo {
__u32 handle;
__u32 pad;
__u64 timeout_ns;
};
/**
* struct drm_vc4_create_bo - ioctl argument for creating VC4 BOs.
*
* There are currently no values for the flags argument, but it may be
* used in a future extension.
*/
struct drm_vc4_create_bo {
__u32 size;
__u32 flags;
/** Returned GEM handle for the BO. */
__u32 handle;
__u32 pad;
};
/**
* struct drm_vc4_mmap_bo - ioctl argument for mapping VC4 BOs.
*
* This doesn't actually perform an mmap. Instead, it returns the
* offset you need to use in an mmap on the DRM device node. This
* means that tools like valgrind end up knowing about the mapped
* memory.
*
* There are currently no values for the flags argument, but it may be
* used in a future extension.
*/
struct drm_vc4_mmap_bo {
/** Handle for the object being mapped. */
__u32 handle;
__u32 flags;
/** offset into the drm node to use for subsequent mmap call. */
__u64 offset;
};
/**
* struct drm_vc4_create_shader_bo - ioctl argument for creating VC4
* shader BOs.
*
* Since allowing a shader to be overwritten while it's also being
* executed from would allow privlege escalation, shaders must be
* created using this ioctl, and they can't be mmapped later.
*/
struct drm_vc4_create_shader_bo {
/* Size of the data argument. */
__u32 size;
/* Flags, currently must be 0. */
__u32 flags;
/* Pointer to the data. */
__u64 data;
/** Returned GEM handle for the BO. */
__u32 handle;
/* Pad, must be 0. */
__u32 pad;
};
struct drm_vc4_get_hang_state_bo {
__u32 handle;
__u32 paddr;
__u32 size;
__u32 pad;
};
/**
* struct drm_vc4_hang_state - ioctl argument for collecting state
* from a GPU hang for analysis.
*/
struct drm_vc4_get_hang_state {
/** Pointer to array of struct drm_vc4_get_hang_state_bo. */
__u64 bo;
/**
* On input, the size of the bo array. Output is the number
* of bos to be returned.
*/
__u32 bo_count;
__u32 start_bin, start_render;
__u32 ct0ca, ct0ea;
__u32 ct1ca, ct1ea;
__u32 ct0cs, ct1cs;
__u32 ct0ra0, ct1ra0;
__u32 bpca, bpcs;
__u32 bpoa, bpos;
__u32 vpmbase;
__u32 dbge;
__u32 fdbgo;
__u32 fdbgb;
__u32 fdbgr;
__u32 fdbgs;
__u32 errstat;
/* Pad that we may save more registers into in the future. */
__u32 pad[16];
};
#define DRM_VC4_PARAM_V3D_IDENT0 0
#define DRM_VC4_PARAM_V3D_IDENT1 1
#define DRM_VC4_PARAM_V3D_IDENT2 2
#define DRM_VC4_PARAM_SUPPORTS_BRANCHES 3
#define DRM_VC4_PARAM_SUPPORTS_ETC1 4
#define DRM_VC4_PARAM_SUPPORTS_THREADED_FS 5
struct drm_vc4_get_param {
__u32 param;
__u32 pad;
__u64 value;
};
struct drm_vc4_get_tiling {
__u32 handle;
__u32 flags;
__u64 modifier;
};
struct drm_vc4_set_tiling {
__u32 handle;
__u32 flags;
__u64 modifier;
};
#if defined(__cplusplus)
}
#endif
#endif /* _VC4_DRM_H_ */

File diff suppressed because it is too large Load Diff

View File

@@ -145,6 +145,17 @@ def check_cc(env, cc, expr, cpp_opt = '-E'):
sys.stdout.write(' %s\n' % ['no', 'yes'][int(bool(result))])
return result
def check_header(env, header):
'''Check if the header exist'''
conf = SCons.Script.Configure(env)
have_header = False
if conf.CheckHeader(header):
have_header = True
env = conf.Finish()
return have_header
def check_prog(env, prog):
"""Check whether this program exists."""
@@ -325,10 +336,8 @@ def generate(env):
'GLX_INDIRECT_RENDERING',
]
conf = SCons.Script.Configure(env)
if conf.CheckHeader('xlocale.h'):
if check_header(env, 'xlocale.h'):
cppdefines += ['HAVE_XLOCALE_H']
env = conf.Finish()
if platform == 'windows':
cppdefines += [

View File

@@ -1054,7 +1054,7 @@ ADDR_E_RETURNCODE ADDR_API AddrComputePrtInfo(
*/
ADDR_E_RETURNCODE ADDR_API AddrGetMaxAlignments(
ADDR_HANDLE hLib, ///< address lib handle
ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut) ///< [out] output structure
ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) ///< [out] output structure
{
Addr::Lib* pLib = Lib::GetLib(hLib);

View File

@@ -2295,17 +2295,17 @@ ADDR_E_RETURNCODE ADDR_API AddrComputeDccInfo(
/**
****************************************************************************************************
* ADDR_GET_MAX_ALINGMENTS_OUTPUT
* ADDR_GET_MAX_ALIGNMENTS_OUTPUT
*
* @brief
* Output structure of AddrGetMaxAlignments
****************************************************************************************************
*/
typedef struct _ADDR_GET_MAX_ALINGMENTS_OUTPUT
typedef struct _ADDR_GET_MAX_ALIGNMENTS_OUTPUT
{
UINT_32 size; ///< Size of this structure in bytes
UINT_64 baseAlign; ///< Maximum base alignment in bytes
} ADDR_GET_MAX_ALINGMENTS_OUTPUT;
} ADDR_GET_MAX_ALIGNMENTS_OUTPUT;
/**
****************************************************************************************************
@@ -2317,7 +2317,7 @@ typedef struct _ADDR_GET_MAX_ALINGMENTS_OUTPUT
*/
ADDR_E_RETURNCODE ADDR_API AddrGetMaxAlignments(
ADDR_HANDLE hLib,
ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut);
ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut);

View File

@@ -356,14 +356,14 @@ Lib* Lib::GetLib(
****************************************************************************************************
*/
ADDR_E_RETURNCODE Lib::GetMaxAlignments(
ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut ///< [out] output structure
ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut ///< [out] output structure
) const
{
ADDR_E_RETURNCODE returnCode = ADDR_OK;
if (GetFillSizeFieldsFlags() == TRUE)
{
if (pOut->size != sizeof(ADDR_GET_MAX_ALINGMENTS_OUTPUT))
if (pOut->size != sizeof(ADDR_GET_MAX_ALIGNMENTS_OUTPUT))
{
returnCode = ADDR_PARAMSIZEMISMATCH;
}

View File

@@ -169,14 +169,14 @@ public:
BOOL_32 GetExportNorm(const ELEM_GETEXPORTNORM_INPUT* pIn) const;
ADDR_E_RETURNCODE GetMaxAlignments(ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut) const;
ADDR_E_RETURNCODE GetMaxAlignments(ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) const;
protected:
Lib(); // Constructor is protected
Lib(const Client* pClient);
/// Pure virtual function to get max alignments
virtual ADDR_E_RETURNCODE HwlGetMaxAlignments(ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut) const = 0;
virtual ADDR_E_RETURNCODE HwlGetMaxAlignments(ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) const = 0;
//
// Initialization

View File

@@ -663,7 +663,7 @@ ADDR_E_RETURNCODE Gfx9Lib::HwlComputeDccInfo(
************************************************************************************************************************
*/
ADDR_E_RETURNCODE Gfx9Lib::HwlGetMaxAlignments(
ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut ///< [out] output structure
ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut ///< [out] output structure
) const
{
pOut->baseAlign = HwlComputeSurfaceBaseAlign(ADDR_SW_64KB);

View File

@@ -374,7 +374,7 @@ protected:
private:
virtual ADDR_E_RETURNCODE HwlGetMaxAlignments(
ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut) const;
ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) const;
virtual BOOL_32 HwlInitGlobalParams(
const ADDR_CREATE_INPUT* pCreateIn);

View File

@@ -2177,7 +2177,7 @@ VOID CiLib::HwlPadDimensions(
****************************************************************************************************
*/
ADDR_E_RETURNCODE CiLib::HwlGetMaxAlignments(
ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut ///< [out] output structure
ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut ///< [out] output structure
) const
{
const UINT_32 pipes = HwlGetPipes(&m_tileTable[0].info);

View File

@@ -168,7 +168,7 @@ protected:
const ADDR_COMPUTE_HTILE_ADDRFROMCOORD_INPUT* pIn,
ADDR_COMPUTE_HTILE_ADDRFROMCOORD_OUTPUT* pOut) const;
virtual ADDR_E_RETURNCODE HwlGetMaxAlignments(ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut) const;
virtual ADDR_E_RETURNCODE HwlGetMaxAlignments(ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) const;
virtual VOID HwlPadDimensions(
AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags,

View File

@@ -3483,7 +3483,7 @@ VOID SiLib::HwlSelectTileMode(
****************************************************************************************************
*/
ADDR_E_RETURNCODE SiLib::HwlGetMaxAlignments(
ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut ///< [out] output structure
ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut ///< [out] output structure
) const
{
const UINT_32 pipes = HwlGetPipes(&m_tileTable[0].info);

View File

@@ -245,7 +245,7 @@ protected:
return TRUE;
}
virtual ADDR_E_RETURNCODE HwlGetMaxAlignments(ADDR_GET_MAX_ALINGMENTS_OUTPUT* pOut) const;
virtual ADDR_E_RETURNCODE HwlGetMaxAlignments(ADDR_GET_MAX_ALIGNMENTS_OUTPUT* pOut) const;
virtual VOID HwlComputeSurfaceAlignmentsMacroTiled(
AddrTileMode tileMode, UINT_32 bpp, ADDR_SURFACE_FLAGS flags,

View File

@@ -84,6 +84,14 @@ static unsigned cik_get_num_tile_pipes(struct amdgpu_gpu_info *info)
}
}
static bool has_syncobj(int fd)
{
uint64_t value;
if (drmGetCap(fd, DRM_CAP_SYNCOBJ, &value))
return false;
return value ? true : false;
}
bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
struct radeon_info *info,
struct amdgpu_gpu_info *amdinfo)
@@ -258,8 +266,13 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
info->vce_fw_version =
vce.available_rings ? vce_version : 0;
info->has_userptr = true;
info->has_syncobj = has_syncobj(fd);
info->num_render_backends = amdinfo->rb_pipes;
info->clock_crystal_freq = amdinfo->gpu_counter_freq;
if (!info->clock_crystal_freq) {
fprintf(stderr, "amdgpu: clock crystal frequency is 0, timestamps will be wrong\n");
info->clock_crystal_freq = 1;
}
info->tcc_cache_line_size = 64; /* TC L2 line size on GCN */
if (info->chip_class == GFX9) {
info->num_tile_pipes = 1 << G_0098F8_NUM_PIPES(amdinfo->gb_addr_cfg);

View File

@@ -76,6 +76,7 @@ struct radeon_info {
uint32_t drm_minor;
uint32_t drm_patchlevel;
bool has_userptr;
bool has_syncobj;
/* Shader cores. */
uint32_t r600_max_quad_pipes; /* wave size / 16 */

View File

@@ -795,21 +795,21 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
bool has_ds_bpermute,
uint32_t mask,
int idx,
LLVMValueRef lds,
LLVMValueRef val)
{
LLVMValueRef thread_id, tl, trbl, tl_tid, trbl_tid, args[2];
LLVMValueRef tl, trbl, args[2];
LLVMValueRef result;
thread_id = ac_get_thread_id(ctx);
tl_tid = LLVMBuildAnd(ctx->builder, thread_id,
LLVMConstInt(ctx->i32, mask, false), "");
trbl_tid = LLVMBuildAdd(ctx->builder, tl_tid,
LLVMConstInt(ctx->i32, idx, false), "");
if (has_ds_bpermute) {
LLVMValueRef thread_id, tl_tid, trbl_tid;
thread_id = ac_get_thread_id(ctx);
tl_tid = LLVMBuildAnd(ctx->builder, thread_id,
LLVMConstInt(ctx->i32, mask, false), "");
trbl_tid = LLVMBuildAdd(ctx->builder, tl_tid,
LLVMConstInt(ctx->i32, idx, false), "");
args[0] = LLVMBuildMul(ctx->builder, tl_tid,
LLVMConstInt(ctx->i32, 4, false), "");
args[1] = val;
@@ -827,15 +827,42 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_CONVERGENT);
} else {
LLVMValueRef store_ptr, load_ptr0, load_ptr1;
uint32_t masks[2];
store_ptr = ac_build_gep0(ctx, lds, thread_id);
load_ptr0 = ac_build_gep0(ctx, lds, tl_tid);
load_ptr1 = ac_build_gep0(ctx, lds, trbl_tid);
switch (mask) {
case AC_TID_MASK_TOP_LEFT:
masks[0] = 0x8000;
if (idx == 1)
masks[1] = 0x8055;
else
masks[1] = 0x80aa;
LLVMBuildStore(ctx->builder, val, store_ptr);
tl = LLVMBuildLoad(ctx->builder, load_ptr0, "");
trbl = LLVMBuildLoad(ctx->builder, load_ptr1, "");
break;
case AC_TID_MASK_TOP:
masks[0] = 0x8044;
masks[1] = 0x80ee;
break;
case AC_TID_MASK_LEFT:
masks[0] = 0x80a0;
masks[1] = 0x80f5;
break;
}
args[0] = val;
args[1] = LLVMConstInt(ctx->i32, masks[0], false);
tl = ac_build_intrinsic(ctx,
"llvm.amdgcn.ds.swizzle", ctx->i32,
args, 2,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_CONVERGENT);
args[1] = LLVMConstInt(ctx->i32, masks[1], false);
trbl = ac_build_intrinsic(ctx,
"llvm.amdgcn.ds.swizzle", ctx->i32,
args, 2,
AC_FUNC_ATTR_READNONE |
AC_FUNC_ATTR_CONVERGENT);
}
tl = LLVMBuildBitCast(ctx->builder, tl, ctx->f32, "");

View File

@@ -173,7 +173,6 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
bool has_ds_bpermute,
uint32_t mask,
int idx,
LLVMValueRef lds,
LLVMValueRef val);
#define AC_SENDMSG_GS 2

View File

@@ -40,21 +40,23 @@ static void ac_init_llvm_target()
LLVMInitializeAMDGPUTargetMC();
LLVMInitializeAMDGPUAsmPrinter();
/*
* Workaround for bug in llvm 4.0 that causes image intrinsics
/* For inline assembly. */
LLVMInitializeAMDGPUAsmParser();
/* Workaround for bug in llvm 4.0 that causes image intrinsics
* to disappear.
* https://reviews.llvm.org/D26348
*/
#if HAVE_LLVM >= 0x0400
const char *argv[2] = {"mesa", "-simplifycfg-sink-common=false"};
LLVMParseCommandLineOptions(2, argv, NULL);
#endif
if (HAVE_LLVM >= 0x0400) {
/* "mesa" is the prefix for error messages */
const char *argv[2] = { "mesa", "-simplifycfg-sink-common=false" };
LLVMParseCommandLineOptions(2, argv, NULL);
}
}
static once_flag ac_init_llvm_target_once_flag = ONCE_FLAG_INIT;
static LLVMTargetRef ac_get_llvm_target(const char *triple)
LLVMTargetRef ac_get_llvm_target(const char *triple)
{
LLVMTargetRef target = NULL;
char *err_message = NULL;

View File

@@ -60,6 +60,7 @@ enum ac_target_machine_options {
};
LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family, enum ac_target_machine_options tm_options);
LLVMTargetRef ac_get_llvm_target(const char *triple);
void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes);
bool ac_is_sgpr_param(LLVMValueRef param);
void ac_add_function_attr(LLVMContextRef ctx, LLVMValueRef function,

View File

@@ -65,6 +65,7 @@ struct nir_to_llvm_context {
struct hash_table *defs;
struct hash_table *phis;
struct hash_table *vars;
LLVMValueRef descriptor_sets[AC_UD_MAX_SETS];
LLVMValueRef ring_offsets;
@@ -154,7 +155,6 @@ struct nir_to_llvm_context {
LLVMValueRef inputs[RADEON_LLVM_MAX_INPUTS * 4];
LLVMValueRef outputs[RADEON_LLVM_MAX_OUTPUTS * 4];
LLVMValueRef shared_memory;
uint64_t input_mask;
uint64_t output_mask;
int num_locals;
@@ -387,23 +387,6 @@ static LLVMTypeRef const_array(LLVMTypeRef elem_type, int num_elements)
CONST_ADDR_SPACE);
}
static LLVMValueRef get_shared_memory_ptr(struct nir_to_llvm_context *ctx,
int idx,
LLVMTypeRef type)
{
LLVMValueRef offset;
LLVMValueRef ptr;
int addr_space;
offset = LLVMConstInt(ctx->i32, idx * 16, false);
ptr = ctx->shared_memory;
ptr = LLVMBuildGEP(ctx->builder, ptr, &offset, 1, "");
addr_space = LLVMGetPointerAddressSpace(LLVMTypeOf(ptr));
ptr = LLVMBuildBitCast(ctx->builder, ptr, LLVMPointerType(type, addr_space), "");
return ptr;
}
static LLVMTypeRef to_integer_type_scalar(struct ac_llvm_context *ctx, LLVMTypeRef t)
{
if (t == ctx->f16 || t == ctx->i16)
@@ -1195,7 +1178,17 @@ static LLVMValueRef emit_find_lsb(struct ac_llvm_context *ctx,
*/
LLVMConstInt(ctx->i1, 1, false),
};
return ac_build_intrinsic(ctx, "llvm.cttz.i32", ctx->i32, params, 2, AC_FUNC_ATTR_READNONE);
LLVMValueRef lsb = ac_build_intrinsic(ctx, "llvm.cttz.i32", ctx->i32,
params, 2,
AC_FUNC_ATTR_READNONE);
/* TODO: We need an intrinsic to skip this conditional. */
/* Check for zero: */
return LLVMBuildSelect(ctx->builder, LLVMBuildICmp(ctx->builder,
LLVMIntEQ, src0,
ctx->i32_0, ""),
LLVMConstInt(ctx->i32, -1, 0), lsb, "");
}
static LLVMValueRef emit_ifind_msb(struct ac_llvm_context *ctx,
@@ -1460,11 +1453,6 @@ static LLVMValueRef emit_ddxy(struct nir_to_llvm_context *ctx,
int idx;
LLVMValueRef result;
if (!ctx->lds && !ctx->has_ds_bpermute)
ctx->lds = LLVMAddGlobalInAddressSpace(ctx->module,
LLVMArrayType(ctx->i32, 64),
"ddxy_lds", LOCAL_ADDR_SPACE);
if (op == nir_op_fddx_fine || op == nir_op_fddx)
mask = AC_TID_MASK_LEFT;
else if (op == nir_op_fddy_fine || op == nir_op_fddy)
@@ -1481,7 +1469,7 @@ static LLVMValueRef emit_ddxy(struct nir_to_llvm_context *ctx,
idx = 2;
result = ac_build_ddxy(&ctx->ac, ctx->has_ds_bpermute,
mask, idx, ctx->lds,
mask, idx,
src0);
return result;
}
@@ -2905,6 +2893,45 @@ load_gs_input(struct nir_to_llvm_context *ctx,
return result;
}
static LLVMValueRef
build_gep_for_deref(struct nir_to_llvm_context *ctx,
nir_deref_var *deref)
{
struct hash_entry *entry = _mesa_hash_table_search(ctx->vars, deref->var);
assert(entry->data);
LLVMValueRef val = entry->data;
nir_deref *tail = deref->deref.child;
while (tail != NULL) {
LLVMValueRef offset;
switch (tail->deref_type) {
case nir_deref_type_array: {
nir_deref_array *array = nir_deref_as_array(tail);
offset = LLVMConstInt(ctx->i32, array->base_offset, 0);
if (array->deref_array_type ==
nir_deref_array_type_indirect) {
offset = LLVMBuildAdd(ctx->builder, offset,
get_src(ctx,
array->indirect),
"");
}
break;
}
case nir_deref_type_struct: {
nir_deref_struct *deref_struct =
nir_deref_as_struct(tail);
offset = LLVMConstInt(ctx->i32,
deref_struct->index, 0);
break;
}
default:
unreachable("bad deref type");
}
val = ac_build_gep0(&ctx->ac, val, offset);
tail = tail->child;
}
return val;
}
static LLVMValueRef visit_load_var(struct nir_to_llvm_context *ctx,
nir_intrinsic_instr *instr)
{
@@ -2966,6 +2993,14 @@ static LLVMValueRef visit_load_var(struct nir_to_llvm_context *ctx,
}
}
break;
case nir_var_shared: {
LLVMValueRef address = build_gep_for_deref(ctx,
instr->variables[0]);
LLVMValueRef val = LLVMBuildLoad(ctx->builder, address, "");
return LLVMBuildBitCast(ctx->builder, val,
get_def_type(ctx, &instr->dest.ssa),
"");
}
case nir_var_shader_out:
if (ctx->stage == MESA_SHADER_TESS_CTRL)
return load_tcs_output(ctx, instr);
@@ -2988,23 +3023,6 @@ static LLVMValueRef visit_load_var(struct nir_to_llvm_context *ctx,
}
}
break;
case nir_var_shared: {
LLVMValueRef ptr = get_shared_memory_ptr(ctx, idx, ctx->i32);
LLVMValueRef derived_ptr;
if (indir_index)
indir_index = LLVMBuildMul(ctx->builder, indir_index, LLVMConstInt(ctx->i32, 4, false), "");
for (unsigned chan = 0; chan < ve; chan++) {
LLVMValueRef index = LLVMConstInt(ctx->i32, chan, false);
if (indir_index)
index = LLVMBuildAdd(ctx->builder, index, indir_index, "");
derived_ptr = LLVMBuildGEP(ctx->builder, ptr, &index, 1, "");
values[chan] = LLVMBuildLoad(ctx->builder, derived_ptr, "");
}
break;
}
default:
unreachable("unhandle variable mode");
}
@@ -3105,24 +3123,32 @@ visit_store_var(struct nir_to_llvm_context *ctx,
}
break;
case nir_var_shared: {
LLVMValueRef ptr = get_shared_memory_ptr(ctx, idx, ctx->i32);
if (indir_index)
indir_index = LLVMBuildMul(ctx->builder, indir_index, LLVMConstInt(ctx->i32, 4, false), "");
for (unsigned chan = 0; chan < 8; chan++) {
if (!(writemask & (1 << chan)))
continue;
LLVMValueRef index = LLVMConstInt(ctx->i32, chan, false);
LLVMValueRef derived_ptr;
if (indir_index)
index = LLVMBuildAdd(ctx->builder, index, indir_index, "");
value = llvm_extract_elem(ctx, src, chan);
derived_ptr = LLVMBuildGEP(ctx->builder, ptr, &index, 1, "");
LLVMBuildStore(ctx->builder,
to_integer(&ctx->ac, value), derived_ptr);
int writemask = instr->const_index[0];
LLVMValueRef address = build_gep_for_deref(ctx,
instr->variables[0]);
LLVMValueRef val = get_src(ctx, instr->src[0]);
unsigned components =
glsl_get_vector_elements(
nir_deref_tail(&instr->variables[0]->deref)->type);
if (writemask == (1 << components) - 1) {
val = LLVMBuildBitCast(
ctx->builder, val,
LLVMGetElementType(LLVMTypeOf(address)), "");
LLVMBuildStore(ctx->builder, val, address);
} else {
for (unsigned chan = 0; chan < 4; chan++) {
if (!(writemask & (1 << chan)))
continue;
LLVMValueRef ptr =
LLVMBuildStructGEP(ctx->builder,
address, chan, "");
LLVMValueRef src = llvm_extract_elem(ctx, val,
chan);
src = LLVMBuildBitCast(
ctx->builder, src,
LLVMGetElementType(LLVMTypeOf(ptr)), "");
LLVMBuildStore(ctx->builder, src, ptr);
}
}
break;
}
@@ -3379,7 +3405,10 @@ static void visit_image_store(struct nir_to_llvm_context *ctx,
char intrinsic_name[64];
const nir_variable *var = instr->variables[0]->var;
const struct glsl_type *type = glsl_without_array(var->type);
LLVMValueRef glc = ctx->i1false;
bool force_glc = ctx->options->chip_class == SI;
if (force_glc)
glc = ctx->i1true;
if (ctx->stage == MESA_SHADER_FRAGMENT)
ctx->shader_info->fs.writes_memory = true;
@@ -3389,7 +3418,7 @@ static void visit_image_store(struct nir_to_llvm_context *ctx,
params[2] = LLVMBuildExtractElement(ctx->builder, get_src(ctx, instr->src[0]),
LLVMConstInt(ctx->i32, 0, false), ""); /* vindex */
params[3] = LLVMConstInt(ctx->i32, 0, false); /* voffset */
params[4] = ctx->i1false; /* glc */
params[4] = glc; /* glc */
params[5] = ctx->i1false; /* slc */
ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.buffer.store.format.v4f32", ctx->voidt,
params, 6, 0);
@@ -3397,7 +3426,6 @@ static void visit_image_store(struct nir_to_llvm_context *ctx,
bool is_da = glsl_sampler_type_is_array(type) ||
glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_CUBE;
LLVMValueRef da = is_da ? ctx->i1true : ctx->i1false;
LLVMValueRef glc = ctx->i1false;
LLVMValueRef slc = ctx->i1false;
params[0] = to_float(&ctx->ac, get_src(ctx, instr->src[2]));
@@ -3604,9 +3632,8 @@ static LLVMValueRef visit_var_atomic(struct nir_to_llvm_context *ctx,
const nir_intrinsic_instr *instr)
{
LLVMValueRef ptr, result;
int idx = instr->variables[0]->var->data.driver_location;
LLVMValueRef src = get_src(ctx, instr->src[0]);
ptr = get_shared_memory_ptr(ctx, idx, ctx->i32);
ptr = build_gep_for_deref(ctx, instr->variables[0]);
if (instr->intrinsic == nir_intrinsic_var_atomic_comp_swap) {
LLVMValueRef src1 = get_src(ctx, instr->src[1]);
@@ -5005,6 +5032,68 @@ handle_shader_output_decl(struct nir_to_llvm_context *ctx,
ctx->output_mask |= mask_attribs;
}
static LLVMTypeRef
glsl_base_to_llvm_type(struct nir_to_llvm_context *ctx,
enum glsl_base_type type)
{
switch (type) {
case GLSL_TYPE_INT:
case GLSL_TYPE_UINT:
case GLSL_TYPE_BOOL:
case GLSL_TYPE_SUBROUTINE:
return ctx->i32;
case GLSL_TYPE_FLOAT: /* TODO handle mediump */
return ctx->f32;
case GLSL_TYPE_INT64:
case GLSL_TYPE_UINT64:
return ctx->i64;
case GLSL_TYPE_DOUBLE:
return ctx->f64;
default:
unreachable("unknown GLSL type");
}
}
static LLVMTypeRef
glsl_to_llvm_type(struct nir_to_llvm_context *ctx,
const struct glsl_type *type)
{
if (glsl_type_is_scalar(type)) {
return glsl_base_to_llvm_type(ctx, glsl_get_base_type(type));
}
if (glsl_type_is_vector(type)) {
return LLVMVectorType(
glsl_base_to_llvm_type(ctx, glsl_get_base_type(type)),
glsl_get_vector_elements(type));
}
if (glsl_type_is_matrix(type)) {
return LLVMArrayType(
glsl_to_llvm_type(ctx, glsl_get_column_type(type)),
glsl_get_matrix_columns(type));
}
if (glsl_type_is_array(type)) {
return LLVMArrayType(
glsl_to_llvm_type(ctx, glsl_get_array_element(type)),
glsl_get_length(type));
}
assert(glsl_type_is_struct(type));
LLVMTypeRef member_types[glsl_get_length(type)];
for (unsigned i = 0; i < glsl_get_length(type); i++) {
member_types[i] =
glsl_to_llvm_type(ctx,
glsl_get_struct_field(type, i));
}
return LLVMStructTypeInContext(ctx->context, member_types,
glsl_get_length(type), false);
}
static void
setup_locals(struct nir_to_llvm_context *ctx,
struct nir_function *func)
@@ -5028,6 +5117,20 @@ setup_locals(struct nir_to_llvm_context *ctx,
}
}
static void
setup_shared(struct nir_to_llvm_context *ctx,
struct nir_shader *nir)
{
nir_foreach_variable(variable, &nir->shared) {
LLVMValueRef shared =
LLVMAddGlobalInAddressSpace(
ctx->module, glsl_to_llvm_type(ctx, variable->type),
variable->name ? variable->name : "",
LOCAL_ADDR_SPACE);
_mesa_hash_table_insert(ctx->vars, variable, shared);
}
}
static LLVMValueRef
emit_float_saturate(struct ac_llvm_context *ctx, LLVMValueRef v, float lo, float hi)
{
@@ -5082,6 +5185,7 @@ si_llvm_init_export_args(struct nir_to_llvm_context *ctx,
unsigned index = target - V_008DFC_SQ_EXP_MRT;
unsigned col_format = (ctx->options->key.fs.col_format >> (4 * index)) & 0xf;
bool is_int8 = (ctx->options->key.fs.is_int8 >> index) & 1;
bool is_int10 = (ctx->options->key.fs.is_int10 >> index) & 1;
switch(col_format) {
case V_028714_SPI_SHADER_ZERO:
@@ -5159,11 +5263,13 @@ si_llvm_init_export_args(struct nir_to_llvm_context *ctx,
break;
case V_028714_SPI_SHADER_UINT16_ABGR: {
LLVMValueRef max = LLVMConstInt(ctx->i32, is_int8 ? 255 : 65535, 0);
LLVMValueRef max_rgb = LLVMConstInt(ctx->i32,
is_int8 ? 255 : is_int10 ? 1023 : 65535, 0);
LLVMValueRef max_alpha = !is_int10 ? max_rgb : LLVMConstInt(ctx->i32, 3, 0);
for (unsigned chan = 0; chan < 4; chan++) {
val[chan] = to_integer(&ctx->ac, values[chan]);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntULT, val[chan], max);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntULT, val[chan], chan == 3 ? max_alpha : max_rgb);
}
args->compr = 1;
@@ -5173,14 +5279,18 @@ si_llvm_init_export_args(struct nir_to_llvm_context *ctx,
}
case V_028714_SPI_SHADER_SINT16_ABGR: {
LLVMValueRef max = LLVMConstInt(ctx->i32, is_int8 ? 127 : 32767, 0);
LLVMValueRef min = LLVMConstInt(ctx->i32, is_int8 ? -128 : -32768, 0);
LLVMValueRef max_rgb = LLVMConstInt(ctx->i32,
is_int8 ? 127 : is_int10 ? 511 : 32767, 0);
LLVMValueRef min_rgb = LLVMConstInt(ctx->i32,
is_int8 ? -128 : is_int10 ? -512 : -32768, 0);
LLVMValueRef max_alpha = !is_int10 ? max_rgb : ctx->i32one;
LLVMValueRef min_alpha = !is_int10 ? min_rgb : LLVMConstInt(ctx->i32, -2, 0);
/* Clamp. */
for (unsigned chan = 0; chan < 4; chan++) {
val[chan] = to_integer(&ctx->ac, values[chan]);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntSLT, val[chan], max);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntSGT, val[chan], min);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntSLT, val[chan], chan == 3 ? max_alpha : max_rgb);
val[chan] = emit_minmax_int(&ctx->ac, LLVMIntSGT, val[chan], chan == 3 ? min_alpha : min_rgb);
}
args->compr = 1;
@@ -5719,10 +5829,11 @@ si_export_mrt_z(struct nir_to_llvm_context *ctx,
args.enabled_channels |= 0x4;
}
/* SI (except OLAND) has a bug that it only looks
/* SI (except OLAND and HAINAN) has a bug that it only looks
* at the X writemask component. */
if (ctx->options->chip_class == SI &&
ctx->options->family != CHIP_OLAND)
ctx->options->family != CHIP_OLAND &&
ctx->options->family != CHIP_HAINAN)
args.enabled_channels |= 0x1;
ac_build_export(&ctx->ac, &args);
@@ -5820,15 +5931,6 @@ handle_shader_outputs_post(struct nir_to_llvm_context *ctx)
}
}
static void
handle_shared_compute_var(struct nir_to_llvm_context *ctx,
struct nir_variable *variable, uint32_t *offset, int idx)
{
unsigned size = glsl_count_attribute_slots(variable->type, false);
variable->data.driver_location = *offset;
*offset += size;
}
static void ac_llvm_finalize_module(struct nir_to_llvm_context * ctx)
{
LLVMPassManagerRef passmgr;
@@ -5985,29 +6087,7 @@ LLVMModuleRef ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
create_function(&ctx);
if (nir->stage == MESA_SHADER_COMPUTE) {
int num_shared = 0;
nir_foreach_variable(variable, &nir->shared)
num_shared++;
if (num_shared) {
int idx = 0;
uint32_t shared_size = 0;
LLVMValueRef var;
LLVMTypeRef i8p = LLVMPointerType(ctx.i8, LOCAL_ADDR_SPACE);
nir_foreach_variable(variable, &nir->shared) {
handle_shared_compute_var(&ctx, variable, &shared_size, idx);
idx++;
}
shared_size *= 16;
var = LLVMAddGlobalInAddressSpace(ctx.module,
LLVMArrayType(ctx.i8, shared_size),
"compute_lds",
LOCAL_ADDR_SPACE);
LLVMSetAlignment(var, 4);
ctx.shared_memory = LLVMBuildBitCast(ctx.builder, var, i8p, "");
}
} else if (nir->stage == MESA_SHADER_GEOMETRY) {
if (nir->stage == MESA_SHADER_GEOMETRY) {
ctx.gs_next_vertex = ac_build_alloca(&ctx, ctx.i32, "gs_next_vertex");
ctx.gs_max_out_vertices = nir->info.gs.vertices_out;
@@ -6033,11 +6113,16 @@ LLVMModuleRef ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
_mesa_key_pointer_equal);
ctx.phis = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
_mesa_key_pointer_equal);
ctx.vars = _mesa_hash_table_create(NULL, _mesa_hash_pointer,
_mesa_key_pointer_equal);
func = (struct nir_function *)exec_list_get_head(&nir->functions);
setup_locals(&ctx, func);
if (nir->stage == MESA_SHADER_COMPUTE)
setup_shared(&ctx, nir);
visit_cf_list(&ctx, &func->impl->body);
phi_post_pass(&ctx);
@@ -6050,6 +6135,7 @@ LLVMModuleRef ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
free(ctx.locals);
ralloc_free(ctx.defs);
ralloc_free(ctx.phis);
ralloc_free(ctx.vars);
if (nir->stage == MESA_SHADER_GEOMETRY) {
unsigned addclip = ctx.num_output_clips + ctx.num_output_culls > 4;

View File

@@ -57,6 +57,7 @@ struct ac_tcs_variant_key {
struct ac_fs_variant_key {
uint32_t col_format;
uint32_t is_int8;
uint32_t is_int10;
};
union ac_shader_variant_key {

View File

@@ -157,7 +157,7 @@ ADDR_HANDLE amdgpu_addr_create(const struct radeon_info *info,
ADDR_CREATE_OUTPUT addrCreateOutput = {0};
ADDR_REGISTER_VALUE regValue = {0};
ADDR_CREATE_FLAGS createFlags = {{0}};
ADDR_GET_MAX_ALINGMENTS_OUTPUT addrGetMaxAlignmentsOutput = {0};
ADDR_GET_MAX_ALIGNMENTS_OUTPUT addrGetMaxAlignmentsOutput = {0};
ADDR_E_RETURNCODE addrRet;
addrCreateInput.size = sizeof(ADDR_CREATE_INPUT);
@@ -257,6 +257,18 @@ static int gfx6_compute_level(ADDR_HANDLE addrlib,
AddrSurfInfoIn->width = u_minify(config->info.width, level);
AddrSurfInfoIn->height = u_minify(config->info.height, level);
/* Make GFX6 linear surfaces compatible with GFX9 for hybrid graphics,
* because GFX9 needs linear alignment of 256 bytes.
*/
if (config->info.levels == 1 &&
AddrSurfInfoIn->tileMode == ADDR_TM_LINEAR_ALIGNED &&
AddrSurfInfoIn->bpp) {
unsigned alignment = 256 / (AddrSurfInfoIn->bpp / 8);
assert(util_is_power_of_two(AddrSurfInfoIn->bpp));
AddrSurfInfoIn->width = align(AddrSurfInfoIn->width, alignment);
}
if (config->is_3d)
AddrSurfInfoIn->numSlices = u_minify(config->info.depth, level);
else if (config->is_cube)
@@ -692,6 +704,20 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
surf->htile_size *= 2;
surf->is_linear = surf->u.legacy.level[0].mode == RADEON_SURF_MODE_LINEAR_ALIGNED;
/* workout base swizzle */
if (!(surf->flags & RADEON_SURF_Z_OR_SBUFFER)) {
ADDR_COMPUTE_BASE_SWIZZLE_INPUT AddrBaseSwizzleIn = {0};
ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT AddrBaseSwizzleOut = {0};
AddrBaseSwizzleIn.surfIndex = config->info.surf_index;
AddrBaseSwizzleIn.tileIndex = AddrSurfInfoIn.tileIndex;
AddrBaseSwizzleIn.macroModeIndex = AddrSurfInfoOut.macroModeIndex;
AddrBaseSwizzleIn.pTileInfo = AddrSurfInfoOut.pTileInfo;
AddrBaseSwizzleIn.tileMode = AddrSurfInfoOut.tileMode;
AddrComputeBaseSwizzle(addrlib, &AddrBaseSwizzleIn, &AddrBaseSwizzleOut);
surf->u.legacy.tile_swizzle = AddrBaseSwizzleOut.tileSwizzle;
}
return 0;
}
@@ -947,7 +973,9 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
AddrSurfInfoIn.flags.color = !(surf->flags & RADEON_SURF_Z_OR_SBUFFER);
AddrSurfInfoIn.flags.depth = (surf->flags & RADEON_SURF_ZBUFFER) != 0;
AddrSurfInfoIn.flags.display = (surf->flags & RADEON_SURF_SCANOUT) != 0;
AddrSurfInfoIn.flags.texture = 1;
/* flags.texture currently refers to TC-compatible HTILE */
AddrSurfInfoIn.flags.texture = AddrSurfInfoIn.flags.color ||
surf->flags & RADEON_SURF_TC_COMPATIBLE_HTILE;
AddrSurfInfoIn.flags.opt4space = 1;
AddrSurfInfoIn.numMipLevels = config->info.levels;

View File

@@ -97,6 +97,7 @@ struct legacy_surf_layout {
unsigned depth_adjusted:1;
unsigned stencil_adjusted:1;
uint8_t tile_swizzle;
struct legacy_surf_level level[RADEON_SURF_MAX_LEVELS];
struct legacy_surf_level stencil_level[RADEON_SURF_MAX_LEVELS];
uint8_t tiling_index[RADEON_SURF_MAX_LEVELS];
@@ -194,6 +195,7 @@ struct ac_surf_info {
uint32_t width;
uint32_t height;
uint32_t depth;
uint32_t surf_index;
uint8_t samples;
uint8_t levels;
uint16_t array_size;

View File

@@ -107,13 +107,11 @@ libvulkan_radeon_la_SOURCES = $(VULKAN_GEM_FILES)
vulkan_api_xml = $(top_srcdir)/src/vulkan/registry/vk.xml
radv_entrypoints.h : radv_entrypoints_gen.py $(vulkan_api_xml)
$(AM_V_GEN) cat $(vulkan_api_xml) |\
$(PYTHON2) $(srcdir)/radv_entrypoints_gen.py header > $@
radv_entrypoints.c : radv_entrypoints_gen.py $(vulkan_api_xml)
$(AM_V_GEN) cat $(vulkan_api_xml) |\
$(PYTHON2) $(srcdir)/radv_entrypoints_gen.py code > $@
radv_entrypoints.c: radv_entrypoints_gen.py $(vulkan_api_xml)
$(MKDIR_GEN)
$(AM_V_GEN)$(PYTHON2) $(srcdir)/radv_entrypoints_gen.py \
--xml $(vulkan_api_xml) --outdir $(builddir)
radv_entrypoints.h: radv_entrypoints.c
vk_format_table.c: vk_format_table.py \
vk_format_parse.py \

View File

@@ -1117,6 +1117,35 @@ radv_load_depth_clear_regs(struct radv_cmd_buffer *cmd_buffer,
radeon_emit(cmd_buffer->cs, 0);
}
/*
*with DCC some colors don't require CMASK elimiation before being
* used as a texture. This sets a predicate value to determine if the
* cmask eliminate is required.
*/
void
radv_set_dcc_need_cmask_elim_pred(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
bool value)
{
uint64_t pred_val = value;
uint64_t va = cmd_buffer->device->ws->buffer_get_va(image->bo);
va += image->offset + image->dcc_pred_offset;
if (!image->surface.dcc_size)
return;
cmd_buffer->device->ws->cs_add_buffer(cmd_buffer->cs, image->bo, 8);
radeon_emit(cmd_buffer->cs, PKT3(PKT3_WRITE_DATA, 4, 0));
radeon_emit(cmd_buffer->cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
radeon_emit(cmd_buffer->cs, pred_val);
radeon_emit(cmd_buffer->cs, pred_val >> 32);
}
void
radv_set_color_clear_regs(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
@@ -1179,7 +1208,13 @@ radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer)
struct radv_framebuffer *framebuffer = cmd_buffer->state.framebuffer;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
for (i = 0; i < subpass->color_count; ++i) {
for (i = 0; i < 8; ++i) {
if (i >= subpass->color_count || subpass->color_attachments[i].attachment == VK_ATTACHMENT_UNUSED) {
radeon_set_context_reg(cmd_buffer->cs, R_028C70_CB_COLOR0_INFO + i * 0x3C,
S_028C70_FORMAT(V_028C70_COLOR_INVALID));
continue;
}
int idx = subpass->color_attachments[i].attachment;
struct radv_attachment_info *att = &framebuffer->attachments[idx];
@@ -1191,10 +1226,6 @@ radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer)
radv_load_color_clear_regs(cmd_buffer, att->attachment->image, i);
}
for (i = subpass->color_count; i < 8; i++)
radeon_set_context_reg(cmd_buffer->cs, R_028C70_CB_COLOR0_INFO + i * 0x3C,
S_028C70_FORMAT(V_028C70_COLOR_INVALID));
if(subpass->depth_stencil_attachment.attachment != VK_ATTACHMENT_UNUSED) {
int idx = subpass->depth_stencil_attachment.attachment;
VkImageLayout layout = subpass->depth_stencil_attachment.layout;
@@ -1768,8 +1799,9 @@ radv_cmd_buffer_set_subpass(struct radv_cmd_buffer *cmd_buffer,
radv_subpass_barrier(cmd_buffer, &subpass->start_barrier);
for (unsigned i = 0; i < subpass->color_count; ++i) {
radv_handle_subpass_image_transition(cmd_buffer,
subpass->color_attachments[i]);
if (subpass->color_attachments[i].attachment != VK_ATTACHMENT_UNUSED)
radv_handle_subpass_image_transition(cmd_buffer,
subpass->color_attachments[i]);
}
for (unsigned i = 0; i < subpass->input_count; ++i) {
@@ -1824,6 +1856,9 @@ radv_cmd_state_setup_attachments(struct radv_cmd_buffer *cmd_buffer,
if ((att_aspects & VK_IMAGE_ASPECT_DEPTH_BIT) &&
att->load_op == VK_ATTACHMENT_LOAD_OP_CLEAR) {
clear_aspects |= VK_IMAGE_ASPECT_DEPTH_BIT;
if ((att_aspects & VK_IMAGE_ASPECT_STENCIL_BIT) &&
att->stencil_load_op == VK_ATTACHMENT_LOAD_OP_DONT_CARE)
clear_aspects |= VK_IMAGE_ASPECT_STENCIL_BIT;
}
if ((att_aspects & VK_IMAGE_ASPECT_STENCIL_BIT) &&
att->stencil_load_op == VK_ATTACHMENT_LOAD_OP_CLEAR) {

View File

@@ -317,7 +317,6 @@ radv_descriptor_set_create(struct radv_device *device,
}
if (pool->size - offset < layout_size) {
vk_free2(&device->alloc, NULL, set->dynamic_descriptors);
vk_free2(&device->alloc, NULL, set);
return vk_error(VK_ERROR_OUT_OF_POOL_MEMORY_KHR);
}
@@ -603,11 +602,18 @@ write_image_descriptor(struct radv_device *device,
struct radv_cmd_buffer *cmd_buffer,
unsigned *dst,
struct radeon_winsys_bo **buffer_list,
VkDescriptorType descriptor_type,
const VkDescriptorImageInfo *image_info)
{
RADV_FROM_HANDLE(radv_image_view, iview, image_info->imageView);
memcpy(dst, iview->descriptor, 8 * 4);
memcpy(dst + 8, iview->fmask_descriptor, 8 * 4);
if (descriptor_type == VK_DESCRIPTOR_TYPE_STORAGE_IMAGE) {
memcpy(dst, iview->storage_descriptor, 8 * 4);
memcpy(dst + 8, iview->storage_fmask_descriptor, 8 * 4);
} else {
memcpy(dst, iview->descriptor, 8 * 4);
memcpy(dst + 8, iview->fmask_descriptor, 8 * 4);
}
if (cmd_buffer)
device->ws->cs_add_buffer(cmd_buffer->cs, iview->bo, 7);
@@ -620,12 +626,13 @@ write_combined_image_sampler_descriptor(struct radv_device *device,
struct radv_cmd_buffer *cmd_buffer,
unsigned *dst,
struct radeon_winsys_bo **buffer_list,
VkDescriptorType descriptor_type,
const VkDescriptorImageInfo *image_info,
bool has_sampler)
{
RADV_FROM_HANDLE(radv_sampler, sampler, image_info->sampler);
write_image_descriptor(device, cmd_buffer, dst, buffer_list, image_info);
write_image_descriptor(device, cmd_buffer, dst, buffer_list, descriptor_type, image_info);
/* copy over sampler state */
if (has_sampler)
memcpy(dst + 16, sampler->state, 16);
@@ -696,10 +703,12 @@ void radv_update_descriptor_sets(
case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:
case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT:
write_image_descriptor(device, cmd_buffer, ptr, buffer_list,
writeset->descriptorType,
writeset->pImageInfo + j);
break;
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
write_combined_image_sampler_descriptor(device, cmd_buffer, ptr, buffer_list,
writeset->descriptorType,
writeset->pImageInfo + j,
!binding_layout->immutable_samplers_offset);
if (copy_immutable_samplers) {
@@ -866,10 +875,12 @@ void radv_update_descriptor_set_with_template(struct radv_device *device,
case VK_DESCRIPTOR_TYPE_STORAGE_IMAGE:
case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT:
write_image_descriptor(device, cmd_buffer, pDst, buffer_list,
templ->entry[i].descriptor_type,
(struct VkDescriptorImageInfo *) pSrc);
break;
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER:
write_combined_image_sampler_descriptor(device, cmd_buffer, pDst, buffer_list,
templ->entry[i].descriptor_type,
(struct VkDescriptorImageInfo *) pSrc,
templ->entry[i].has_sampler);
if (templ->entry[i].immutable_samplers)

View File

@@ -91,7 +91,7 @@ static const VkExtensionProperties instance_extensions[] = {
#ifdef VK_USE_PLATFORM_WAYLAND_KHR
{
.extensionName = VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME,
.specVersion = 5,
.specVersion = 6,
},
#endif
{
@@ -99,7 +99,11 @@ static const VkExtensionProperties instance_extensions[] = {
.specVersion = 1,
},
{
.extensionName = VK_KHX_EXTERNAL_MEMORY_CAPABILITIES_EXTENSION_NAME,
.extensionName = VK_KHR_EXTERNAL_MEMORY_CAPABILITIES_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHR_EXTERNAL_SEMAPHORE_CAPABILITIES_EXTENSION_NAME,
.specVersion = 1,
},
};
@@ -138,15 +142,37 @@ static const VkExtensionProperties common_device_extensions[] = {
.specVersion = 1,
},
{
.extensionName = VK_NV_DEDICATED_ALLOCATION_EXTENSION_NAME,
.extensionName = VK_KHR_GET_MEMORY_REQUIREMENTS_2_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHX_EXTERNAL_MEMORY_EXTENSION_NAME,
.extensionName = VK_KHR_DEDICATED_ALLOCATION_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHX_EXTERNAL_MEMORY_FD_EXTENSION_NAME,
.extensionName = VK_KHR_EXTERNAL_MEMORY_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHR_EXTERNAL_MEMORY_FD_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHR_STORAGE_BUFFER_STORAGE_CLASS_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHR_VARIABLE_POINTERS_EXTENSION_NAME,
.specVersion = 1,
},
};
static const VkExtensionProperties ext_sema_device_extensions[] = {
{
.extensionName = VK_KHR_EXTERNAL_SEMAPHORE_EXTENSION_NAME,
.specVersion = 1,
},
{
.extensionName = VK_KHR_EXTERNAL_SEMAPHORE_FD_EXTENSION_NAME,
.specVersion = 1,
},
};
@@ -300,6 +326,15 @@ radv_physical_device_init(struct radv_physical_device *device,
if (result != VK_SUCCESS)
goto fail;
if (device->rad_info.has_syncobj) {
result = radv_extensions_register(instance,
&device->extensions,
ext_sema_device_extensions,
ARRAY_SIZE(ext_sema_device_extensions));
if (result != VK_SUCCESS)
goto fail;
}
fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n");
device->name = get_chip_name(device->rad_info.family);
@@ -535,7 +570,7 @@ void radv_GetPhysicalDeviceFeatures(
.independentBlend = true,
.geometryShader = !is_gfx9,
.tessellationShader = !is_gfx9,
.sampleRateShading = false,
.sampleRateShading = true,
.dualSrcBlend = true,
.logicOp = true,
.multiDrawIndirect = true,
@@ -581,6 +616,18 @@ void radv_GetPhysicalDeviceFeatures2KHR(
VkPhysicalDevice physicalDevice,
VkPhysicalDeviceFeatures2KHR *pFeatures)
{
vk_foreach_struct(ext, pFeatures->pNext) {
switch (ext->sType) {
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VARIABLE_POINTER_FEATURES_KHR: {
VkPhysicalDeviceVariablePointerFeaturesKHR *features = (void *)ext;
features->variablePointersStorageBuffer = true;
features->variablePointers = false;
break;
}
default:
break;
}
}
return radv_GetPhysicalDeviceFeatures(physicalDevice, &pFeatures->features);
}
@@ -746,8 +793,8 @@ void radv_GetPhysicalDeviceProperties2KHR(
properties->maxPushDescriptors = MAX_PUSH_DESCRIPTORS;
break;
}
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES_KHX: {
VkPhysicalDeviceIDPropertiesKHX *properties = (VkPhysicalDeviceIDPropertiesKHX*)ext;
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES_KHR: {
VkPhysicalDeviceIDPropertiesKHR *properties = (VkPhysicalDeviceIDPropertiesKHR*)ext;
radv_device_get_cache_uuid(0, properties->driverUUID);
memcpy(properties->deviceUUID, pdevice->device_uuid, VK_UUID_SIZE);
properties->deviceLUIDValid = false;
@@ -881,15 +928,17 @@ void radv_GetPhysicalDeviceMemoryProperties(
};
STATIC_ASSERT(RADV_MEM_HEAP_COUNT <= VK_MAX_MEMORY_HEAPS);
uint64_t visible_vram_size = MIN2(physical_device->rad_info.vram_size,
physical_device->rad_info.vram_vis_size);
pMemoryProperties->memoryHeapCount = RADV_MEM_HEAP_COUNT;
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM] = (VkMemoryHeap) {
.size = physical_device->rad_info.vram_size -
physical_device->rad_info.vram_vis_size,
visible_vram_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
};
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_VRAM_CPU_ACCESS] = (VkMemoryHeap) {
.size = physical_device->rad_info.vram_vis_size,
.size = visible_vram_size,
.flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
};
pMemoryProperties->memoryHeaps[RADV_MEM_HEAP_GTT] = (VkMemoryHeap) {
@@ -998,6 +1047,19 @@ VkResult radv_CreateDevice(
return vk_error(VK_ERROR_EXTENSION_NOT_PRESENT);
}
/* Check enabled features */
if (pCreateInfo->pEnabledFeatures) {
VkPhysicalDeviceFeatures supported_features;
radv_GetPhysicalDeviceFeatures(physicalDevice, &supported_features);
VkBool32 *supported_feature = (VkBool32 *)&supported_features;
VkBool32 *enabled_feature = (VkBool32 *)pCreateInfo->pEnabledFeatures;
unsigned num_features = sizeof(VkPhysicalDeviceFeatures) / sizeof(VkBool32);
for (uint32_t i = 0; i < num_features; i++) {
if (enabled_feature[i] && !supported_feature[i])
return vk_error(VK_ERROR_FEATURE_NOT_PRESENT);
}
}
device = vk_alloc2(&physical_device->instance->alloc, pAllocator,
sizeof(*device), 8,
VK_SYSTEM_ALLOCATION_SCOPE_DEVICE);
@@ -1861,6 +1923,89 @@ fail:
return VK_ERROR_OUT_OF_DEVICE_MEMORY;
}
static VkResult radv_alloc_sem_counts(struct radv_winsys_sem_counts *counts,
int num_sems,
const VkSemaphore *sems,
bool reset_temp)
{
int syncobj_idx = 0, sem_idx = 0;
if (num_sems == 0)
return VK_SUCCESS;
for (uint32_t i = 0; i < num_sems; i++) {
RADV_FROM_HANDLE(radv_semaphore, sem, sems[i]);
if (sem->temp_syncobj || sem->syncobj)
counts->syncobj_count++;
else
counts->sem_count++;
}
if (counts->syncobj_count) {
counts->syncobj = (uint32_t *)malloc(sizeof(uint32_t) * counts->syncobj_count);
if (!counts->syncobj)
return VK_ERROR_OUT_OF_HOST_MEMORY;
}
if (counts->sem_count) {
counts->sem = (struct radeon_winsys_sem **)malloc(sizeof(struct radeon_winsys_sem *) * counts->sem_count);
if (!counts->sem) {
free(counts->syncobj);
return VK_ERROR_OUT_OF_HOST_MEMORY;
}
}
for (uint32_t i = 0; i < num_sems; i++) {
RADV_FROM_HANDLE(radv_semaphore, sem, sems[i]);
if (sem->temp_syncobj) {
counts->syncobj[syncobj_idx++] = sem->temp_syncobj;
if (reset_temp) {
/* after we wait on a temp import - drop it */
sem->temp_syncobj = 0;
}
}
else if (sem->syncobj)
counts->syncobj[syncobj_idx++] = sem->syncobj;
else {
assert(sem->sem);
counts->sem[sem_idx++] = sem->sem;
}
}
return VK_SUCCESS;
}
void radv_free_sem_info(struct radv_winsys_sem_info *sem_info)
{
free(sem_info->wait.syncobj);
free(sem_info->wait.sem);
free(sem_info->signal.syncobj);
free(sem_info->signal.sem);
}
VkResult radv_alloc_sem_info(struct radv_winsys_sem_info *sem_info,
int num_wait_sems,
const VkSemaphore *wait_sems,
int num_signal_sems,
const VkSemaphore *signal_sems)
{
VkResult ret;
memset(sem_info, 0, sizeof(*sem_info));
ret = radv_alloc_sem_counts(&sem_info->wait, num_wait_sems, wait_sems, true);
if (ret)
return ret;
ret = radv_alloc_sem_counts(&sem_info->signal, num_signal_sems, signal_sems, false);
if (ret)
radv_free_sem_info(sem_info);
/* caller can override these */
sem_info->cs_emit_wait = true;
sem_info->cs_emit_signal = true;
return ret;
}
VkResult radv_QueueSubmit(
VkQueue _queue,
uint32_t submitCount,
@@ -1911,16 +2056,22 @@ VkResult radv_QueueSubmit(
bool do_flush = !i || pSubmits[i].pWaitDstStageMask;
bool can_patch = !do_flush;
uint32_t advance;
struct radv_winsys_sem_info sem_info;
result = radv_alloc_sem_info(&sem_info,
pSubmits[i].waitSemaphoreCount,
pSubmits[i].pWaitSemaphores,
pSubmits[i].signalSemaphoreCount,
pSubmits[i].pSignalSemaphores);
if (result != VK_SUCCESS)
return result;
if (!pSubmits[i].commandBufferCount) {
if (pSubmits[i].waitSemaphoreCount || pSubmits[i].signalSemaphoreCount) {
ret = queue->device->ws->cs_submit(ctx, queue->queue_idx,
&queue->device->empty_cs[queue->queue_family_index],
1, NULL, NULL,
(struct radeon_winsys_sem **)pSubmits[i].pWaitSemaphores,
pSubmits[i].waitSemaphoreCount,
(struct radeon_winsys_sem **)pSubmits[i].pSignalSemaphores,
pSubmits[i].signalSemaphoreCount,
&sem_info,
false, base_fence);
if (ret) {
radv_loge("failed to submit CS %d\n", i);
@@ -1928,6 +2079,7 @@ VkResult radv_QueueSubmit(
}
fence_emitted = true;
}
radv_free_sem_info(&sem_info);
continue;
}
@@ -1952,18 +2104,16 @@ VkResult radv_QueueSubmit(
for (uint32_t j = 0; j < pSubmits[i].commandBufferCount + do_flush; j += advance) {
advance = MIN2(max_cs_submission,
pSubmits[i].commandBufferCount + do_flush - j);
bool b = j == 0;
bool e = j + advance == pSubmits[i].commandBufferCount + do_flush;
if (queue->device->trace_bo)
*queue->device->trace_id_ptr = 0;
sem_info.cs_emit_wait = j == 0;
sem_info.cs_emit_signal = j + advance == pSubmits[i].commandBufferCount + do_flush;
ret = queue->device->ws->cs_submit(ctx, queue->queue_idx, cs_array + j,
advance, initial_preamble_cs, continue_preamble_cs,
(struct radeon_winsys_sem **)pSubmits[i].pWaitSemaphores,
b ? pSubmits[i].waitSemaphoreCount : 0,
(struct radeon_winsys_sem **)pSubmits[i].pSignalSemaphores,
e ? pSubmits[i].signalSemaphoreCount : 0,
&sem_info,
can_patch, base_fence);
if (ret) {
@@ -1984,16 +2134,19 @@ VkResult radv_QueueSubmit(
}
}
}
radv_free_sem_info(&sem_info);
free(cs_array);
}
if (fence) {
if (!fence_emitted)
if (!fence_emitted) {
struct radv_winsys_sem_info sem_info = {0};
ret = queue->device->ws->cs_submit(ctx, queue->queue_idx,
&queue->device->empty_cs[queue->queue_family_index],
1, NULL, NULL, NULL, 0, NULL, 0,
1, NULL, NULL, &sem_info,
false, base_fence);
}
fence->submitted = true;
}
@@ -2089,10 +2242,10 @@ VkResult radv_AllocateMemory(
return VK_SUCCESS;
}
const VkImportMemoryFdInfoKHX *import_info =
vk_find_struct_const(pAllocateInfo->pNext, IMPORT_MEMORY_FD_INFO_KHX);
const VkDedicatedAllocationMemoryAllocateInfoNV *dedicate_info =
vk_find_struct_const(pAllocateInfo->pNext, DEDICATED_ALLOCATION_MEMORY_ALLOCATE_INFO_NV);
const VkImportMemoryFdInfoKHR *import_info =
vk_find_struct_const(pAllocateInfo->pNext, IMPORT_MEMORY_FD_INFO_KHR);
const VkMemoryDedicatedAllocateInfoKHR *dedicate_info =
vk_find_struct_const(pAllocateInfo->pNext, MEMORY_DEDICATED_ALLOCATE_INFO_KHR);
mem = vk_alloc2(&device->alloc, pAllocator, sizeof(*mem), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
@@ -2109,14 +2262,16 @@ VkResult radv_AllocateMemory(
if (import_info) {
assert(import_info->handleType ==
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX);
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
mem->bo = device->ws->buffer_from_fd(device->ws, import_info->fd,
NULL, NULL);
if (!mem->bo) {
result = VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX;
result = VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR;
goto fail;
} else
} else {
close(import_info->fd);
goto out_success;
}
}
uint64_t alloc_size = align_u64(pAllocateInfo->allocationSize, 4096);
@@ -2241,6 +2396,29 @@ void radv_GetBufferMemoryRequirements(
pMemoryRequirements->size = align64(buffer->size, pMemoryRequirements->alignment);
}
void radv_GetBufferMemoryRequirements2KHR(
VkDevice device,
const VkBufferMemoryRequirementsInfo2KHR* pInfo,
VkMemoryRequirements2KHR* pMemoryRequirements)
{
radv_GetBufferMemoryRequirements(device, pInfo->buffer,
&pMemoryRequirements->memoryRequirements);
vk_foreach_struct(ext, pMemoryRequirements->pNext) {
switch (ext->sType) {
case VK_STRUCTURE_TYPE_MEMORY_DEDICATED_REQUIREMENTS_KHR: {
VkMemoryDedicatedRequirementsKHR *req =
(VkMemoryDedicatedRequirementsKHR *) ext;
req->requiresDedicatedAllocation = false;
req->prefersDedicatedAllocation = req->requiresDedicatedAllocation;
break;
}
default:
break;
}
}
}
void radv_GetImageMemoryRequirements(
VkDevice device,
VkImage _image,
@@ -2254,6 +2432,31 @@ void radv_GetImageMemoryRequirements(
pMemoryRequirements->alignment = image->alignment;
}
void radv_GetImageMemoryRequirements2KHR(
VkDevice device,
const VkImageMemoryRequirementsInfo2KHR* pInfo,
VkMemoryRequirements2KHR* pMemoryRequirements)
{
radv_GetImageMemoryRequirements(device, pInfo->image,
&pMemoryRequirements->memoryRequirements);
RADV_FROM_HANDLE(radv_image, image, pInfo->image);
vk_foreach_struct(ext, pMemoryRequirements->pNext) {
switch (ext->sType) {
case VK_STRUCTURE_TYPE_MEMORY_DEDICATED_REQUIREMENTS_KHR: {
VkMemoryDedicatedRequirementsKHR *req =
(VkMemoryDedicatedRequirementsKHR *) ext;
req->requiresDedicatedAllocation = image->shareable;
req->prefersDedicatedAllocation = req->requiresDedicatedAllocation;
break;
}
default:
break;
}
}
}
void radv_GetImageSparseMemoryRequirements(
VkDevice device,
VkImage image,
@@ -2263,6 +2466,15 @@ void radv_GetImageSparseMemoryRequirements(
stub();
}
void radv_GetImageSparseMemoryRequirements2KHR(
VkDevice device,
const VkImageSparseMemoryRequirementsInfo2KHR* pInfo,
uint32_t* pSparseMemoryRequirementCount,
VkSparseImageMemoryRequirements2KHR* pSparseMemoryRequirements)
{
stub();
}
void radv_GetDeviceMemoryCommitment(
VkDevice device,
VkDeviceMemory memory,
@@ -2364,6 +2576,7 @@ radv_sparse_image_opaque_bind_memory(struct radv_device *device,
bool fence_emitted = false;
for (uint32_t i = 0; i < bindInfoCount; ++i) {
struct radv_winsys_sem_info sem_info;
for (uint32_t j = 0; j < pBindInfo[i].bufferBindCount; ++j) {
radv_sparse_buffer_bind_memory(queue->device,
pBindInfo[i].pBufferBinds + j);
@@ -2374,19 +2587,28 @@ radv_sparse_image_opaque_bind_memory(struct radv_device *device,
pBindInfo[i].pImageOpaqueBinds + j);
}
VkResult result;
result = radv_alloc_sem_info(&sem_info,
pBindInfo[i].waitSemaphoreCount,
pBindInfo[i].pWaitSemaphores,
pBindInfo[i].signalSemaphoreCount,
pBindInfo[i].pSignalSemaphores);
if (result != VK_SUCCESS)
return result;
if (pBindInfo[i].waitSemaphoreCount || pBindInfo[i].signalSemaphoreCount) {
queue->device->ws->cs_submit(queue->hw_ctx, queue->queue_idx,
&queue->device->empty_cs[queue->queue_family_index],
1, NULL, NULL,
(struct radeon_winsys_sem **)pBindInfo[i].pWaitSemaphores,
pBindInfo[i].waitSemaphoreCount,
(struct radeon_winsys_sem **)pBindInfo[i].pSignalSemaphores,
pBindInfo[i].signalSemaphoreCount,
&sem_info,
false, base_fence);
fence_emitted = true;
if (fence)
fence->submitted = true;
}
radv_free_sem_info(&sem_info);
}
if (fence && !fence_emitted) {
@@ -2523,13 +2745,38 @@ VkResult radv_CreateSemaphore(
VkSemaphore* pSemaphore)
{
RADV_FROM_HANDLE(radv_device, device, _device);
struct radeon_winsys_sem *sem;
const VkExportSemaphoreCreateInfoKHR *export =
vk_find_struct_const(pCreateInfo->pNext, EXPORT_SEMAPHORE_CREATE_INFO_KHR);
VkExternalSemaphoreHandleTypeFlagsKHR handleTypes =
export ? export->handleTypes : 0;
sem = device->ws->create_sem(device->ws);
struct radv_semaphore *sem = vk_alloc2(&device->alloc, pAllocator,
sizeof(*sem), 8,
VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
if (!sem)
return VK_ERROR_OUT_OF_HOST_MEMORY;
*pSemaphore = radeon_winsys_sem_to_handle(sem);
sem->temp_syncobj = 0;
/* create a syncobject if we are going to export this semaphore */
if (handleTypes) {
assert (device->physical_device->rad_info.has_syncobj);
assert (handleTypes == VK_EXTERNAL_FENCE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
int ret = device->ws->create_syncobj(device->ws, &sem->syncobj);
if (ret) {
vk_free2(&device->alloc, pAllocator, sem);
return VK_ERROR_OUT_OF_HOST_MEMORY;
}
sem->sem = NULL;
} else {
sem->sem = device->ws->create_sem(device->ws);
if (!sem->sem) {
vk_free2(&device->alloc, pAllocator, sem);
return VK_ERROR_OUT_OF_HOST_MEMORY;
}
sem->syncobj = 0;
}
*pSemaphore = radv_semaphore_to_handle(sem);
return VK_SUCCESS;
}
@@ -2539,11 +2786,15 @@ void radv_DestroySemaphore(
const VkAllocationCallbacks* pAllocator)
{
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radeon_winsys_sem, sem, _semaphore);
RADV_FROM_HANDLE(radv_semaphore, sem, _semaphore);
if (!_semaphore)
return;
device->ws->destroy_sem(sem);
if (sem->syncobj)
device->ws->destroy_syncobj(device->ws, sem->syncobj);
else
device->ws->destroy_sem(sem->sem);
vk_free2(&device->alloc, pAllocator, sem);
}
VkResult radv_CreateEvent(
@@ -2753,7 +3004,8 @@ radv_initialise_color_surface(struct radv_device *device,
}
cb->cb_color_base = va >> 8;
if (device->physical_device->rad_info.chip_class < GFX9)
cb->cb_color_base |= iview->image->surface.u.legacy.tile_swizzle;
/* CMASK variables */
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
va += iview->image->cmask.offset;
@@ -2762,6 +3014,8 @@ radv_initialise_color_surface(struct radv_device *device,
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
va += iview->image->dcc_offset;
cb->cb_dcc_base = va >> 8;
if (device->physical_device->rad_info.chip_class < GFX9)
cb->cb_dcc_base |= iview->image->surface.u.legacy.tile_swizzle;
uint32_t max_slice = radv_surface_layer_count(iview);
cb->cb_color_view = S_028C6C_SLICE_START(iview->base_layer) |
@@ -2777,6 +3031,8 @@ radv_initialise_color_surface(struct radv_device *device,
if (iview->image->fmask.size) {
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + iview->image->fmask.offset;
cb->cb_color_fmask = va >> 8;
if (device->physical_device->rad_info.chip_class < GFX9)
cb->cb_color_fmask |= iview->image->surface.u.legacy.tile_swizzle;
} else {
cb->cb_color_fmask = cb->cb_color_base;
}
@@ -2992,6 +3248,8 @@ radv_initialise_ds_surface(struct radv_device *device,
ds->db_z_info |= S_028040_TILE_MODE_INDEX(tile_mode_index);
tile_mode_index = si_tile_mode_index(iview->image, level, true);
ds->db_stencil_info |= S_028044_TILE_MODE_INDEX(tile_mode_index);
if (stencil_only)
ds->db_z_info |= S_028040_TILE_MODE_INDEX(tile_mode_index);
}
ds->db_depth_size = S_028058_PITCH_TILE_MAX((level_info->nblk_x / 8) - 1) |
@@ -3291,16 +3549,18 @@ vk_icdNegotiateLoaderICDInterfaceVersion(uint32_t *pSupportedVersion)
return VK_SUCCESS;
}
VkResult radv_GetMemoryFdKHX(VkDevice _device,
VkDeviceMemory _memory,
VkExternalMemoryHandleTypeFlagsKHX handleType,
VkResult radv_GetMemoryFdKHR(VkDevice _device,
const VkMemoryGetFdInfoKHR *pGetFdInfo,
int *pFD)
{
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_device_memory, memory, _memory);
RADV_FROM_HANDLE(radv_device_memory, memory, pGetFdInfo->memory);
assert(pGetFdInfo->sType == VK_STRUCTURE_TYPE_MEMORY_GET_FD_INFO_KHR);
/* We support only one handle type. */
assert(handleType == VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX);
assert(pGetFdInfo->handleType ==
VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
bool ret = radv_get_memory_fd(device, memory, pFD);
if (ret == false)
@@ -3308,10 +3568,10 @@ VkResult radv_GetMemoryFdKHX(VkDevice _device,
return VK_SUCCESS;
}
VkResult radv_GetMemoryFdPropertiesKHX(VkDevice _device,
VkExternalMemoryHandleTypeFlagBitsKHX handleType,
VkResult radv_GetMemoryFdPropertiesKHR(VkDevice _device,
VkExternalMemoryHandleTypeFlagBitsKHR handleType,
int fd,
VkMemoryFdPropertiesKHX *pMemoryFdProperties)
VkMemoryFdPropertiesKHR *pMemoryFdProperties)
{
/* The valid usage section for this function says:
*
@@ -3319,5 +3579,63 @@ VkResult radv_GetMemoryFdPropertiesKHX(VkDevice _device,
*
* Since we only handle opaque handles for now, there are no FD properties.
*/
return VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX;
return VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR;
}
VkResult radv_ImportSemaphoreFdKHR(VkDevice _device,
const VkImportSemaphoreFdInfoKHR *pImportSemaphoreFdInfo)
{
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_semaphore, sem, pImportSemaphoreFdInfo->semaphore);
uint32_t syncobj_handle = 0;
assert(pImportSemaphoreFdInfo->handleType == VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
int ret = device->ws->import_syncobj(device->ws, pImportSemaphoreFdInfo->fd, &syncobj_handle);
if (ret != 0)
return VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR;
if (pImportSemaphoreFdInfo->flags & VK_SEMAPHORE_IMPORT_TEMPORARY_BIT_KHR) {
sem->temp_syncobj = syncobj_handle;
} else {
sem->syncobj = syncobj_handle;
}
close(pImportSemaphoreFdInfo->fd);
return VK_SUCCESS;
}
VkResult radv_GetSemaphoreFdKHR(VkDevice _device,
const VkSemaphoreGetFdInfoKHR *pGetFdInfo,
int *pFd)
{
RADV_FROM_HANDLE(radv_device, device, _device);
RADV_FROM_HANDLE(radv_semaphore, sem, pGetFdInfo->semaphore);
int ret;
uint32_t syncobj_handle;
assert(pGetFdInfo->handleType == VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
if (sem->temp_syncobj)
syncobj_handle = sem->temp_syncobj;
else
syncobj_handle = sem->syncobj;
ret = device->ws->export_syncobj(device->ws, syncobj_handle, pFd);
if (ret)
return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR);
return VK_SUCCESS;
}
void radv_GetPhysicalDeviceExternalSemaphorePropertiesKHR(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceExternalSemaphoreInfoKHR* pExternalSemaphoreInfo,
VkExternalSemaphorePropertiesKHR* pExternalSemaphoreProperties)
{
if (pExternalSemaphoreInfo->handleType == VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR) {
pExternalSemaphoreProperties->exportFromImportedHandleTypes = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;
pExternalSemaphoreProperties->compatibleHandleTypes = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;
pExternalSemaphoreProperties->externalSemaphoreFeatures = VK_EXTERNAL_SEMAPHORE_FEATURE_EXPORTABLE_BIT_KHR |
VK_EXTERNAL_SEMAPHORE_FEATURE_IMPORTABLE_BIT_KHR;
} else {
pExternalSemaphoreProperties->exportFromImportedHandleTypes = 0;
pExternalSemaphoreProperties->compatibleHandleTypes = 0;
pExternalSemaphoreProperties->externalSemaphoreFeatures = 0;
}
}

View File

@@ -1,6 +1,6 @@
# coding=utf-8
#
# Copyright © 2015 Intel Corporation
# Copyright © 2015, 2017 Intel Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
@@ -22,29 +22,41 @@
# IN THE SOFTWARE.
#
import sys
import xml.etree.ElementTree as ET
import argparse
import functools
import os
import textwrap
import xml.etree.cElementTree as et
max_api_version = 1.0
from mako.template import Template
supported_extensions = [
'VK_AMD_draw_indirect_count',
'VK_NV_dedicated_allocation',
'VK_KHR_descriptor_update_template',
'VK_KHR_get_physical_device_properties2',
'VK_KHR_incremental_present',
'VK_KHR_maintenance1',
'VK_KHR_push_descriptor',
'VK_KHR_sampler_mirror_clamp_to_edge',
'VK_KHR_shader_draw_parameters',
'VK_KHR_surface',
'VK_KHR_swapchain',
'VK_KHR_wayland_surface',
'VK_KHR_xcb_surface',
'VK_KHR_xlib_surface',
'VK_KHX_external_memory_capabilities',
'VK_KHX_external_memory',
'VK_KHX_external_memory_fd',
MAX_API_VERSION = 1.0
SUPPORTED_EXTENSIONS = [
'VK_AMD_draw_indirect_count',
'VK_NV_dedicated_allocation',
'VK_KHR_descriptor_update_template',
'VK_KHR_get_physical_device_properties2',
'VK_KHR_incremental_present',
'VK_KHR_maintenance1',
'VK_KHR_push_descriptor',
'VK_KHR_sampler_mirror_clamp_to_edge',
'VK_KHR_shader_draw_parameters',
'VK_KHR_surface',
'VK_KHR_swapchain',
'VK_KHR_wayland_surface',
'VK_KHR_xcb_surface',
'VK_KHR_xlib_surface',
'VK_KHR_get_memory_requirements2',
'VK_KHR_dedicated_allocation',
'VK_KHR_external_memory_capabilities',
'VK_KHR_external_memory',
'VK_KHR_external_memory_fd',
'VK_KHR_storage_buffer_storage_class',
'VK_KHR_variable_pointers',
'VK_KHR_external_semaphore_capabilities',
'VK_KHR_external_semaphore',
'VK_KHR_external_semaphore_fd',
]
# We generate a static hash table for entry point lookup
@@ -52,54 +64,204 @@ supported_extensions = [
# function and a power-of-two size table. The prime numbers are determined
# experimentally.
none = 0xffff
hash_size = 256
u32_mask = 2**32 - 1
hash_mask = hash_size - 1
TEMPLATE_H = Template(textwrap.dedent("""\
/* This file generated from ${filename}, don't edit directly. */
prime_factor = 5024183
prime_step = 19
struct radv_dispatch_table {
union {
void *entrypoints[${len(entrypoints)}];
struct {
% for _, name, _, _, _, guard in entrypoints:
% if guard is not None:
#ifdef ${guard}
PFN_vk${name} ${name};
#else
void *${name};
# endif
% else:
PFN_vk${name} ${name};
% endif
% endfor
};
};
};
def hash(name):
h = 0;
for c in name:
h = (h * prime_factor + ord(c)) & u32_mask
% for type_, name, args, num, h, guard in entrypoints:
% if guard is not None:
#ifdef ${guard}
% endif
${type_} radv_${name}(${args});
% if guard is not None:
#endif // ${guard}
% endif
% endfor
"""), output_encoding='utf-8')
return h
TEMPLATE_C = Template(textwrap.dedent(u"""\
/*
* Copyright © 2015 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
def print_guard_start(guard):
if guard is not None:
print "#ifdef {0}".format(guard)
/* This file generated from ${filename}, don't edit directly. */
def print_guard_end(guard):
if guard is not None:
print "#endif // {0}".format(guard)
#include "radv_private.h"
opt_header = False
opt_code = False
struct radv_entrypoint {
uint32_t name;
uint32_t hash;
};
/* We use a big string constant to avoid lots of reloctions from the entry
* point table to lots of little strings. The entries in the entry point table
* store the index into this big string.
*/
static const char strings[] =
% for _, name, _, _, _, _ in entrypoints:
"vk${name}\\0"
% endfor
;
static const struct radv_entrypoint entrypoints[] = {
% for _, _, _, num, h, _ in entrypoints:
{ ${offsets[num]}, ${'{:0=#8x}'.format(h)} },
% endfor
};
/* Weak aliases for all potential implementations. These will resolve to
* NULL if they're not defined, which lets the resolve_entrypoint() function
* either pick the correct entry point.
*/
% for layer in ['radv']:
% for type_, name, args, _, _, guard in entrypoints:
% if guard is not None:
#ifdef ${guard}
% endif
${type_} ${layer}_${name}(${args}) __attribute__ ((weak));
% if guard is not None:
#endif // ${guard}
% endif
% endfor
const struct radv_dispatch_table ${layer}_layer = {
% for _, name, args, _, _, guard in entrypoints:
% if guard is not None:
#ifdef ${guard}
% endif
.${name} = ${layer}_${name},
% if guard is not None:
#endif // ${guard}
% endif
% endfor
};
% endfor
static void * __attribute__ ((noinline))
radv_resolve_entrypoint(uint32_t index)
{
return radv_layer.entrypoints[index];
}
/* Hash table stats:
* size ${hash_size} entries
* collisions entries:
% for i in xrange(10):
* ${i}${'+' if i == 9 else ''} ${collisions[i]}
% endfor
*/
#define none ${'{:#x}'.format(none)}
static const uint16_t map[] = {
% for i in xrange(0, hash_size, 8):
% for j in xrange(i, i + 8):
## This is 6 because the 0x is counted in the length
% if mapping[j] & 0xffff == 0xffff:
none,
% else:
${'{:0=#6x}'.format(mapping[j] & 0xffff)},
% endif
% endfor
% endfor
};
void *
radv_lookup_entrypoint(const char *name)
{
static const uint32_t prime_factor = ${prime_factor};
static const uint32_t prime_step = ${prime_step};
const struct radv_entrypoint *e;
uint32_t hash, h, i;
const char *p;
hash = 0;
for (p = name; *p; p++)
hash = hash * prime_factor + *p;
h = hash;
do {
i = map[h & ${hash_mask}];
if (i == none)
return NULL;
e = &entrypoints[i];
h += prime_step;
} while (e->hash != hash);
if (strcmp(name, strings + e->name) != 0)
return NULL;
return radv_resolve_entrypoint(i);
}"""), output_encoding='utf-8')
NONE = 0xffff
HASH_SIZE = 256
U32_MASK = 2**32 - 1
HASH_MASK = HASH_SIZE - 1
PRIME_FACTOR = 5024183
PRIME_STEP = 19
def cal_hash(name):
"""Calculate the same hash value that Mesa will calculate in C."""
return functools.reduce(
lambda h, c: (h * PRIME_FACTOR + ord(c)) & U32_MASK, name, 0)
if (sys.argv[1] == "header"):
opt_header = True
sys.argv.pop()
elif (sys.argv[1] == "code"):
opt_code = True
sys.argv.pop()
# Extract the entry points from the registry
def get_entrypoints(doc, entrypoints_to_defines):
"""Extract the entry points from the registry."""
entrypoints = []
enabled_commands = set()
for feature in doc.findall('./feature'):
assert feature.attrib['api'] == 'vulkan'
if float(feature.attrib['number']) > max_api_version:
if float(feature.attrib['number']) > MAX_API_VERSION:
continue
for command in feature.findall('./require/command'):
enabled_commands.add(command.attrib['name'])
for extension in doc.findall('.extensions/extension'):
if extension.attrib['name'] not in supported_extensions:
if extension.attrib['name'] not in SUPPORTED_EXTENSIONS:
continue
assert extension.attrib['supported'] == 'vulkan'
@@ -115,219 +277,78 @@ def get_entrypoints(doc, entrypoints_to_defines):
continue
shortname = fullname[2:]
params = map(lambda p: "".join(p.itertext()), command.findall('./param'))
params = (''.join(p.itertext()) for p in command.findall('./param'))
params = ', '.join(params)
if fullname in entrypoints_to_defines:
guard = entrypoints_to_defines[fullname]
else:
guard = None
entrypoints.append((type, shortname, params, index, hash(fullname), guard))
guard = entrypoints_to_defines.get(fullname)
entrypoints.append((type, shortname, params, index, cal_hash(fullname), guard))
index += 1
return entrypoints
# Maps entry points to extension defines
def get_entrypoints_defines(doc):
"""Maps entry points to extension defines."""
entrypoints_to_defines = {}
extensions = doc.findall('./extensions/extension')
for extension in extensions:
define = extension.get('protect')
entrypoints = extension.findall('./require/command')
for entrypoint in entrypoints:
fullname = entrypoint.get('name')
for extension in doc.findall('./extensions/extension[@protect]'):
define = extension.attrib['protect']
for entrypoint in extension.findall('./require/command'):
fullname = entrypoint.attrib['name']
entrypoints_to_defines[fullname] = define
return entrypoints_to_defines
doc = ET.parse(sys.stdin)
entrypoints = get_entrypoints(doc, get_entrypoints_defines(doc))
# For outputting entrypoints.h we generate a radv_EntryPoint() prototype
# per entry point.
def gen_code(entrypoints):
"""Generate the C code."""
i = 0
offsets = []
for _, name, _, _, _, _ in entrypoints:
offsets.append(i)
i += 2 + len(name) + 1
if opt_header:
print "/* This file generated from vk_gen.py, don't edit directly. */\n"
print "struct radv_dispatch_table {"
print " union {"
print " void *entrypoints[%d];" % len(entrypoints)
print " struct {"
for type, name, args, num, h, guard in entrypoints:
if guard is not None:
print "#ifdef {0}".format(guard)
print " PFN_vk{0} {0};".format(name)
print "#else"
print " void *{0};".format(name)
print "#endif"
mapping = [NONE] * HASH_SIZE
collisions = [0] * 10
for _, name, _, num, h, _ in entrypoints:
level = 0
while mapping[h & HASH_MASK] != NONE:
h = h + PRIME_STEP
level = level + 1
if level > 9:
collisions[9] += 1
else:
print " PFN_vk{0} {0};".format(name)
print " };\n"
print " };\n"
print "};\n"
collisions[level] += 1
mapping[h & HASH_MASK] = num
for type, name, args, num, h, guard in entrypoints:
print_guard_start(guard)
print "%s radv_%s(%s);" % (type, name, args)
print_guard_end(guard)
exit()
return TEMPLATE_C.render(entrypoints=entrypoints,
offsets=offsets,
collisions=collisions,
mapping=mapping,
hash_mask=HASH_MASK,
prime_step=PRIME_STEP,
prime_factor=PRIME_FACTOR,
none=NONE,
hash_size=HASH_SIZE,
filename=os.path.basename(__file__))
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--outdir', help='Where to write the files.',
required=True)
parser.add_argument('--xml', help='Vulkan API XML file.', required=True)
args = parser.parse_args()
print """/*
* Copyright © 2015 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
doc = et.parse(args.xml)
entrypoints = get_entrypoints(doc, get_entrypoints_defines(doc))
/* DO NOT EDIT! This is a generated file. */
with open(os.path.join(args.outdir, 'radv_entrypoints.h'), 'wb') as f:
f.write(TEMPLATE_H.render(entrypoints=entrypoints,
filename=os.path.basename(__file__)))
with open(os.path.join(args.outdir, 'radv_entrypoints.c'), 'wb') as f:
f.write(gen_code(entrypoints))
#include "radv_private.h"
struct radv_entrypoint {
uint32_t name;
uint32_t hash;
};
/* We use a big string constant to avoid lots of reloctions from the entry
* point table to lots of little strings. The entries in the entry point table
* store the index into this big string.
*/
static const char strings[] ="""
offsets = []
i = 0;
for type, name, args, num, h, guard in entrypoints:
print " \"vk%s\\0\"" % name
offsets.append(i)
i += 2 + len(name) + 1
print " ;"
# Now generate the table of all entry points
print "\nstatic const struct radv_entrypoint entrypoints[] = {"
for type, name, args, num, h, guard in entrypoints:
print " { %5d, 0x%08x }," % (offsets[num], h)
print "};\n"
print """
/* Weak aliases for all potential implementations. These will resolve to
* NULL if they're not defined, which lets the resolve_entrypoint() function
* either pick the correct entry point.
*/
"""
for layer in [ "radv" ]:
for type, name, args, num, h, guard in entrypoints:
print_guard_start(guard)
print "%s %s_%s(%s) __attribute__ ((weak));" % (type, layer, name, args)
print_guard_end(guard)
print "\nconst struct radv_dispatch_table %s_layer = {" % layer
for type, name, args, num, h, guard in entrypoints:
print_guard_start(guard)
print " .%s = %s_%s," % (name, layer, name)
print_guard_end(guard)
print "};\n"
print """
static void * __attribute__ ((noinline))
radv_resolve_entrypoint(uint32_t index)
{
return radv_layer.entrypoints[index];
}
"""
# Now generate the hash table used for entry point look up. This is a
# uint16_t table of entry point indices. We use 0xffff to indicate an entry
# in the hash table is empty.
map = [none for f in xrange(hash_size)]
collisions = [0 for f in xrange(10)]
for type, name, args, num, h, guard in entrypoints:
level = 0
while map[h & hash_mask] != none:
h = h + prime_step
level = level + 1
if level > 9:
collisions[9] += 1
else:
collisions[level] += 1
map[h & hash_mask] = num
print "/* Hash table stats:"
print " * size %d entries" % hash_size
print " * collisions entries"
for i in xrange(10):
if (i == 9):
plus = "+"
else:
plus = " "
print " * %2d%s %4d" % (i, plus, collisions[i])
print " */\n"
print "#define none 0x%04x\n" % none
print "static const uint16_t map[] = {"
for i in xrange(0, hash_size, 8):
print " ",
for j in xrange(i, i + 8):
if map[j] & 0xffff == 0xffff:
print " none,",
else:
print "0x%04x," % (map[j] & 0xffff),
print
print "};"
# Finally we generate the hash table lookup function. The hash function and
# linear probing algorithm matches the hash table generated above.
print """
void *
radv_lookup_entrypoint(const char *name)
{
static const uint32_t prime_factor = %d;
static const uint32_t prime_step = %d;
const struct radv_entrypoint *e;
uint32_t hash, h, i;
const char *p;
hash = 0;
for (p = name; *p; p++)
hash = hash * prime_factor + *p;
h = hash;
do {
i = map[h & %d];
if (i == none)
return NULL;
e = &entrypoints[i];
h += prime_step;
} while (e->hash != hash);
if (strcmp(name, strings + e->name) != 0)
return NULL;
return radv_resolve_entrypoint(i);
}
""" % (prime_factor, prime_step, hash_mask)
if __name__ == '__main__':
main()

View File

@@ -977,6 +977,27 @@ bool radv_format_pack_clear_color(VkFormat format,
clear_vals[0] = float3_to_r11g11b10f(value->float32);
clear_vals[1] = 0;
break;
case VK_FORMAT_R32G32B32A32_SFLOAT:
if (value->float32[0] != value->float32[1] ||
value->float32[0] != value->float32[2])
return false;
clear_vals[0] = fui(value->float32[0]);
clear_vals[1] = fui(value->float32[3]);
break;
case VK_FORMAT_R32G32B32A32_UINT:
if (value->uint32[0] != value->uint32[1] ||
value->uint32[0] != value->uint32[2])
return false;
clear_vals[0] = value->uint32[0];
clear_vals[1] = value->uint32[3];
break;
case VK_FORMAT_R32G32B32A32_SINT:
if (value->int32[0] != value->int32[1] ||
value->int32[0] != value->int32[2])
return false;
clear_vals[0] = value->int32[0];
clear_vals[1] = value->int32[3];
break;
default:
fprintf(stderr, "failed to fast clear %d\n", format);
return false;
@@ -1144,21 +1165,21 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties(
static void
get_external_image_format_properties(const VkPhysicalDeviceImageFormatInfo2KHR *pImageFormatInfo,
VkExternalMemoryPropertiesKHX *external_properties)
VkExternalMemoryPropertiesKHR *external_properties)
{
VkExternalMemoryFeatureFlagBitsKHX flags = 0;
VkExternalMemoryHandleTypeFlagsKHX export_flags = 0;
VkExternalMemoryHandleTypeFlagsKHX compat_flags = 0;
VkExternalMemoryFeatureFlagBitsKHR flags = 0;
VkExternalMemoryHandleTypeFlagsKHR export_flags = 0;
VkExternalMemoryHandleTypeFlagsKHR compat_flags = 0;
switch (pImageFormatInfo->type) {
case VK_IMAGE_TYPE_2D:
flags = VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT_KHX|VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT_KHX|VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT_KHX;
compat_flags = export_flags = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX;
flags = VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT_KHR|VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT_KHR|VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT_KHR;
compat_flags = export_flags = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;
break;
default:
break;
}
*external_properties = (VkExternalMemoryPropertiesKHX) {
*external_properties = (VkExternalMemoryPropertiesKHR) {
.externalMemoryFeatures = flags,
.exportFromImportedHandleTypes = export_flags,
.compatibleHandleTypes = compat_flags,
@@ -1171,8 +1192,8 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties2KHR(
VkImageFormatProperties2KHR *base_props)
{
RADV_FROM_HANDLE(radv_physical_device, physical_device, physicalDevice);
const VkPhysicalDeviceExternalImageFormatInfoKHX *external_info = NULL;
VkExternalImageFormatPropertiesKHX *external_props = NULL;
const VkPhysicalDeviceExternalImageFormatInfoKHR *external_info = NULL;
VkExternalImageFormatPropertiesKHR *external_props = NULL;
VkResult result;
result = radv_get_image_format_properties(physical_device, base_info,
@@ -1183,7 +1204,7 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties2KHR(
/* Extract input structs */
vk_foreach_struct_const(s, base_info->pNext) {
switch (s->sType) {
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_EXTERNAL_IMAGE_FORMAT_INFO_KHX:
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_EXTERNAL_IMAGE_FORMAT_INFO_KHR:
external_info = (const void *) s;
break;
default:
@@ -1194,7 +1215,7 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties2KHR(
/* Extract output structs */
vk_foreach_struct(s, base_props->pNext) {
switch (s->sType) {
case VK_STRUCTURE_TYPE_EXTERNAL_IMAGE_FORMAT_PROPERTIES_KHX:
case VK_STRUCTURE_TYPE_EXTERNAL_IMAGE_FORMAT_PROPERTIES_KHR:
external_props = (void *) s;
break;
default:
@@ -1205,12 +1226,12 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties2KHR(
/* From the Vulkan 1.0.42 spec:
*
* If handleType is 0, vkGetPhysicalDeviceImageFormatProperties2KHR will
* behave as if VkPhysicalDeviceExternalImageFormatInfoKHX was not
* present and VkExternalImageFormatPropertiesKHX will be ignored.
* behave as if VkPhysicalDeviceExternalImageFormatInfoKHR was not
* present and VkExternalImageFormatPropertiesKHR will be ignored.
*/
if (external_info && external_info->handleType != 0) {
switch (external_info->handleType) {
case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX:
case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR:
get_external_image_format_properties(base_info, &external_props->externalMemoryProperties);
break;
default:
@@ -1222,7 +1243,7 @@ VkResult radv_GetPhysicalDeviceImageFormatProperties2KHR(
* VK_ERROR_FORMAT_NOT_SUPPORTED.
*/
result = vk_errorf(VK_ERROR_FORMAT_NOT_SUPPORTED,
"unsupported VkExternalMemoryTypeFlagBitsKHX 0x%x",
"unsupported VkExternalMemoryTypeFlagBitsKHR 0x%x",
external_info->handleType);
goto fail;
}
@@ -1269,25 +1290,24 @@ void radv_GetPhysicalDeviceSparseImageFormatProperties2KHR(
*pPropertyCount = 0;
}
void radv_GetPhysicalDeviceExternalBufferPropertiesKHX(
void radv_GetPhysicalDeviceExternalBufferPropertiesKHR(
VkPhysicalDevice physicalDevice,
const VkPhysicalDeviceExternalBufferInfoKHX *pExternalBufferInfo,
VkExternalBufferPropertiesKHX *pExternalBufferProperties)
const VkPhysicalDeviceExternalBufferInfoKHR *pExternalBufferInfo,
VkExternalBufferPropertiesKHR *pExternalBufferProperties)
{
VkExternalMemoryFeatureFlagBitsKHX flags = 0;
VkExternalMemoryHandleTypeFlagsKHX export_flags = 0;
VkExternalMemoryHandleTypeFlagsKHX compat_flags = 0;
VkExternalMemoryFeatureFlagBitsKHR flags = 0;
VkExternalMemoryHandleTypeFlagsKHR export_flags = 0;
VkExternalMemoryHandleTypeFlagsKHR compat_flags = 0;
switch(pExternalBufferInfo->handleType) {
case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX:
flags = VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT_KHX |
VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT_KHX |
VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT_KHX;
compat_flags = export_flags = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX;
case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR:
flags = VK_EXTERNAL_MEMORY_FEATURE_EXPORTABLE_BIT_KHR |
VK_EXTERNAL_MEMORY_FEATURE_IMPORTABLE_BIT_KHR;
compat_flags = export_flags = VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR;
break;
default:
break;
}
pExternalBufferProperties->externalMemoryProperties = (VkExternalMemoryPropertiesKHX) {
pExternalBufferProperties->externalMemoryProperties = (VkExternalMemoryPropertiesKHR) {
.externalMemoryFeatures = flags,
.exportFromImportedHandleTypes = export_flags,
.compatibleHandleTypes = compat_flags,

View File

@@ -27,10 +27,12 @@
#include "radv_private.h"
#include "vk_format.h"
#include "vk_util.h"
#include "radv_radeon_winsys.h"
#include "sid.h"
#include "gfx9d.h"
#include "util/debug.h"
#include "util/u_atomic.h"
static unsigned
radv_choose_tiling(struct radv_device *Device,
const struct radv_image_create_info *create_info)
@@ -107,6 +109,7 @@ radv_init_surface(struct radv_device *device,
surface->flags |= RADEON_SURF_SBUFFER;
surface->flags |= RADEON_SURF_HAS_TILE_MODE_INDEX;
surface->flags |= RADEON_SURF_OPTIMIZE_FOR_SPACE;
if ((pCreateInfo->usage & (VK_IMAGE_USAGE_TRANSFER_SRC_BIT |
VK_IMAGE_USAGE_STORAGE_BIT)) ||
@@ -178,6 +181,11 @@ radv_make_buffer_descriptor(struct radv_device *device,
state[0] = va;
state[1] = S_008F04_BASE_ADDRESS_HI(va >> 32) |
S_008F04_STRIDE(stride);
if (device->physical_device->rad_info.chip_class < VI && stride) {
range /= stride;
}
state[2] = range;
state[3] = S_008F0C_DST_SEL_X(radv_map_swizzle(desc->swizzle[0])) |
S_008F0C_DST_SEL_Y(radv_map_swizzle(desc->swizzle[1])) |
@@ -195,7 +203,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
unsigned block_width, bool is_stencil,
uint32_t *state)
{
uint64_t gpu_address = device->ws->buffer_get_va(image->bo) + image->offset;
uint64_t gpu_address = image->bo ? device->ws->buffer_get_va(image->bo) + image->offset : 0;
uint64_t va = gpu_address;
unsigned pitch = base_level_info->nblk_x * block_width;
enum chip_class chip_class = device->physical_device->rad_info.chip_class;
@@ -209,6 +217,8 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
va += base_level_info->offset;
state[0] = va >> 8;
if (chip_class < GFX9)
state[0] |= image->surface.u.legacy.tile_swizzle;
state[1] &= C_008F14_BASE_ADDRESS_HI;
state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40);
state[3] |= S_008F1C_TILING_INDEX(si_tile_mode_index(image, base_level,
@@ -219,12 +229,13 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
state[6] &= C_008F28_COMPRESSION_EN;
state[7] = 0;
if (image->surface.dcc_size && first_level < image->surface.num_dcc_levels) {
uint64_t meta_va = gpu_address + image->dcc_offset;
meta_va = gpu_address + image->dcc_offset;
if (chip_class <= VI)
meta_va += base_level_info->dcc_offset;
state[6] |= S_008F28_COMPRESSION_EN(1);
state[7] = meta_va >> 8;
if (chip_class < GFX9)
state[7] |= image->surface.u.legacy.tile_swizzle;
}
}
@@ -325,7 +336,7 @@ static unsigned gfx9_border_color_swizzle(const unsigned char swizzle[4])
static void
si_make_texture_descriptor(struct radv_device *device,
struct radv_image *image,
bool sampler,
bool is_storage_image,
VkImageViewType view_type,
VkFormat vk_format,
const VkComponentMapping *mapping,
@@ -362,7 +373,7 @@ si_make_texture_descriptor(struct radv_device *device,
}
type = radv_tex_dim(image->type, view_type, image->info.array_size, image->info.samples,
(image->usage & VK_IMAGE_USAGE_STORAGE_BIT));
is_storage_image);
if (type == V_008F1C_SQ_RSRC_IMG_1D_ARRAY) {
height = 1;
depth = image->info.array_size;
@@ -472,6 +483,8 @@ si_make_texture_descriptor(struct radv_device *device,
}
fmask_state[0] = va >> 8;
if (device->physical_device->rad_info.chip_class < GFX9)
fmask_state[0] |= image->surface.u.legacy.tile_swizzle;
fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va >> 40) |
S_008F14_DATA_FORMAT_GFX6(fmask_format) |
S_008F14_NUM_FORMAT_GFX6(num_format);
@@ -526,7 +539,7 @@ radv_query_opaque_metadata(struct radv_device *device,
md->metadata[1] = si_get_bo_metadata_word1(device);
si_make_texture_descriptor(device, image, true,
si_make_texture_descriptor(device, image, false,
(VkImageViewType)image->type, image->vk_format,
&fixedmapping, 0, image->info.levels - 1, 0,
image->info.array_size,
@@ -705,12 +718,16 @@ static void
radv_image_alloc_cmask(struct radv_device *device,
struct radv_image *image)
{
uint32_t clear_value_size = 0;
radv_image_get_cmask_info(device, image, &image->cmask);
image->cmask.offset = align64(image->size, image->cmask.alignment);
/* + 8 for storing the clear values */
image->clear_value_offset = image->cmask.offset + image->cmask.size;
image->size = image->cmask.offset + image->cmask.size + 8;
if (!image->clear_value_offset) {
image->clear_value_offset = image->cmask.offset + image->cmask.size;
clear_value_size = 8;
}
image->size = image->cmask.offset + image->cmask.size + clear_value_size;
image->alignment = MAX2(image->alignment, image->cmask.alignment);
}
@@ -719,9 +736,10 @@ radv_image_alloc_dcc(struct radv_device *device,
struct radv_image *image)
{
image->dcc_offset = align64(image->size, image->surface.dcc_alignment);
/* + 8 for storing the clear values */
/* + 16 for storing the clear values + dcc pred */
image->clear_value_offset = image->dcc_offset + image->surface.dcc_size;
image->size = image->dcc_offset + image->surface.dcc_size + 8;
image->dcc_pred_offset = image->clear_value_offset + 8;
image->size = image->dcc_offset + image->surface.dcc_size + 16;
image->alignment = MAX2(image->alignment, image->surface.dcc_alignment);
}
@@ -783,12 +801,18 @@ radv_image_create(VkDevice _device,
image->exclusive = pCreateInfo->sharingMode == VK_SHARING_MODE_EXCLUSIVE;
if (pCreateInfo->sharingMode == VK_SHARING_MODE_CONCURRENT) {
for (uint32_t i = 0; i < pCreateInfo->queueFamilyIndexCount; ++i)
if (pCreateInfo->pQueueFamilyIndices[i] == VK_QUEUE_FAMILY_EXTERNAL_KHX)
if (pCreateInfo->pQueueFamilyIndices[i] == VK_QUEUE_FAMILY_EXTERNAL_KHR)
image->queue_family_mask |= (1u << RADV_MAX_QUEUE_FAMILIES) - 1u;
else
image->queue_family_mask |= 1u << pCreateInfo->pQueueFamilyIndices[i];
}
image->shareable = vk_find_struct_const(pCreateInfo->pNext,
EXTERNAL_MEMORY_IMAGE_CREATE_INFO_KHR) != NULL;
if (!vk_format_is_depth(pCreateInfo->format) && !create_info->scanout && !image->shareable) {
image->info.surf_index = p_atomic_inc_return(&device->image_mrt_offset_counter) - 1;
}
radv_init_surface(device, &image->surface, create_info);
device->ws->surface_init(device->ws, &image->info, &image->surface);
@@ -834,6 +858,50 @@ radv_image_create(VkDevice _device,
return VK_SUCCESS;
}
static void
radv_image_view_make_descriptor(struct radv_image_view *iview,
struct radv_device *device,
const VkImageViewCreateInfo* pCreateInfo,
bool is_storage_image)
{
RADV_FROM_HANDLE(radv_image, image, pCreateInfo->image);
const VkImageSubresourceRange *range = &pCreateInfo->subresourceRange;
bool is_stencil = iview->aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT;
uint32_t blk_w;
uint32_t *descriptor;
uint32_t *fmask_descriptor;
if (is_storage_image) {
descriptor = iview->storage_descriptor;
fmask_descriptor = iview->storage_fmask_descriptor;
} else {
descriptor = iview->descriptor;
fmask_descriptor = iview->fmask_descriptor;
}
assert(image->surface.blk_w % vk_format_get_blockwidth(image->vk_format) == 0);
blk_w = image->surface.blk_w / vk_format_get_blockwidth(image->vk_format) * vk_format_get_blockwidth(iview->vk_format);
si_make_texture_descriptor(device, image, is_storage_image,
iview->type,
iview->vk_format,
&pCreateInfo->components,
0, radv_get_levelCount(image, range) - 1,
range->baseArrayLayer,
range->baseArrayLayer + radv_get_layerCount(image, range) - 1,
iview->extent.width,
iview->extent.height,
iview->extent.depth,
descriptor,
fmask_descriptor);
si_set_mutable_tex_desc_fields(device, image,
is_stencil ? &image->surface.u.legacy.stencil_level[range->baseMipLevel]
: &image->surface.u.legacy.level[range->baseMipLevel],
range->baseMipLevel,
range->baseMipLevel,
blk_w, is_stencil, descriptor);
}
void
radv_image_view_init(struct radv_image_view *iview,
struct radv_device *device,
@@ -841,8 +909,7 @@ radv_image_view_init(struct radv_image_view *iview,
{
RADV_FROM_HANDLE(radv_image, image, pCreateInfo->image);
const VkImageSubresourceRange *range = &pCreateInfo->subresourceRange;
uint32_t blk_w;
bool is_stencil = false;
switch (image->type) {
case VK_IMAGE_TYPE_1D:
case VK_IMAGE_TYPE_2D:
@@ -862,7 +929,6 @@ radv_image_view_init(struct radv_image_view *iview,
iview->aspect_mask = pCreateInfo->subresourceRange.aspectMask;
if (iview->aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT) {
is_stencil = true;
iview->vk_format = vk_format_stencil_only(iview->vk_format);
} else if (iview->aspect_mask == VK_IMAGE_ASPECT_DEPTH_BIT) {
iview->vk_format = vk_format_depth_only(iview->vk_format);
@@ -879,30 +945,12 @@ radv_image_view_init(struct radv_image_view *iview,
iview->extent.height = round_up_u32(iview->extent.height * vk_format_get_blockheight(iview->vk_format),
vk_format_get_blockheight(image->vk_format));
assert(image->surface.blk_w % vk_format_get_blockwidth(image->vk_format) == 0);
blk_w = image->surface.blk_w / vk_format_get_blockwidth(image->vk_format) * vk_format_get_blockwidth(iview->vk_format);
iview->base_layer = range->baseArrayLayer;
iview->layer_count = radv_get_layerCount(image, range);
iview->base_mip = range->baseMipLevel;
si_make_texture_descriptor(device, image, false,
iview->type,
iview->vk_format,
&pCreateInfo->components,
0, radv_get_levelCount(image, range) - 1,
range->baseArrayLayer,
range->baseArrayLayer + radv_get_layerCount(image, range) - 1,
iview->extent.width,
iview->extent.height,
iview->extent.depth,
iview->descriptor,
iview->fmask_descriptor);
si_set_mutable_tex_desc_fields(device, image,
is_stencil ? &image->surface.u.legacy.stencil_level[range->baseMipLevel]
: &image->surface.u.legacy.level[range->baseMipLevel],
range->baseMipLevel,
range->baseMipLevel,
blk_w, is_stencil, iview->descriptor);
radv_image_view_make_descriptor(iview, device, pCreateInfo, false);
radv_image_view_make_descriptor(iview, device, pCreateInfo, true);
}
bool radv_layout_has_htile(const struct radv_image *image,
@@ -938,7 +986,7 @@ unsigned radv_image_queue_family_mask(const struct radv_image *image, uint32_t f
{
if (!image->exclusive)
return image->queue_family_mask;
if (family == VK_QUEUE_FAMILY_EXTERNAL_KHX)
if (family == VK_QUEUE_FAMILY_EXTERNAL_KHR)
return (1u << RADV_MAX_QUEUE_FAMILIES) - 1u;
if (family == VK_QUEUE_FAMILY_IGNORED)
return 1u << queue_family;

View File

@@ -695,6 +695,8 @@ static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_A2R10G10B10_UINT_PACK32,
VK_FORMAT_A2R10G10B10_SINT_PACK32,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,

View File

@@ -1134,6 +1134,8 @@ static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_A2R10G10B10_UINT_PACK32,
VK_FORMAT_A2R10G10B10_SINT_PACK32,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,

View File

@@ -81,8 +81,10 @@ build_color_shaders(struct nir_shader **out_vs,
vs_out_layer->data.location = VARYING_SLOT_LAYER;
vs_out_layer->data.interpolation = INTERP_MODE_FLAT;
nir_ssa_def *inst_id = nir_load_system_value(&vs_b, nir_intrinsic_load_instance_id, 0);
nir_ssa_def *base_instance = nir_load_system_value(&vs_b, nir_intrinsic_load_base_instance, 0);
nir_store_var(&vs_b, vs_out_layer, inst_id, 0x1);
nir_ssa_def *layer_id = nir_iadd(&vs_b, inst_id, base_instance);
nir_store_var(&vs_b, vs_out_layer, layer_id, 0x1);
*out_vs = vs_b.shader;
*out_fs = fs_b.shader;
@@ -398,7 +400,7 @@ emit_color_clear(struct radv_cmd_buffer *cmd_buffer,
radv_CmdSetScissor(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &clear_rect->rect);
radv_CmdDraw(cmd_buffer_h, 3, clear_rect->layerCount, 0, 0);
radv_CmdDraw(cmd_buffer_h, 3, clear_rect->layerCount, 0, clear_rect->baseArrayLayer);
radv_cmd_buffer_set_subpass(cmd_buffer, subpass, false);
}
@@ -439,7 +441,10 @@ build_depthstencil_shader(struct nir_shader **out_vs, struct nir_shader **out_fs
vs_out_layer->data.location = VARYING_SLOT_LAYER;
vs_out_layer->data.interpolation = INTERP_MODE_FLAT;
nir_ssa_def *inst_id = nir_load_system_value(&vs_b, nir_intrinsic_load_instance_id, 0);
nir_store_var(&vs_b, vs_out_layer, inst_id, 0x1);
nir_ssa_def *base_instance = nir_load_system_value(&vs_b, nir_intrinsic_load_base_instance, 0);
nir_ssa_def *layer_id = nir_iadd(&vs_b, inst_id, base_instance);
nir_store_var(&vs_b, vs_out_layer, layer_id, 0x1);
*out_vs = vs_b.shader;
*out_fs = fs_b.shader;
@@ -654,7 +659,7 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
radv_CmdSetScissor(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1, &clear_rect->rect);
radv_CmdDraw(cmd_buffer_h, 3, clear_rect->layerCount, 0, 0);
radv_CmdDraw(cmd_buffer_h, 3, clear_rect->layerCount, 0, clear_rect->baseArrayLayer);
}
static bool
@@ -749,6 +754,8 @@ static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_A2R10G10B10_UINT_PACK32,
VK_FORMAT_A2R10G10B10_SINT_PACK32,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,
@@ -856,6 +863,83 @@ fail:
return res;
}
static void vi_get_fast_clear_parameters(VkFormat format,
const VkClearColorValue *clear_value,
uint32_t* reset_value,
bool *can_avoid_fast_clear_elim)
{
bool values[4] = {};
int extra_channel;
bool main_value = false;
bool extra_value = false;
int i;
*can_avoid_fast_clear_elim = false;
*reset_value = 0x20202020U;
const struct vk_format_description *desc = vk_format_description(format);
if (format == VK_FORMAT_B10G11R11_UFLOAT_PACK32 ||
format == VK_FORMAT_R5G6B5_UNORM_PACK16 ||
format == VK_FORMAT_B5G6R5_UNORM_PACK16)
extra_channel = -1;
else if (desc->layout == VK_FORMAT_LAYOUT_PLAIN) {
if (radv_translate_colorswap(format, false) <= 1)
extra_channel = desc->nr_channels - 1;
else
extra_channel = 0;
} else
return;
for (i = 0; i < 4; i++) {
int index = desc->swizzle[i] - VK_SWIZZLE_X;
if (desc->swizzle[i] < VK_SWIZZLE_X ||
desc->swizzle[i] > VK_SWIZZLE_W)
continue;
if (desc->channel[i].pure_integer &&
desc->channel[i].type == VK_FORMAT_TYPE_SIGNED) {
/* Use the maximum value for clamping the clear color. */
int max = u_bit_consecutive(0, desc->channel[i].size - 1);
values[i] = clear_value->int32[i] != 0;
if (clear_value->int32[i] != 0 && MIN2(clear_value->int32[i], max) != max)
return;
} else if (desc->channel[i].pure_integer &&
desc->channel[i].type == VK_FORMAT_TYPE_UNSIGNED) {
/* Use the maximum value for clamping the clear color. */
unsigned max = u_bit_consecutive(0, desc->channel[i].size);
values[i] = clear_value->uint32[i] != 0U;
if (clear_value->uint32[i] != 0U && MIN2(clear_value->uint32[i], max) != max)
return;
} else {
values[i] = clear_value->float32[i] != 0.0F;
if (clear_value->float32[i] != 0.0F && clear_value->float32[i] != 1.0F)
return;
}
if (index == extra_channel)
extra_value = values[i];
else
main_value = values[i];
}
for (int i = 0; i < 4; ++i)
if (values[i] != main_value &&
desc->swizzle[i] - VK_SWIZZLE_X != extra_channel &&
desc->swizzle[i] >= VK_SWIZZLE_X &&
desc->swizzle[i] <= VK_SWIZZLE_W)
return;
*can_avoid_fast_clear_elim = true;
if (main_value)
*reset_value |= 0x80808080U;
if (extra_value)
*reset_value |= 0x40404040U;
return;
}
static bool
emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
const VkClearAttachment *clear_att,
@@ -881,8 +965,6 @@ emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
if (!radv_layout_can_fast_clear(iview->image, image_layout, radv_image_queue_family_mask(iview->image, cmd_buffer->queue_family_index, cmd_buffer->queue_family_index)))
goto fail;
if (vk_format_get_blocksizebits(iview->image->vk_format) > 64)
goto fail;
/* don't fast clear 3D */
if (iview->image->type == VK_IMAGE_TYPE_3D)
@@ -932,10 +1014,23 @@ emit_fast_color_clear(struct radv_cmd_buffer *cmd_buffer,
RADV_CMD_FLAG_FLUSH_AND_INV_CB_META;
/* clear cmask buffer */
if (iview->image->surface.dcc_size) {
uint32_t reset_value;
bool can_avoid_fast_clear_elim;
vi_get_fast_clear_parameters(iview->image->vk_format,
&clear_value, &reset_value,
&can_avoid_fast_clear_elim);
radv_fill_buffer(cmd_buffer, iview->image->bo,
iview->image->offset + iview->image->dcc_offset,
iview->image->surface.dcc_size, 0x20202020);
iview->image->surface.dcc_size, reset_value);
radv_set_dcc_need_cmask_elim_pred(cmd_buffer, iview->image,
!can_avoid_fast_clear_elim);
} else {
if (iview->image->surface.bpe > 8) {
/* 128 bit formats not supported */
return false;
}
radv_fill_buffer(cmd_buffer, iview->image->bo,
iview->image->offset + iview->image->cmask.offset,
iview->image->cmask.size, 0);
@@ -992,7 +1087,8 @@ subpass_needs_clear(const struct radv_cmd_buffer *cmd_buffer)
ds = cmd_state->subpass->depth_stencil_attachment.attachment;
for (uint32_t i = 0; i < cmd_state->subpass->color_count; ++i) {
uint32_t a = cmd_state->subpass->color_attachments[i].attachment;
if (cmd_state->attachments[a].pending_clear_aspects) {
if (a != VK_ATTACHMENT_UNUSED &&
cmd_state->attachments[a].pending_clear_aspects) {
return true;
}
}
@@ -1032,7 +1128,8 @@ radv_cmd_buffer_clear_subpass(struct radv_cmd_buffer *cmd_buffer)
for (uint32_t i = 0; i < cmd_state->subpass->color_count; ++i) {
uint32_t a = cmd_state->subpass->color_attachments[i].attachment;
if (!cmd_state->attachments[a].pending_clear_aspects)
if (a == VK_ATTACHMENT_UNUSED ||
!cmd_state->attachments[a].pending_clear_aspects)
continue;
assert(cmd_state->attachments[a].pending_clear_aspects ==

View File

@@ -334,6 +334,20 @@ emit_fast_clear_flush(struct radv_cmd_buffer *cmd_buffer,
RADV_CMD_FLAG_FLUSH_AND_INV_CB_META);
}
static void
radv_emit_set_predication_state_from_image(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image, bool value)
{
uint64_t va = 0;
if (value) {
va = cmd_buffer->device->ws->buffer_get_va(image->bo) + image->offset;
va += image->dcc_pred_offset;
}
si_emit_set_predication_state(cmd_buffer, va);
}
/**
*/
void
@@ -351,6 +365,10 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
radv_meta_save_pass(&saved_pass_state, cmd_buffer);
radv_meta_save_graphics_reset_vport_scissor_novertex(&saved_state, cmd_buffer);
if (image->surface.dcc_size) {
radv_emit_set_predication_state_from_image(cmd_buffer, image, true);
cmd_buffer->state.predicating = true;
}
for (uint32_t layer = 0; layer < layer_count; ++layer) {
struct radv_image_view iview;
@@ -413,6 +431,10 @@ radv_fast_clear_flush_image_inplace(struct radv_cmd_buffer *cmd_buffer,
&cmd_buffer->pool->alloc);
}
if (image->surface.dcc_size) {
cmd_buffer->state.predicating = false;
radv_emit_set_predication_state_from_image(cmd_buffer, image, false);
}
radv_meta_restore(&saved_state, cmd_buffer);
radv_meta_restore_pass(&saved_pass_state, cmd_buffer);
}

View File

@@ -560,6 +560,11 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
if (src_att.attachment == VK_ATTACHMENT_UNUSED ||
dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
struct radv_image *src_img = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment->image;
@@ -582,10 +587,13 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
if (src_att.attachment == VK_ATTACHMENT_UNUSED ||
dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
if (dst_img->surface.dcc_size) {
radv_initialize_dcc(cmd_buffer, dst_img, 0xffffffff);
cmd_buffer->state.attachments[dest_att.attachment].current_layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;

View File

@@ -447,11 +447,14 @@ radv_cmd_buffer_resolve_subpass_cs(struct radv_cmd_buffer *cmd_buffer)
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
if (src_att.attachment == VK_ATTACHMENT_UNUSED ||
dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
struct radv_image_view *src_iview = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
if (dst_img->surface.dcc_size) {
radv_initialize_dcc(cmd_buffer, dst_img, 0xffffffff);
cmd_buffer->state.attachments[dest_att.attachment].current_layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;

View File

@@ -160,6 +160,8 @@ static VkFormat pipeline_formats[] = {
VK_FORMAT_R8G8B8A8_UNORM,
VK_FORMAT_R8G8B8A8_UINT,
VK_FORMAT_R8G8B8A8_SINT,
VK_FORMAT_A2R10G10B10_UINT_PACK32,
VK_FORMAT_A2R10G10B10_SINT_PACK32,
VK_FORMAT_R16G16B16A16_UNORM,
VK_FORMAT_R16G16B16A16_SNORM,
VK_FORMAT_R16G16B16A16_UINT,
@@ -618,11 +620,14 @@ radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer *cmd_buffer)
for (uint32_t i = 0; i < subpass->color_count; ++i) {
VkAttachmentReference src_att = subpass->color_attachments[i];
VkAttachmentReference dest_att = subpass->resolve_attachments[i];
if (src_att.attachment == VK_ATTACHMENT_UNUSED ||
dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_image_view *dest_iview = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment;
struct radv_image *dst_img = dest_iview->image;
struct radv_image_view *src_iview = cmd_buffer->state.framebuffer->attachments[src_att.attachment].attachment;
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
if (dst_img->surface.dcc_size) {
radv_initialize_dcc(cmd_buffer, dst_img, 0xffffffff);

View File

@@ -230,6 +230,7 @@ radv_shader_compile_to_nir(struct radv_device *device,
.image_write_without_format = true,
.tessellation = true,
.int64 = true,
.variable_pointers = true,
};
entry_point = spirv_to_nir(spirv, module->size / 4,
spec_entries, num_spec_entries,
@@ -1034,14 +1035,17 @@ radv_pipeline_compute_spi_color_formats(struct radv_pipeline *pipeline,
unsigned col_format = 0;
for (unsigned i = 0; i < (single_cb_enable ? 1 : subpass->color_count); ++i) {
struct radv_render_pass_attachment *attachment;
unsigned cf;
attachment = pass->attachments + subpass->color_attachments[i].attachment;
if (subpass->color_attachments[i].attachment == VK_ATTACHMENT_UNUSED) {
cf = V_028714_SPI_SHADER_ZERO;
} else {
struct radv_render_pass_attachment *attachment = pass->attachments + subpass->color_attachments[i].attachment;
cf = si_choose_spi_color_format(attachment->format,
blend_enable & (1 << i),
blend_need_alpha & (1 << i));
cf = si_choose_spi_color_format(attachment->format,
blend_enable & (1 << i),
blend_need_alpha & (1 << i));
}
col_format |= cf << (4 * i);
}
@@ -1063,31 +1067,51 @@ format_is_int8(VkFormat format)
desc->channel[channel].size == 8;
}
static bool
format_is_int10(VkFormat format)
{
const struct vk_format_description *desc = vk_format_description(format);
if (desc->nr_channels != 4)
return false;
for (unsigned i = 0; i < 4; i++) {
if (desc->channel[i].pure_integer && desc->channel[i].size == 10)
return true;
}
return false;
}
unsigned radv_format_meta_fs_key(VkFormat format)
{
unsigned col_format = si_choose_spi_color_format(format, false, false) - 1;
bool is_int8 = format_is_int8(format);
bool is_int10 = format_is_int10(format);
return col_format + (is_int8 ? 3 : 0);
return col_format + (is_int8 ? 3 : is_int10 ? 5 : 0);
}
static unsigned
radv_pipeline_compute_is_int8(const VkGraphicsPipelineCreateInfo *pCreateInfo)
static void
radv_pipeline_compute_get_int_clamp(const VkGraphicsPipelineCreateInfo *pCreateInfo,
unsigned *is_int8, unsigned *is_int10)
{
RADV_FROM_HANDLE(radv_render_pass, pass, pCreateInfo->renderPass);
struct radv_subpass *subpass = pass->subpasses + pCreateInfo->subpass;
unsigned is_int8 = 0;
*is_int8 = 0;
*is_int10 = 0;
for (unsigned i = 0; i < subpass->color_count; ++i) {
struct radv_render_pass_attachment *attachment;
if (subpass->color_attachments[i].attachment == VK_ATTACHMENT_UNUSED)
continue;
attachment = pass->attachments + subpass->color_attachments[i].attachment;
if (format_is_int8(attachment->format))
is_int8 |= 1 << i;
*is_int8 |= 1 << i;
if (format_is_int10(attachment->format))
*is_int10 |= 1 << i;
}
return is_int8;
}
static void
@@ -1342,7 +1366,9 @@ radv_pipeline_init_multisample_state(struct radv_pipeline *pipeline,
else
ms->num_samples = 1;
if (pipeline->shaders[MESA_SHADER_FRAGMENT]->info.fs.force_persample) {
if (vkms && vkms->sampleShadingEnable) {
ps_iter_samples = ceil(vkms->minSampleShading * ms->num_samples);
} else if (pipeline->shaders[MESA_SHADER_FRAGMENT]->info.fs.force_persample) {
ps_iter_samples = ms->num_samples;
}
@@ -2044,9 +2070,11 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
}
if (modules[MESA_SHADER_FRAGMENT]) {
union ac_shader_variant_key key;
union ac_shader_variant_key key = {0};
key.fs.col_format = pipeline->graphics.blend.spi_shader_col_format;
key.fs.is_int8 = radv_pipeline_compute_is_int8(pCreateInfo);
if (pipeline->device->physical_device->rad_info.chip_class < VI)
radv_pipeline_compute_get_int_clamp(pCreateInfo, &key.fs.is_int8, &key.fs.is_int10);
const VkPipelineShaderStageCreateInfo *stage = pStages[MESA_SHADER_FRAGMENT];

View File

@@ -84,7 +84,7 @@ typedef uint32_t xcb_window_t;
#define MAX_PUSH_DESCRIPTORS 32
#define MAX_DYNAMIC_BUFFERS 16
#define MAX_SAMPLES_LOG2 4
#define NUM_META_FS_KEYS 11
#define NUM_META_FS_KEYS 13
#define RADV_MAX_DRM_DEVICES 8
#define NUM_DEPTH_CLEAR_PIPELINES 3
@@ -547,6 +547,8 @@ struct radv_device {
/* Backup in-memory cache to be used if the app doesn't provide one */
struct radv_pipeline_cache * mem_cache;
uint32_t image_mrt_offset_counter;
};
struct radv_device_memory {
@@ -869,7 +871,7 @@ void si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
bool is_mec,
enum radv_cmd_flush_bits flush_bits);
void si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer);
void si_emit_set_pred(struct radv_cmd_buffer *cmd_buffer, uint64_t va);
void si_emit_set_predication_state(struct radv_cmd_buffer *cmd_buffer, uint64_t va);
void si_cp_dma_buffer_copy(struct radv_cmd_buffer *cmd_buffer,
uint64_t src_va, uint64_t dest_va,
uint64_t size);
@@ -912,6 +914,9 @@ void radv_set_color_clear_regs(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
int idx,
uint32_t color_values[2]);
void radv_set_dcc_need_cmask_elim_pred(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image,
bool value);
void radv_fill_buffer(struct radv_cmd_buffer *cmd_buffer,
struct radeon_winsys_bo *bo,
uint64_t offset, uint64_t size, uint32_t value);
@@ -1205,6 +1210,8 @@ struct radv_image {
bool exclusive;
unsigned queue_family_mask;
bool shareable;
/* Set when bound */
struct radeon_winsys_bo *bo;
VkDeviceSize offset;
@@ -1215,6 +1222,7 @@ struct radv_image {
struct radv_fmask_info fmask;
struct radv_cmask_info cmask;
uint32_t clear_value_offset;
uint32_t dcc_pred_offset;
};
/* Whether the image has a htile that is known consistent with the contents of
@@ -1276,6 +1284,12 @@ struct radv_image_view {
uint32_t descriptor[8];
uint32_t fmask_descriptor[8];
/* Descriptor for use as a storage image as opposed to a sampled image.
* This has a few differences for cube maps (e.g. type).
*/
uint32_t storage_descriptor[8];
uint32_t storage_fmask_descriptor[8];
};
struct radv_image_create_info {
@@ -1456,6 +1470,20 @@ struct radv_query_pool {
uint32_t pipeline_stats_mask;
};
struct radv_semaphore {
/* use a winsys sem for non-exportable */
struct radeon_winsys_sem *sem;
uint32_t syncobj;
uint32_t temp_syncobj;
};
VkResult radv_alloc_sem_info(struct radv_winsys_sem_info *sem_info,
int num_wait_sems,
const VkSemaphore *wait_sems,
int num_signal_sems,
const VkSemaphore *signal_sems);
void radv_free_sem_info(struct radv_winsys_sem_info *sem_info);
void
radv_update_descriptor_sets(struct radv_device *device,
struct radv_cmd_buffer *cmd_buffer,
@@ -1549,6 +1577,6 @@ RADV_DEFINE_NONDISP_HANDLE_CASTS(radv_query_pool, VkQueryPool)
RADV_DEFINE_NONDISP_HANDLE_CASTS(radv_render_pass, VkRenderPass)
RADV_DEFINE_NONDISP_HANDLE_CASTS(radv_sampler, VkSampler)
RADV_DEFINE_NONDISP_HANDLE_CASTS(radv_shader_module, VkShaderModule)
RADV_DEFINE_NONDISP_HANDLE_CASTS(radeon_winsys_sem, VkSemaphore)
RADV_DEFINE_NONDISP_HANDLE_CASTS(radv_semaphore, VkSemaphore)
#endif /* RADV_PRIVATE_H */

View File

@@ -131,9 +131,23 @@ struct radeon_bo_metadata {
uint32_t metadata[64];
};
uint32_t syncobj_handle;
struct radeon_winsys_bo;
struct radeon_winsys_fence;
struct radeon_winsys_sem;
struct radv_winsys_sem_counts {
uint32_t syncobj_count;
uint32_t sem_count;
uint32_t *syncobj;
struct radeon_winsys_sem **sem;
};
struct radv_winsys_sem_info {
bool cs_emit_signal;
bool cs_emit_wait;
struct radv_winsys_sem_counts wait;
struct radv_winsys_sem_counts signal;
};
struct radeon_winsys {
void (*destroy)(struct radeon_winsys *ws);
@@ -191,10 +205,7 @@ struct radeon_winsys {
unsigned cs_count,
struct radeon_winsys_cs *initial_preamble_cs,
struct radeon_winsys_cs *continue_preamble_cs,
struct radeon_winsys_sem **wait_sem,
unsigned wait_sem_count,
struct radeon_winsys_sem **signal_sem,
unsigned signal_sem_count,
struct radv_winsys_sem_info *sem_info,
bool can_patch,
struct radeon_winsys_fence *fence);
@@ -221,9 +232,17 @@ struct radeon_winsys {
bool absolute,
uint64_t timeout);
/* old semaphores - non shareable */
struct radeon_winsys_sem *(*create_sem)(struct radeon_winsys *ws);
void (*destroy_sem)(struct radeon_winsys_sem *sem);
/* new shareable sync objects */
int (*create_syncobj)(struct radeon_winsys *ws, uint32_t *handle);
void (*destroy_syncobj)(struct radeon_winsys *ws, uint32_t handle);
int (*export_syncobj)(struct radeon_winsys *ws, uint32_t syncobj, int *fd);
int (*import_syncobj)(struct radeon_winsys *ws, int fd, uint32_t *syncobj);
};
static inline void radeon_emit(struct radeon_winsys_cs *cs, uint32_t value)

View File

@@ -185,8 +185,8 @@ radv_wsi_image_create(VkDevice device_h,
VkDeviceMemory memory_h;
const VkDedicatedAllocationMemoryAllocateInfoNV ded_alloc = {
.sType = VK_STRUCTURE_TYPE_DEDICATED_ALLOCATION_MEMORY_ALLOCATE_INFO_NV,
const VkMemoryDedicatedAllocateInfoKHR ded_alloc = {
.sType = VK_STRUCTURE_TYPE_MEMORY_DEDICATED_ALLOCATE_INFO_KHR,
.pNext = NULL,
.buffer = VK_NULL_HANDLE,
.image = image_h
@@ -442,7 +442,6 @@ VkResult radv_AcquireNextImageKHR(
fence->submitted = true;
fence->signalled = true;
}
return result;
}
@@ -452,7 +451,6 @@ VkResult radv_QueuePresentKHR(
{
RADV_FROM_HANDLE(radv_queue, queue, _queue);
VkResult result = VK_SUCCESS;
const VkPresentRegionsKHR *regions =
vk_find_struct_const(pPresentInfo->pNext, PRESENT_REGIONS_KHR);
@@ -461,6 +459,20 @@ VkResult radv_QueuePresentKHR(
struct radeon_winsys_cs *cs;
const VkPresentRegionKHR *region = NULL;
VkResult item_result;
struct radv_winsys_sem_info sem_info;
item_result = radv_alloc_sem_info(&sem_info,
pPresentInfo->waitSemaphoreCount,
pPresentInfo->pWaitSemaphores,
0,
NULL);
if (pPresentInfo->pResults != NULL)
pPresentInfo->pResults[i] = item_result;
result = result == VK_SUCCESS ? item_result : result;
if (item_result != VK_SUCCESS) {
radv_free_sem_info(&sem_info);
continue;
}
assert(radv_device_from_handle(swapchain->device) == queue->device);
if (swapchain->fences[0] == VK_NULL_HANDLE) {
@@ -472,8 +484,10 @@ VkResult radv_QueuePresentKHR(
if (pPresentInfo->pResults != NULL)
pPresentInfo->pResults[i] = item_result;
result = result == VK_SUCCESS ? item_result : result;
if (item_result != VK_SUCCESS)
if (item_result != VK_SUCCESS) {
radv_free_sem_info(&sem_info);
continue;
}
} else {
radv_ResetFences(radv_device_to_handle(queue->device),
1, &swapchain->fences[0]);
@@ -487,11 +501,12 @@ VkResult radv_QueuePresentKHR(
RADV_FROM_HANDLE(radv_fence, fence, swapchain->fences[0]);
struct radeon_winsys_fence *base_fence = fence->fence;
struct radeon_winsys_ctx *ctx = queue->hw_ctx;
queue->device->ws->cs_submit(ctx, queue->queue_idx,
&cs,
1, NULL, NULL,
(struct radeon_winsys_sem **)pPresentInfo->pWaitSemaphores,
pPresentInfo->waitSemaphoreCount, NULL, 0, false, base_fence);
&sem_info,
false, base_fence);
fence->submitted = true;
if (regions && regions->pRegions)
@@ -504,8 +519,10 @@ VkResult radv_QueuePresentKHR(
if (pPresentInfo->pResults != NULL)
pPresentInfo->pResults[i] = item_result;
result = result == VK_SUCCESS ? item_result : result;
if (item_result != VK_SUCCESS)
if (item_result != VK_SUCCESS) {
radv_free_sem_info(&sem_info);
continue;
}
VkFence last = swapchain->fences[2];
swapchain->fences[2] = swapchain->fences[1];
@@ -517,6 +534,7 @@ VkResult radv_QueuePresentKHR(
1, &last, true, 1);
}
radv_free_sem_info(&sem_info);
}
return VK_SUCCESS;

View File

@@ -1129,8 +1129,9 @@ si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer)
cmd_buffer->state.flush_bits = 0;
}
/* sets the CP predication state using a boolean stored at va */
void
si_emit_set_pred(struct radv_cmd_buffer *cmd_buffer, uint64_t va)
si_emit_set_predication_state(struct radv_cmd_buffer *cmd_buffer, uint64_t va)
{
uint32_t val = 0;

View File

@@ -89,6 +89,14 @@ static int ring_to_hw_ip(enum ring_type ring)
}
}
static int radv_amdgpu_signal_sems(struct radv_amdgpu_ctx *ctx,
uint32_t ip_type,
uint32_t ring,
struct radv_winsys_sem_info *sem_info);
static int radv_amdgpu_cs_submit(struct radv_amdgpu_ctx *ctx,
struct amdgpu_cs_request *request,
struct radv_winsys_sem_info *sem_info);
static void radv_amdgpu_request_to_fence(struct radv_amdgpu_ctx *ctx,
struct radv_amdgpu_fence *fence,
struct amdgpu_cs_request *req)
@@ -647,6 +655,7 @@ static void radv_assign_last_submit(struct radv_amdgpu_ctx *ctx,
static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx,
int queue_idx,
struct radv_winsys_sem_info *sem_info,
struct radeon_winsys_cs **cs_array,
unsigned cs_count,
struct radeon_winsys_cs *initial_preamble_cs,
@@ -703,7 +712,7 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx,
ibs[0] = ((struct radv_amdgpu_cs*)initial_preamble_cs)->ib;
}
r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1);
r = radv_amdgpu_cs_submit(ctx, &request, sem_info);
if (r) {
if (r == -ENOMEM)
fprintf(stderr, "amdgpu: Not enough memory for command submission.\n");
@@ -724,6 +733,7 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx,
static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx,
int queue_idx,
struct radv_winsys_sem_info *sem_info,
struct radeon_winsys_cs **cs_array,
unsigned cs_count,
struct radeon_winsys_cs *initial_preamble_cs,
@@ -735,7 +745,7 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx,
struct radv_amdgpu_fence *fence = (struct radv_amdgpu_fence *)_fence;
amdgpu_bo_list_handle bo_list;
struct amdgpu_cs_request request;
bool emit_signal_sem = sem_info->cs_emit_signal;
assert(cs_count);
for (unsigned i = 0; i < cs_count;) {
@@ -775,7 +785,8 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx,
}
}
r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1);
sem_info->cs_emit_signal = (i == cs_count - cnt) ? emit_signal_sem : false;
r = radv_amdgpu_cs_submit(ctx, &request, sem_info);
if (r) {
if (r == -ENOMEM)
fprintf(stderr, "amdgpu: Not enough memory for command submission.\n");
@@ -801,6 +812,7 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx,
static int radv_amdgpu_winsys_cs_submit_sysmem(struct radeon_winsys_ctx *_ctx,
int queue_idx,
struct radv_winsys_sem_info *sem_info,
struct radeon_winsys_cs **cs_array,
unsigned cs_count,
struct radeon_winsys_cs *initial_preamble_cs,
@@ -815,6 +827,7 @@ static int radv_amdgpu_winsys_cs_submit_sysmem(struct radeon_winsys_ctx *_ctx,
amdgpu_bo_list_handle bo_list;
struct amdgpu_cs_request request;
uint32_t pad_word = 0xffff1000U;
bool emit_signal_sem = sem_info->cs_emit_signal;
if (radv_amdgpu_winsys(ws)->info.chip_class == SI)
pad_word = 0x80000000;
@@ -880,7 +893,8 @@ static int radv_amdgpu_winsys_cs_submit_sysmem(struct radeon_winsys_ctx *_ctx,
request.ibs = &ib;
request.fence_info = radv_set_cs_fence(ctx, cs0->hw_ip, queue_idx);
r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1);
sem_info->cs_emit_signal = (i == cs_count - cnt) ? emit_signal_sem : false;
r = radv_amdgpu_cs_submit(ctx, &request, sem_info);
if (r) {
if (r == -ENOMEM)
fprintf(stderr, "amdgpu: Not enough memory for command submission.\n");
@@ -911,39 +925,27 @@ static int radv_amdgpu_winsys_cs_submit(struct radeon_winsys_ctx *_ctx,
unsigned cs_count,
struct radeon_winsys_cs *initial_preamble_cs,
struct radeon_winsys_cs *continue_preamble_cs,
struct radeon_winsys_sem **wait_sem,
unsigned wait_sem_count,
struct radeon_winsys_sem **signal_sem,
unsigned signal_sem_count,
struct radv_winsys_sem_info *sem_info,
bool can_patch,
struct radeon_winsys_fence *_fence)
{
struct radv_amdgpu_cs *cs = radv_amdgpu_cs(cs_array[0]);
struct radv_amdgpu_ctx *ctx = radv_amdgpu_ctx(_ctx);
int ret;
int i;
for (i = 0; i < wait_sem_count; i++) {
amdgpu_semaphore_handle sem = (amdgpu_semaphore_handle)wait_sem[i];
amdgpu_cs_wait_semaphore(ctx->ctx, cs->hw_ip, 0, queue_idx,
sem);
}
assert(sem_info);
if (!cs->ws->use_ib_bos) {
ret = radv_amdgpu_winsys_cs_submit_sysmem(_ctx, queue_idx, cs_array,
ret = radv_amdgpu_winsys_cs_submit_sysmem(_ctx, queue_idx, sem_info, cs_array,
cs_count, initial_preamble_cs, continue_preamble_cs, _fence);
} else if (can_patch && cs_count > AMDGPU_CS_MAX_IBS_PER_SUBMIT && cs->ws->batchchain) {
ret = radv_amdgpu_winsys_cs_submit_chained(_ctx, queue_idx, cs_array,
ret = radv_amdgpu_winsys_cs_submit_chained(_ctx, queue_idx, sem_info, cs_array,
cs_count, initial_preamble_cs, continue_preamble_cs, _fence);
} else {
ret = radv_amdgpu_winsys_cs_submit_fallback(_ctx, queue_idx, cs_array,
ret = radv_amdgpu_winsys_cs_submit_fallback(_ctx, queue_idx, sem_info, cs_array,
cs_count, initial_preamble_cs, continue_preamble_cs, _fence);
}
for (i = 0; i < signal_sem_count; i++) {
amdgpu_semaphore_handle sem = (amdgpu_semaphore_handle)signal_sem[i];
amdgpu_cs_signal_semaphore(ctx->ctx, cs->hw_ip, 0, queue_idx,
sem);
}
radv_amdgpu_signal_sems(ctx, cs->hw_ip, queue_idx, sem_info);
return ret;
}
@@ -1042,19 +1044,202 @@ static bool radv_amdgpu_ctx_wait_idle(struct radeon_winsys_ctx *rwctx,
static struct radeon_winsys_sem *radv_amdgpu_create_sem(struct radeon_winsys *_ws)
{
int ret;
amdgpu_semaphore_handle sem;
ret = amdgpu_cs_create_semaphore(&sem);
if (ret)
struct amdgpu_cs_fence *sem = CALLOC_STRUCT(amdgpu_cs_fence);
if (!sem)
return NULL;
return (struct radeon_winsys_sem *)sem;
}
static void radv_amdgpu_destroy_sem(struct radeon_winsys_sem *_sem)
{
amdgpu_semaphore_handle sem = (amdgpu_semaphore_handle)_sem;
amdgpu_cs_destroy_semaphore(sem);
struct amdgpu_cs_fence *sem = (struct amdgpu_cs_fence *)_sem;
FREE(sem);
}
static int radv_amdgpu_signal_sems(struct radv_amdgpu_ctx *ctx,
uint32_t ip_type,
uint32_t ring,
struct radv_winsys_sem_info *sem_info)
{
for (unsigned i = 0; i < sem_info->signal.sem_count; i++) {
struct amdgpu_cs_fence *sem = (struct amdgpu_cs_fence *)(sem_info->signal.sem)[i];
if (sem->context)
return -EINVAL;
*sem = ctx->last_submission[ip_type][ring].fence;
}
return 0;
}
static struct drm_amdgpu_cs_chunk_sem *radv_amdgpu_cs_alloc_syncobj_chunk(struct radv_winsys_sem_counts *counts,
struct drm_amdgpu_cs_chunk *chunk, int chunk_id)
{
struct drm_amdgpu_cs_chunk_sem *syncobj = malloc(sizeof(struct drm_amdgpu_cs_chunk_sem) * counts->syncobj_count);
if (!syncobj)
return NULL;
for (unsigned i = 0; i < counts->syncobj_count; i++) {
struct drm_amdgpu_cs_chunk_sem *sem = &syncobj[i];
sem->handle = counts->syncobj[i];
}
chunk->chunk_id = chunk_id;
chunk->length_dw = sizeof(struct drm_amdgpu_cs_chunk_sem) / 4 * counts->syncobj_count;
chunk->chunk_data = (uint64_t)(uintptr_t)syncobj;
return syncobj;
}
static int radv_amdgpu_cs_submit(struct radv_amdgpu_ctx *ctx,
struct amdgpu_cs_request *request,
struct radv_winsys_sem_info *sem_info)
{
int r;
int num_chunks;
int size;
bool user_fence;
struct drm_amdgpu_cs_chunk *chunks;
struct drm_amdgpu_cs_chunk_data *chunk_data;
struct drm_amdgpu_cs_chunk_dep *sem_dependencies = NULL;
struct drm_amdgpu_cs_chunk_sem *wait_syncobj = NULL, *signal_syncobj = NULL;
int i;
struct amdgpu_cs_fence *sem;
user_fence = (request->fence_info.handle != NULL);
size = request->number_of_ibs + (user_fence ? 2 : 1) + 3;
chunks = alloca(sizeof(struct drm_amdgpu_cs_chunk) * size);
size = request->number_of_ibs + (user_fence ? 1 : 0);
chunk_data = alloca(sizeof(struct drm_amdgpu_cs_chunk_data) * size);
num_chunks = request->number_of_ibs;
for (i = 0; i < request->number_of_ibs; i++) {
struct amdgpu_cs_ib_info *ib;
chunks[i].chunk_id = AMDGPU_CHUNK_ID_IB;
chunks[i].length_dw = sizeof(struct drm_amdgpu_cs_chunk_ib) / 4;
chunks[i].chunk_data = (uint64_t)(uintptr_t)&chunk_data[i];
ib = &request->ibs[i];
chunk_data[i].ib_data._pad = 0;
chunk_data[i].ib_data.va_start = ib->ib_mc_address;
chunk_data[i].ib_data.ib_bytes = ib->size * 4;
chunk_data[i].ib_data.ip_type = request->ip_type;
chunk_data[i].ib_data.ip_instance = request->ip_instance;
chunk_data[i].ib_data.ring = request->ring;
chunk_data[i].ib_data.flags = ib->flags;
}
if (user_fence) {
i = num_chunks++;
chunks[i].chunk_id = AMDGPU_CHUNK_ID_FENCE;
chunks[i].length_dw = sizeof(struct drm_amdgpu_cs_chunk_fence) / 4;
chunks[i].chunk_data = (uint64_t)(uintptr_t)&chunk_data[i];
amdgpu_cs_chunk_fence_info_to_data(&request->fence_info,
&chunk_data[i]);
}
if (sem_info->wait.syncobj_count && sem_info->cs_emit_wait) {
wait_syncobj = radv_amdgpu_cs_alloc_syncobj_chunk(&sem_info->wait,
&chunks[num_chunks],
AMDGPU_CHUNK_ID_SYNCOBJ_IN);
if (!wait_syncobj) {
r = -ENOMEM;
goto error_out;
}
num_chunks++;
if (sem_info->wait.sem_count == 0)
sem_info->cs_emit_wait = false;
}
if (sem_info->wait.sem_count && sem_info->cs_emit_wait) {
sem_dependencies = malloc(sizeof(struct drm_amdgpu_cs_chunk_dep) * sem_info->wait.sem_count);
if (!sem_dependencies) {
r = -ENOMEM;
goto error_out;
}
int sem_count = 0;
for (unsigned j = 0; j < sem_info->wait.sem_count; j++) {
sem = (struct amdgpu_cs_fence *)sem_info->wait.sem[j];
if (!sem->context)
continue;
struct drm_amdgpu_cs_chunk_dep *dep = &sem_dependencies[sem_count++];
amdgpu_cs_chunk_fence_to_dep(sem, dep);
sem->context = NULL;
}
i = num_chunks++;
/* dependencies chunk */
chunks[i].chunk_id = AMDGPU_CHUNK_ID_DEPENDENCIES;
chunks[i].length_dw = sizeof(struct drm_amdgpu_cs_chunk_dep) / 4 * sem_count;
chunks[i].chunk_data = (uint64_t)(uintptr_t)sem_dependencies;
sem_info->cs_emit_wait = false;
}
if (sem_info->signal.syncobj_count && sem_info->cs_emit_signal) {
signal_syncobj = radv_amdgpu_cs_alloc_syncobj_chunk(&sem_info->signal,
&chunks[num_chunks],
AMDGPU_CHUNK_ID_SYNCOBJ_OUT);
if (!signal_syncobj) {
r = -ENOMEM;
goto error_out;
}
num_chunks++;
}
r = amdgpu_cs_submit_raw(ctx->ws->dev,
ctx->ctx,
request->resources,
num_chunks,
chunks,
&request->seq_no);
error_out:
free(sem_dependencies);
free(wait_syncobj);
free(signal_syncobj);
return r;
}
static int radv_amdgpu_create_syncobj(struct radeon_winsys *_ws,
uint32_t *handle)
{
struct radv_amdgpu_winsys *ws = radv_amdgpu_winsys(_ws);
return amdgpu_cs_create_syncobj(ws->dev, handle);
}
static void radv_amdgpu_destroy_syncobj(struct radeon_winsys *_ws,
uint32_t handle)
{
struct radv_amdgpu_winsys *ws = radv_amdgpu_winsys(_ws);
amdgpu_cs_destroy_syncobj(ws->dev, handle);
}
static int radv_amdgpu_export_syncobj(struct radeon_winsys *_ws,
uint32_t syncobj,
int *fd)
{
struct radv_amdgpu_winsys *ws = radv_amdgpu_winsys(_ws);
return amdgpu_cs_export_syncobj(ws->dev, syncobj, fd);
}
static int radv_amdgpu_import_syncobj(struct radeon_winsys *_ws,
int fd,
uint32_t *syncobj)
{
struct radv_amdgpu_winsys *ws = radv_amdgpu_winsys(_ws);
return amdgpu_cs_import_syncobj(ws->dev, fd, syncobj);
}
void radv_amdgpu_cs_init_functions(struct radv_amdgpu_winsys *ws)
@@ -1075,5 +1260,9 @@ void radv_amdgpu_cs_init_functions(struct radv_amdgpu_winsys *ws)
ws->base.destroy_fence = radv_amdgpu_destroy_fence;
ws->base.create_sem = radv_amdgpu_create_sem;
ws->base.destroy_sem = radv_amdgpu_destroy_sem;
ws->base.create_syncobj = radv_amdgpu_create_syncobj;
ws->base.destroy_syncobj = radv_amdgpu_destroy_syncobj;
ws->base.export_syncobj = radv_amdgpu_export_syncobj;
ws->base.import_syncobj = radv_amdgpu_import_syncobj;
ws->base.fence_wait = radv_amdgpu_fence_wait;
}

View File

@@ -37,7 +37,7 @@ $(intermediates)/dummy.c:
$(hide) touch $@
# This is the list of auto-generated files headers
LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/, $(BROADCOM_GENXML_GENERATED_FILES))
LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/broadcom/, $(BROADCOM_GENXML_GENERATED_FILES))
define header-gen
@mkdir -p $(dir $@)
@@ -45,13 +45,13 @@ define header-gen
$(hide) $(PRIVATE_SCRIPT) $(PRIVATE_SCRIPT_FLAGS) $(PRIVATE_XML) > $@
endef
$(intermediates)/cle/v3d_packet_v21_pack.h: PRIVATE_SCRIPT := $(MESA_PYTHON2) $(LOCAL_PATH)/cle/gen_pack_header.py
$(intermediates)/cle/v3d_packet_v21_pack.h: PRIVATE_XML := $(LOCAL_PATH)/cle/v3d_packet_v21.xml
$(intermediates)/cle/v3d_packet_v21_pack.h: $(LOCAL_PATH)/cle/v3d_packet_v21.xml $(LOCAL_PATH)/cle/gen_pack_header.py
$(intermediates)/broadcom/cle/v3d_packet_v21_pack.h: PRIVATE_SCRIPT := $(MESA_PYTHON2) $(LOCAL_PATH)/cle/gen_pack_header.py
$(intermediates)/broadcom/cle/v3d_packet_v21_pack.h: PRIVATE_XML := $(LOCAL_PATH)/cle/v3d_packet_v21.xml
$(intermediates)/broadcom/cle/v3d_packet_v21_pack.h: $(LOCAL_PATH)/cle/v3d_packet_v21.xml $(LOCAL_PATH)/cle/gen_pack_header.py
$(call header-gen)
LOCAL_EXPORT_C_INCLUDE_DIRS := \
$(MESA_TOP)/src/broadcom \
$(MESA_TOP)/src/broadcom/cle \
$(intermediates)
include $(MESA_COMMON_MK)

View File

@@ -8,5 +8,6 @@ BROADCOM_GENXML_XML_FILES = \
BROADCOM_FILES = \
cle/v3d_packet_helpers.h \
common/v3d_device_info.h \
$()

View File

@@ -326,6 +326,11 @@ class Group(object):
if field.type != "mbo":
convert = None
args = []
args.append('cl')
args.append(str(start + field.start))
args.append(str(start + field.end))
if field.type == "address":
convert = "__gen_unpack_address"
elif field.type == "uint":
@@ -339,17 +344,17 @@ class Group(object):
elif field.type == "offset":
convert = "__gen_unpack_offset"
elif field.type == 'ufixed':
args.append(str(field.fractional_size))
convert = "__gen_unpack_ufixed"
elif field.type == 'sfixed':
args.append(str(field.fractional_size))
convert = "__gen_unpack_sfixed"
else:
print("/* unhandled field %s, type %s */\n" % (name, field.type))
s = None
print(" values->%s = %s(cl, %s, %s);" % \
(field.name, convert, \
start + field.start, start + field.end))
print(" values->%s = %s(%s);" % \
(field.name, convert, ', '.join(args)))
class Value(object):
def __init__(self, attrs):
@@ -378,8 +383,8 @@ class Parser(object):
def start_element(self, name, attrs):
if name == "vcxml":
self.platform = "V3D {}".format(attrs["ver"])
self.ver = attrs["ver"].replace('.', '')
self.platform = "V3D {}".format(attrs["gen"])
self.ver = attrs["gen"].replace('.', '')
print(pack_header % {'license': license, 'platform': self.platform, 'guard': self.gen_guard()})
elif name in ("packet", "struct", "register"):
default_field = None

View File

@@ -176,6 +176,22 @@ __gen_unpack_sint(const uint8_t *restrict cl, uint32_t start, uint32_t end)
return (val << (64 - size)) >> (64 - size);
}
static inline float
__gen_unpack_sfixed(const uint8_t *restrict cl, uint32_t start, uint32_t end,
uint32_t fractional_size)
{
int32_t bits = __gen_unpack_sint(cl, start, end);
return (float)bits / (1 << fractional_size);
}
static inline float
__gen_unpack_ufixed(const uint8_t *restrict cl, uint32_t start, uint32_t end,
uint32_t fractional_size)
{
int32_t bits = __gen_unpack_uint(cl, start, end);
return (float)bits / (1 << fractional_size);
}
static inline float
__gen_unpack_float(const uint8_t *restrict cl, uint32_t start, uint32_t end)
{

View File

@@ -1,4 +1,4 @@
<vcxml ver="2.1">
<vcxml gen="2.1">
<packet name="Halt" code="0"/>
<packet name="NOP" code="1"/>
<packet name="Flush" code="4" cl="B"/>
@@ -31,6 +31,76 @@
<field name="Disable Color Buffer read" size="1" start="0" type="bool"/>
</packet>
<packet name="Store Tile Buffer General" code="28" cl="R">
<field name="Memory base address of frame/tile dump buffer" size="32" start="16" type="address"/>
<field name="Last Tile of Frame" size="1" start="19" type="bool"/>
<field name="Disable VG-Mask buffer dump" size="1" start="18" type="bool"/>
<field name="Disable Z/Stencil buffer dump" size="1" start="17" type="bool"/>
<field name="Disable Color buffer dump" size="1" start="16" type="bool"/>
<field name="Disable VG-Mask buffer clear on store/dump" size="1" start="15" type="bool"/>
<field name="Disable Z/Stencil buffer clear on store/dump" size="1" start="14" type="bool"/>
<field name="Disable Color buffer clear on store/dump" size="1" start="13" type="bool"/>
<field name="Pixel Color Format" size="2" start="8" type="uint">
<value name="rgba8888" value="0"/>
<value name="bgr565 dithered" value="1"/>
<value name="bgr565 no dither" value="2"/>
</field>
<field name="Mode" size="2" start="6" type="uint">
<value name="Sample 0" value="0"/>
<value name="Decimate x4" value="1"/>
<value name="Decimate x16" value="2"/>
</field>
<field name="Format" size="2" start="4" type="uint">
<value name="Raster" value="0"/>
<value name="T" value="1"/>
<value name="LT" value="2"/>
</field>
<field name="Buffer to Store" size="3" start="0" type="uint">
<value name="None" value="0"/>
<value name="Color" value="1"/>
<value name="Z/stencil" value="2"/>
<value name="Z" value="3"/>
<value name="VG-Mask" value="4"/>
</field>
</packet>
<packet name="Load Tile Buffer General" code="29" cl="R">
<field name="Memory base address of frame/tile dump buffer" size="32" start="16" type="address"/>
<field name="Disable VG-Mask buffer load" size="1" start="18" type="bool"/>
<field name="Disable Z/Stencil buffer load" size="1" start="17" type="bool"/>
<field name="Disable Color buffer load" size="1" start="16" type="bool"/>
<field name="Pixel Color Format" size="2" start="8" type="uint">
<value name="rgba8888" value="0"/>
<value name="bgr565 dithered" value="1"/>
<value name="bgr565 no dither" value="2"/>
</field>
<field name="Mode" size="2" start="6" type="uint">
<value name="Sample 0" value="0"/>
<value name="Decimate x4" value="1"/>
<value name="Decimate x16" value="2"/>
</field>
<field name="Format" size="2" start="4" type="uint">
<value name="Raster" value="0"/>
<value name="T" value="1"/>
<value name="LT" value="2"/>
</field>
<field name="Buffer to Store" size="3" start="0" type="uint">
<value name="None" value="0"/>
<value name="Color" value="1"/>
<value name="Z/stencil" value="2"/>
<value name="Z" value="3"/>
<value name="VG-Mask" value="4"/>
</field>
</packet>
<packet name="Indexed Primitive List" code="32">
<field name="Maximum Index" size="32" start="72" type="uint"/>
<field name="Address of Indices List" size="32" start="40" type="uint"/>
@@ -191,6 +261,42 @@
</packet>
<packet name="Tile Rendering Mode Configuration" code="113" cl="R">
<field name="Double-buffer in non-ms mode" size="1" start="76" type="bool"/>
<field name="Early-Z/Early-Cov disable" size="1" start="75" type="bool"/>
<field name="Early-Z Update Direction GT/GE" size="1" start="74" type="bool"/>
<field name="Select Coverage Mode" size="1" start="73" type="bool"/>
<field name="Enable VG Mask Buffer" size="1" start="72" type="bool"/>
<field name="Memory Format" size="2" start="70" type="uint">
<value name="Raster" value="0"/>
<value name="T" value="1"/>
<value name="LT" value="2"/>
</field>
<field name="Decimate Mode" size="2" start="68" type="uint"/>
<field name="Non-HDR Frame Buffer Color Format" size="2" start="66" type="uint">
<value name="rendering config bgr565 dithered" value="0"/>
<value name="rendering config rgba8888" value="1"/>
<value name="rendering config bgr565 no dither" value="2"/>
</field>
<field name="Tile Buffer 64-bit Color Depth" size="1" start="65" type="bool"/>
<field name="Multisample Mode (4x)" size="1" start="64" type="bool"/>
<field name="Height (pixels)" size="16" start="48" type="uint"/>
<field name="Width (pixels)" size="16" start="32" type="uint"/>
<field name="Memory Address" size="32" start="0" type="address"/>
</packet>
<packet name="Tile Coordinates" code="115" cl="R">
<field name="Tile Row Number" size="8" start="8" type="uint"/>
<field name="Tile Column Number" size="8" start="0" type="uint"/>
</packet>
<packet name="Gem Relocations" code="254" cl="B">
<field name="buffer 1" size="32" start="32" type="uint"/>
<field name="buffer 0" size="32" start="0" type="uint"/>
</packet>
<struct name="Shader Record">
<field name="Fragment Shader is single threaded" size="1" start="0" type="bool"/>
<field name="Point Size included in shaded vertex data" size="1" start="1" type="bool"/>

View File

@@ -0,0 +1,39 @@
/*
* Copyright © 2016 Broadcom
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#ifndef V3D_CHIP_H
#define V3D_CHIP_H
#include <stdint.h>
/**
* Struct for tracking features of the V3D chip. This is where we'll store
* boolean flags for features in a specific version, but for now it's just the
* version
*/
struct v3d_device_info {
/** Simple V3D version: major * 10 + minor */
uint8_t ver;
};
#endif

View File

@@ -41,7 +41,7 @@ LOCAL_EXPORT_C_INCLUDE_DIRS += \
$(MESA_TOP)/src/compiler/nir
LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/, \
$(NIR_GENERATED_FILES))
$(NIR_GENERATED_FILES) $(SPIRV_GENERATED_FILES))
# Modules using libmesa_nir must set LOCAL_GENERATED_SOURCES to this
MESA_GEN_NIR_H := $(addprefix $(call local-generated-sources-dir)/, \
@@ -94,3 +94,7 @@ nir_opt_algebraic_deps := \
$(intermediates)/nir/nir_opt_algebraic.c: $(nir_opt_algebraic_deps)
@mkdir -p $(dir $@)
$(hide) $(MESA_PYTHON2) $(nir_opt_algebraic_gen) $< > $@
$(intermediates)/spirv/spirv_info.c: $(LOCAL_PATH)/spirv/spirv_info_c.py $(LOCAL_PATH)/spirv/spirv.core.grammar.json
@mkdir -p $(dir $@)
$(hide) $(MESA_PYTHON2) $^ $@ || ($(RM) $@; false)

View File

@@ -37,6 +37,7 @@ LOCAL_SRC_FILES := \
LOCAL_C_INCLUDES := \
$(MESA_TOP)/src/mapi \
$(MESA_TOP)/src/mesa \
$(MESA_TOP)/src/compiler/spirv \
$(MESA_TOP)/src/gallium/include \
$(MESA_TOP)/src/gallium/auxiliary

View File

@@ -33,6 +33,7 @@ AM_CPPFLAGS = \
-I$(top_srcdir)/src/compiler/glsl/glcpp\
-I$(top_builddir)/src/compiler/nir \
-I$(top_srcdir)/src/compiler/nir \
-I$(top_srcdir)/src/compiler/spirv \
-I$(top_srcdir)/src/gallium/include \
-I$(top_srcdir)/src/gallium/auxiliary \
-I$(top_srcdir)/src/gtest/include \

View File

@@ -29,6 +29,7 @@ nir_libnir_la_LIBADD = \
nir_libnir_la_SOURCES = \
$(NIR_FILES) \
$(SPIRV_FILES) \
$(SPIRV_GENERATED_FILES) \
$(NIR_GENERATED_FILES)
nir/nir_builder_opcodes.h: nir/nir_opcodes.py nir/nir_builder_opcodes_h.py
@@ -51,6 +52,10 @@ nir/nir_opt_algebraic.c: nir/nir_opt_algebraic.py nir/nir_algebraic.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/nir/nir_opt_algebraic.py > $@ || ($(RM) $@; false)
spirv/spirv_info.c: spirv/spirv_info_c.py spirv/spirv.core.grammar.json
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/spirv/spirv_info_c.py $(srcdir)/spirv/spirv.core.grammar.json $@ || ($(RM) $@; false)
noinst_PROGRAMS += spirv2nir
spirv2nir_SOURCES = \
@@ -91,8 +96,13 @@ nir_tests_control_flow_tests_LDADD = \
TESTS += nir/tests/control_flow_tests
BUILT_SOURCES += $(NIR_GENERATED_FILES)
CLEANFILES += $(NIR_GENERATED_FILES)
BUILT_SOURCES += \
$(NIR_GENERATED_FILES) \
$(SPIRV_GENERATED_FILES)
CLEANFILES += \
$(NIR_GENERATED_FILES) \
$(SPIRV_GENERATED_FILES)
EXTRA_DIST += \
nir/nir_algebraic.py \
@@ -104,4 +114,6 @@ EXTRA_DIST += \
nir/nir_opt_algebraic.py \
nir/tests \
nir/README \
spirv/spirv_info_c.py \
spirv/spirv.core.grammar.json \
SConscript.nir

View File

@@ -229,6 +229,7 @@ NIR_FILES = \
nir/nir_lower_passthrough_edgeflags.c \
nir/nir_lower_patch_vertices.c \
nir/nir_lower_phis_to_scalar.c \
nir/nir_lower_read_invocation_to_scalar.c \
nir/nir_lower_regs_to_ssa.c \
nir/nir_lower_returns.c \
nir/nir_lower_samplers.c \
@@ -254,6 +255,7 @@ NIR_FILES = \
nir/nir_opt_gcm.c \
nir/nir_opt_global_to_local.c \
nir/nir_opt_if.c \
nir/nir_opt_intrinsics.c \
nir/nir_opt_loop_unroll.c \
nir/nir_opt_move_comparisons.c \
nir/nir_opt_peephole_select.c \
@@ -277,12 +279,14 @@ NIR_FILES = \
nir/nir_worklist.c \
nir/nir_worklist.h
SPIRV_GENERATED_FILES = \
spirv/spirv_info.c
SPIRV_FILES = \
spirv/GLSL.std.450.h \
spirv/nir_spirv.h \
spirv/spirv.h \
spirv/spirv_info.h \
spirv/spirv_info.c \
spirv/spirv_to_nir.c \
spirv/vtn_alu.c \
spirv/vtn_cfg.c \

View File

@@ -7677,21 +7677,17 @@ ast_interface_block::hir(exec_list *instructions,
"invalid qualifier for block",
this->block_name);
/* The ast_interface_block has a list of ast_declarator_lists. We
* need to turn those into ir_variables with an association
* with this uniform block.
*/
enum glsl_interface_packing packing;
if (this->layout.flags.q.shared) {
packing = GLSL_INTERFACE_PACKING_SHARED;
if (this->layout.flags.q.std140) {
packing = GLSL_INTERFACE_PACKING_STD140;
} else if (this->layout.flags.q.packed) {
packing = GLSL_INTERFACE_PACKING_PACKED;
} else if (this->layout.flags.q.std430) {
packing = GLSL_INTERFACE_PACKING_STD430;
} else {
/* The default layout is std140.
/* The default layout is shared.
*/
packing = GLSL_INTERFACE_PACKING_STD140;
packing = GLSL_INTERFACE_PACKING_SHARED;
}
ir_variable_mode var_mode;

View File

@@ -799,6 +799,24 @@ nir_visitor::visit(ir_call *ir)
case ir_intrinsic_shared_atomic_comp_swap:
op = nir_intrinsic_shared_atomic_comp_swap;
break;
case ir_intrinsic_vote_any:
op = nir_intrinsic_vote_any;
break;
case ir_intrinsic_vote_all:
op = nir_intrinsic_vote_all;
break;
case ir_intrinsic_vote_eq:
op = nir_intrinsic_vote_eq;
break;
case ir_intrinsic_ballot:
op = nir_intrinsic_ballot;
break;
case ir_intrinsic_read_invocation:
op = nir_intrinsic_read_invocation;
break;
case ir_intrinsic_read_first_invocation:
op = nir_intrinsic_read_first_invocation;
break;
default:
unreachable("not reached");
}
@@ -1135,6 +1153,55 @@ nir_visitor::visit(ir_call *ir)
nir_builder_instr_insert(&b, &instr->instr);
break;
}
case nir_intrinsic_vote_any:
case nir_intrinsic_vote_all:
case nir_intrinsic_vote_eq: {
nir_ssa_dest_init(&instr->instr, &instr->dest, 1, 32, NULL);
instr->variables[0] = evaluate_deref(&instr->instr, ir->return_deref);
ir_rvalue *value = (ir_rvalue *) ir->actual_parameters.get_head();
instr->src[0] = nir_src_for_ssa(evaluate_rvalue(value));
nir_builder_instr_insert(&b, &instr->instr);
break;
}
case nir_intrinsic_ballot: {
nir_ssa_dest_init(&instr->instr, &instr->dest,
ir->return_deref->type->vector_elements, 64, NULL);
ir_rvalue *value = (ir_rvalue *) ir->actual_parameters.get_head();
instr->src[0] = nir_src_for_ssa(evaluate_rvalue(value));
nir_builder_instr_insert(&b, &instr->instr);
break;
}
case nir_intrinsic_read_invocation: {
nir_ssa_dest_init(&instr->instr, &instr->dest,
ir->return_deref->type->vector_elements, 32, NULL);
instr->num_components = ir->return_deref->type->vector_elements;
ir_rvalue *value = (ir_rvalue *) ir->actual_parameters.get_head();
instr->src[0] = nir_src_for_ssa(evaluate_rvalue(value));
ir_rvalue *invocation = (ir_rvalue *) ir->actual_parameters.get_head()->next;
instr->src[1] = nir_src_for_ssa(evaluate_rvalue(invocation));
nir_builder_instr_insert(&b, &instr->instr);
break;
}
case nir_intrinsic_read_first_invocation: {
nir_ssa_dest_init(&instr->instr, &instr->dest,
ir->return_deref->type->vector_elements, 32, NULL);
instr->num_components = ir->return_deref->type->vector_elements;
ir_rvalue *value = (ir_rvalue *) ir->actual_parameters.get_head();
instr->src[0] = nir_src_for_ssa(evaluate_rvalue(value));
nir_builder_instr_insert(&b, &instr->instr);
break;
}
default:
unreachable("not reached");
}

View File

@@ -140,6 +140,29 @@ ir_array_reference_visitor::get_variable_entry(ir_variable *var)
if (var->type->is_unsized_array())
return NULL;
/* FIXME: arrays of arrays are not handled correctly by this pass so we
* skip it for now. While the pass will create functioning code it actually
* produces worse code.
*
* For example the array:
*
* int[3][2] a;
*
* ends up being split up into:
*
* int[3][2] a_0;
* int[3][2] a_1;
* int[3][2] a_2;
*
* And we end up referencing each of these new arrays for example:
*
* a[0][1] will be turned into a_0[0][1]
* a[1][0] will be turned into a_1[1][0]
* a[2][0] will be turned into a_2[2][0]
*/
if (var->type->is_array() && var->type->fields.array->is_array())
return NULL;
foreach_in_list(variable_entry, entry, &this->variable_list) {
if (entry->var == var)
return entry;

View File

@@ -1908,6 +1908,20 @@ nir_intrinsic_from_system_value(gl_system_value val)
return nir_intrinsic_load_helper_invocation;
case SYSTEM_VALUE_VIEW_INDEX:
return nir_intrinsic_load_view_index;
case SYSTEM_VALUE_SUBGROUP_SIZE:
return nir_intrinsic_load_subgroup_size;
case SYSTEM_VALUE_SUBGROUP_INVOCATION:
return nir_intrinsic_load_subgroup_invocation;
case SYSTEM_VALUE_SUBGROUP_EQ_MASK:
return nir_intrinsic_load_subgroup_eq_mask;
case SYSTEM_VALUE_SUBGROUP_GE_MASK:
return nir_intrinsic_load_subgroup_ge_mask;
case SYSTEM_VALUE_SUBGROUP_GT_MASK:
return nir_intrinsic_load_subgroup_gt_mask;
case SYSTEM_VALUE_SUBGROUP_LE_MASK:
return nir_intrinsic_load_subgroup_le_mask;
case SYSTEM_VALUE_SUBGROUP_LT_MASK:
return nir_intrinsic_load_subgroup_lt_mask;
default:
unreachable("system value does not directly correspond to intrinsic");
}
@@ -1961,6 +1975,20 @@ nir_system_value_from_intrinsic(nir_intrinsic_op intrin)
return SYSTEM_VALUE_HELPER_INVOCATION;
case nir_intrinsic_load_view_index:
return SYSTEM_VALUE_VIEW_INDEX;
case SYSTEM_VALUE_SUBGROUP_SIZE:
return nir_intrinsic_load_subgroup_size;
case SYSTEM_VALUE_SUBGROUP_INVOCATION:
return nir_intrinsic_load_subgroup_invocation;
case nir_intrinsic_load_subgroup_eq_mask:
return SYSTEM_VALUE_SUBGROUP_EQ_MASK;
case nir_intrinsic_load_subgroup_ge_mask:
return SYSTEM_VALUE_SUBGROUP_GE_MASK;
case nir_intrinsic_load_subgroup_gt_mask:
return SYSTEM_VALUE_SUBGROUP_GT_MASK;
case nir_intrinsic_load_subgroup_le_mask:
return SYSTEM_VALUE_SUBGROUP_LE_MASK;
case nir_intrinsic_load_subgroup_lt_mask:
return SYSTEM_VALUE_SUBGROUP_LT_MASK;
default:
unreachable("intrinsic doesn't produce a system value");
}

View File

@@ -1821,6 +1821,9 @@ typedef struct nir_shader_compiler_options {
bool lower_extract_byte;
bool lower_extract_word;
bool lower_vote_trivial;
bool lower_subgroup_masks;
/**
* Does the driver support real 32-bit integers? (Otherwise, integers
* are simulated by floats.)
@@ -1840,6 +1843,8 @@ typedef struct nir_shader_compiler_options {
*/
bool use_interpolated_input_intrinsics;
unsigned max_subgroup_size;
unsigned max_unroll_iterations;
} nir_shader_compiler_options;
@@ -2429,7 +2434,7 @@ bool nir_move_vec_src_uses_to_dest(nir_shader *shader);
bool nir_lower_vec_to_movs(nir_shader *shader);
bool nir_lower_alu_to_scalar(nir_shader *shader);
bool nir_lower_load_const_to_scalar(nir_shader *shader);
bool nir_lower_read_invocation_to_scalar(nir_shader *shader);
bool nir_lower_phis_to_scalar(nir_shader *shader);
void nir_lower_io_to_scalar(nir_shader *shader, nir_variable_mode mask);
@@ -2644,6 +2649,8 @@ bool nir_opt_gcm(nir_shader *shader, bool value_number);
bool nir_opt_if(nir_shader *shader);
bool nir_opt_intrinsics(nir_shader *shader);
bool nir_opt_loop_unroll(nir_shader *shader, nir_variable_mode indirect_mask);
bool nir_opt_move_comparisons(nir_shader *shader);

View File

@@ -93,6 +93,19 @@ BARRIER(memory_barrier)
*/
INTRINSIC(shader_clock, 0, ARR(0), true, 2, 0, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
/*
* Shader ballot intrinsics with semantics analogous to the
*
* ballotARB()
* readInvocationARB()
* readFirstInvocationARB()
*
* GLSL functions from ARB_shader_ballot.
*/
INTRINSIC(ballot, 1, ARR(1), true, 1, 0, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
INTRINSIC(read_invocation, 2, ARR(0, 1), true, 0, 0, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
INTRINSIC(read_first_invocation, 1, ARR(0), true, 0, 0, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
/*
* Memory barrier with semantics analogous to the compute shader
* groupMemoryBarrier(), memoryBarrierAtomicCounter(), memoryBarrierBuffer(),
@@ -107,6 +120,11 @@ BARRIER(memory_barrier_shared)
/** A conditional discard, with a single boolean source. */
INTRINSIC(discard_if, 1, ARR(1), false, 0, 0, 0, xx, xx, xx, 0)
/** ARB_shader_group_vote intrinsics */
INTRINSIC(vote_any, 1, ARR(1), true, 1, 1, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
INTRINSIC(vote_all, 1, ARR(1), true, 1, 1, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
INTRINSIC(vote_eq, 1, ARR(1), true, 1, 1, 0, xx, xx, xx, NIR_INTRINSIC_CAN_ELIMINATE)
/**
* Basic Geometry Shader intrinsics.
*
@@ -326,10 +344,16 @@ SYSTEM_VALUE(work_group_id, 3, 0, xx, xx, xx)
SYSTEM_VALUE(user_clip_plane, 4, 1, UCP_ID, xx, xx)
SYSTEM_VALUE(num_work_groups, 3, 0, xx, xx, xx)
SYSTEM_VALUE(helper_invocation, 1, 0, xx, xx, xx)
SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx)
SYSTEM_VALUE(alpha_ref_float, 1, 0, xx, xx, xx)
SYSTEM_VALUE(layer_id, 1, 0, xx, xx, xx)
SYSTEM_VALUE(view_index, 1, 0, xx, xx, xx)
SYSTEM_VALUE(subgroup_size, 1, 0, xx, xx, xx)
SYSTEM_VALUE(subgroup_invocation, 1, 0, xx, xx, xx)
SYSTEM_VALUE(subgroup_eq_mask, 1, 0, xx, xx, xx)
SYSTEM_VALUE(subgroup_ge_mask, 1, 0, xx, xx, xx)
SYSTEM_VALUE(subgroup_gt_mask, 1, 0, xx, xx, xx)
SYSTEM_VALUE(subgroup_le_mask, 1, 0, xx, xx, xx)
SYSTEM_VALUE(subgroup_lt_mask, 1, 0, xx, xx, xx)
/* Blend constant color values. Float values are clamped. */
SYSTEM_VALUE(blend_const_color_r_float, 1, 0, xx, xx, xx)

View File

@@ -155,7 +155,7 @@ lower_instr(nir_intrinsic_instr *instr,
* instruction.
*/
for (unsigned i = 0; i < nir_intrinsic_infos[instr->intrinsic].num_srcs; i++)
new_instr->src[i + 1] = instr->src[i];
nir_src_copy(&new_instr->src[i + 1], &instr->src[i], new_instr);
if (instr->dest.is_ssa) {
nir_ssa_dest_init(&new_instr->instr, &new_instr->dest,

View File

@@ -115,7 +115,7 @@ lower_instr(nir_intrinsic_instr *instr, unsigned ssbo_offset, nir_builder *b)
/* remapped to ssbo_atomic_add: { buffer_idx, offset, +1 } */
temp = nir_imm_int(b, +1);
new_instr->src[0] = nir_src_for_ssa(buffer);
new_instr->src[1] = instr->src[0];
nir_src_copy(&new_instr->src[1], &instr->src[0], new_instr);
new_instr->src[2] = nir_src_for_ssa(temp);
break;
case nir_intrinsic_atomic_counter_dec:
@@ -123,21 +123,21 @@ lower_instr(nir_intrinsic_instr *instr, unsigned ssbo_offset, nir_builder *b)
/* NOTE semantic difference so we adjust the return value below */
temp = nir_imm_int(b, -1);
new_instr->src[0] = nir_src_for_ssa(buffer);
new_instr->src[1] = instr->src[0];
nir_src_copy(&new_instr->src[1], &instr->src[0], new_instr);
new_instr->src[2] = nir_src_for_ssa(temp);
break;
case nir_intrinsic_atomic_counter_read:
/* remapped to load_ssbo: { buffer_idx, offset } */
new_instr->src[0] = nir_src_for_ssa(buffer);
new_instr->src[1] = instr->src[0];
nir_src_copy(&new_instr->src[1], &instr->src[0], new_instr);
break;
default:
/* remapped to ssbo_atomic_x: { buffer_idx, offset, data, (compare)? } */
new_instr->src[0] = nir_src_for_ssa(buffer);
new_instr->src[1] = instr->src[0];
new_instr->src[2] = instr->src[1];
nir_src_copy(&new_instr->src[1], &instr->src[0], new_instr);
nir_src_copy(&new_instr->src[2], &instr->src[1], new_instr);
if (op == nir_intrinsic_ssbo_atomic_comp_swap)
new_instr->src[3] = instr->src[2];
nir_src_copy(&new_instr->src[3], &instr->src[2], new_instr);
break;
}

View File

@@ -49,7 +49,7 @@ lower_load_input_to_scalar(nir_builder *b, nir_intrinsic_instr *intr)
nir_intrinsic_set_base(chan_intr, nir_intrinsic_base(intr));
nir_intrinsic_set_component(chan_intr, nir_intrinsic_component(intr) + i);
/* offset */
chan_intr->src[0] = intr->src[0];
nir_src_copy(&chan_intr->src[0], &intr->src[0], chan_intr);
nir_builder_instr_insert(b, &chan_intr->instr);
@@ -84,7 +84,7 @@ lower_store_output_to_scalar(nir_builder *b, nir_intrinsic_instr *intr)
/* value */
chan_intr->src[0] = nir_src_for_ssa(nir_channel(b, value, i));
/* offset */
chan_intr->src[1] = intr->src[1];
nir_src_copy(&chan_intr->src[1], &intr->src[1], chan_intr);
nir_builder_instr_insert(b, &chan_intr->instr);
}

View File

@@ -141,6 +141,7 @@ create_shadow_temp(struct lower_io_state *state, nir_variable *var)
temp->data.mode = nir_var_global;
temp->data.read_only = false;
temp->data.fb_fetch_output = false;
temp->data.compact = false;
return nvar;
}

View File

@@ -0,0 +1,112 @@
/*
* Copyright © 2017 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include "nir.h"
#include "nir_builder.h"
/** @file nir_lower_read_invocation_to_scalar.c
*
* Replaces nir_intrinsic_read_invocation/nir_intrinsic_read_first_invocation
* operations with num_components != 1 with individual per-channel operations.
*/
static void
lower_read_invocation_to_scalar(nir_builder *b, nir_intrinsic_instr *intrin)
{
b->cursor = nir_before_instr(&intrin->instr);
nir_ssa_def *value = nir_ssa_for_src(b, intrin->src[0], intrin->num_components);
nir_ssa_def *reads[4];
for (unsigned i = 0; i < intrin->num_components; i++) {
nir_intrinsic_instr *chan_intrin =
nir_intrinsic_instr_create(b->shader, intrin->intrinsic);
nir_ssa_dest_init(&chan_intrin->instr, &chan_intrin->dest,
1, intrin->dest.ssa.bit_size, NULL);
chan_intrin->num_components = 1;
/* value */
chan_intrin->src[0] = nir_src_for_ssa(nir_channel(b, value, i));
/* invocation */
if (intrin->intrinsic == nir_intrinsic_read_invocation)
nir_src_copy(&chan_intrin->src[1], &intrin->src[1], chan_intrin);
nir_builder_instr_insert(b, &chan_intrin->instr);
reads[i] = &chan_intrin->dest.ssa;
}
nir_ssa_def_rewrite_uses(&intrin->dest.ssa,
nir_src_for_ssa(nir_vec(b, reads,
intrin->num_components)));
nir_instr_remove(&intrin->instr);
}
static bool
nir_lower_read_invocation_to_scalar_impl(nir_function_impl *impl)
{
bool progress = false;
nir_builder b;
nir_builder_init(&b, impl);
nir_foreach_block(block, impl) {
nir_foreach_instr_safe(instr, block) {
if (instr->type != nir_instr_type_intrinsic)
continue;
nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
if (intrin->num_components == 1)
continue;
switch (intrin->intrinsic) {
case nir_intrinsic_read_invocation:
case nir_intrinsic_read_first_invocation:
lower_read_invocation_to_scalar(&b, intrin);
progress = true;
break;
default:
break;
}
}
}
if (progress) {
nir_metadata_preserve(impl, nir_metadata_block_index |
nir_metadata_dominance);
}
return progress;
}
bool
nir_lower_read_invocation_to_scalar(nir_shader *shader)
{
bool progress = false;
nir_foreach_function(function, shader) {
if (function->impl)
progress |= nir_lower_read_invocation_to_scalar_impl(function->impl);
}
return progress;
}

View File

@@ -116,6 +116,20 @@ convert_block(nir_block *block, nir_builder *b)
nir_load_base_instance(b));
break;
case SYSTEM_VALUE_SUBGROUP_EQ_MASK:
case SYSTEM_VALUE_SUBGROUP_GE_MASK:
case SYSTEM_VALUE_SUBGROUP_GT_MASK:
case SYSTEM_VALUE_SUBGROUP_LE_MASK:
case SYSTEM_VALUE_SUBGROUP_LT_MASK: {
nir_intrinsic_op op =
nir_intrinsic_from_system_value(var->data.location);
nir_intrinsic_instr *load = nir_intrinsic_instr_create(b->shader, op);
nir_ssa_dest_init(&load->instr, &load->dest, 1, 64, NULL);
nir_builder_instr_insert(b, &load->instr);
sysval = &load->dest.ssa;
break;
}
default:
break;
}

View File

@@ -245,8 +245,12 @@ foreach_deref_node_worker(struct deref_node *node, nir_deref *deref,
case nir_deref_type_struct: {
nir_deref_struct *str = nir_deref_as_struct(deref->child);
return foreach_deref_node_worker(node->children[str->index],
deref->child, cb, state);
if (node->children[str->index] &&
!foreach_deref_node_worker(node->children[str->index],
deref->child, cb, state))
return false;
return true;
}
default:

View File

@@ -250,8 +250,8 @@ optimizations = [
(('ishr', a, 0), a),
(('ushr', 0, a), 0),
(('ushr', a, 0), a),
(('iand', 0xff, ('ushr', a, 24)), ('ushr', a, 24)),
(('iand', 0xffff, ('ushr', a, 16)), ('ushr', a, 16)),
(('iand', 0xff, ('ushr@32', a, 24)), ('ushr', a, 24)),
(('iand', 0xffff, ('ushr@32', a, 16)), ('ushr', a, 16)),
# Exponential/logarithmic identities
(('~fexp2', ('flog2', a)), a), # 2^lg2(a) = a
(('~flog2', ('fexp2', a)), a), # lg2(2^a) = a
@@ -357,6 +357,17 @@ optimizations = [
(('~fadd', '#a', ('fadd', b, '#c')), ('fadd', ('fadd', a, c), b)),
(('iadd', '#a', ('iadd', b, '#c')), ('iadd', ('iadd', a, c), b)),
# By definition...
(('bcsel', ('ige', ('find_lsb', a), 0), ('find_lsb', a), -1), ('find_lsb', a)),
(('bcsel', ('ige', ('ifind_msb', a), 0), ('ifind_msb', a), -1), ('ifind_msb', a)),
(('bcsel', ('ige', ('ufind_msb', a), 0), ('ufind_msb', a), -1), ('ufind_msb', a)),
(('bcsel', ('ine', a, 0), ('find_lsb', a), -1), ('find_lsb', a)),
(('bcsel', ('ine', a, 0), ('ifind_msb', a), -1), ('ifind_msb', a)),
(('bcsel', ('ine', a, 0), ('ufind_msb', a), -1), ('ufind_msb', a)),
(('bcsel', ('ine', a, -1), ('ifind_msb', a), -1), ('ifind_msb', a)),
# Misc. lowering
(('fmod@32', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b)))), 'options->lower_fmod32'),
(('fmod@64', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b)))), 'options->lower_fmod64'),

View File

@@ -469,8 +469,8 @@ specialize_wildcards(nir_deref_var *deref,
nir_deref_var *ret = nir_deref_var_create(mem_ctx, deref->var);
nir_deref *deref_tail = deref->deref.child;
nir_deref *guide_tail = guide->deref.child;
nir_deref *spec_tail = specific->deref.child;
nir_deref *guide_tail = &guide->deref;
nir_deref *spec_tail = &specific->deref;
nir_deref *ret_tail = &ret->deref;
while (deref_tail) {
switch (deref_tail->deref_type) {
@@ -495,14 +495,14 @@ specialize_wildcards(nir_deref_var *deref,
* the entry deref to find its corresponding wildcard and fill
* this slot in with the value from the src.
*/
while (guide_tail) {
while (guide_tail->child) {
guide_tail = guide_tail->child;
spec_tail = spec_tail->child;
if (guide_tail->deref_type == nir_deref_type_array &&
nir_deref_as_array(guide_tail)->deref_array_type ==
nir_deref_array_type_wildcard)
break;
guide_tail = guide_tail->child;
spec_tail = spec_tail->child;
}
nir_deref_array *spec_arr = nir_deref_as_array(spec_tail);

View File

@@ -0,0 +1,144 @@
/*
* Copyright © 2017 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include "nir.h"
#include "nir_builder.h"
/**
* \file nir_opt_intrinsics.c
*/
static bool
opt_intrinsics_impl(nir_function_impl *impl)
{
nir_builder b;
nir_builder_init(&b, impl);
bool progress = false;
nir_foreach_block(block, impl) {
nir_foreach_instr_safe(instr, block) {
if (instr->type != nir_instr_type_intrinsic)
continue;
nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
nir_ssa_def *replacement = NULL;
b.cursor = nir_before_instr(instr);
switch (intrin->intrinsic) {
case nir_intrinsic_vote_any:
case nir_intrinsic_vote_all: {
nir_const_value *val = nir_src_as_const_value(intrin->src[0]);
if (!val && !b.shader->options->lower_vote_trivial)
continue;
replacement = nir_ssa_for_src(&b, intrin->src[0], 1);
break;
}
case nir_intrinsic_vote_eq: {
nir_const_value *val = nir_src_as_const_value(intrin->src[0]);
if (!val && !b.shader->options->lower_vote_trivial)
continue;
replacement = nir_imm_int(&b, NIR_TRUE);
break;
}
case nir_intrinsic_ballot: {
assert(b.shader->options->max_subgroup_size != 0);
if (b.shader->options->max_subgroup_size > 32 ||
intrin->dest.ssa.bit_size <= 32)
continue;
nir_intrinsic_instr *ballot =
nir_intrinsic_instr_create(b.shader, nir_intrinsic_ballot);
nir_ssa_dest_init(&ballot->instr, &ballot->dest, 1, 32, NULL);
nir_src_copy(&ballot->src[0], &intrin->src[0], ballot);
nir_builder_instr_insert(&b, &ballot->instr);
replacement = nir_pack_64_2x32_split(&b,
&ballot->dest.ssa,
nir_imm_int(&b, 0));
break;
}
case nir_intrinsic_load_subgroup_eq_mask:
case nir_intrinsic_load_subgroup_ge_mask:
case nir_intrinsic_load_subgroup_gt_mask:
case nir_intrinsic_load_subgroup_le_mask:
case nir_intrinsic_load_subgroup_lt_mask: {
if (!b.shader->options->lower_subgroup_masks)
break;
nir_ssa_def *count = nir_load_subgroup_invocation(&b);
switch (intrin->intrinsic) {
case nir_intrinsic_load_subgroup_eq_mask:
replacement = nir_ishl(&b, nir_imm_int64(&b, 1ull), count);
break;
case nir_intrinsic_load_subgroup_ge_mask:
replacement = nir_ishl(&b, nir_imm_int64(&b, ~0ull), count);
break;
case nir_intrinsic_load_subgroup_gt_mask:
replacement = nir_ishl(&b, nir_imm_int64(&b, ~1ull), count);
break;
case nir_intrinsic_load_subgroup_le_mask:
replacement = nir_inot(&b, nir_ishl(&b, nir_imm_int64(&b, ~1ull), count));
break;
case nir_intrinsic_load_subgroup_lt_mask:
replacement = nir_inot(&b, nir_ishl(&b, nir_imm_int64(&b, ~0ull), count));
break;
default:
unreachable("you seriously can't tell this is unreachable?");
}
break;
}
default:
break;
}
if (!replacement)
continue;
nir_ssa_def_rewrite_uses(&intrin->dest.ssa,
nir_src_for_ssa(replacement));
nir_instr_remove(instr);
nir_metadata_preserve(impl, nir_metadata_block_index |
nir_metadata_dominance);
progress = true;
}
}
return progress;
}
bool
nir_opt_intrinsics(nir_shader *shader)
{
bool progress = false;
nir_foreach_function(function, shader) {
if (function->impl)
progress |= opt_intrinsics_impl(function->impl);
}
return false;
}

View File

@@ -257,7 +257,7 @@ static const char *
get_var_name(nir_variable *var, print_state *state)
{
if (state->ht == NULL)
return var->name;
return var->name ? var->name : "unnamed";
assert(state->syms);

1
src/compiler/spirv/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
/spirv_info.c

View File

@@ -51,6 +51,7 @@ struct nir_spirv_supported_extensions {
bool image_write_without_format;
bool int64;
bool multiview;
bool variable_pointers;
};
nir_function *spirv_to_nir(const uint32_t *words, size_t word_count,

File diff suppressed because it is too large Load Diff

View File

@@ -50,12 +50,12 @@
typedef unsigned int SpvId;
#define SPV_VERSION 0x10100
#define SPV_REVISION 6
#define SPV_VERSION 0x10200
#define SPV_REVISION 1
static const unsigned int SpvMagicNumber = 0x07230203;
static const unsigned int SpvVersion = 0x00010100;
static const unsigned int SpvRevision = 6;
static const unsigned int SpvVersion = 0x00010200;
static const unsigned int SpvRevision = 1;
static const unsigned int SpvOpCodeMask = 0xffff;
static const unsigned int SpvWordCountShift = 16;
@@ -65,6 +65,7 @@ typedef enum SpvSourceLanguage_ {
SpvSourceLanguageGLSL = 2,
SpvSourceLanguageOpenCL_C = 3,
SpvSourceLanguageOpenCL_CPP = 4,
SpvSourceLanguageHLSL = 5,
SpvSourceLanguageMax = 0x7fffffff,
} SpvSourceLanguage;
@@ -129,6 +130,10 @@ typedef enum SpvExecutionMode_ {
SpvExecutionModeFinalizer = 34,
SpvExecutionModeSubgroupSize = 35,
SpvExecutionModeSubgroupsPerWorkgroup = 36,
SpvExecutionModeSubgroupsPerWorkgroupId = 37,
SpvExecutionModeLocalSizeId = 38,
SpvExecutionModeLocalSizeHintId = 39,
SpvExecutionModePostDepthCoverage = 4446,
SpvExecutionModeMax = 0x7fffffff,
} SpvExecutionMode;
@@ -145,6 +150,7 @@ typedef enum SpvStorageClass_ {
SpvStorageClassPushConstant = 9,
SpvStorageClassAtomicCounter = 10,
SpvStorageClassImage = 11,
SpvStorageClassStorageBuffer = 12,
SpvStorageClassMax = 0x7fffffff,
} SpvStorageClass;
@@ -383,6 +389,9 @@ typedef enum SpvDecoration_ {
SpvDecorationInputAttachmentIndex = 43,
SpvDecorationAlignment = 44,
SpvDecorationMaxByteOffset = 45,
SpvDecorationAlignmentId = 46,
SpvDecorationMaxByteOffsetId = 47,
SpvDecorationExplicitInterpAMD = 4999,
SpvDecorationOverrideCoverageNV = 5248,
SpvDecorationPassthroughNV = 5250,
SpvDecorationViewportRelativeNV = 5252,
@@ -442,6 +451,13 @@ typedef enum SpvBuiltIn_ {
SpvBuiltInDrawIndex = 4426,
SpvBuiltInDeviceIndex = 4438,
SpvBuiltInViewIndex = 4440,
SpvBuiltInBaryCoordNoPerspAMD = 4992,
SpvBuiltInBaryCoordNoPerspCentroidAMD = 4993,
SpvBuiltInBaryCoordNoPerspSampleAMD = 4994,
SpvBuiltInBaryCoordSmoothAMD = 4995,
SpvBuiltInBaryCoordSmoothCentroidAMD = 4996,
SpvBuiltInBaryCoordSmoothSampleAMD = 4997,
SpvBuiltInBaryCoordPullModelAMD = 4998,
SpvBuiltInViewportMaskNV = 5253,
SpvBuiltInSecondaryPositionNV = 5257,
SpvBuiltInSecondaryViewportMaskNV = 5258,
@@ -632,12 +648,19 @@ typedef enum SpvCapability_ {
SpvCapabilitySubgroupBallotKHR = 4423,
SpvCapabilityDrawParameters = 4427,
SpvCapabilitySubgroupVoteKHR = 4431,
SpvCapabilityStorageBuffer16BitAccess = 4433,
SpvCapabilityStorageUniformBufferBlock16 = 4433,
SpvCapabilityStorageUniform16 = 4434,
SpvCapabilityUniformAndStorageBuffer16BitAccess = 4434,
SpvCapabilityStoragePushConstant16 = 4435,
SpvCapabilityStorageInputOutput16 = 4436,
SpvCapabilityDeviceGroup = 4437,
SpvCapabilityMultiView = 4439,
SpvCapabilityVariablePointersStorageBuffer = 4441,
SpvCapabilityVariablePointers = 4442,
SpvCapabilityAtomicStorageOps = 4445,
SpvCapabilitySampleMaskPostDepthCoverage = 4447,
SpvCapabilityImageGatherBiasLodAMD = 5009,
SpvCapabilitySampleMaskOverrideCoverageNV = 5249,
SpvCapabilityGeometryShaderPassthroughNV = 5251,
SpvCapabilityShaderViewportIndexLayerNV = 5254,
@@ -952,12 +975,22 @@ typedef enum SpvOp_ {
SpvOpNamedBarrierInitialize = 328,
SpvOpMemoryNamedBarrier = 329,
SpvOpModuleProcessed = 330,
SpvOpExecutionModeId = 331,
SpvOpDecorateId = 332,
SpvOpSubgroupBallotKHR = 4421,
SpvOpSubgroupFirstInvocationKHR = 4422,
SpvOpSubgroupAllKHR = 4428,
SpvOpSubgroupAnyKHR = 4429,
SpvOpSubgroupAllEqualKHR = 4430,
SpvOpSubgroupReadInvocationKHR = 4432,
SpvOpGroupIAddNonUniformAMD = 5000,
SpvOpGroupFAddNonUniformAMD = 5001,
SpvOpGroupFMinNonUniformAMD = 5002,
SpvOpGroupUMinNonUniformAMD = 5003,
SpvOpGroupSMinNonUniformAMD = 5004,
SpvOpGroupFMaxNonUniformAMD = 5005,
SpvOpGroupUMaxNonUniformAMD = 5006,
SpvOpGroupSMaxNonUniformAMD = 5007,
SpvOpMax = 0x7fffffff,
} SpvOp;

View File

@@ -1,156 +0,0 @@
/*
* Copyright © 2016 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
* IN THE SOFTWARE.
*/
#include "spirv_info.h"
#include "util/macros.h"
#define CAPABILITY(cap) [SpvCapability##cap] = #cap
static const char * const capability_to_string[] = {
CAPABILITY(Matrix),
CAPABILITY(Shader),
CAPABILITY(Geometry),
CAPABILITY(Tessellation),
CAPABILITY(Addresses),
CAPABILITY(Linkage),
CAPABILITY(Kernel),
CAPABILITY(Vector16),
CAPABILITY(Float16Buffer),
CAPABILITY(Float16),
CAPABILITY(Float64),
CAPABILITY(Int64),
CAPABILITY(Int64Atomics),
CAPABILITY(ImageBasic),
CAPABILITY(ImageReadWrite),
CAPABILITY(ImageMipmap),
CAPABILITY(Pipes),
CAPABILITY(Groups),
CAPABILITY(DeviceEnqueue),
CAPABILITY(LiteralSampler),
CAPABILITY(AtomicStorage),
CAPABILITY(Int16),
CAPABILITY(TessellationPointSize),
CAPABILITY(GeometryPointSize),
CAPABILITY(ImageGatherExtended),
CAPABILITY(StorageImageMultisample),
CAPABILITY(UniformBufferArrayDynamicIndexing),
CAPABILITY(SampledImageArrayDynamicIndexing),
CAPABILITY(StorageBufferArrayDynamicIndexing),
CAPABILITY(StorageImageArrayDynamicIndexing),
CAPABILITY(ClipDistance),
CAPABILITY(CullDistance),
CAPABILITY(ImageCubeArray),
CAPABILITY(SampleRateShading),
CAPABILITY(ImageRect),
CAPABILITY(SampledRect),
CAPABILITY(GenericPointer),
CAPABILITY(Int8),
CAPABILITY(InputAttachment),
CAPABILITY(SparseResidency),
CAPABILITY(MinLod),
CAPABILITY(Sampled1D),
CAPABILITY(Image1D),
CAPABILITY(SampledCubeArray),
CAPABILITY(SampledBuffer),
CAPABILITY(ImageBuffer),
CAPABILITY(ImageMSArray),
CAPABILITY(StorageImageExtendedFormats),
CAPABILITY(ImageQuery),
CAPABILITY(DerivativeControl),
CAPABILITY(InterpolationFunction),
CAPABILITY(TransformFeedback),
CAPABILITY(GeometryStreams),
CAPABILITY(StorageImageReadWithoutFormat),
CAPABILITY(StorageImageWriteWithoutFormat),
CAPABILITY(MultiViewport),
CAPABILITY(SubgroupDispatch),
CAPABILITY(NamedBarrier),
CAPABILITY(PipeStorage),
CAPABILITY(SubgroupBallotKHR),
CAPABILITY(DrawParameters),
};
const char *
spirv_capability_to_string(SpvCapability cap)
{
if (cap < ARRAY_SIZE(capability_to_string))
return capability_to_string[cap];
else
return "unknown";
}
#define DECORATION(dec) [SpvDecoration##dec] = #dec
static const char * const decoration_to_string[] = {
DECORATION(RelaxedPrecision),
DECORATION(SpecId),
DECORATION(Block),
DECORATION(BufferBlock),
DECORATION(RowMajor),
DECORATION(ColMajor),
DECORATION(ArrayStride),
DECORATION(MatrixStride),
DECORATION(GLSLShared),
DECORATION(GLSLPacked),
DECORATION(CPacked),
DECORATION(BuiltIn),
DECORATION(NoPerspective),
DECORATION(Flat),
DECORATION(Patch),
DECORATION(Centroid),
DECORATION(Sample),
DECORATION(Invariant),
DECORATION(Restrict),
DECORATION(Aliased),
DECORATION(Volatile),
DECORATION(Constant),
DECORATION(Coherent),
DECORATION(NonWritable),
DECORATION(NonReadable),
DECORATION(Uniform),
DECORATION(SaturatedConversion),
DECORATION(Stream),
DECORATION(Location),
DECORATION(Component),
DECORATION(Index),
DECORATION(Binding),
DECORATION(DescriptorSet),
DECORATION(Offset),
DECORATION(XfbBuffer),
DECORATION(XfbStride),
DECORATION(FuncParamAttr),
DECORATION(FPRoundingMode),
DECORATION(FPFastMathMode),
DECORATION(LinkageAttributes),
DECORATION(NoContraction),
DECORATION(InputAttachmentIndex),
DECORATION(Alignment),
DECORATION(MaxByteOffset),
};
const char *
spirv_decoration_to_string(SpvDecoration dec)
{
if (dec < ARRAY_SIZE(decoration_to_string))
return decoration_to_string[dec];
else
return "unknown";
}

View File

@@ -0,0 +1,82 @@
COPYRIGHT = """\
/*
* Copyright (C) 2017 Intel Corporation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice (including the next
* paragraph) shall be included in all copies or substantial portions of the
* Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/
"""
import argparse
import json
from sys import stdout
from mako.template import Template
def collect_data(spirv, kind):
for x in spirv["operand_kinds"]:
if x["kind"] == kind:
operands = x
break
# There are some duplicate values in some of the tables (thanks guys!), so
# filter them out.
last_value = -1
values = []
for x in operands["enumerants"]:
if x["value"] != last_value:
last_value = x["value"]
values.append(x["enumerant"])
return (kind, values)
def parse_args():
p = argparse.ArgumentParser()
p.add_argument("json")
p.add_argument("out")
return p.parse_args()
TEMPLATE = Template(COPYRIGHT + """\
#include "spirv_info.h"
% for kind,values in info:
const char *
spirv_${kind.lower()}_to_string(Spv${kind} v)
{
switch (v) {
% for name in values:
case Spv${kind}${name}: return "Spv${kind}${name}";
% endfor
case Spv${kind}Max: break; /* silence warnings about unhandled enums. */
}
return "unknown";
}
% endfor
""")
if __name__ == "__main__":
pargs = parse_args()
spirv_info = json.JSONDecoder().decode(open(pargs.json, "r").read())
capabilities = collect_data(spirv_info, "Capability")
decorations = collect_data(spirv_info, "Decoration")
with open(pargs.out, 'w') as f:
f.write(TEMPLATE.render(info=[capabilities, decorations]))

View File

@@ -185,6 +185,13 @@ vtn_ssa_value(struct vtn_builder *b, uint32_t value_id)
case vtn_value_type_ssa:
return val->ssa;
case vtn_value_type_pointer:
assert(val->pointer->ptr_type && val->pointer->ptr_type->type);
struct vtn_ssa_value *ssa =
vtn_create_ssa_value(b, val->pointer->ptr_type->type);
ssa->def = vtn_pointer_to_ssa(b, val->pointer);
return ssa;
default:
unreachable("Invalid type for an SSA value");
}
@@ -599,12 +606,17 @@ type_decoration_cb(struct vtn_builder *b,
switch (dec->decoration) {
case SpvDecorationArrayStride:
assert(type->base_type == vtn_base_type_matrix ||
type->base_type == vtn_base_type_array ||
type->base_type == vtn_base_type_pointer);
type->stride = dec->literals[0];
break;
case SpvDecorationBlock:
assert(type->base_type == vtn_base_type_struct);
type->block = true;
break;
case SpvDecorationBufferBlock:
assert(type->base_type == vtn_base_type_struct);
type->buffer_block = true;
break;
case SpvDecorationGLSLShared:
@@ -709,7 +721,7 @@ translate_image_format(SpvImageFormat format)
case SpvImageFormatRg32ui: return 0x823C; /* GL_RG32UI */
case SpvImageFormatRg16ui: return 0x823A; /* GL_RG16UI */
case SpvImageFormatRg8ui: return 0x8238; /* GL_RG8UI */
case SpvImageFormatR16ui: return 0x823A; /* GL_RG16UI */
case SpvImageFormatR16ui: return 0x8234; /* GL_R16UI */
case SpvImageFormatR8ui: return 0x8232; /* GL_R8UI */
default:
assert(!"Invalid image format");
@@ -856,9 +868,16 @@ vtn_handle_type(struct vtn_builder *b, SpvOp opcode,
vtn_value(b, w[3], vtn_value_type_type)->type;
val->type->base_type = vtn_base_type_pointer;
val->type->type = NULL;
val->type->storage_class = storage_class;
val->type->deref = deref_type;
if (storage_class == SpvStorageClassUniform ||
storage_class == SpvStorageClassStorageBuffer) {
/* These can actually be stored to nir_variables and used as SSA
* values so they need a real glsl_type.
*/
val->type->type = glsl_vector_type(GLSL_TYPE_UINT, 2);
}
break;
}
@@ -1374,6 +1393,7 @@ static void
vtn_handle_function_call(struct vtn_builder *b, SpvOp opcode,
const uint32_t *w, unsigned count)
{
struct vtn_type *res_type = vtn_value(b, w[1], vtn_value_type_type)->type;
struct nir_function *callee =
vtn_value(b, w[3], vtn_value_type_function)->func->impl->function;
@@ -1381,7 +1401,8 @@ vtn_handle_function_call(struct vtn_builder *b, SpvOp opcode,
for (unsigned i = 0; i < call->num_params; i++) {
unsigned arg_id = w[4 + i];
struct vtn_value *arg = vtn_untyped_value(b, arg_id);
if (arg->value_type == vtn_value_type_pointer) {
if (arg->value_type == vtn_value_type_pointer &&
arg->pointer->ptr_type->type == NULL) {
nir_deref_var *d = vtn_pointer_to_deref(b, arg->pointer);
call->params[i] = nir_deref_var_clone(d, call);
} else {
@@ -1397,6 +1418,7 @@ vtn_handle_function_call(struct vtn_builder *b, SpvOp opcode,
}
nir_variable *out_tmp = NULL;
assert(res_type->type == callee->return_type);
if (!glsl_type_is_void(callee->return_type)) {
out_tmp = nir_local_variable_create(b->impl, callee->return_type,
"out_tmp");
@@ -1408,8 +1430,7 @@ vtn_handle_function_call(struct vtn_builder *b, SpvOp opcode,
if (glsl_type_is_void(callee->return_type)) {
vtn_push_value(b, w[2], vtn_value_type_undef);
} else {
struct vtn_value *retval = vtn_push_value(b, w[2], vtn_value_type_ssa);
retval->ssa = vtn_local_load(b, call->return_deref);
vtn_push_ssa(b, w[2], res_type, vtn_local_load(b, call->return_deref));
}
}
@@ -2763,6 +2784,11 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, SpvOp opcode,
spv_check_supported(multiview, cap);
break;
case SpvCapabilityVariablePointersStorageBuffer:
case SpvCapabilityVariablePointers:
spv_check_supported(variable_pointers, cap);
break;
default:
unreachable("Unhandled capability");
}
@@ -3063,6 +3089,7 @@ vtn_handle_body_instruction(struct vtn_builder *b, SpvOp opcode,
case SpvOpCopyMemory:
case SpvOpCopyMemorySized:
case SpvOpAccessChain:
case SpvOpPtrAccessChain:
case SpvOpInBoundsAccessChain:
case SpvOpArrayLength:
vtn_handle_variables(b, opcode, w, count);
@@ -3146,6 +3173,19 @@ vtn_handle_body_instruction(struct vtn_builder *b, SpvOp opcode,
break;
}
case SpvOpSelect: {
/* Handle OpSelect up-front here because it needs to be able to handle
* pointers and not just regular vectors and scalars.
*/
struct vtn_type *res_type = vtn_value(b, w[1], vtn_value_type_type)->type;
struct vtn_ssa_value *ssa = vtn_create_ssa_value(b, res_type->type);
ssa->def = nir_bcsel(&b->nb, vtn_ssa_value(b, w[3])->def,
vtn_ssa_value(b, w[4])->def,
vtn_ssa_value(b, w[5])->def);
vtn_push_ssa(b, w[2], res_type, ssa);
break;
}
case SpvOpSNegate:
case SpvOpFNegate:
case SpvOpNot:
@@ -3203,7 +3243,6 @@ vtn_handle_body_instruction(struct vtn_builder *b, SpvOp opcode,
case SpvOpBitwiseOr:
case SpvOpBitwiseXor:
case SpvOpBitwiseAnd:
case SpvOpSelect:
case SpvOpIEqual:
case SpvOpFOrdEqual:
case SpvOpFUnordEqual:

View File

@@ -52,7 +52,8 @@ vtn_cfg_handle_prepass_instruction(struct vtn_builder *b, SpvOp opcode,
func->num_params = func_type->length;
func->params = ralloc_array(b->shader, nir_parameter, func->num_params);
for (unsigned i = 0; i < func->num_params; i++) {
if (func_type->params[i]->base_type == vtn_base_type_pointer) {
if (func_type->params[i]->base_type == vtn_base_type_pointer &&
func_type->params[i]->type == NULL) {
func->params[i].type = func_type->params[i]->deref->type;
} else {
func->params[i].type = func_type->params[i]->type;
@@ -82,7 +83,7 @@ vtn_cfg_handle_prepass_instruction(struct vtn_builder *b, SpvOp opcode,
assert(b->func_param_idx < b->func->impl->num_params);
nir_variable *param = b->func->impl->params[b->func_param_idx++];
if (type->base_type == vtn_base_type_pointer) {
if (type->base_type == vtn_base_type_pointer && type->type == NULL) {
struct vtn_variable *vtn_var = rzalloc(b, struct vtn_variable);
vtn_var->type = type->deref;
vtn_var->var = param;
@@ -112,12 +113,12 @@ vtn_cfg_handle_prepass_instruction(struct vtn_builder *b, SpvOp opcode,
val->pointer = vtn_pointer_for_variable(b, vtn_var, type);
} else {
/* We're a regular SSA value. */
struct vtn_value *val = vtn_push_value(b, w[2], vtn_value_type_ssa);
struct vtn_ssa_value *param_ssa =
vtn_local_load(b, nir_deref_var_create(b, param));
struct vtn_value *val = vtn_push_ssa(b, w[2], type, param_ssa);
/* Name the parameter so it shows up nicely in NIR */
param->name = ralloc_strdup(param, val->name);
val->ssa = vtn_local_load(b, nir_deref_var_create(b, param));
}
break;
}
@@ -504,14 +505,13 @@ vtn_handle_phis_first_pass(struct vtn_builder *b, SpvOp opcode,
* algorithm all over again. It's easier if we just let
* lower_vars_to_ssa do that for us instead of repeating it here.
*/
struct vtn_value *val = vtn_push_value(b, w[2], vtn_value_type_ssa);
struct vtn_type *type = vtn_value(b, w[1], vtn_value_type_type)->type;
nir_variable *phi_var =
nir_local_variable_create(b->nb.impl, type->type, "phi");
_mesa_hash_table_insert(b->phi_table, w, phi_var);
val->ssa = vtn_local_load(b, nir_deref_var_create(b, phi_var));
vtn_push_ssa(b, w[2], type,
vtn_local_load(b, nir_deref_var_create(b, phi_var)));
return true;
}

View File

@@ -220,15 +220,15 @@ struct vtn_type {
/* Specifies the length of complex types. */
unsigned length;
/* for arrays, matrices and pointers, the array stride */
unsigned stride;
union {
/* Members for scalar, vector, and array-like types */
struct {
/* for arrays, the vtn_type for the elements of the array */
struct vtn_type *array_element;
/* for arrays and matrices, the array stride */
unsigned stride;
/* for matrices, whether the matrix is stored row-major */
bool row_major:1;
@@ -308,6 +308,11 @@ struct vtn_access_link {
struct vtn_access_chain {
uint32_t length;
/** Whether or not to treat the base pointer as an array. This is only
* true if this access chain came from an OpPtrAccessChain.
*/
bool ptr_as_array;
/** Struct elements and array offsets.
*
* This is an array of 1 so that it can conveniently be created on the
@@ -364,6 +369,13 @@ struct vtn_pointer {
struct nir_ssa_def *offset;
};
static inline bool
vtn_pointer_uses_ssa_offset(struct vtn_pointer *ptr)
{
return ptr->mode == vtn_variable_mode_ubo ||
ptr->mode == vtn_variable_mode_ssbo;
}
struct vtn_variable {
enum vtn_variable_mode mode;
@@ -496,6 +508,12 @@ struct vtn_builder {
bool has_loop_continue;
};
nir_ssa_def *
vtn_pointer_to_ssa(struct vtn_builder *b, struct vtn_pointer *ptr);
struct vtn_pointer *
vtn_pointer_from_ssa(struct vtn_builder *b, nir_ssa_def *ssa,
struct vtn_type *ptr_type);
static inline struct vtn_value *
vtn_push_value(struct vtn_builder *b, uint32_t value_id,
enum vtn_value_type value_type)
@@ -508,6 +526,21 @@ vtn_push_value(struct vtn_builder *b, uint32_t value_id,
return &b->values[value_id];
}
static inline struct vtn_value *
vtn_push_ssa(struct vtn_builder *b, uint32_t value_id,
struct vtn_type *type, struct vtn_ssa_value *ssa)
{
struct vtn_value *val;
if (type->base_type == vtn_base_type_pointer) {
val = vtn_push_value(b, value_id, vtn_value_type_pointer);
val->pointer = vtn_pointer_from_ssa(b, ssa->def, type);
} else {
val = vtn_push_value(b, value_id, vtn_value_type_ssa);
val->ssa = ssa;
}
return val;
}
static inline struct vtn_value *
vtn_untyped_value(struct vtn_builder *b, uint32_t value_id)
{

Some files were not shown because too many files have changed in this diff Show More