Compare commits

...

23 Commits

Author SHA1 Message Date
Dylan Baker
9e3d9c4e13 docs: Add release notes for 19.0.5 2019-05-21 14:10:20 -07:00
Dylan Baker
bec0a67629 bump version to 19.0.5 2019-05-21 09:18:03 -07:00
Caio Marcelo de Oliveira Filho
e64fc93148 nir: Fix clone of nir_variable state slots
When num_state_slots is 0, don't create the array.  This was
triggering the following assert when running vkcube with
NIR_TEST_CLONE=1

    vkcube: ../src/compiler/nir/nir_split_per_member_structs.c:66:
    split_variable: Assertion `var->state_slots == NULL' failed.

Fixes: 9fbd390dd4 "nir: Add support for cloning shaders"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 005cc9ae37)
2019-05-21 09:11:22 -07:00
Charmaine Lee
1302f20ddb mesa: unreference current winsys buffers when unbinding winsys buffers
This fixes surface leak when no winsys buffers are bound.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 12bf7cfecf)
2019-05-21 09:11:16 -07:00
Charmaine Lee
d1443da4f0 st/mesa: purge framebuffers with current context after unbinding winsys buffers
With commit c89e8470e5, framebuffers are purged after unbinding context,
but this change also introduces a heap corruption when running Rhino application
on VMware svga device. Instead of purging the framebuffers after the context
is unbound, this patch first ubinds the winsys buffers, then purges the framebuffers
with the current context, and then finally unbinds the context.

This fixes heap corruption.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit b480adfa5e)
2019-05-21 09:11:03 -07:00
Eric Engestrom
03cb07168f meson: expose glapi through osmesa
Suggested-by: Pierre Guillou <pierre.guillou@lip6.fr>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109659
Fixes: f121a669c7 "meson: build gallium based osmesa"
Fixes: cbbd5bb889 "meson: build classic osmesa"
Cc: Brian Paul <brianp@vmware.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
(cherry picked from commit ccb8ea7acf)
2019-05-20 09:52:56 -07:00
Ian Romanick
dfe2258cc1 Revert "nir: add late opt to turn inot/b2f combos back to bcsel"
This reverts commit 7acc865226.

With these optimizations in place, the extra constant folding added in
the next commit extends some live ranges of 0.0 and ±1.0 constants, and
that causes several hundred shaders to have more spills and fills.

I believe this optimization we made basically irrelevant by 7725d60938
"intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))".

All Gen7.5+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17225303 -> 17224634 (<.01%)
instructions in affected programs: 879402 -> 878733 (-0.08%)
helped: 679
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.02 -0.95
95% mean confidence interval for instructions %-change: -0.26% -0.22%
Instructions are helped.

total cycles in shared programs: 360842595 -> 360828542 (<.01%)
cycles in affected programs: 110443594 -> 110429541 (-0.01%)
helped: 389
HURT: 265
helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28
helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11%
HURT stats (abs)   min: 1 max: 7614 x̄: 185.96 x̃: 48
HURT stats (rel)   min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10%
95% mean confidence interval for cycles value: -75.65 32.67
95% mean confidence interval for cycles %-change: -0.49% -0.06%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12159 -> 12161 (0.02%)
spills in affected programs: 13 -> 15 (15.38%)
helped: 0
HURT: 1

total fills in shared programs: 25207 -> 25208 (<.01%)
fills in affected programs: 25 -> 26 (4.00%)
helped: 0
HURT: 1

Ivy Bridge
total instructions in shared programs: 12082019 -> 12082013 (<.01%)
instructions in affected programs: 1033 -> 1027 (-0.58%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.78% -0.45%
Instructions are helped.

total cycles in shared programs: 179849270 -> 179849157 (<.01%)
cycles in affected programs: 4735 -> 4622 (-2.39%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18
helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36%
95% mean confidence interval for cycles value: -82.73 26.23
95% mean confidence interval for cycles %-change: -7.98% 2.28%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10882750 -> 10882748 (<.01%)
instructions in affected programs: 266 -> 264 (-0.75%)
helped: 2
HURT: 0

Iron Lake
total cycles in shared programs: 188609440 -> 188609448 (<.01%)
cycles in affected programs: 4320 -> 4328 (0.19%)
helped: 0
HURT: 2

GM45
total cycles in shared programs: 129016868 -> 129016872 (<.01%)
cycles in affected programs: 2302 -> 2306 (0.17%)
helped: 0
HURT: 1

Reviewed-by: Matt Turner <mattst88@gmail.com>
(cherry picked from commit d2a9ba03e3)
Conflicts resolved by Dylan

Conflicts:
	src/compiler/nir/nir_opt_algebraic.py
2019-05-17 15:30:00 -07:00
Dylan Baker
0ed91c772d cherry-ignore: Add more 19.1 patches 2019-05-17 15:28:12 -07:00
Gert Wollny
cab826d5a8 Revert "softpipe/buffer: load only as many components as the the buffer resource type provides"
This reverts commit 865b9ddae4.

The buffer always reports format PIPE_FORMAT_R8_UNORM so with this patch only
one component would be supported. The original issue is still relevant, but
the fix should be different.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 0f598ed7b3)
2019-05-17 15:24:45 -07:00
Jason Ekstrand
93d278a73a anv: Only consider minSampleShading when sampleShadingEnable is set
From the Vulkan 1.1.107 spec:

    Sample shading is enabled for a graphics pipeline:

      - If the interface of the fragment shader entry point of the
        graphics pipeline includes an input variable decorated with
        SampleId or SamplePosition. In this case minSampleShadingFactor
        takes the value 1.0.

      - Else if the sampleShadingEnable member of the
        VkPipelineMultisampleStateCreateInfo structure specified when
        creating the graphics pipeline is set to VK_TRUE. In this case
        minSampleShadingFactor takes the value of
        VkPipelineMultisampleStateCreateInfo::minSampleShading.

    Otherwise, sample shading is considered disabled.

In other words, if sampleShadingEnable is set to VK_FALSE, we should
ignore minSampleShading.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 1c92358bd8)
2019-05-17 15:20:39 -07:00
Samuel Pitoiset
7be21f6575 radv: add a workaround for Monster Hunter World and LLVM 7&8
The load/store optimizer pass doesn't handle WaW hazards correctly
and this is the root cause of the reflection issue with Monster
Hunter World. AFAIK, it's the only game that are affected by this
issue.

This is fixed with LLVM r361008, but we need a workaround for older
LLVM versions unfortunately.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit d7501834cd)
2019-05-17 15:20:14 -07:00
Nanley Chery
e8d9b33986 anv: Fix some depth buffer sampling cases on ICL+
Don't attempt sampling with HiZ if the sampler lacks support for it. On
ICL, the HW docs state that sampling with HiZ is not supported and that
instances of AUX_HIZ in the RENDER_SURFACE_STATE object will be
interpreted as AUX_NONE.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
(cherry picked from commit 629806b55b)
2019-05-17 15:20:07 -07:00
Caio Marcelo de Oliveira Filho
fe7f221ddb nir: Fix nir_opt_idiv_const when negatives are involved
First, allow the case for negative powers of two.  Then ensure that we
use the absolute value of the non-constant value to calculate the
quotient -- this was hinted in the code by the name 'uq'.

This fixes an issue when 'd' is positive and 'n' is negative.  The
ishr will propagate the negative sign and we'll use nir_ineg() again,
incorrectly.

v2: First version used only ishr, but that isn't sufficient, since it
    never can produce a zero as a result.  (Jason)
    Allow negative powers of two.  (Caio)

Fixes: 74492ebad9 "nir: Add a pass for lowering integer division by constants"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 8a995f2b5e)
2019-05-17 15:20:00 -07:00
Jason Ekstrand
0d645c98f2 intel/fs/ra: Stop adding RA interference to too many SENDS nodes
We only have one node per VGRF so this was adding way too much
interference.  No idea how we didn't catch this before.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311100 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355468050 -> 355543197 (0.02%)
    cycles in affected programs: 2472492 -> 2547639 (3.04%)
    helped: 17
    HURT: 20

Fixes: 014edff0d2 "intel/fs: Add interference between SENDS sources"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 096ad8a809)
2019-05-17 15:19:23 -07:00
Jason Ekstrand
1621a5ab55 intel/fs/ra: Only add dest interference to sources that exist
Fixes: 83dedb6354 "i965: Add src/dst interference for certain"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 88cac12230)
2019-05-17 15:19:16 -07:00
Gert Wollny
2274d43fa6 softpipe/buffer: load only as many components as the the buffer resource type provides
Otherwise we risk to read past the end of the buffer.

In addition, change the loop counters to unsigned to be consistent
with the types.

Fixes: afa8707ba9
    softpipe: add SSBO/shader atomics support.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 865b9ddae4)
2019-05-17 15:19:10 -07:00
Józef Kucia
5cac14f77a radv: clear vertex bindings while resetting command buffer
Only vertex inputs accessed by vertex shader must have valid buffers
bound.

Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 5010436e09 "radv: bail out when binding the same vertex buffers"
(cherry picked from commit 24af0f1318)
2019-05-13 11:27:45 -07:00
Marek Olšák
4f8992efac st/mesa: fix 2 crashes in st_tgsi_lower_yuv
src/mesa/state_tracker/st_tgsi_lower_yuv.c:68: void reg_dst(struct
 tgsi_full_dst_register *, const struct tgsi_full_dst_register *, unsigned
 int): assertion "dst->Register.WriteMask" failed

The second crash was due to insufficient allocated size for TGSI
instructions.

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
(cherry picked from commit 83435e748f)
2019-05-13 11:27:40 -07:00
Kenneth Graunke
aaf9a11f79 i965: Fix memory leaks in brw_upload_cs_work_groups_surface().
This was taking a reference to the 64kB upload buffer and never
returning it, leaking a reference each time this atom triggered.

This leaked lots of 64kB upload BOs, eventually running us out of
of VMA space.  This would usually happen when using mpv to watch a
movie, after 20-40 minutes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110134
Fixes: 63d7b33f51 i965/cs: Setup surface binding for gl_NumWorkGroups
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 3f60810de0)
2019-05-13 11:27:24 -07:00
Dylan Baker
1221311fb1 cherry-ignore: add patches for panfrost
there is no panfrost in 19.0
2019-05-10 10:31:22 -07:00
Leo Liu
1ad8a0e751 winsys/amdgpu: add VCN JPEG to no user fence group
There is no user fence for JPEG, the bug triggering
kernel WARN_ON(flags & AMDGPU_FENCE_FLAG_64BIT)

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit ceba9ff294)
2019-05-10 10:30:02 -07:00
Lionel Landwerlin
f4ab855312 anv: Use corresponding type from the vector allocation
We didn't notice this issue much because the 2 struct share a similar
layout, expect for the additional fields...

We run into that issue in Anv :

==15236== Invalid write of size 8
==15236==    at 0x8CF3939C: anv_state_table_expand_range (anv_allocator.c:211)
==15236==    by 0x8CF394D5: anv_state_table_grow (anv_allocator.c:264)
==15236==    by 0x8CF3967E: anv_state_table_add (anv_allocator.c:312)
==15236==    by 0x8CF3B13C: anv_state_pool_alloc_no_vg (anv_allocator.c:1167)
==15236==    by 0x8CF3B2B0: anv_state_pool_alloc (anv_allocator.c:1190)
==15236==    by 0x8CF60871: alloc_surface_state (anv_image.c:1122)
==15236==    by 0x8CF61FF9: anv_CreateImageView (anv_image.c:1519)
==15236==    by 0x8BCBD2ED: vkCreateImageView (trampoline.c:1358)
==15236==  Address 0x8994ef10 is 0 bytes after a block of size 128 alloc'd
==15236==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15236==    by 0x8D2578E6: u_vector_init (u_vector.c:47)
==15236==    by 0x8CF3929A: anv_state_table_init (anv_allocator.c:168)
==15236==    by 0x8CF3A99A: anv_state_pool_init (anv_allocator.c:921)
==15236==    by 0x8CF56517: anv_CreateDevice (anv_device.c:1909)
==15236==    by 0x8BCB4FBA: terminator_CreateDevice (loader.c:6073)
==15236==    by 0x8DD2CB3D: ??? (in /home/djdeath/.steam/ubuntu12_64/libVkLayer_steam_fossilize.so)
==15236==    by 0x8DF4D241: vkCreateDevice (in /home/djdeath/.steam/ubuntu12_64/steamoverlayvulkanlayer.so)
==15236==    by 0x8BCB35C6: loader_create_device_chain (loader.c:5449)
==15236==    by 0x8BCBC230: vkCreateDevice (trampoline.c:838)

v2: Rename mmap_cleanups to avoid confusion (Caio)

v3: s/fail_mmap_cleanups/fail_cleanups/ (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110648
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit f2f6ac1c08)
2019-05-10 10:29:53 -07:00
Dylan Baker
ab41ddb671 docs: Add SHA256 sums for mesa 19.0.4 2019-05-09 13:45:19 -07:00
25 changed files with 250 additions and 80 deletions

View File

@@ -1 +1 @@
19.0.4
19.0.5

View File

@@ -31,3 +31,10 @@ b031c643491a92a5574c7a4bd659df33f2d89bb6
# These were de-nominated since they don't apply nicley
88105375c978f9de82af8c654051e5aa16d61614
c9358621276ae49162e58d4a16fe37abda6a347f
# These are only for 19.1
c3538ab5702ceeead284c2b5f9e700f3082c8135
d2aa65eb1892f7b300ac24560f9dbda6b600b5a7
78e35df52aa2f7d770f929a0866a0faa89c261a9
0f1b070bad34c46c4bcc6c679fa533bf6b4b79e5
ad2b4aa37806779bdfc15d704940136c3db21eb4

View File

@@ -31,7 +31,8 @@ Compatibility contexts may report a lower version depending on each driver.
<h2>SHA256 checksums</h2>
<pre>
TBD
de361c76bf7aae09219f571b9ae77a34864a1cd9f6ba24c845b18b3cd5e4b9a2 mesa-19.0.4.tar.gz
39f9f32f448d77388ef817c6098d50eb0c1595815ce7e895dec09dd68774ce47 mesa-19.0.4.tar.xz
</pre>

136
docs/relnotes/19.0.5.html Normal file
View File

@@ -0,0 +1,136 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 19.0.5 Release Notes / May 21, 2019</h1>
<p>
Mesa 19.0.5 is a bug fix release which fixes bugs found since the 19.0.4 release.
</p>
<p>
Mesa 19.0.5 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD
</pre>
<h2>New features</h2>
<p>N/A</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109659">Bug 109659</a> - Missing OpenGL symbols in OSMesa Gallium when building with meson</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110134">Bug 110134</a> - SIGSEGV while playing large hevc video in mpv</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110648">Bug 110648</a> - Dota2 will not open using vulkan since 19.0 series</li>
</ul>
<h2>Changes</h2>
<p>Caio Marcelo de Oliveira Filho (2):</p>
<ul>
<li>nir: Fix nir_opt_idiv_const when negatives are involved</li>
<li>nir: Fix clone of nir_variable state slots</li>
</ul>
<p>Charmaine Lee (2):</p>
<ul>
<li>st/mesa: purge framebuffers with current context after unbinding winsys buffers</li>
<li>mesa: unreference current winsys buffers when unbinding winsys buffers</li>
</ul>
<p>Dylan Baker (4):</p>
<ul>
<li>docs: Add SHA256 sums for mesa 19.0.4</li>
<li>cherry-ignore: add patches for panfrost</li>
<li>cherry-ignore: Add more 19.1 patches</li>
<li>bump version to 19.0.5</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>meson: expose glapi through osmesa</li>
</ul>
<p>Gert Wollny (2):</p>
<ul>
<li>softpipe/buffer: load only as many components as the the buffer resource type provides</li>
<li>Revert "softpipe/buffer: load only as many components as the the buffer resource type provides"</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>Revert "nir: add late opt to turn inot/b2f combos back to bcsel"</li>
</ul>
<p>Jason Ekstrand (3):</p>
<ul>
<li>intel/fs/ra: Only add dest interference to sources that exist</li>
<li>intel/fs/ra: Stop adding RA interference to too many SENDS nodes</li>
<li>anv: Only consider minSampleShading when sampleShadingEnable is set</li>
</ul>
<p>Józef Kucia (1):</p>
<ul>
<li>radv: clear vertex bindings while resetting command buffer</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Fix memory leaks in brw_upload_cs_work_groups_surface().</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>winsys/amdgpu: add VCN JPEG to no user fence group</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>anv: Use corresponding type from the vector allocation</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>st/mesa: fix 2 crashes in st_tgsi_lower_yuv</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>anv: Fix some depth buffer sampling cases on ICL+</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: add a workaround for Monster Hunter World and LLVM 7&amp;8</li>
</ul>
</div>
</body>
</html>

View File

@@ -151,13 +151,14 @@ static LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family,
LLVMTargetRef target = ac_get_llvm_target(triple);
snprintf(features, sizeof(features),
"+DumpCode,-fp32-denormals,+fp64-denormals%s%s%s%s%s",
"+DumpCode,-fp32-denormals,+fp64-denormals%s%s%s%s%s%s",
HAVE_LLVM >= 0x0800 ? "" : ",+vgpr-spilling",
tm_options & AC_TM_SISCHED ? ",+si-scheduler" : "",
tm_options & AC_TM_FORCE_ENABLE_XNACK ? ",+xnack" : "",
tm_options & AC_TM_FORCE_DISABLE_XNACK ? ",-xnack" : "",
tm_options & AC_TM_PROMOTE_ALLOCA_TO_SCRATCH ? ",-promote-alloca" : "");
tm_options & AC_TM_PROMOTE_ALLOCA_TO_SCRATCH ? ",-promote-alloca" : "",
tm_options & AC_TM_NO_LOAD_STORE_OPT ? ",-load-store-opt" : "");
LLVMTargetMachineRef tm = LLVMCreateTargetMachine(
target,
triple,

View File

@@ -65,6 +65,7 @@ enum ac_target_machine_options {
AC_TM_CHECK_IR = (1 << 5),
AC_TM_ENABLE_GLOBAL_ISEL = (1 << 6),
AC_TM_CREATE_LOW_OPT = (1 << 7),
AC_TM_NO_LOAD_STORE_OPT = (1 << 8),
};
enum ac_float_mode {

View File

@@ -301,7 +301,6 @@ radv_cmd_buffer_destroy(struct radv_cmd_buffer *cmd_buffer)
static VkResult
radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer)
{
cmd_buffer->device->ws->cs_reset(cmd_buffer->cs);
list_for_each_entry_safe(struct radv_cmd_buffer_upload, up,
@@ -326,6 +325,8 @@ radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer)
cmd_buffer->record_result = VK_SUCCESS;
memset(cmd_buffer->vertex_bindings, 0, sizeof(cmd_buffer->vertex_bindings));
for (unsigned i = 0; i < VK_PIPELINE_BIND_POINT_RANGE_SIZE; i++) {
cmd_buffer->descriptors[i].dirty = 0;
cmd_buffer->descriptors[i].valid = 0;

View File

@@ -51,6 +51,7 @@ enum {
RADV_DEBUG_CHECKIR = 0x200000,
RADV_DEBUG_NOTHREADLLVM = 0x400000,
RADV_DEBUG_NOBINNING = 0x800000,
RADV_DEBUG_NO_LOAD_STORE_OPT = 0x1000000,
};
enum {

View File

@@ -466,6 +466,7 @@ static const struct debug_control radv_debug_options[] = {
{"checkir", RADV_DEBUG_CHECKIR},
{"nothreadllvm", RADV_DEBUG_NOTHREADLLVM},
{"nobinning", RADV_DEBUG_NOBINNING},
{"noloadstoreopt", RADV_DEBUG_NO_LOAD_STORE_OPT},
{NULL, 0}
};
@@ -511,6 +512,13 @@ radv_handle_per_app_options(struct radv_instance *instance,
} else if (!strcmp(name, "DOOM_VFR")) {
/* Work around a Doom VFR game bug */
instance->debug_flags |= RADV_DEBUG_NO_DYNAMIC_BOUNDS;
} else if (!strcmp(name, "MonsterHunterWorld.exe")) {
/* Workaround for a WaW hazard when LLVM moves/merges
* load/store memory operations.
* See https://reviews.llvm.org/D61313
*/
if (HAVE_LLVM < 0x900)
instance->debug_flags |= RADV_DEBUG_NO_LOAD_STORE_OPT;
}
}

View File

@@ -612,6 +612,8 @@ shader_variant_create(struct radv_device *device,
tm_options |= AC_TM_SISCHED;
if (options->check_ir)
tm_options |= AC_TM_CHECK_IR;
if (device->instance->debug_flags & RADV_DEBUG_NO_LOAD_STORE_OPT)
tm_options |= AC_TM_NO_LOAD_STORE_OPT;
thread_compiler = !(device->instance->debug_flags & RADV_DEBUG_NOTHREADLLVM);
radv_init_llvm_once();

View File

@@ -151,9 +151,11 @@ nir_variable_clone(const nir_variable *var, nir_shader *shader)
nvar->name = ralloc_strdup(nvar, var->name);
nvar->data = var->data;
nvar->num_state_slots = var->num_state_slots;
nvar->state_slots = ralloc_array(nvar, nir_state_slot, var->num_state_slots);
memcpy(nvar->state_slots, var->state_slots,
var->num_state_slots * sizeof(nir_state_slot));
if (var->num_state_slots) {
nvar->state_slots = ralloc_array(nvar, nir_state_slot, var->num_state_slots);
memcpy(nvar->state_slots, var->state_slots,
var->num_state_slots * sizeof(nir_state_slot));
}
if (var->constant_initializer) {
nvar->constant_initializer =
nir_constant_clone(var->constant_initializer, nvar);

View File

@@ -929,9 +929,6 @@ late_optimizations = [
(('fdot4', a, b), ('fdot_replicated4', a, b), 'options->fdot_replicates'),
(('fdph', a, b), ('fdph_replicated', a, b), 'options->fdot_replicates'),
(('b2f(is_used_more_than_once)', ('inot', 'a@1')), ('bcsel', a, 0.0, 1.0)),
(('fneg(is_used_more_than_once)', ('b2f', ('inot', 'a@1'))), ('bcsel', a, -0.0, -1.0)),
# we do these late so that we don't get in the way of creating ffmas
(('fmin', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', b)), ('fadd', c, ('fmin', a, b))),
(('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', b)), ('fadd', c, ('fmax', a, b))),

View File

@@ -65,15 +65,17 @@ build_umod(nir_builder *b, nir_ssa_def *n, uint64_t d)
static nir_ssa_def *
build_idiv(nir_builder *b, nir_ssa_def *n, int64_t d)
{
uint64_t abs_d = d < 0 ? -d : d;
if (d == 0) {
return nir_imm_intN_t(b, 0, n->bit_size);
} else if (d == 1) {
return n;
} else if (d == -1) {
return nir_ineg(b, n);
} else if (util_is_power_of_two_or_zero64(d)) {
uint64_t abs_d = d < 0 ? -d : d;
nir_ssa_def *uq = nir_ishr(b, n, nir_imm_int(b, util_logbase2_64(abs_d)));
} else if (util_is_power_of_two_or_zero64(abs_d)) {
nir_ssa_def *uq = nir_ushr(b, nir_iabs(b, n),
nir_imm_int(b, util_logbase2_64(abs_d)));
nir_ssa_def *n_neg = nir_ilt(b, n, nir_imm_intN_t(b, 0, n->bit_size));
nir_ssa_def *neg = d < 0 ? nir_inot(b, n_neg) : n_neg;
return nir_bcsel(b, neg, nir_ineg(b, uq), uq);

View File

@@ -116,22 +116,6 @@ is_not_const(nir_alu_instr *instr, unsigned src, UNUSED unsigned num_components,
return !nir_src_is_const(instr->src[src].src);
}
static inline bool
is_used_more_than_once(nir_alu_instr *instr)
{
bool zero_if_use = list_empty(&instr->dest.dest.ssa.if_uses);
bool zero_use = list_empty(&instr->dest.dest.ssa.uses);
if (zero_use && zero_if_use)
return false;
else if (zero_use && list_is_singular(&instr->dest.dest.ssa.if_uses))
return false;
else if (zero_if_use && list_is_singular(&instr->dest.dest.ssa.uses))
return false;
return true;
}
static inline bool
is_used_once(nir_alu_instr *instr)
{

View File

@@ -43,9 +43,9 @@ libosmesa = shared_library(
inc_gallium_drivers,
],
link_depends : osmesa_link_deps,
link_whole : [libosmesa_st],
link_whole : [libosmesa_st, libglapi_static],
link_with : [
libmesa_gallium, libgallium, libglapi_static, libws_null, osmesa_link_with,
libmesa_gallium, libgallium, libws_null, osmesa_link_with,
],
dependencies : [
dep_selinux, dep_thread, dep_clock, dep_unwind,

View File

@@ -386,7 +386,8 @@ static bool amdgpu_cs_has_user_fence(struct amdgpu_cs_context *cs)
cs->ib[IB_MAIN].ip_type != AMDGPU_HW_IP_VCE &&
cs->ib[IB_MAIN].ip_type != AMDGPU_HW_IP_UVD_ENC &&
cs->ib[IB_MAIN].ip_type != AMDGPU_HW_IP_VCN_DEC &&
cs->ib[IB_MAIN].ip_type != AMDGPU_HW_IP_VCN_ENC;
cs->ib[IB_MAIN].ip_type != AMDGPU_HW_IP_VCN_ENC &&
cs->ib[IB_MAIN].ip_type != AMDGPU_HW_IP_VCN_JPEG;
}
static bool amdgpu_cs_has_chaining(struct amdgpu_cs *cs)

View File

@@ -591,7 +591,7 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
*/
foreach_block_and_inst(block, fs_inst, inst, cfg) {
if (inst->dst.file == VGRF && inst->has_source_and_destination_hazard()) {
for (unsigned i = 0; i < 3; i++) {
for (unsigned i = 0; i < inst->sources; i++) {
if (inst->src[i].file == VGRF) {
ra_add_node_interference(g, inst->dst.nr, inst->src[i].nr);
}
@@ -710,14 +710,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
if (inst->opcode == SHADER_OPCODE_SEND && inst->ex_mlen > 0 &&
inst->src[2].file == VGRF &&
inst->src[3].file == VGRF &&
inst->src[2].nr != inst->src[3].nr) {
for (unsigned i = 0; i < inst->mlen; i++) {
for (unsigned j = 0; j < inst->ex_mlen; j++) {
ra_add_node_interference(g, inst->src[2].nr + i,
inst->src[3].nr + j);
}
}
}
inst->src[2].nr != inst->src[3].nr)
ra_add_node_interference(g, inst->src[2].nr,
inst->src[3].nr);
}
}

View File

@@ -165,7 +165,7 @@ anv_state_table_init(struct anv_state_table *table,
goto fail_fd;
}
if (!u_vector_init(&table->mmap_cleanups,
if (!u_vector_init(&table->cleanups,
round_to_power_of_two(sizeof(struct anv_state_table_cleanup)),
128)) {
result = vk_error(VK_ERROR_INITIALIZATION_FAILED);
@@ -179,12 +179,12 @@ anv_state_table_init(struct anv_state_table *table,
uint32_t initial_size = initial_entries * ANV_STATE_ENTRY_SIZE;
result = anv_state_table_expand_range(table, initial_size);
if (result != VK_SUCCESS)
goto fail_mmap_cleanups;
goto fail_cleanups;
return VK_SUCCESS;
fail_mmap_cleanups:
u_vector_finish(&table->mmap_cleanups);
fail_cleanups:
u_vector_finish(&table->cleanups);
fail_fd:
close(table->fd);
@@ -195,7 +195,7 @@ static VkResult
anv_state_table_expand_range(struct anv_state_table *table, uint32_t size)
{
void *map;
struct anv_mmap_cleanup *cleanup;
struct anv_state_table_cleanup *cleanup;
/* Assert that we only ever grow the pool */
assert(size >= table->state.end);
@@ -204,11 +204,11 @@ anv_state_table_expand_range(struct anv_state_table *table, uint32_t size)
if (size > BLOCK_POOL_MEMFD_SIZE)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
cleanup = u_vector_add(&table->mmap_cleanups);
cleanup = u_vector_add(&table->cleanups);
if (!cleanup)
return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
*cleanup = ANV_MMAP_CLEANUP_INIT;
*cleanup = ANV_STATE_TABLE_CLEANUP_INIT;
/* Just leak the old map until we destroy the pool. We can't munmap it
* without races or imposing locking on the block allocate fast path. On
@@ -272,12 +272,12 @@ anv_state_table_finish(struct anv_state_table *table)
{
struct anv_state_table_cleanup *cleanup;
u_vector_foreach(cleanup, &table->mmap_cleanups) {
u_vector_foreach(cleanup, &table->cleanups) {
if (cleanup->map)
munmap(cleanup->map, cleanup->size);
}
u_vector_finish(&table->mmap_cleanups);
u_vector_finish(&table->cleanups);
close(table->fd);
}

View File

@@ -377,12 +377,12 @@ populate_wm_prog_key(const struct gen_device_info *devinfo,
* harmless to compute it and then let dead-code take care of it.
*/
if (ms_info->rasterizationSamples > 1) {
key->persample_interp =
key->persample_interp = ms_info->sampleShadingEnable &&
(ms_info->minSampleShading * ms_info->rasterizationSamples) > 1;
key->multisample_fbo = true;
}
key->frag_coord_adds_sample_pos = ms_info->sampleShadingEnable;
key->frag_coord_adds_sample_pos = key->persample_interp;
}
}

View File

@@ -742,7 +742,7 @@ struct anv_state_table {
struct anv_free_entry *map;
uint32_t size;
struct anv_block_state state;
struct u_vector mmap_cleanups;
struct u_vector cleanups;
};
struct anv_state_pool {
@@ -3062,7 +3062,13 @@ anv_can_sample_with_hiz(const struct gen_device_info * const devinfo,
if (!(image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT))
return false;
if (devinfo->gen < 8)
/* Allow this feature on BDW even though it is disabled in the BDW devinfo
* struct. There's documentation which suggests that this feature actually
* reduces performance on BDW, but it has only been observed to help so
* far. Sampling fast-cleared blocks on BDW must also be handled with care
* (see depth_stencil_attachment_compute_aux_usage() for more info).
*/
if (devinfo->gen != 8 && !devinfo->has_sample_with_hiz)
return false;
return image->samples == 1;

View File

@@ -1681,6 +1681,11 @@ brw_upload_cs_work_groups_surface(struct brw_context *brw)
ISL_FORMAT_RAW,
3 * sizeof(GLuint), 1,
RELOC_WRITE);
/* The state buffer now holds a reference to our upload, drop ours. */
if (bo != brw->compute.num_work_groups_bo)
brw_bo_unreference(bo);
brw->ctx.NewDriverState |= BRW_NEW_SURFACES;
}
}

View File

@@ -33,7 +33,8 @@ libosmesa = shared_library(
include_directories : [
inc_include, inc_src, inc_mapi, inc_mesa, inc_gallium, inc_gallium_aux,
],
link_with : [libmesa_classic, libglapi_static, osmesa_link_with],
link_whole : libglapi_static,
link_with : [libmesa_classic, osmesa_link_with],
dependencies : [dep_thread, dep_selinux],
version : '8.0.0',
install : true,

View File

@@ -1760,6 +1760,10 @@ _mesa_make_current( struct gl_context *newCtx,
check_init_viewport(newCtx, drawBuffer->Width, drawBuffer->Height);
}
else {
_mesa_reference_framebuffer(&newCtx->WinSysDrawBuffer, NULL);
_mesa_reference_framebuffer(&newCtx->WinSysReadBuffer, NULL);
}
if (newCtx->FirstTimeCurrent) {
handle_first_current(newCtx);

View File

@@ -1105,10 +1105,17 @@ st_api_make_current(struct st_api *stapi, struct st_context_iface *stctxi,
else {
GET_CURRENT_CONTEXT(ctx);
ret = _mesa_make_current(NULL, NULL, NULL);
if (ctx)
if (ctx) {
/* Before releasing the context, release its associated
* winsys buffers first. Then purge the context's winsys buffers list
* to free the resources of any winsys buffers that no longer have
* an existing drawable.
*/
ret = _mesa_make_current(ctx, NULL, NULL);
st_framebuffers_purge(ctx->st);
}
ret = _mesa_make_current(NULL, NULL, NULL);
}
return ret;

View File

@@ -269,31 +269,39 @@ yuv_to_rgb(struct tgsi_transform_context *tctx,
tctx->emit_instruction(tctx, &inst);
/* DP3 dst.x, tmpA, imm[0] */
inst = dp3_instruction();
reg_dst(&inst.Dst[0], dst, TGSI_WRITEMASK_X);
reg_src(&inst.Src[0], &ctx->tmp[A].src, SWIZ(X, Y, Z, W));
reg_src(&inst.Src[1], &ctx->imm[0], SWIZ(X, Y, Z, W));
tctx->emit_instruction(tctx, &inst);
if (dst->Register.WriteMask & TGSI_WRITEMASK_X) {
inst = dp3_instruction();
reg_dst(&inst.Dst[0], dst, TGSI_WRITEMASK_X);
reg_src(&inst.Src[0], &ctx->tmp[A].src, SWIZ(X, Y, Z, W));
reg_src(&inst.Src[1], &ctx->imm[0], SWIZ(X, Y, Z, W));
tctx->emit_instruction(tctx, &inst);
}
/* DP3 dst.y, tmpA, imm[1] */
inst = dp3_instruction();
reg_dst(&inst.Dst[0], dst, TGSI_WRITEMASK_Y);
reg_src(&inst.Src[0], &ctx->tmp[A].src, SWIZ(X, Y, Z, W));
reg_src(&inst.Src[1], &ctx->imm[1], SWIZ(X, Y, Z, W));
tctx->emit_instruction(tctx, &inst);
if (dst->Register.WriteMask & TGSI_WRITEMASK_Y) {
inst = dp3_instruction();
reg_dst(&inst.Dst[0], dst, TGSI_WRITEMASK_Y);
reg_src(&inst.Src[0], &ctx->tmp[A].src, SWIZ(X, Y, Z, W));
reg_src(&inst.Src[1], &ctx->imm[1], SWIZ(X, Y, Z, W));
tctx->emit_instruction(tctx, &inst);
}
/* DP3 dst.z, tmpA, imm[2] */
inst = dp3_instruction();
reg_dst(&inst.Dst[0], dst, TGSI_WRITEMASK_Z);
reg_src(&inst.Src[0], &ctx->tmp[A].src, SWIZ(X, Y, Z, W));
reg_src(&inst.Src[1], &ctx->imm[2], SWIZ(X, Y, Z, W));
tctx->emit_instruction(tctx, &inst);
if (dst->Register.WriteMask & TGSI_WRITEMASK_Z) {
inst = dp3_instruction();
reg_dst(&inst.Dst[0], dst, TGSI_WRITEMASK_Z);
reg_src(&inst.Src[0], &ctx->tmp[A].src, SWIZ(X, Y, Z, W));
reg_src(&inst.Src[1], &ctx->imm[2], SWIZ(X, Y, Z, W));
tctx->emit_instruction(tctx, &inst);
}
/* MOV dst.w, imm[0].x */
inst = mov_instruction();
reg_dst(&inst.Dst[0], dst, TGSI_WRITEMASK_W);
reg_src(&inst.Src[0], &ctx->imm[3], SWIZ(_, _, _, W));
tctx->emit_instruction(tctx, &inst);
if (dst->Register.WriteMask & TGSI_WRITEMASK_W) {
inst = mov_instruction();
reg_dst(&inst.Dst[0], dst, TGSI_WRITEMASK_W);
reg_src(&inst.Src[0], &ctx->imm[3], SWIZ(_, _, _, W));
tctx->emit_instruction(tctx, &inst);
}
}
static void
@@ -434,7 +442,7 @@ st_tgsi_lower_yuv(const struct tgsi_token *tokens, unsigned free_slots,
/* TODO better job of figuring out how many extra tokens we need..
* this is a pain about tgsi_transform :-/
*/
newlen = tgsi_num_tokens(tokens) + 120;
newlen = tgsi_num_tokens(tokens) + 300;
newtoks = tgsi_alloc_tokens(newlen);
if (!newtoks)
return NULL;