Compare commits

..

41 Commits

Author SHA1 Message Date
Dylan Baker
7419e553db VERSION: bump for 21.0.2 release 2021-04-07 09:35:30 -07:00
Dylan Baker
ebe8cfc3ec docs: add release notes for 21.0.2 2021-04-07 09:35:07 -07:00
Boyuan Zhang
759ce9f053 frontend/va/image: add pipe flush for vlVaPutImage
To fix synchronization issue between multimedia queue and gfx queue.
Adding flush call will let multimedia queue to wait for the content of gfx
command buffer to be executed, for the case where there is dependency
between these two queues.

Fixes: 2f50dea218 ("radeonsi: always use a staging texture for linear 1D textures in VRAM")
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 27209e63ea)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9995>
2021-04-06 18:55:42 +00:00
Dave Airlie
a3a2783237 drisw: move zink down the list below the sw drivers.
We don't ever want drisw path picking zink as the driver,
we can revisit this when the penny wrapper work gets further
along.

This selection causes systems with nvidia/intel dual-gpus
to try and pick the intel gpu for rendering in the nvidia
context if there is no nvidia GL driver or accel doesn't work.

This is a partial revert of the original commit.

Fixes: 4a3b42a717 ("drisw: Prefer hardware-layered sw-winsys drivers over pure sw")
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9816>
(cherry picked from commit 3e1698fe1b)
2021-04-06 09:41:56 -07:00
Bas Nieuwenhuizen
b96b1db389 radv: Flush caches for shader read operations.
As part of the fmask expand we very much read from the images as
well ...

Fixes: 8f8d72af55 ("radv: Use access helpers for flushing with meta operations.")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10042>
(cherry picked from commit 57511d1458)
2021-04-06 09:41:56 -07:00
Pierre-Eric Pelloux-Prayer
882d47fae4 mesa/st: fix st_nir_lower_tex_src_plane arguments
st_nir_lower_tex_src_plane expects a mask, no a boolean.

CC: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9931>
(cherry picked from commit 72c54713aa)
2021-04-06 09:41:56 -07:00
Pierre-Eric Pelloux-Prayer
1665f478ac nir/lower_tex: ignore texture_index if tex_instr has deref src
texture_index is meaningless when a tex_instr has deref src.
Use var->data.binding instead.

This fixes the incorrect lowering on radeonsi where the same
lowering steps was applied to all tex_instr based on the needs
of the first one (since texture_index is always 0).

CC: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9931>
(cherry picked from commit bc438c91d9)
2021-04-06 09:41:55 -07:00
Adrian Ratiu
6a0f0a34fe docs: docker: minor stale documentation fix
Commits like the following changed the script names and distro tag
but didn't update the documentation. We do not explicitely mention
script names because they will likely change in the future but the
distro tag is less likely to change because it is shared with the
upstream ci-templates repo.

Fixes: af7dca3560 ("ci: Update the ci-templates commit.")
Fixes: 506e9d5fc7 ("gitlab-ci: Rename container install scripts to ...")
Fixes: c6c7652753 ("gitlab-ci: Organize images using new REPO_SUFFIX ...")
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9781>
(cherry picked from commit 8371b75241)
2021-04-06 09:41:55 -07:00
Marek Olšák
11585bb003 radeonsi: disable sparse buffers on gfx7-8
Cc: 20.3 21.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795>
(cherry picked from commit 8ea685dfc0)
2021-04-06 09:41:55 -07:00
Marek Olšák
816fd2cf5f ac/llvm: don't set unsupported xnack options to fix LLVM crashes on gfx6-8
LLVM prints an error if xnack is unsupported and it uses a global stream
object that is not thread-safe. Since Mesa uses multiple threads to compile
shaders, there is a small chance that it will crash.

Just don't set any xnack options to use LLVM defaults.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4439

Cc: 20.3 21.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795>
(cherry picked from commit ac78b12e23)
2021-04-06 09:41:55 -07:00
Dylan Baker
a1328ea781 .pick_status.json: Update to 1e0a69afa7 2021-04-06 09:41:55 -07:00
Tapani Pälli
c5c7d6a05a iris: clamp PointWidth in 3DSTATE_SF like i965 does
Values match how MinimumPointWidth, MaximumPointWidth is setup. This
fixes assert hit in debug build when packing the struct with too large
value for genxml.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9942>
(cherry picked from commit b2af419391)
2021-04-06 09:41:55 -07:00
Charmaine Lee
99a47874de gallivm: increase size of texture target enum bitfield
Need to bump up the size of texture target bitfield for MSVC.

Fixes: 0ce7c4a7c9 ("gallivm: Use the proper enum for the texture target bitfield.")

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9928>
(cherry picked from commit a442e3ff55)
2021-04-06 09:41:55 -07:00
Dylan Baker
ed60dec381 .pick_status.json: Update to fb5615af40 2021-04-06 09:41:55 -07:00
Erik Faye-Lund
d30cea2b9b compiler/glsl: avoid null-pointer deref
When we encounter a bindless image here, lower_deref returns a
NULL-pointer, and calling record_images_used will try to dereference
that NULL-pointer.

So let's dig out the var from the source instruction instead of the
result of the lowering.

Fixes: 5910c938a2 ("nir/glsl: gather bitmask of images used by program")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9895>
(cherry picked from commit 89a04a54c4)
2021-03-30 11:06:52 -07:00
Icecream95
5bcbe14854 pipe-loader,gallium/drm: Fix the kmsro pipe_loader target
Include drm_helper.h to define the driver descriptor again, but with a
new define GALLIUM_KMSRO_ONLY to disable defining descriptors for the
drivers that kmsro uses.

Fixes clinfo on Panfrost.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4002
Fixes: 9ec28b8d22 ("gallium/drm: Deduplicate screen creation for the dynamic (clover) pipe loader.")
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9380>
(cherry picked from commit 06a883cfe5)
2021-03-30 09:43:06 -07:00
Lionel Landwerlin
da38b604e3 intel/fs/copy_prop: check stride constraints with actual final type
In some cases we will change the type of the destination register of
an instruction. This is the type we should use to verify that we're
allow to do the replacement.

Otherwise we can hit restrictions on CHV and upcoming Xe-Hp for
instance where the copy propagation transforms this :

send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD
mov(16) vgrf11:UW, vgrf10<2>:UW
mov(16) vgrf12:UW, vgrf10+0.2<2>:UW
mov(16) vgrf15:HF, |vgrf11|:HF
mov(16) vgrf16:HF, |vgrf12|:HF
mov(8) vgrf41<2>:UW, vgrf15+0.0:UW group0
mov(8) vgrf42<2>:UW, vgrf15+0.16:UW group8
mov(8) vgrf45<2>:UW, vgrf16+0.0:UW group0
mov(8) vgrf46<2>:UW, vgrf16+0.16:UW group8

into this :

send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD
mov(8) vgrf41<2>:HF, |vgrf10+0.0|<2>:HF group0
mov(8) vgrf42<2>:HF, |vgrf10+1.0|<2>:HF group8
mov(8) vgrf45<2>:HF, |vgrf10+0.2|<2>:HF group0
mov(8) vgrf46<2>:HF, |vgrf10+1.2|<2>:HF group8

Because of the floating point use, stride and offets should be the
same.

v2: Fix final destination type selection (Curro)

v3: constify (Curro)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9832>
(cherry picked from commit aa53665fda)
2021-03-30 09:43:05 -07:00
Gert Wollny
540172fa43 r600: don't set an index_bias for indirect draw calls
The indirect draw call already encodes the index bias so that no
additional encoding in the hardware is needed in this case.

This fixes a regression with a number of tests from
   dEQP-GLES31.functional.draw_indirect.random.*

Fixes: c6c532faa8
  "gallium/u_vbuf: use updated pipe_draw_start_count while using draw_vbo"

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9877>
(cherry picked from commit acdf1a1234)
2021-03-30 09:43:04 -07:00
Dylan Baker
ca86b94e55 .pick_status.json: Update to 3c64c090e0 2021-03-30 09:42:48 -07:00
Dave Airlie
fe9e25b29a util: rework AMD cpu L3 cache affinity code.
This changes how the L3 cache affinity code works out the affinity
masks. It works better with multi-CPU systems and should also be
capable of handling big/little type situations if they appear in
the future.

It now iterates over all CPU cores, gets the core count for each
CPU, and works out the L3_ID from the physical CPU ID, and
the current cores L3 cache. It then tracks how many L3 caches
it has seen and reallocate the affinity masks for each one.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4496
Fixes: d8ea509965 ("util: completely rewrite and do AMD Zen L3 cache pinning correctly")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9782>
(cherry picked from commit 11d2db17c5)
2021-03-29 10:08:35 -07:00
Mike Blumenkrantz
b6123cd4d5 lavapipe: fix array texture region copies
these need to use different struct members for copying array textures

the buffer2image variants are already doing the right thing

Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9761>
(cherry picked from commit dfe9bfef9b)
2021-03-29 10:08:30 -07:00
Dylan Baker
f5444d504a .pick_status.json: Update to ee14bec09a 2021-03-29 10:08:20 -07:00
Tony Wasserka
5de93ffed8 aco/isel: Don't emit unsupported i16<->f16 conversion opcodes on GFX6/7
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: b86305bb57 ("nir/algebraic: collapse conversion opcodes (many patterns)")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4357
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
(cherry picked from commit 436922c84a)
2021-03-26 09:52:47 -07:00
Simon Ser
9a439ebcac Revert "egl: Don't add hardware device if there is no render node v2."
This reverts commit 5743a36b2b.

Now that _eglAddDevice is always called with the correct software
hint, no need to bail out if the device doesn't have a render node.
On split render/display SoCs, the DRM device won't have a render
node, yet rendering is hardware-accelerated (via kmsro).

Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 5743a36b2b ("egl: Don't add hardware device if there is no render node v2.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4178
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9697>
(cherry picked from commit 1d349a6484)
2021-03-26 09:52:45 -07:00
Simon Ser
f1ec9335a8 egl: only take render nodes into account when listing DRM devices
We don't want to expose an EGL device for a display-only DRM devices
(like VKMS). For these DRM devices we have a separate software-rendering
device (the first in the list, always present).

There is a similar check in _eglAddDRMDevice, however it will be
removed in a future commit to allow split render/display devices
to be properly added. We can't figure out whether we're on a split
render/display system before loading the driver.

Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 5743a36b2b ("egl: Don't add hardware device if there is no render node v2.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9697>
(cherry picked from commit e39d72aec2)
2021-03-26 09:52:44 -07:00
Simon Ser
4e4962b464 egl: fix software flag in _eglAddDevice call on DRM
On the EGL DRM platform, call _eglAddDevice with the software flag
set if GBM has loaded a software driver. This allows _eglAddDevice
to make the difference between llvmpipe and kmsro.

This is important on split render/display SoCs: we don't want to
advertise EGL_MESA_device_software on these systems.

Completely drop disp->Options.ForceSoftware, because GBM is
responsible for choosing software rendering and doesn't take this
hint into account.

Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 5743a36b2b ("egl: Don't add hardware device if there is no render node v2.")
References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4178
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9697>
(cherry picked from commit 08a51770bd)
2021-03-26 09:52:43 -07:00
Pierre-Eric Pelloux-Prayer
bca2aa6e48 mesa/st: fix lower_tex_src_plane in multiple samplers scenario
"plane[0].i32" is the plane being lowered, it's not the sampler we're looking
for.

It worked when there's a single sampler because, eg for NV12, plane[0].i32 for
the UV plane would be 1 and the added ":uv" sampler would also land at binding
point 1.

Fixes: 079e5f73d7 ("mesa/st: rewrite src var when lowering tex_src_plane")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9812>
(cherry picked from commit 6298347ec7)
2021-03-26 09:52:43 -07:00
Dylan Baker
d4e0e7c0f0 .pick_status.json: Update to a7c0cf500b 2021-03-26 09:52:35 -07:00
Icecream95
ffd661d50b panfrost: Disable early-z when alpha test is used
Fixes rendering artefacts in Minetest on Midgard.

Fixes: 275277a2b4 ("panfrost: Implement alpha testing natively")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9676>
(cherry picked from commit ae62fb3737)

Conflicts:
	src/gallium/drivers/panfrost/pan_cmdstream.c
2021-03-25 11:00:53 -07:00
Mike Blumenkrantz
a6a79fb31e lavapipe: fix CmdCopyQueryPoolResults for partial pipeline statistics queries
if this isn't a query for all pipeline statistics, the bits that are
set need to be individually copied in increasing order

Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9813>
(cherry picked from commit 4ad5bfd1bd)
2021-03-25 10:38:01 -07:00
Mike Blumenkrantz
090239c244 util/bitscan: add u_foreach_bit macros
this is a standardized (and very slightly improved for usability) version
of the macro that has been copied into every vulkan driver

includes fixup from Rob Clark <robclark@freedesktop.org>

Reviewed-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9191>
(cherry picked from commit e7c7150d63)
2021-03-25 10:37:59 -07:00
Rhys Perry
2ac46f95bd aco: implement image_deref_samples
It used to be that this intrinsic was never created and texture
instructions were always used.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 50881d59e6 ("compiler/spirv: fix image sample queries")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9686>
(cherry picked from commit 27e2f82f17)
2021-03-25 10:36:42 -07:00
Mike Blumenkrantz
f0b620307e lavapipe: use the passed offset for CmdCopyQueryPoolResults
this avoids overwriting buffer[0] on every copy

Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9813>
(cherry picked from commit e20aebb83c)
2021-03-25 10:36:42 -07:00
Dylan Baker
8d32c55d93 .pick_status.json: Mark 75951a44ee as backported 2021-03-25 10:36:42 -07:00
Mike Blumenkrantz
2733a9c712 util/set: stop leaking u32 key sets which pass a mem ctx
Fixes: 10a7682413 ("util: add _mesa_set_create_u32_keys where keys are not pointers")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9810>
(cherry picked from commit 5ecad3cb44)
2021-03-25 09:43:56 -07:00
Dylan Baker
3260a85b5c .pick_status.json: Update to 8e43abcd2c 2021-03-25 09:43:56 -07:00
Michel Dänzer
aa8bff051e Revert "glsl/test: Don't run whitespace tests in parallel"
This reverts commit c60cea0daa.

Didn't have the intended effect, and slowed down the meson test run.

Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9528>
(cherry picked from commit 5057f14cba)
2021-03-24 15:51:13 -07:00
Michel Dänzer
8d9ec9cd11 intel/tools: Use subprocess.Popen to read output directly from a pipe
Instead of using tempfiles to communicate between child & parent
process. The latter sometimes resulted in hitting the meson timeout if
there was high filesystem pressure.

Fixes: ccaa5b034f "intel/tools: rewrite run-test.sh in python"
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9528>
(cherry picked from commit 05bf12ccb6)
2021-03-24 15:51:13 -07:00
Dave Airlie
e37442f1b8 lavapipe: fix templated descriptor updates
The template path was buggy but CTS only tested it with Vulkan 1.1 enabled.

It was just missing the dstArrayElement offset.

Fixes: 41f7fa273d ("lavapipe: add support for VK_KHR_descriptor_update_template")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9675>
(cherry picked from commit 833847603b)
2021-03-24 15:51:06 -07:00
Dylan Baker
770b0185ab .pick_status.json: Update to 9be24c89c8 2021-03-24 15:51:01 -07:00
Dylan Baker
63267e018d docs: Add 21.0.1 hashes 2021-03-24 15:50:15 -07:00
38 changed files with 5713 additions and 184 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -1 +1 @@
21.0.1
21.0.2

View File

@@ -8,10 +8,9 @@ VK-GL-CTS, on the shared GitLab runners provided by `freedesktop
Software architecture
---------------------
The Docker containers are rebuilt from the debian-install.sh script
when DEBIAN\_TAG is changed in .gitlab-ci.yml, and
debian-test-install.sh when DEBIAN\_ARM64\_TAG is changed in
.gitlab-ci.yml. The resulting images are around 500MB, and are
The Docker containers are rebuilt using the shell scripts under
.gitlab-ci/container/ when the FDO\_DISTRIBUTION\_TAG changes in
.gitlab-ci.yml. The resulting images are around 1 GB, and are
expected to change approximately weekly (though an individual
developer working on them may produce many more images while trying to
come up with a working MR!).

View File

@@ -19,7 +19,7 @@ SHA256 checksum
::
TBD.
379fc984459394f2ab2d84049efdc3a659869dc1328ce72ef0598506611712bb mesa-21.0.1.tar.xz
New features

135
docs/relnotes/21.0.2.rst Normal file
View File

@@ -0,0 +1,135 @@
Mesa 21.0.2 Release Notes / 2021-04-07
======================================
Mesa 21.0.2 is a bug fix release which fixes bugs found since the 21.0.1 release.
Mesa 21.0.2 implements the OpenGL 4.6 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.6. OpenGL
4.6 is **only** available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
Mesa 21.0.2 implements the Vulkan 1.2 API, but the version reported by
the apiVersion property of the VkPhysicalDeviceProperties struct
depends on the particular driver being used.
SHA256 checksum
---------------
::
TBD.
New features
------------
- None
Bug fixes
---------
- warning: xnack 'Off' was requested for a processor that does not support it! \[AMD VEGAM with LLVM 12.0.0\]
- Clover doesn't work for kmsro drivers
- util cpu detection breaks on 128-core AMD machines
- ACO error with GCN 1 GPU
- kmsro advertises EGL_MESA_device_software
Changes
-------
Adrian Ratiu (1):
- docs: docker: minor stale documentation fix
Bas Nieuwenhuizen (1):
- radv: Flush caches for shader read operations.
Boyuan Zhang (1):
- frontend/va/image: add pipe flush for vlVaPutImage
Charmaine Lee (1):
- gallivm: increase size of texture target enum bitfield
Dave Airlie (3):
- lavapipe: fix templated descriptor updates
- util: rework AMD cpu L3 cache affinity code.
- drisw: move zink down the list below the sw drivers.
Dylan Baker (9):
- docs: Add 21.0.1 hashes
- .pick_status.json: Update to 9be24c89c8c298069eaa3ff600ba556b9a4557e9
- .pick_status.json: Update to 8e43abcd2c29366d77fff804a7845b61fb97ca5c
- .pick_status.json: Mark 75951a44ee9f25d29865f3dd60cdf3b8ce3f7f0c as backported
- .pick_status.json: Update to a7c0cf500b335069bfe480c947b26052335f897e
- .pick_status.json: Update to ee14bec09a92e4363ef916d00d4d9baecfb09fa9
- .pick_status.json: Update to 3c64c090e0d2250d7ee880550f8cbeac0052c8d9
- .pick_status.json: Update to fb5615af40a5878b127827f80f4185df63933f34
- .pick_status.json: Update to 1e0a69afa72c61e5f5841db3e5e7f6bb846a0fab
Erik Faye-Lund (1):
- compiler/glsl: avoid null-pointer deref
Gert Wollny (1):
- r600: don't set an index_bias for indirect draw calls
Icecream95 (2):
- panfrost: Disable early-z when alpha test is used
- pipe-loader,gallium/drm: Fix the kmsro pipe_loader target
Lionel Landwerlin (1):
- intel/fs/copy_prop: check stride constraints with actual final type
Marek Olšák (2):
- ac/llvm: don't set unsupported xnack options to fix LLVM crashes on gfx6-8
- radeonsi: disable sparse buffers on gfx7-8
Michel Dänzer (2):
- intel/tools: Use subprocess.Popen to read output directly from a pipe
- Revert "glsl/test: Don't run whitespace tests in parallel"
Mike Blumenkrantz (5):
- util/set: stop leaking u32 key sets which pass a mem ctx
- lavapipe: use the passed offset for CmdCopyQueryPoolResults
- util/bitscan: add u_foreach_bit macros
- lavapipe: fix CmdCopyQueryPoolResults for partial pipeline statistics queries
- lavapipe: fix array texture region copies
Pierre-Eric Pelloux-Prayer (3):
- mesa/st: fix lower_tex_src_plane in multiple samplers scenario
- nir/lower_tex: ignore texture_index if tex_instr has deref src
- mesa/st: fix st_nir_lower_tex_src_plane arguments
Rhys Perry (1):
- aco: implement image_deref_samples
Simon Ser (3):
- egl: fix software flag in \_eglAddDevice call on DRM
- egl: only take render nodes into account when listing DRM devices
- Revert "egl: Don't add hardware device if there is no render node v2."
Tapani Pälli (1):
- iris: clamp PointWidth in 3DSTATE_SF like i965 does
Tony Wasserka (1):
- aco/isel: Don't emit unsupported i16<->f16 conversion opcodes on GFX6/7

View File

@@ -2444,11 +2444,24 @@ void visit_alu_instr(isel_context *ctx, nir_alu_instr *instr)
case nir_op_i2f16: {
assert(dst.regClass() == v2b);
Temp src = get_alu_src(ctx, instr->src[0]);
if (instr->src[0].src.ssa->bit_size == 8)
src = convert_int(ctx, bld, src, 8, 16, true);
else if (instr->src[0].src.ssa->bit_size == 64)
const unsigned input_size = instr->src[0].src.ssa->bit_size;
if (input_size <= 16) {
/* Expand integer to the size expected by the uint→float converter used below */
unsigned target_size = (ctx->program->chip_class >= GFX8 ? 16 : 32);
if (input_size != target_size) {
src = convert_int(ctx, bld, src, input_size, target_size, true);
}
} else if (input_size == 64) {
src = convert_int(ctx, bld, src, 64, 32, false);
bld.vop1(aco_opcode::v_cvt_f16_i16, Definition(dst), src);
}
if (ctx->program->chip_class >= GFX8) {
bld.vop1(aco_opcode::v_cvt_f16_i16, Definition(dst), src);
} else {
/* GFX7 and earlier do not support direct f16⟷i16 conversions */
src = bld.vop1(aco_opcode::v_cvt_f32_i32, bld.def(v1), src);
bld.vop1(aco_opcode::v_cvt_f16_f32, Definition(dst), src);
}
break;
}
case nir_op_i2f32: {
@@ -2483,11 +2496,24 @@ void visit_alu_instr(isel_context *ctx, nir_alu_instr *instr)
case nir_op_u2f16: {
assert(dst.regClass() == v2b);
Temp src = get_alu_src(ctx, instr->src[0]);
if (instr->src[0].src.ssa->bit_size == 8)
src = convert_int(ctx, bld, src, 8, 16, false);
else if (instr->src[0].src.ssa->bit_size == 64)
const unsigned input_size = instr->src[0].src.ssa->bit_size;
if (input_size <= 16) {
/* Expand integer to the size expected by the uint→float converter used below */
unsigned target_size = (ctx->program->chip_class >= GFX8 ? 16 : 32);
if (input_size != target_size) {
src = convert_int(ctx, bld, src, input_size, target_size, false);
}
} else if (input_size == 64) {
src = convert_int(ctx, bld, src, 64, 32, false);
bld.vop1(aco_opcode::v_cvt_f16_u16, Definition(dst), src);
}
if (ctx->program->chip_class >= GFX8) {
bld.vop1(aco_opcode::v_cvt_f16_u16, Definition(dst), src);
} else {
/* GFX7 and earlier do not support direct f16⟷u16 conversions */
src = bld.vop1(aco_opcode::v_cvt_f32_u32, bld.def(v1), src);
bld.vop1(aco_opcode::v_cvt_f16_f32, Definition(dst), src);
}
break;
}
case nir_op_u2f32: {
@@ -2524,22 +2550,46 @@ void visit_alu_instr(isel_context *ctx, nir_alu_instr *instr)
}
case nir_op_f2i8:
case nir_op_f2i16: {
if (instr->src[0].src.ssa->bit_size == 16)
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_i16_f16, dst);
else if (instr->src[0].src.ssa->bit_size == 32)
if (instr->src[0].src.ssa->bit_size == 16) {
if (ctx->program->chip_class >= GFX8) {
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_i16_f16, dst);
} else {
/* GFX7 and earlier do not support direct f16⟷i16 conversions */
Temp tmp = bld.tmp(v1);
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_f32_f16, tmp);
tmp = bld.vop1(aco_opcode::v_cvt_i32_f32, bld.def(v1), tmp);
tmp = convert_int(ctx, bld, tmp, 32, 16, false, (dst.type() == RegType::sgpr) ? Temp() : dst);
if (dst.type() == RegType::sgpr) {
bld.pseudo(aco_opcode::p_as_uniform, Definition(dst), tmp);
}
}
} else if (instr->src[0].src.ssa->bit_size == 32) {
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_i32_f32, dst);
else
} else {
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_i32_f64, dst);
}
break;
}
case nir_op_f2u8:
case nir_op_f2u16: {
if (instr->src[0].src.ssa->bit_size == 16)
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_u16_f16, dst);
else if (instr->src[0].src.ssa->bit_size == 32)
if (instr->src[0].src.ssa->bit_size == 16) {
if (ctx->program->chip_class >= GFX8) {
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_u16_f16, dst);
} else {
/* GFX7 and earlier do not support direct f16⟷u16 conversions */
Temp tmp = bld.tmp(v1);
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_f32_f16, tmp);
tmp = bld.vop1(aco_opcode::v_cvt_u32_f32, bld.def(v1), tmp);
tmp = convert_int(ctx, bld, tmp, 32, 16, false, (dst.type() == RegType::sgpr) ? Temp() : dst);
if (dst.type() == RegType::sgpr) {
bld.pseudo(aco_opcode::p_as_uniform, Definition(dst), tmp);
}
}
} else if (instr->src[0].src.ssa->bit_size == 32) {
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_u32_f32, dst);
else
} else {
emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_u32_f64, dst);
}
break;
}
case nir_op_f2i32: {
@@ -6456,6 +6506,37 @@ void visit_image_size(isel_context *ctx, nir_intrinsic_instr *instr)
emit_split_vector(ctx, dst, instr->dest.ssa.num_components);
}
void get_image_samples(isel_context *ctx, Definition dst, Temp resource)
{
Builder bld(ctx->program, ctx->block);
Temp dword3 = emit_extract_vector(ctx, resource, 3, s1);
Temp samples_log2 = bld.sop2(aco_opcode::s_bfe_u32, bld.def(s1), bld.def(s1, scc), dword3, Operand(16u | 4u<<16));
Temp samples = bld.sop2(aco_opcode::s_lshl_b32, bld.def(s1), bld.def(s1, scc), Operand(1u), samples_log2);
Temp type = bld.sop2(aco_opcode::s_bfe_u32, bld.def(s1), bld.def(s1, scc), dword3, Operand(28u | 4u<<16 /* offset=28, width=4 */));
Operand default_sample = Operand(1u);
if (ctx->options->robust_buffer_access) {
/* Extract the second dword of the descriptor, if it's
* all zero, then it's a null descriptor.
*/
Temp dword1 = emit_extract_vector(ctx, resource, 1, s1);
Temp is_non_null_descriptor = bld.sopc(aco_opcode::s_cmp_gt_u32, bld.def(s1, scc), dword1, Operand(0u));
default_sample = Operand(is_non_null_descriptor);
}
Temp is_msaa = bld.sopc(aco_opcode::s_cmp_ge_u32, bld.def(s1, scc), type, Operand(14u));
bld.sop2(aco_opcode::s_cselect_b32, dst, samples, default_sample, bld.scc(is_msaa));
}
void visit_image_samples(isel_context *ctx, nir_intrinsic_instr *instr)
{
Builder bld(ctx->program, ctx->block);
Temp dst = get_ssa_temp(ctx, &instr->dest.ssa);
Temp resource = get_sampler_desc(ctx, nir_instr_as_deref(instr->src[0].ssa->parent_instr), ACO_DESC_IMAGE, NULL, true, false);
get_image_samples(ctx, Definition(dst), resource);
}
void visit_load_ssbo(isel_context *ctx, nir_intrinsic_instr *instr)
{
Builder bld(ctx->program, ctx->block);
@@ -8060,6 +8141,9 @@ void visit_intrinsic(isel_context *ctx, nir_intrinsic_instr *instr)
case nir_intrinsic_image_deref_size:
visit_image_size(ctx, instr);
break;
case nir_intrinsic_image_deref_samples:
visit_image_samples(ctx, instr);
break;
case nir_intrinsic_load_ssbo:
visit_load_ssbo(ctx, instr);
break;
@@ -9006,25 +9090,7 @@ void visit_tex(isel_context *ctx, nir_tex_instr *instr)
return get_buffer_size(ctx, resource, get_ssa_temp(ctx, &instr->dest.ssa), true);
if (instr->op == nir_texop_texture_samples) {
Temp dword3 = emit_extract_vector(ctx, resource, 3, s1);
Temp samples_log2 = bld.sop2(aco_opcode::s_bfe_u32, bld.def(s1), bld.def(s1, scc), dword3, Operand(16u | 4u<<16));
Temp samples = bld.sop2(aco_opcode::s_lshl_b32, bld.def(s1), bld.def(s1, scc), Operand(1u), samples_log2);
Temp type = bld.sop2(aco_opcode::s_bfe_u32, bld.def(s1), bld.def(s1, scc), dword3, Operand(28u | 4u<<16 /* offset=28, width=4 */));
Operand default_sample = Operand(1u);
if (ctx->options->robust_buffer_access) {
/* Extract the second dword of the descriptor, if it's
* all zero, then it's a null descriptor.
*/
Temp dword1 = emit_extract_vector(ctx, resource, 1, s1);
Temp is_non_null_descriptor = bld.sopc(aco_opcode::s_cmp_gt_u32, bld.def(s1, scc), dword1, Operand(0u));
default_sample = Operand(is_non_null_descriptor);
}
Temp is_msaa = bld.sopc(aco_opcode::s_cmp_ge_u32, bld.def(s1, scc), type, Operand(14u));
bld.sop2(aco_opcode::s_cselect_b32, Definition(get_ssa_temp(ctx, &instr->dest.ssa)),
samples, default_sample, bld.scc(is_msaa));
get_image_samples(ctx, Definition(get_ssa_temp(ctx, &instr->dest.ssa)), resource);
return;
}

View File

@@ -799,6 +799,7 @@ void init_context(isel_context *ctx, nir_shader *shader)
case nir_intrinsic_read_invocation:
case nir_intrinsic_first_invocation:
case nir_intrinsic_ballot:
case nir_intrinsic_image_deref_samples:
type = RegType::sgpr;
break;
case nir_intrinsic_load_sample_id:

View File

@@ -194,13 +194,11 @@ static LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family,
const char *triple = (tm_options & AC_TM_SUPPORTS_SPILL) ? "amdgcn-mesa-mesa3d" : "amdgcn--";
LLVMTargetRef target = ac_get_llvm_target(triple);
snprintf(features, sizeof(features), "+DumpCode%s%s%s%s%s",
snprintf(features, sizeof(features), "+DumpCode%s%s%s",
LLVM_VERSION_MAJOR >= 11 ? "" : ",-fp32-denormals,+fp64-denormals",
family >= CHIP_NAVI10 && !(tm_options & AC_TM_WAVE32)
? ",+wavefrontsize64,-wavefrontsize32"
: "",
family <= CHIP_NAVI14 && tm_options & AC_TM_FORCE_ENABLE_XNACK ? ",+xnack" : "",
family <= CHIP_NAVI14 && tm_options & AC_TM_FORCE_DISABLE_XNACK ? ",-xnack" : "",
tm_options & AC_TM_PROMOTE_ALLOCA_TO_SCRATCH ? ",-promote-alloca" : "");
LLVMTargetMachineRef tm =

View File

@@ -62,8 +62,6 @@ enum ac_func_attr
enum ac_target_machine_options
{
AC_TM_SUPPORTS_SPILL = (1 << 0),
AC_TM_FORCE_ENABLE_XNACK = (1 << 1),
AC_TM_FORCE_DISABLE_XNACK = (1 << 2),
AC_TM_PROMOTE_ALLOCA_TO_SCRATCH = (1 << 3),
AC_TM_CHECK_IR = (1 << 4),
AC_TM_ENABLE_GLOBAL_ISEL = (1 << 5),

View File

@@ -114,7 +114,9 @@ radv_expand_fmask_image_inplace(struct radv_cmd_buffer *cmd_buffer,
radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
cmd_buffer->state.flush_bits |= radv_dst_access_flush(cmd_buffer, VK_ACCESS_SHADER_WRITE_BIT, image);
cmd_buffer->state.flush_bits |=
radv_dst_access_flush(cmd_buffer, VK_ACCESS_SHADER_READ_BIT |
VK_ACCESS_SHADER_WRITE_BIT, image);
for (unsigned l = 0; l < radv_get_layerCount(image, subresourceRange); l++) {
struct radv_image_view iview;

View File

@@ -120,9 +120,10 @@ remove_struct_derefs_prep(nir_deref_instr **p, char **name,
static void
record_images_used(struct shader_info *info,
nir_deref_instr *deref)
nir_intrinsic_instr *instr)
{
nir_variable *var = nir_deref_instr_get_variable(deref);
nir_variable *var =
nir_deref_instr_get_variable(nir_src_as_deref(instr->src[0]));
/* Structs have been lowered already, so get_aoa_size is sufficient. */
const unsigned size =
@@ -302,7 +303,7 @@ lower_intrinsic(nir_intrinsic_instr *instr,
nir_deref_instr *deref =
lower_deref(b, state, nir_src_as_deref(instr->src[0]));
record_images_used(&state->shader->info, deref);
record_images_used(&state->shader->info, instr);
/* don't lower bindless: */
if (!deref)

View File

@@ -86,13 +86,6 @@ if with_any_opengl and with_tests and host_machine.system() != 'windows'
modes += ['valgrind']
endif
# For some unfathomable reason, three out of these four tests often time out
# when running within CI. On the assumption that there is some
# parallelisation badness happening rather than the non-UNIX tests entering
# infinite loops, try just marking them as serial-only.
#
# This should have a negligible impact on runtime since they are quick to
# execute.
foreach m : modes
test(
'glcpp test (@0@)'.format(m),
@@ -104,7 +97,6 @@ if with_any_opengl and with_tests and host_machine.system() != 'windows'
],
suite : ['compiler', 'glcpp'],
timeout: 60,
is_parallel: false,
)
endforeach
endif

View File

@@ -287,16 +287,17 @@ static void
convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex,
nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v,
nir_ssa_def *a,
const nir_lower_tex_options *options)
const nir_lower_tex_options *options,
unsigned texture_index)
{
const float *offset_vals;
const nir_const_value_3_4 *m;
assert((options->bt709_external & options->bt2020_external) == 0);
if (options->bt709_external & (1 << tex->texture_index)) {
if (options->bt709_external & (1u << texture_index)) {
m = &bt709_csc_coeffs;
offset_vals = bt709_csc_offsets;
} else if (options->bt2020_external & (1 << tex->texture_index)) {
} else if (options->bt2020_external & (1u << texture_index)) {
m = &bt2020_csc_coeffs;
offset_vals = bt2020_csc_offsets;
} else {
@@ -327,7 +328,8 @@ convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex,
static void
lower_y_uv_external(nir_builder *b, nir_tex_instr *tex,
const nir_lower_tex_options *options)
const nir_lower_tex_options *options,
unsigned texture_index)
{
b->cursor = nir_after_instr(&tex->instr);
@@ -339,12 +341,14 @@ lower_y_uv_external(nir_builder *b, nir_tex_instr *tex,
nir_channel(b, uv, 0),
nir_channel(b, uv, 1),
nir_imm_float(b, 1.0f),
options);
options,
texture_index);
}
static void
lower_y_u_v_external(nir_builder *b, nir_tex_instr *tex,
const nir_lower_tex_options *options)
const nir_lower_tex_options *options,
unsigned texture_index)
{
b->cursor = nir_after_instr(&tex->instr);
@@ -357,12 +361,14 @@ lower_y_u_v_external(nir_builder *b, nir_tex_instr *tex,
nir_channel(b, u, 0),
nir_channel(b, v, 0),
nir_imm_float(b, 1.0f),
options);
options,
texture_index);
}
static void
lower_yx_xuxv_external(nir_builder *b, nir_tex_instr *tex,
const nir_lower_tex_options *options)
const nir_lower_tex_options *options,
unsigned texture_index)
{
b->cursor = nir_after_instr(&tex->instr);
@@ -374,12 +380,14 @@ lower_yx_xuxv_external(nir_builder *b, nir_tex_instr *tex,
nir_channel(b, xuxv, 1),
nir_channel(b, xuxv, 3),
nir_imm_float(b, 1.0f),
options);
options,
texture_index);
}
static void
lower_xy_uxvx_external(nir_builder *b, nir_tex_instr *tex,
const nir_lower_tex_options *options)
const nir_lower_tex_options *options,
unsigned texture_index)
{
b->cursor = nir_after_instr(&tex->instr);
@@ -391,12 +399,14 @@ lower_xy_uxvx_external(nir_builder *b, nir_tex_instr *tex,
nir_channel(b, uxvx, 0),
nir_channel(b, uxvx, 2),
nir_imm_float(b, 1.0f),
options);
options,
texture_index);
}
static void
lower_ayuv_external(nir_builder *b, nir_tex_instr *tex,
const nir_lower_tex_options *options)
const nir_lower_tex_options *options,
unsigned texture_index)
{
b->cursor = nir_after_instr(&tex->instr);
@@ -407,12 +417,14 @@ lower_ayuv_external(nir_builder *b, nir_tex_instr *tex,
nir_channel(b, ayuv, 1),
nir_channel(b, ayuv, 0),
nir_channel(b, ayuv, 3),
options);
options,
texture_index);
}
static void
lower_xyuv_external(nir_builder *b, nir_tex_instr *tex,
const nir_lower_tex_options *options)
const nir_lower_tex_options *options,
unsigned texture_index)
{
b->cursor = nir_after_instr(&tex->instr);
@@ -423,12 +435,14 @@ lower_xyuv_external(nir_builder *b, nir_tex_instr *tex,
nir_channel(b, xyuv, 1),
nir_channel(b, xyuv, 0),
nir_imm_float(b, 1.0f),
options);
options,
texture_index);
}
static void
lower_yuv_external(nir_builder *b, nir_tex_instr *tex,
const nir_lower_tex_options *options)
const nir_lower_tex_options *options,
unsigned texture_index)
{
b->cursor = nir_after_instr(&tex->instr);
@@ -439,7 +453,8 @@ lower_yuv_external(nir_builder *b, nir_tex_instr *tex,
nir_channel(b, yuv, 1),
nir_channel(b, yuv, 2),
nir_imm_float(b, 1.0f),
options);
options,
texture_index);
}
/*
@@ -1052,38 +1067,45 @@ nir_lower_tex_block(nir_block *block, nir_builder *b,
progress = true;
}
if ((1 << tex->texture_index) & options->lower_y_uv_external) {
lower_y_uv_external(b, tex, options);
unsigned texture_index = tex->texture_index;
int tex_index = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref);
if (tex_index >= 0) {
nir_deref_instr *deref = nir_src_as_deref(tex->src[tex_index].src);
texture_index = nir_deref_instr_get_variable(deref)->data.binding;
}
if ((1u << texture_index) & options->lower_y_uv_external) {
lower_y_uv_external(b, tex, options, texture_index);
progress = true;
}
if ((1 << tex->texture_index) & options->lower_y_u_v_external) {
lower_y_u_v_external(b, tex, options);
if ((1u << texture_index) & options->lower_y_u_v_external) {
lower_y_u_v_external(b, tex, options, texture_index);
progress = true;
}
if ((1 << tex->texture_index) & options->lower_yx_xuxv_external) {
lower_yx_xuxv_external(b, tex, options);
if ((1u << texture_index) & options->lower_yx_xuxv_external) {
lower_yx_xuxv_external(b, tex, options, texture_index);
progress = true;
}
if ((1 << tex->texture_index) & options->lower_xy_uxvx_external) {
lower_xy_uxvx_external(b, tex, options);
if ((1u << texture_index) & options->lower_xy_uxvx_external) {
lower_xy_uxvx_external(b, tex, options, texture_index);
progress = true;
}
if ((1 << tex->texture_index) & options->lower_ayuv_external) {
lower_ayuv_external(b, tex, options);
if ((1u << texture_index) & options->lower_ayuv_external) {
lower_ayuv_external(b, tex, options, texture_index);
progress = true;
}
if ((1 << tex->texture_index) & options->lower_xyuv_external) {
lower_xyuv_external(b, tex, options);
if ((1u << texture_index) & options->lower_xyuv_external) {
lower_xyuv_external(b, tex, options, texture_index);
progress = true;
}
if ((1 << tex->texture_index) & options->lower_yuv_external) {
lower_yuv_external(b, tex, options);
lower_yuv_external(b, tex, options, texture_index);
progress = true;
}
@@ -1097,7 +1119,7 @@ nir_lower_tex_block(nir_block *block, nir_builder *b,
progress = true;
}
if (((1 << tex->texture_index) & options->swizzle_result) &&
if (((1u << texture_index) & options->swizzle_result) &&
!nir_tex_instr_is_query(tex) &&
!(tex->is_shadow && tex->is_new_style_shadow)) {
swizzle_result(b, tex, options->swizzles[tex->texture_index]);
@@ -1105,7 +1127,7 @@ nir_lower_tex_block(nir_block *block, nir_builder *b,
}
/* should be after swizzle so we know which channels are rgb: */
if (((1 << tex->texture_index) & options->lower_srgb) &&
if (((1u << texture_index) & options->lower_srgb) &&
!nir_tex_instr_is_query(tex) && !tex->is_shadow) {
linearize_srgb_result(b, tex);
progress = true;

View File

@@ -718,7 +718,7 @@ dri2_initialize_drm(_EGLDisplay *disp)
goto cleanup;
}
dev = _eglAddDevice(dri2_dpy->fd, disp->Options.ForceSoftware);
dev = _eglAddDevice(dri2_dpy->fd, dri2_dpy->gbm_dri->software);
if (!dev) {
err = "DRI2: failed to find EGLDevice";
goto cleanup;

View File

@@ -109,9 +109,9 @@ static int
_eglAddDRMDevice(drmDevicePtr device, _EGLDevice **out_dev)
{
_EGLDevice *dev;
const int wanted_nodes = 1 << DRM_NODE_RENDER | 1 << DRM_NODE_PRIMARY;
if ((device->available_nodes & wanted_nodes) != wanted_nodes)
if ((device->available_nodes & (1 << DRM_NODE_PRIMARY |
1 << DRM_NODE_RENDER)) == 0)
return -1;
dev = _eglGlobal.DeviceList;
@@ -274,6 +274,9 @@ _eglRefreshDeviceList(void)
num_devs = drmGetDevices2(0, devices, ARRAY_SIZE(devices));
for (int i = 0; i < num_devs; i++) {
if (!(devices[i]->available_nodes & (1 << DRM_NODE_RENDER)))
continue;
ret = _eglAddDRMDevice(devices[i], NULL);
/* Device is not added - error or already present */

View File

@@ -169,7 +169,7 @@ struct lp_static_texture_state
unsigned swizzle_a:3;
/* pipe_texture's state */
enum pipe_texture_target target:4; /**< PIPE_TEXTURE_* */
enum pipe_texture_target target:5; /**< PIPE_TEXTURE_* */
unsigned pot_width:1; /**< is the width a power of two? */
unsigned pot_height:1;
unsigned pot_depth:1;

View File

@@ -60,6 +60,15 @@ const struct drm_driver_descriptor descriptor_name = { \
#endif
#ifdef GALLIUM_KMSRO_ONLY
#undef GALLIUM_V3D
#undef GALLIUM_VC4
#undef GALLIUM_FREEDRENO
#undef GALLIUM_ETNAVIV
#undef GALLIUM_PANFROST
#undef GALLIUM_LIMA
#endif
#ifdef GALLIUM_I915
#include "i915/drm/i915_drm_public.h"
#include "i915/i915_public.h"

View File

@@ -81,9 +81,6 @@ sw_screen_create(struct sw_winsys *winsys)
UNUSED bool only_sw = env_var_as_boolean("LIBGL_ALWAYS_SOFTWARE", false);
const char *drivers[] = {
debug_get_option("GALLIUM_DRIVER", ""),
#if defined(GALLIUM_ZINK)
only_sw ? "" : "zink",
#endif
#if defined(GALLIUM_D3D12)
only_sw ? "" : "d3d12",
#endif
@@ -95,6 +92,9 @@ sw_screen_create(struct sw_winsys *winsys)
#endif
#if defined(GALLIUM_SWR)
"swr",
#endif
#if defined(GALLIUM_ZINK)
only_sw ? "" : "zink",
#endif
};

View File

@@ -86,9 +86,6 @@ sw_screen_create(struct sw_winsys *winsys)
UNUSED bool only_sw = env_var_as_boolean("LIBGL_ALWAYS_SOFTWARE", false);
const char *drivers[] = {
debug_get_option("GALLIUM_DRIVER", ""),
#if defined(GALLIUM_ZINK)
only_sw ? "" : "zink",
#endif
#if defined(GALLIUM_D3D12)
only_sw ? "" : "d3d12",
#endif
@@ -100,6 +97,9 @@ sw_screen_create(struct sw_winsys *winsys)
#endif
#if defined(GALLIUM_SWR)
"swr",
#endif
#if defined(GALLIUM_ZINK)
only_sw ? "" : "zink",
#endif
};

View File

@@ -1761,7 +1761,7 @@ iris_create_rasterizer_state(struct pipe_context *ctx,
sf.SmoothPointEnable = (state->point_smooth || state->multisample) &&
!state->point_quad_rasterization;
sf.PointWidthSource = state->point_size_per_vertex ? Vertex : State;
sf.PointWidth = state->point_size;
sf.PointWidth = CLAMP(state->point_size, 0.125f, 255.875f);
if (state->flatshade_first) {
sf.TriangleFanProvokingVertexSelect = 1;

View File

@@ -436,7 +436,8 @@ panfrost_prepare_midgard_fs_state(struct panfrost_context *ctx,
} else {
/* Reasons to disable early-Z from a shader perspective */
bool late_z = fs->can_discard || fs->writes_global ||
fs->writes_depth || fs->writes_stencil;
fs->writes_depth || fs->writes_stencil ||
(zsa->alpha_func != MALI_FUNC_ALWAYS);
/* If either depth or stencil is enabled, discard matters */
bool zs_enabled =

View File

@@ -2210,7 +2210,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info
}
index_bias = info->index_bias;
} else {
index_bias = draws[0].start;
index_bias = indirect ? 0 : draws[0].start;
}
/* Set the index offset and primitive restart. */

View File

@@ -229,7 +229,9 @@ static int si_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
return LLVM_VERSION_MAJOR < 9 && !sscreen->info.has_unaligned_shader_loads;
case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE:
return sscreen->info.has_sparse_vm_mappings ? RADEON_SPARSE_PAGE_SIZE : 0;
/* Gfx8 (Polaris11) hangs, so don't enable this on Gfx8 and older chips. */
return sscreen->info.chip_class >= GFX9 &&
sscreen->info.has_sparse_vm_mappings ? RADEON_SPARSE_PAGE_SIZE : 0;
case PIPE_CAP_UMA:
case PIPE_CAP_PREFER_IMM_ARRAYS_AS_CONSTBUF:

View File

@@ -140,8 +140,6 @@ void si_init_compiler(struct si_screen *sscreen, struct ac_llvm_compiler *compil
enum ac_target_machine_options tm_options =
(sscreen->debug_flags & DBG(GISEL) ? AC_TM_ENABLE_GLOBAL_ISEL : 0) |
(sscreen->info.chip_class <= GFX8 ? AC_TM_FORCE_DISABLE_XNACK :
sscreen->info.chip_class <= GFX10 ? AC_TM_FORCE_ENABLE_XNACK : 0) |
(!sscreen->llvm_has_working_vgpr_indexing ? AC_TM_PROMOTE_ALLOCA_TO_SCRATCH : 0) |
(sscreen->debug_flags & DBG(CHECK_IR) ? AC_TM_CHECK_IR : 0) |
(create_low_opt_compiler ? AC_TM_CREATE_LOW_OPT : 0);

View File

@@ -567,11 +567,12 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
struct lvp_descriptor *desc =
&set->descriptors[bind_layout->descriptor_index];
for (j = 0; j < entry->descriptorCount; ++j) {
unsigned idx = j + entry->dstArrayElement;
switch (entry->descriptorType) {
case VK_DESCRIPTOR_TYPE_SAMPLER: {
LVP_FROM_HANDLE(lvp_sampler, sampler,
*(VkSampler *)pSrc);
desc[j] = (struct lvp_descriptor) {
desc[idx] = (struct lvp_descriptor) {
.type = VK_DESCRIPTOR_TYPE_SAMPLER,
.info.sampler = sampler,
};
@@ -579,7 +580,7 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
}
case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER: {
VkDescriptorImageInfo *info = (VkDescriptorImageInfo *)pSrc;
desc[j] = (struct lvp_descriptor) {
desc[idx] = (struct lvp_descriptor) {
.type = entry->descriptorType,
.info.iview = lvp_image_view_from_handle(info->imageView),
.info.sampler = lvp_sampler_from_handle(info->sampler),
@@ -591,7 +592,7 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT: {
LVP_FROM_HANDLE(lvp_image_view, iview,
((VkDescriptorImageInfo *)pSrc)->imageView);
desc[j] = (struct lvp_descriptor) {
desc[idx] = (struct lvp_descriptor) {
.type = entry->descriptorType,
.info.iview = iview,
};
@@ -601,7 +602,7 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER: {
LVP_FROM_HANDLE(lvp_buffer_view, bview,
*(VkBufferView *)pSrc);
desc[j] = (struct lvp_descriptor) {
desc[idx] = (struct lvp_descriptor) {
.type = entry->descriptorType,
.info.buffer_view = bview,
};
@@ -613,7 +614,7 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC: {
VkDescriptorBufferInfo *info = (VkDescriptorBufferInfo *)pSrc;
desc[j] = (struct lvp_descriptor) {
desc[idx] = (struct lvp_descriptor) {
.type = entry->descriptorType,
.info.offset = info->offset,
.info.buffer = lvp_buffer_from_handle(info->buffer),

View File

@@ -1722,16 +1722,24 @@ static void handle_copy_image(struct lvp_cmd_buffer_entry *cmd,
struct pipe_box src_box;
src_box.x = copycmd->regions[i].srcOffset.x;
src_box.y = copycmd->regions[i].srcOffset.y;
src_box.z = copycmd->regions[i].srcOffset.z + copycmd->regions[i].srcSubresource.baseArrayLayer;
src_box.width = copycmd->regions[i].extent.width;
src_box.height = copycmd->regions[i].extent.height;
src_box.depth = copycmd->regions[i].extent.depth;
if (copycmd->src->bo->target == PIPE_TEXTURE_3D) {
src_box.depth = copycmd->regions[i].extent.depth;
src_box.z = copycmd->regions[i].srcOffset.z;
} else {
src_box.depth = copycmd->regions[i].srcSubresource.layerCount;
src_box.z = copycmd->regions[i].srcSubresource.baseArrayLayer;
}
unsigned dstz = copycmd->dst->bo->target == PIPE_TEXTURE_3D ?
copycmd->regions[i].dstOffset.z :
copycmd->regions[i].dstSubresource.baseArrayLayer;
state->pctx->resource_copy_region(state->pctx, copycmd->dst->bo,
copycmd->regions[i].dstSubresource.mipLevel,
copycmd->regions[i].dstOffset.x,
copycmd->regions[i].dstOffset.y,
copycmd->regions[i].dstOffset.z + copycmd->regions[i].dstSubresource.baseArrayLayer,
dstz,
copycmd->src->bo,
copycmd->regions[i].srcSubresource.mipLevel,
&src_box);
@@ -2096,7 +2104,7 @@ static void handle_copy_query_pool_results(struct lvp_cmd_buffer_entry *cmd,
struct lvp_query_pool *pool = copycmd->pool;
for (unsigned i = copycmd->first_query; i < copycmd->first_query + copycmd->query_count; i++) {
unsigned offset = copycmd->dst->offset + (copycmd->stride * (i - copycmd->first_query));
unsigned offset = copycmd->dst_offset + copycmd->dst->offset + (copycmd->stride * (i - copycmd->first_query));
if (pool->queries[i]) {
if (copycmd->flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT)
state->pctx->get_query_result_resource(state->pctx,
@@ -2106,21 +2114,35 @@ static void handle_copy_query_pool_results(struct lvp_cmd_buffer_entry *cmd,
-1,
copycmd->dst->bo,
offset + (copycmd->flags & VK_QUERY_RESULT_64_BIT ? 8 : 4));
state->pctx->get_query_result_resource(state->pctx,
pool->queries[i],
copycmd->flags & VK_QUERY_RESULT_WAIT_BIT,
copycmd->flags & VK_QUERY_RESULT_64_BIT ? PIPE_QUERY_TYPE_U64 : PIPE_QUERY_TYPE_U32,
0,
copycmd->dst->bo,
offset);
if (pool->type == VK_QUERY_TYPE_PIPELINE_STATISTICS) {
unsigned num_results = 0;
unsigned result_size = copycmd->flags & VK_QUERY_RESULT_64_BIT ? 8 : 4;
u_foreach_bit(bit, pool->pipeline_stats)
state->pctx->get_query_result_resource(state->pctx,
pool->queries[i],
copycmd->flags & VK_QUERY_RESULT_WAIT_BIT,
copycmd->flags & VK_QUERY_RESULT_64_BIT ? PIPE_QUERY_TYPE_U64 : PIPE_QUERY_TYPE_U32,
bit,
copycmd->dst->bo,
offset + num_results++ * result_size);
} else {
state->pctx->get_query_result_resource(state->pctx,
pool->queries[i],
copycmd->flags & VK_QUERY_RESULT_WAIT_BIT,
copycmd->flags & VK_QUERY_RESULT_64_BIT ? PIPE_QUERY_TYPE_U64 : PIPE_QUERY_TYPE_U32,
0,
copycmd->dst->bo,
offset);
}
} else {
/* if no queries emitted yet, just reset the buffer to 0 so avail is reported correctly */
if (copycmd->flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT) {
struct pipe_transfer *src_t;
uint32_t *map;
struct pipe_box box = {};
box.width = copycmd->stride * copycmd->query_count;
struct pipe_box box = {0};
box.x = offset;
box.width = copycmd->stride;
box.height = 1;
box.depth = 1;
map = state->pctx->transfer_map(state->pctx,

View File

@@ -696,6 +696,7 @@ vlVaPutImage(VADriverContextP ctx, VASurfaceID surface, VAImageID image,
}
}
}
drv->pipe->flush(drv->pipe, NULL, 0);
mtx_unlock(&drv->mutex);
return VA_STATUS_SUCCESS;

View File

@@ -2,3 +2,5 @@
#include "target-helpers/inline_debug_helper.h"
#include "frontend/drm_driver.h"
#include "kmsro/drm/kmsro_drm_public.h"
#define GALLIUM_KMSRO_ONLY
#include "target-helpers/drm_helper.h"

View File

@@ -486,10 +486,13 @@ dri_screen_create_sw(struct gbm_dri_device *dri)
return -errno;
ret = dri_screen_create_dri2(dri, driver_name);
if (ret == 0)
if (ret != 0)
ret = dri_screen_create_swrast(dri);
if (ret != 0)
return ret;
return dri_screen_create_swrast(dri);
dri->software = true;
return 0;
}
static const struct gbm_dri_visual gbm_dri_visuals_table[] = {

View File

@@ -63,6 +63,7 @@ struct gbm_dri_device {
void *driver;
char *driver_name; /* Name of the DRI module, without the _dri suffix */
bool software; /* A software driver was loaded */
__DRIscreen *screen;
__DRIcontext *context;

View File

@@ -367,7 +367,8 @@ is_logic_op(enum opcode opcode)
}
static bool
can_take_stride(fs_inst *inst, unsigned arg, unsigned stride,
can_take_stride(fs_inst *inst, brw_reg_type dst_type,
unsigned arg, unsigned stride,
const gen_device_info *devinfo)
{
if (stride > 4)
@@ -377,9 +378,9 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned stride,
* of the corresponding channel of the destination, and the provided stride
* would break this restriction.
*/
if (has_dst_aligned_region_restriction(devinfo, inst) &&
if (has_dst_aligned_region_restriction(devinfo, inst, dst_type) &&
!(type_sz(inst->src[arg].type) * stride ==
type_sz(inst->dst.type) * inst->dst.stride ||
type_sz(dst_type) * inst->dst.stride ||
stride == 0))
return false;
@@ -528,10 +529,15 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
if (instruction_requires_packed_data(inst) && entry_stride != 1)
return false;
const brw_reg_type dst_type = (has_source_modifiers &&
entry->dst.type != inst->src[arg].type) ?
entry->dst.type : inst->dst.type;
/* Bail if the result of composing both strides would exceed the
* hardware limit.
*/
if (!can_take_stride(inst, arg, entry_stride * inst->src[arg].stride,
if (!can_take_stride(inst, dst_type, arg,
entry_stride * inst->src[arg].stride,
devinfo))
return false;

View File

@@ -549,7 +549,8 @@ is_unordered(const fs_inst *inst)
*/
static inline bool
has_dst_aligned_region_restriction(const gen_device_info *devinfo,
const fs_inst *inst)
const fs_inst *inst,
brw_reg_type dst_type)
{
const brw_reg_type exec_type = get_exec_type(inst);
/* Even though the hardware spec claims that "integer DWord multiply"
@@ -563,13 +564,20 @@ has_dst_aligned_region_restriction(const gen_device_info *devinfo,
(inst->opcode == BRW_OPCODE_MAD &&
MIN2(type_sz(inst->src[1].type), type_sz(inst->src[2].type)) >= 4));
if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
if (type_sz(dst_type) > 4 || type_sz(exec_type) > 4 ||
(type_sz(exec_type) == 4 && is_dword_multiply))
return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
else
return false;
}
static inline bool
has_dst_aligned_region_restriction(const gen_device_info *devinfo,
const fs_inst *inst)
{
return has_dst_aligned_region_restriction(devinfo, inst, inst->dst.type);
}
/**
* Return whether the LOAD_PAYLOAD instruction is a plain copy of bits from
* the specified register file into a VGRF.

View File

@@ -7,7 +7,6 @@ import os
import pathlib
import subprocess
import sys
import tempfile
# The meson version handles windows paths better, but if it's not available
# fall back to shlex
@@ -37,18 +36,17 @@ success = True
for asm_file in args.gen_folder.glob('*.asm'):
expected_file = asm_file.stem + '.expected'
expected_path = args.gen_folder / expected_file
out_path = tempfile.NamedTemporaryFile()
try:
command = i965_asm + [
'--type', 'hex',
'--gen', args.gen_name,
'--output', out_path.name,
asm_file
]
subprocess.run(command,
stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT)
with subprocess.Popen(command,
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL) as cmd:
lines_after = [line.decode('ascii') for line in cmd.stdout.readlines()]
except OSError as e:
if e.errno == errno.ENOEXEC:
print('Skipping due to inability to run host binaries.',
@@ -58,7 +56,6 @@ for asm_file in args.gen_folder.glob('*.asm'):
with expected_path.open() as f:
lines_before = f.readlines()
lines_after = [line.decode('ascii') for line in out_path]
diff = ''.join(difflib.unified_diff(lines_before, lines_after,
expected_file, asm_file.stem + '.out'))

View File

@@ -139,7 +139,7 @@ lower_tex_src_plane_block(nir_builder *b, lower_tex_src_state *state, nir_block
if (tex_index >= 0 && samp_index >= 0) {
b->cursor = nir_before_instr(&tex->instr);
nir_variable* samp = find_sampler(state, plane[0].i32);
nir_variable* samp = find_sampler(state, tex->sampler_index);
assert(samp);
nir_deref_instr *tex_deref_instr = nir_build_deref_var(b, samp);

View File

@@ -1321,7 +1321,7 @@ st_create_fp_variant(struct st_context *st,
key->external.lower_yuv)) {
NIR_PASS_V(state.ir.nir, st_nir_lower_tex_src_plane,
~stfp->Base.SamplersUsed,
key->external.lower_nv12 || key->external.lower_xy_uxvx ||
key->external.lower_nv12 | key->external.lower_xy_uxvx |
key->external.lower_yx_xuxv,
key->external.lower_iyuv);
finalize = true;

View File

@@ -104,6 +104,11 @@ u_bit_scan(unsigned *mask)
return i;
}
#define u_foreach_bit(b, dword) \
for (uint32_t __dword = (dword), b; \
((b) = ffs(__dword) - 1, __dword); \
__dword &= ~(1 << (b)))
static inline int
u_bit_scan64(uint64_t *mask)
{
@@ -112,6 +117,11 @@ u_bit_scan64(uint64_t *mask)
return i;
}
#define u_foreach_bit64(b, dword) \
for (uint64_t __dword = (dword), b; \
((b) = ffsll(__dword) - 1, __dword); \
__dword &= ~(1ull << (b)))
/* Determine if an unsigned value is a power of two.
*
* \note

View File

@@ -165,7 +165,7 @@ key_u32_equals(const void *a, const void *b)
struct set *
_mesa_set_create_u32_keys(void *mem_ctx)
{
return _mesa_set_create(NULL, key_u32_hash, key_u32_equals);
return _mesa_set_create(mem_ctx, key_u32_hash, key_u32_equals);
}
struct set *

View File

@@ -444,20 +444,14 @@ get_cpu_topology(void)
util_cpu_caps.family < CPU_AMD_LAST) {
uint32_t regs[4];
/* Query the L3 cache count. */
cpuid_count(0x8000001D, 3, regs);
unsigned cache_level = (regs[0] >> 5) & 0x7;
unsigned cores_per_L3 = ((regs[0] >> 14) & 0xfff) + 1;
if (cache_level != 3 || cores_per_L3 == util_cpu_caps.nr_cpus)
return;
uint32_t saved_mask[UTIL_MAX_CPUS / 32] = {0};
uint32_t mask[UTIL_MAX_CPUS / 32] = {0};
uint32_t allowed_mask[UTIL_MAX_CPUS / 32] = {0};
uint32_t apic_id[UTIL_MAX_CPUS];
bool saved = false;
uint32_t L3_found[UTIL_MAX_CPUS] = {0};
uint32_t num_L3_caches = 0;
util_affinity_mask *L3_affinity_masks = NULL;
/* Query APIC IDs from each CPU core.
*
* An APIC ID is a logical ID of the CPU with respect to the cache
@@ -484,39 +478,58 @@ get_cpu_topology(void)
!saved ? saved_mask : NULL,
util_cpu_caps.num_cpu_mask_bits)) {
saved = true;
allowed_mask[i / 32] |= cpu_bit;
/* Query the APIC ID of the current core. */
cpuid(0x00000001, regs);
apic_id[i] = regs[1] >> 24;
unsigned apic_id = regs[1] >> 24;
/* Query the total core count for the CPU */
uint32_t core_count = 1;
if (regs[3] & (1 << 28))
core_count = (regs[1] >> 16) & 0xff;
core_count = util_next_power_of_two(core_count);
/* Query the L3 cache count. */
cpuid_count(0x8000001D, 3, regs);
unsigned cache_level = (regs[0] >> 5) & 0x7;
unsigned cores_per_L3 = ((regs[0] >> 14) & 0xfff) + 1;
if (cache_level != 3)
continue;
unsigned local_core_id = apic_id & (core_count - 1);
unsigned phys_id = (apic_id & ~(core_count - 1)) >> util_logbase2(core_count);
unsigned local_l3_cache_index = local_core_id / util_next_power_of_two(cores_per_L3);
#define L3_ID(p, i) (p << 16 | i << 1 | 1);
unsigned l3_id = L3_ID(phys_id, local_l3_cache_index);
int idx = -1;
for (unsigned c = 0; c < num_L3_caches; c++) {
if (L3_found[c] == l3_id) {
idx = c;
break;
}
}
if (idx == -1) {
idx = num_L3_caches;
L3_found[num_L3_caches++] = l3_id;
L3_affinity_masks = realloc(L3_affinity_masks, sizeof(util_affinity_mask) * num_L3_caches);
if (!L3_affinity_masks)
return;
memset(&L3_affinity_masks[num_L3_caches - 1], 0, sizeof(util_affinity_mask));
}
util_cpu_caps.cpu_to_L3[i] = idx;
L3_affinity_masks[idx][i / 32] |= cpu_bit;
}
mask[i / 32] = 0;
}
util_cpu_caps.num_L3_caches = num_L3_caches;
util_cpu_caps.L3_affinity_mask = L3_affinity_masks;
if (saved) {
/* We succeeded in using at least one CPU. */
util_cpu_caps.num_L3_caches = util_cpu_caps.nr_cpus / cores_per_L3;
util_cpu_caps.cores_per_L3 = cores_per_L3;
util_cpu_caps.L3_affinity_mask = calloc(sizeof(util_affinity_mask),
util_cpu_caps.num_L3_caches);
for (unsigned i = 0; i < util_cpu_caps.nr_cpus && i < UTIL_MAX_CPUS;
i++) {
uint32_t cpu_bit = 1u << (i % 32);
if (allowed_mask[i / 32] & cpu_bit) {
/* Each APIC ID bit represents a topology level, so we need
* to round up to the next power of two.
*/
unsigned L3_index = apic_id[i] /
util_next_power_of_two(cores_per_L3);
util_cpu_caps.L3_affinity_mask[L3_index][i / 32] |= cpu_bit;
util_cpu_caps.cpu_to_L3[i] = L3_index;
}
}
if (debug_get_option_dump_cpu()) {
fprintf(stderr, "CPU <-> L3 cache mapping:\n");
for (unsigned i = 0; i < util_cpu_caps.num_L3_caches; i++) {