Compare commits

...

26 Commits

Author SHA1 Message Date
Dylan Baker
6bbbef9699 VERSION: bump for 20.0.0-rc3 2020-02-13 10:18:44 -08:00
Samuel Pitoiset
fa0dcef2ef nir: do not use De Morgan's Law rules for flt and fge
In presence of NaNs, "!(flt(a, b) && flt(c, d))" is NOT EQUAL
to "fge(a, b) || fge(c, d)". These optimizations are unsafe for
apps that rely on NaN behaviour.

pipeline-db (GFX9/LLVM):
Totals from affected shaders:
SGPRS: 3176 -> 3136 (-1.26 %)
VGPRS: 2188 -> 2144 (-2.01 %)
Spilled SGPRs: 227 -> 169 (-25.55 %)
Code Size: 150572 -> 151800 (0.82 %) bytes
Max Waves: 307 -> 310 (0.98 %)

pipeline-db (GFX9/ACO):
Totals from affected shaders:
SGPRS: 18744 -> 18744 (0.00 %)
VGPRS: 15576 -> 15580 (0.03 %)
Spilled SGPRs: 164 -> 164 (0.00 %)
Code Size: 1573012 -> 1576492 (0.22 %) bytes
Max Waves: 1534 -> 1532 (-0.13 %)

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2127
Fixes: d1ed4ffe0b ("nir: Use De Morgan's Law on logic compounded comparisons")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3696>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3696>
(cherry picked from commit 8e77280774)
2020-02-11 09:49:15 -08:00
Samuel Pitoiset
4558bdb95a aco: fix creating v_madak if v_mad_f32 has two sgpr literals
Do not ignore that src1 can be a sgpr.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2435
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3759>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3759>
(cherry picked from commit ddd767387f)
2020-02-11 09:49:15 -08:00
Vinson Lee
75ea9c808d panfrost: Remove unused anonymous enum variables.
This patch fix these build errors with GCC 10.

/usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:52: multiple definition of `pan_sysval'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:52: first defined here
/usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:68: multiple definition of `pan_special_attributes'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:68: first defined here

Fixes: 7e8de5a707 ("panfrost: Implement system values")
Fixes: 306800d747 ("pan/midgard: Lower gl_VertexID/gl_InstanceID to attributes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752>
(cherry picked from commit 63345a3596)
2020-02-11 09:49:14 -08:00
Eric Anholt
e5f13bca20 Revert "gallium: Fix big-endian addressing of non-bitmask array formats."
This reverts the functional part of commit
d17ff2f7f1, leaving the unit test for
mesa/pipe agreement on what's an array.

The issue is that the util_channel_desc.shift values on array formats are
not used for bit addressing in memory, they're bit addressing within a
word treating a pixel of the format as a native type, as seen by
llvmpipe's use of the values to do shifts (see
lp_build_unpack_arith_rgba_aos() for example).  This means the values are
nonsensical for 3-byte RGB, but then llvmpipe doesn't expose those formats
so it works out.

I still want to clean up our big-endian format handling at some point, but
let's fix the s390x regression first, sort out our format unit tests in
CI, then be able to refactor with confidence.

Fixes: d17ff2f7f1 ("gallium: Fix big-endian addressing of non-bitmask array formats.")
Closes: #2472
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721>
(cherry picked from commit 1886dbfe73)
2020-02-11 09:49:14 -08:00
Marek Olšák
f93c8d8598 radeonsi: fix the DCC MSAA bug workaround
Cc: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3646>
(cherry picked from commit fbb27eebc8)
2020-02-11 09:49:13 -08:00
Neha Bhende
c4e1dd07eb svga: Use pipe_shader_state_from_tgsi to set shader state
Use pipe_shader_state_from_tgsi() to set shader state for transformed
shader so that we get all correct data for respective shader state.

This fixes several regressed glretrace, piglit crashes found during merging
upsteam mesa

Fixes: bf12bc2dd7 (draw: add nir info gathering and building support)

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit 144561dc5e)
2020-02-11 09:49:12 -08:00
Neha Bhende
f86e27156d svga: fix size of format_conversion_table[]
Since we are now using sparse matrix for format_conversion_table,
we have to make sure we have last entry in table which gives the
sense of required size of format_conversion_table

Fixes: 84db6ba7 ("svga: Drop unsupported formats from the format table")

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit 470e73e7f8)
2020-02-11 09:49:12 -08:00
Dylan Baker
9724b0f32c .pick_status.json: Update to 2303762735 2020-02-11 09:49:08 -08:00
Samuel Pitoiset
a3bd400c14 aco: fix waiting for scalar stores before "writing back" data on GFX8-GFX9
Seems required also on GFX8-GFX9 to achieve correct behaviour. This
is an undocumented behaviour but it makes real sense to me.

pipeline-db on GFX9:
Totals from affected shaders:
SGPRS: 1018 -> 1018 (0.00 %)
VGPRS: 516 -> 516 (0.00 %)
Code Size: 40516 -> 40636 (0.30 %) bytes
Max Waves: 280 -> 280 (0.00 %)

This fixes some sort of sun flickering with Assassins Creed Origins.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2488
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3750>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3750>
(cherry picked from commit 34fd894e42)
2020-02-10 09:05:28 -08:00
Georg Lehmann
32dc7fff47 Vulkan overlay: use the corresponding image index for each swapchain
pImageIndices should be a pointer to the current image index
otherwise every swapchain but the first one could have a wrong image index

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3741>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3741>
(cherry picked from commit 7283c33b98)
2020-02-10 09:05:28 -08:00
Marek Olšák
027f9c887c radeonsi: don't report that multi-plane formats are supported
Fixes: a554b45d - st/mesa: don't lower YUV when driver supports it natively
Closes: #2376

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3632>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3632>
(cherry picked from commit 35961b10da)
2020-02-10 09:05:27 -08:00
Hyunjun Ko
f3f4751851 freedreno/ir3: put the conversion back for half const to the right place.
The previous commit leads to match immed values unexpectedly.

This makes constlen for each shader including bvert wrong.
Also fixes atan2 for mediump deqp tests.

Fixes: cbd1f47433 ("freedreno/ir3: convert back to 32-bit values for half constant registers.")

v2: Move conversion up above fabs/fneg modifier handling as well.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
(cherry picked from commit 260bd32b58)
2020-02-10 09:05:27 -08:00
Dylan Baker
a25c7674aa .pick_status.json: Update to 689817c9df 2020-02-10 09:05:25 -08:00
Georg Lehmann
1d17f42732 Vulkan Overlay: Don't try to change the image layout to present twice
The render pass already does the transition.
The pipeline barrier is still needed to transfer the queue family ownership.

Fixes: 320b0f66c2 ("vulkan/overlay: bounce image back to present layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3740>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3740>
(cherry picked from commit f239bb8020)
2020-02-07 09:21:52 -08:00
Samuel Pitoiset
e393404ff1 aco: do not use ds_{read,write}2 on GFX6
According to LLVM, these instructions have a bounds checking bug.
LLVM only uses them on GFX7+.

This fixes broken geometry in Assassins Creed Origins.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2489
Fixes: 4a553212fa ("radv: enable ACO support for GFX6")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3746>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3746>
(cherry picked from commit 4b978cd950)
2020-02-07 09:21:52 -08:00
Tapani Pälli
1f8db81632 intel/vec4: fix valgrind errors with vf_values array
Fixes valgrind errors introduced since commit a8ec4082.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2346
Fixes: a8ec4082 ("nir+vtn: vec8+vec16 support")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3691>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3691>
(cherry picked from commit da76dfb515)
2020-02-07 09:21:51 -08:00
Rhys Perry
8f29aaa2cf aco: fix gfx10_wave64_bpermute
Since 9254fb4fc7, the pass replaced the SCC clobber with the scalar
identity temporary. Just skip most of the temporary setup, since we don't
need it for gfx10_wave64_bpermute.

Although shuffles are disabled on GFX10, Detroit: Become Human seems to
use them anyway.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 9254fb4fc7 ('aco: don't use a scalar
       temporary for reductions on GFX10')

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3683>
(cherry picked from commit 20eb1acb6f)
2020-02-07 09:21:49 -08:00
Georg Lehmann
1e0cc313ba Correctly wait in the fragment stage until all semaphores are signaled
This fixes two issues:
- a crash if the application uses more than one semaphore for presenting because the driver expects one stage per semaphore
- the swapchain image could be not ready yet if the semaphores aren't signaled, #946 is possible related

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3718>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3718>
(cherry picked from commit 1c79afd946)
2020-02-07 09:21:47 -08:00
Thomas Hellstrom
d96f0faacf svga: Fix banded DMA upload
A previous commit ("winsys/svga: Limit the maximum DMA hardware buffer
size") made banded DMA transfer kick in when transfering gnome-shell
window contents under gnome-shell / wayland. This uncovered a bug where
we assumed that banded DMA transfers always occur to the top (y=0) of the
surface.
Fix this by taking the destination y offset into account.

Cc: 19.2 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Fixes: 287c94ea49 ("Squashed commit of the following:")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3733>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3733>
(cherry picked from commit 451cf228d5)
2020-02-07 09:21:46 -08:00
Lionel Landwerlin
203710e94c anv: set MOCS on push constants
v2: Also set MOCS on 3DSTATE_CONSTANT_ALL (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 67d2cb3e93 ("anv: Add get_push_range_address() helper.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3732>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3732>
(cherry picked from commit f9febfae41)
2020-02-07 09:21:45 -08:00
Rafael Antognolli
d189ab9fcc intel: Load the driver even if I915_PARAM_REVISION is not found.
This param is only available starting on kernel 4.1. Use a default
value of 0 if it is not found instead.

v2: Update commit message (Lionel)

Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Mark Janes <mark.a.janes@intel.com>
Fixes: 96e1c945f2 ("i965: Move device info initialization to common
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3727>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3727>
(cherry picked from commit 4aa7af9e9a)
2020-02-07 09:21:44 -08:00
Vinson Lee
bd934ff613 swr: Fix GCC 4.9 checks.
Fixes: f0a22956be ("swr/rast: _mm*_undefined_* implementations for gcc<4.9")
Fixes: e21fc2c625 ("swr/rast: non-regex knob fallback code for gcc < 4.9")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
(cherry picked from commit deb2bbf57e)
2020-02-07 09:21:44 -08:00
James Xiong
51f7d81dd2 gallium: let the pipe drivers decide the supported modifiers
fixes: ac0219cc5b ("gallium: dmabuf support for yuv formats that are not natively supported")

Signed-off-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3527>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3527>
(cherry picked from commit 205ce0bea5)
2020-02-07 09:21:43 -08:00
Timur Kristóf
06a9d51f27 aco/optimizer: Don't combine uniform bool s_and to s_andn2.
Fixes: 8a32f57fff

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3714>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3714>
(cherry picked from commit 4d34abd15c)
2020-02-07 09:21:43 -08:00
Dylan Baker
419c992e65 .pick_status.json: Update to d8bae10bfe 2020-02-07 09:21:42 -08:00
21 changed files with 1348 additions and 96 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -1 +1 @@
20.0.0-rc2
20.0.0-rc3

View File

@@ -374,13 +374,16 @@ wait_imm kill(Instruction* instr, wait_ctx& ctx)
imm.combine(parse_wait_instr(ctx, instr));
if (ctx.chip_class >= GFX10) {
/* Seems to be required on GFX10 to achieve correct behaviour.
* It shouldn't cost anything anyways since we're about to do s_endpgm.
*/
if (ctx.lgkm_cnt && instr->opcode == aco_opcode::s_dcache_wb)
imm.lgkm = 0;
/* It's required to wait for scalar stores before "writing back" data.
* It shouldn't cost anything anyways since we're about to do s_endpgm.
*/
if (ctx.lgkm_cnt && instr->opcode == aco_opcode::s_dcache_wb) {
assert(ctx.chip_class >= GFX8);
imm.lgkm = 0;
}
if (ctx.chip_class >= GFX10) {
/* GFX10: A store followed by a load at the same address causes a problem because
* the load doesn't load the correct values unless we wait for the store first.
* This is NOT mitigated by an s_nop.

View File

@@ -2657,6 +2657,7 @@ void load_lds(isel_context *ctx, unsigned elem_size_bytes, Temp dst,
unsigned total_bytes = num_components * elem_size_bytes;
std::array<Temp, NIR_MAX_VEC_COMPONENTS> result;
bool large_ds_read = ctx->options->chip_class >= GFX7;
bool usable_read2 = ctx->options->chip_class >= GFX7;
while (bytes_read < total_bytes) {
unsigned todo = total_bytes - bytes_read;
@@ -2668,7 +2669,7 @@ void load_lds(isel_context *ctx, unsigned elem_size_bytes, Temp dst,
if (todo >= 16 && aligned16 && large_ds_read) {
op = aco_opcode::ds_read_b128;
todo = 16;
} else if (todo >= 16 && aligned8) {
} else if (todo >= 16 && aligned8 && usable_read2) {
op = aco_opcode::ds_read2_b64;
read2 = true;
todo = 16;
@@ -2678,7 +2679,7 @@ void load_lds(isel_context *ctx, unsigned elem_size_bytes, Temp dst,
} else if (todo >= 8 && aligned8) {
op = aco_opcode::ds_read_b64;
todo = 8;
} else if (todo >= 8) {
} else if (todo >= 8 && usable_read2) {
op = aco_opcode::ds_read2_b32;
read2 = true;
todo = 8;
@@ -2783,6 +2784,7 @@ void ds_write_helper(isel_context *ctx, Operand m, Temp address, Temp data, unsi
Builder bld(ctx->program, ctx->block);
unsigned bytes_written = 0;
bool large_ds_write = ctx->options->chip_class >= GFX7;
bool usable_write2 = ctx->options->chip_class >= GFX7;
while (bytes_written < total_size * 4) {
unsigned todo = total_size * 4 - bytes_written;
@@ -2795,7 +2797,7 @@ void ds_write_helper(isel_context *ctx, Operand m, Temp address, Temp data, unsi
if (todo >= 16 && aligned16 && large_ds_write) {
op = aco_opcode::ds_write_b128;
size = 4;
} else if (todo >= 16 && aligned8) {
} else if (todo >= 16 && aligned8 && usable_write2) {
op = aco_opcode::ds_write2_b64;
write2 = true;
size = 4;
@@ -2805,7 +2807,7 @@ void ds_write_helper(isel_context *ctx, Operand m, Temp address, Temp data, unsi
} else if (todo >= 8 && aligned8) {
op = aco_opcode::ds_write_b64;
size = 2;
} else if (todo >= 8) {
} else if (todo >= 8 && usable_write2) {
op = aco_opcode::ds_write2_b32;
write2 = true;
size = 2;

View File

@@ -1844,6 +1844,9 @@ bool combine_salu_n2(opt_ctx& ctx, aco_ptr<Instruction>& instr)
if (instr->definitions[1].isTemp() && ctx.uses[instr->definitions[1].tempId()])
return false;
if (instr->definitions[0].isTemp() && ctx.info[instr->definitions[0].tempId()].is_uniform_bool())
return false;
for (unsigned i = 0; i < 2; i++) {
Instruction *op2_instr = follow_operand(ctx, instr->operands[i]);
if (!op2_instr || (op2_instr->opcode != aco_opcode::s_not_b32 && op2_instr->opcode != aco_opcode::s_not_b64))
@@ -2601,7 +2604,7 @@ void select_instruction(opt_ctx &ctx, aco_ptr<Instruction>& instr)
continue;
/* if one of the operands is sgpr, we cannot add a literal somewhere else on pre-GFX10 or operands other than the 1st */
if (instr->operands[i].getTemp().type() == RegType::sgpr && (i > 0 || ctx.program->chip_class < GFX10)) {
if (ctx.info[instr->operands[i].tempId()].is_literal()) {
if (!sgpr_used && ctx.info[instr->operands[i].tempId()].is_literal()) {
literal_uses = ctx.uses[instr->operands[i].tempId()];
literal_idx = i;
} else {

View File

@@ -114,6 +114,11 @@ void setup_reduce_temp(Program* program)
}
}
if (op == gfx10_wave64_bpermute) {
instr->operands[1] = Operand(reduceTmp);
continue;
}
/* same as before, except for the vector temporary instead of the reduce temporary */
unsigned cluster_size = static_cast<Pseudo_reduction_instruction *>(instr)->cluster_size;
bool need_vtmp = op == imul32 || op == fadd64 || op == fmul64 ||
@@ -121,7 +126,7 @@ void setup_reduce_temp(Program* program)
op == umax64 || op == imin64 || op == imax64 ||
op == imul64;
if (program->chip_class >= GFX10 && cluster_size == 64 && op != gfx10_wave64_bpermute)
if (program->chip_class >= GFX10 && cluster_size == 64)
need_vtmp = true;
if (program->chip_class >= GFX10 && op == iadd64)
need_vtmp = true;

View File

@@ -1279,7 +1279,7 @@ for bit_size in [8, 16, 32, 64]:
('bcsel', ('ilt', a, ('isub', a, b)), intmin, ('isub', a, b))), 'options->lower_add_sat'),
]
invert = OrderedDict([('feq', 'fne'), ('fne', 'feq'), ('fge', 'flt'), ('flt', 'fge')])
invert = OrderedDict([('feq', 'fne'), ('fne', 'feq')])
for left, right in itertools.combinations_with_replacement(invert.keys(), 2):
optimizations.append((('inot', ('ior(is_used_once)', (left, a, b), (right, c, d))),

View File

@@ -305,6 +305,12 @@ lower_immed(struct ir3_cp_ctx *ctx, struct ir3_register *reg, unsigned new_flags
reg = ir3_reg_clone(ctx->shader, reg);
/* Half constant registers seems to handle only 32-bit values
* within floating-point opcodes. So convert back to 32-bit values.
*/
if (f_opcode && (new_flags & IR3_REG_HALF))
reg->uim_val = fui(_mesa_half_to_float(reg->uim_val));
/* in some cases, there are restrictions on (abs)/(neg) plus const..
* so just evaluate those and clear the flags:
*/
@@ -350,12 +356,6 @@ lower_immed(struct ir3_cp_ctx *ctx, struct ir3_register *reg, unsigned new_flags
swiz = i % 4;
idx = i / 4;
/* Half constant registers seems to handle only 32-bit values
* within floating-point opcodes. So convert back to 32-bit values. */
if (f_opcode && (new_flags & IR3_REG_HALF)) {
reg->uim_val = fui(_mesa_half_to_float(reg->uim_val));
}
const_state->immediates[idx].val[swiz] = reg->uim_val;
const_state->immediates_count = idx + 1;
const_state->immediate_idx++;

View File

@@ -685,7 +685,7 @@ static void si_bind_blend_state(struct pipe_context *ctx, void *state)
if (old_blend->cb_target_mask != blend->cb_target_mask ||
old_blend->dual_src_blend != blend->dual_src_blend ||
(old_blend->blend_enable_4bit != blend->blend_enable_4bit &&
(old_blend->dcc_msaa_corruption_4bit != blend->dcc_msaa_corruption_4bit &&
sctx->framebuffer.nr_samples >= 2 &&
sctx->screen->dcc_msaa_allowed))
si_mark_atom_dirty(sctx, &sctx->atoms.s.cb_render_state);
@@ -2242,13 +2242,6 @@ static bool si_is_format_supported(struct pipe_screen *screen,
return false;
}
if (util_format_get_num_planes(format) >= 2) {
return util_format_planar_is_supported(screen, format, target,
sample_count,
storage_sample_count,
usage);
}
if (MAX2(1, sample_count) < MAX2(1, storage_sample_count))
return false;

View File

@@ -207,6 +207,8 @@ static const struct vgpu10_format_entry format_conversion_table[] =
[ PIPE_FORMAT_L32_SINT ] = { SVGA3D_FORMAT_INVALID, SVGA3D_FORMAT_INVALID, SVGA3D_R32_SINT, TF_XXX1 },
[ PIPE_FORMAT_L32A32_SINT ] = { SVGA3D_FORMAT_INVALID, SVGA3D_FORMAT_INVALID, SVGA3D_R32G32_SINT, TF_XXXY },
[ PIPE_FORMAT_R10G10B10A2_UINT ] = { SVGA3D_R10G10B10A2_UINT, SVGA3D_R10G10B10A2_UINT, SVGA3D_R10G10B10A2_UINT, 0 },
/* Must specify following entry to give the sense of size of format_conversion_table[] */
[ PIPE_FORMAT_COUNT ] = {SVGA3D_FORMAT_INVALID, SVGA3D_FORMAT_INVALID, SVGA3D_FORMAT_INVALID, 0 },
};

View File

@@ -133,25 +133,26 @@ svga_transfer_dma(struct svga_context *svga,
}
}
else {
int y, h, srcy;
int y, h, y_max;
unsigned blockheight =
util_format_get_blockheight(st->base.resource->format);
h = st->hw_nblocksy * blockheight;
srcy = 0;
y_max = st->box.y + st->box.h;
for (y = 0; y < st->box.h; y += h) {
for (y = st->box.y; y < y_max; y += h) {
unsigned offset, length;
void *hw, *sw;
if (y + h > st->box.h)
h = st->box.h - y;
if (y + h > y_max)
h = y_max - y;
/* Transfer band must be aligned to pixel block boundaries */
assert(y % blockheight == 0);
assert(h % blockheight == 0);
offset = y * st->base.stride / blockheight;
/* First band starts at the top of the SW buffer. */
offset = (y - st->box.y) * st->base.stride / blockheight;
length = h * st->base.stride / blockheight;
sw = (uint8_t *) st->swbuf + offset;
@@ -159,9 +160,9 @@ svga_transfer_dma(struct svga_context *svga,
if (transfer == SVGA3D_WRITE_HOST_VRAM) {
unsigned usage = PIPE_TRANSFER_WRITE;
/* Wait for the previous DMAs to complete */
/* TODO: keep one DMA (at half the size) in the background */
if (y) {
/* Don't write to an in-flight DMA buffer. Synchronize or
* discard in-flight storage. */
if (y != st->box.y) {
svga_context_flush(svga, NULL);
usage |= PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE;
}
@@ -177,7 +178,7 @@ svga_transfer_dma(struct svga_context *svga,
svga_transfer_dma_band(svga, st, transfer,
st->box.x, y, st->box.z,
st->box.w, h, st->box.d,
0, srcy, 0, flags);
0, 0, 0, flags);
/*
* Prevent the texture contents to be discarded on the next band

View File

@@ -131,7 +131,7 @@ emulate_point_sprite(struct svga_context *svga,
tgsi_dump(new_tokens, 0);
}
templ.tokens = new_tokens;
pipe_shader_state_from_tgsi(&templ, new_tokens);
templ.stream_output.num_outputs = 0;
if (streamout) {

View File

@@ -43,7 +43,7 @@
//========================================================
void KnobBase::autoExpandEnvironmentVariables(std::string& text)
{
#if (__GNUC__) && (GCC_VERSION < 409000)
#if (__GNUC__) && (GCC_VERSION < 40900)
// <regex> isn't implemented prior to gcc-4.9.0
// unix style variable replacement
size_t start;

View File

@@ -182,7 +182,7 @@ static INLINE void _mm256_storeu2_m128i(__m128i* hi, __m128i* lo, __m256i a)
}
// gcc prior to 4.9 doesn't have _mm*_undefined_*
#if (__GNUC__) && (GCC_VERSION < 409000)
#if (__GNUC__) && (GCC_VERSION < 40900)
#define _mm_undefined_si128 _mm_setzero_si128
#define _mm256_undefined_ps _mm256_setzero_ps
#endif

View File

@@ -1395,15 +1395,11 @@ dri2_query_dma_buf_modifiers(__DRIscreen *_screen, int fourcc, int max,
(pscreen->is_format_supported(pscreen, format, screen->target, 0, 0,
PIPE_BIND_RENDER_TARGET) ||
pscreen->is_format_supported(pscreen, format, screen->target, 0, 0,
PIPE_BIND_SAMPLER_VIEW))) {
PIPE_BIND_SAMPLER_VIEW) ||
dri2_yuv_dma_buf_supported(screen, map))) {
pscreen->query_dmabuf_modifiers(pscreen, format, max, modifiers,
external_only, count);
return true;
} else if (dri2_yuv_dma_buf_supported(screen, map)) {
*count = 1;
if (modifiers)
modifiers[0] = DRM_FORMAT_MOD_NONE;
return true;
}
return false;
}

View File

@@ -1030,7 +1030,8 @@ try_immediate_source(const nir_alu_instr *instr, src_reg *op,
} else {
uint8_t vf_values[4] = { 0, 0, 0, 0 };
for (unsigned i = 0; i < NIR_MAX_VEC_COMPONENTS; i++) {
for (unsigned i = 0; i < ARRAY_SIZE(vf_values); i++) {
if (op[idx].abs)
f[i] = fabs(f[i]);

View File

@@ -1447,7 +1447,7 @@ gen_get_device_info_from_fd(int fd, struct gen_device_info *devinfo)
return false;
if (!getparam(fd, I915_PARAM_REVISION, &devinfo->revision))
return false;
devinfo->revision = 0;
if (!query_topology(devinfo, fd)) {
if (devinfo->gen >= 10) {

View File

@@ -2833,6 +2833,10 @@ cmd_buffer_emit_push_constant(struct anv_cmd_buffer *cmd_buffer,
const struct anv_pipeline_bind_map *bind_map =
&pipeline->shaders[stage]->bind_map;
#if GEN_GEN >= 12
c.MOCS = cmd_buffer->device->isl_dev.mocs.internal;
#endif
#if GEN_GEN >= 8 || GEN_IS_HASWELL
/* The Skylake PRM contains the following restriction:
*
@@ -2893,6 +2897,7 @@ cmd_buffer_emit_push_constant_all(struct anv_cmd_buffer *cmd_buffer,
if (count == 0) {
anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_CONSTANT_ALL), c) {
c.ShaderUpdateEnable = shader_mask;
c.MOCS = cmd_buffer->device->isl_dev.mocs.internal;
}
return;
}
@@ -2923,7 +2928,8 @@ cmd_buffer_emit_push_constant_all(struct anv_cmd_buffer *cmd_buffer,
dw = anv_batch_emitn(&cmd_buffer->batch, num_dwords,
GENX(3DSTATE_CONSTANT_ALL),
.ShaderUpdateEnable = shader_mask,
.PointerBufferMask = buffers);
.PointerBufferMask = buffers,
.MOCS = cmd_buffer->device->isl_dev.mocs.internal);
for (int i = 0; i < count; i++) {
const struct anv_push_range *range = &bind_map->push_ranges[i];

View File

@@ -49,7 +49,7 @@ enum {
PAN_SYSVAL_SSBO = 4,
PAN_SYSVAL_NUM_WORK_GROUPS = 5,
PAN_SYSVAL_SAMPLER = 7,
} pan_sysval;
};
#define PAN_TXS_SYSVAL_ID(texidx, dim, is_array) \
((texidx) | ((dim) << 7) | ((is_array) ? (1 << 9) : 0))
@@ -65,7 +65,7 @@ enum {
PAN_VERTEX_ID = 16,
PAN_INSTANCE_ID = 17,
PAN_MAX_ATTRIBUTE
} pan_special_attributes;
};
typedef struct {
int work_register_count;

View File

@@ -379,27 +379,16 @@ def parse(filename):
channel.shift = le_shift
le_shift += channel.size
be_shift = 0
for channel in be_channels[3::-1]:
channel.shift = be_shift
be_shift += channel.size
assert le_shift == be_shift
for i in range(4):
assert (le_swizzles[i] != SWIZZLE_NONE) == (be_swizzles[i] != SWIZZLE_NONE)
format = Format(name, layout, block_width, block_height, block_depth, le_channels, le_swizzles, be_channels, be_swizzles, colorspace)
if format.is_array() and not format.is_bitmask():
# Formats accessed as arrays by the pack functions (R32G32_FLOAT or
# R8G8B8_UNORM, for example) should not be channel-ordering-reversed
# for BE.
# Note that __eq__ on channels ignores .shift!
assert(format.be_channels == format.le_channels)
assert(format.be_swizzles == format.le_swizzles)
format.be_channels = format.le_channels
else:
be_shift = 0
for channel in format.be_channels[3::-1]:
channel.shift = be_shift
be_shift += channel.size
assert le_shift == be_shift
formats.append(format)
return formats

View File

@@ -1319,43 +1319,56 @@ static struct overlay_draw *render_swapchain_display(struct swapchain_data *data
device_data->vtable.CmdEndRenderPass(draw->command_buffer);
/* Bounce the image to display back to present layout. */
imb.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
imb.pNext = nullptr;
imb.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
imb.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
imb.oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
imb.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
imb.image = data->images[image_index];
imb.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imb.subresourceRange.baseMipLevel = 0;
imb.subresourceRange.levelCount = 1;
imb.subresourceRange.baseArrayLayer = 0;
imb.subresourceRange.layerCount = 1;
imb.srcQueueFamilyIndex = device_data->graphic_queue->family_index;
imb.dstQueueFamilyIndex = present_queue->family_index;
device_data->vtable.CmdPipelineBarrier(draw->command_buffer,
VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT,
VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT,
0, /* dependency flags */
0, nullptr, /* memory barriers */
0, nullptr, /* buffer memory barriers */
1, &imb); /* image memory barriers */
if (device_data->graphic_queue->family_index != present_queue->family_index)
{
/* Transfer the image back to the present queue family
* image layout was already changed to present by the render pass
*/
imb.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
imb.pNext = nullptr;
imb.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
imb.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
imb.oldLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
imb.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
imb.image = data->images[image_index];
imb.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
imb.subresourceRange.baseMipLevel = 0;
imb.subresourceRange.levelCount = 1;
imb.subresourceRange.baseArrayLayer = 0;
imb.subresourceRange.layerCount = 1;
imb.srcQueueFamilyIndex = device_data->graphic_queue->family_index;
imb.dstQueueFamilyIndex = present_queue->family_index;
device_data->vtable.CmdPipelineBarrier(draw->command_buffer,
VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT,
VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT,
0, /* dependency flags */
0, nullptr, /* memory barriers */
0, nullptr, /* buffer memory barriers */
1, &imb); /* image memory barriers */
}
device_data->vtable.EndCommandBuffer(draw->command_buffer);
VkPipelineStageFlags *stages_wait = (VkPipelineStageFlags*) malloc(sizeof(VkPipelineStageFlags) * n_wait_semaphores);
for (unsigned i = 0; i < n_wait_semaphores; i++)
{
// wait in the fragment stage until the swapchain image is ready
stages_wait[i] = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
}
VkSubmitInfo submit_info = {};
VkPipelineStageFlags stage_wait = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
submit_info.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submit_info.commandBufferCount = 1;
submit_info.pCommandBuffers = &draw->command_buffer;
submit_info.pWaitDstStageMask = &stage_wait;
submit_info.pWaitDstStageMask = stages_wait;
submit_info.waitSemaphoreCount = n_wait_semaphores;
submit_info.pWaitSemaphores = wait_semaphores;
submit_info.signalSemaphoreCount = 1;
submit_info.pSignalSemaphores = &draw->semaphore;
device_data->vtable.QueueSubmit(device_data->graphic_queue->queue, 1, &submit_info, draw->fence);
free(stages_wait);
return draw;
}
@@ -1890,15 +1903,18 @@ static VkResult overlay_QueuePresentKHR(
struct swapchain_data *swapchain_data =
FIND(struct swapchain_data, swapchain);
uint32_t image_index = pPresentInfo->pImageIndices[i];
before_present(swapchain_data,
queue_data,
pPresentInfo->pWaitSemaphores,
pPresentInfo->waitSemaphoreCount,
pPresentInfo->pImageIndices[i]);
image_index);
VkPresentInfoKHR present_info = *pPresentInfo;
present_info.swapchainCount = 1;
present_info.pSwapchains = &swapchain;
present_info.pImageIndices = &image_index;
uint64_t ts0 = os_time_get();
result = queue_data->device->vtable.QueuePresentKHR(queue, &present_info);
@@ -1910,11 +1926,13 @@ static VkResult overlay_QueuePresentKHR(
VkSwapchainKHR swapchain = pPresentInfo->pSwapchains[i];
struct swapchain_data *swapchain_data =
FIND(struct swapchain_data, swapchain);
uint32_t image_index = pPresentInfo->pImageIndices[i];
VkPresentInfoKHR present_info = *pPresentInfo;
present_info.swapchainCount = 1;
present_info.pSwapchains = &swapchain;
uint32_t image_index = pPresentInfo->pImageIndices[i];
present_info.pImageIndices = &image_index;
struct overlay_draw *draw = before_present(swapchain_data,
queue_data,