VERSION: bump for 20.0.0-rc3

nir: do not use De Morgan's Law rules for flt and fge
In presence of NaNs, "!(flt(a, b) && flt(c, d))" is NOT EQUAL to "fge(a, b) || fge(c, d)". These optimizations are unsafe for apps that rely on NaN behaviour. pipeline-db (GFX9/LLVM): Totals from affected shaders: SGPRS: 3176 -> 3136 (-1.26 %) VGPRS: 2188 -> 2144 (-2.01 %) Spilled SGPRs: 227 -> 169 (-25.55 %) Code Size: 150572 -> 151800 (0.82 %) bytes Max Waves: 307 -> 310 (0.98 %) pipeline-db (GFX9/ACO): Totals from affected shaders: SGPRS: 18744 -> 18744 (0.00 %) VGPRS: 15576 -> 15580 (0.03 %) Spilled SGPRs: 164 -> 164 (0.00 %) Code Size: 1573012 -> 1576492 (0.22 %) bytes Max Waves: 1534 -> 1532 (-0.13 %) Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2127 Fixes: d1ed4ffe0b ("nir: Use De Morgan's Law on logic compounded comparisons") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3696> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3696> (cherry picked from commit 8e77280774)
2020-02-13 10:18:44 -08:00 · 2020-02-11 09:49:15 -08:00 · 2020-02-11 09:49:15 -08:00 · 2020-02-11 09:49:14 -08:00 · 2020-02-11 09:49:14 -08:00 · 2020-02-11 09:49:13 -08:00
21 changed files with 1348 additions and 96 deletions
--- a/.pick_status.json
+++ b/.pick_status.json
--- a/2
+++ b/2
@@ -1 +1 @@
-20.0.0-rc2
+20.0.0-rc3
--- a/src/amd/compiler/aco_insert_waitcnt.cpp
+++ b/src/amd/compiler/aco_insert_waitcnt.cpp
@@ -374,13 +374,16 @@ wait_imm kill(Instruction* instr, wait_ctx& ctx)

   imm.combine(parse_wait_instr(ctx, instr));

-   if (ctx.chip_class >= GFX10) {
-      /* Seems to be required on GFX10 to achieve correct behaviour.
-       * It shouldn't cost anything anyways since we're about to do s_endpgm.
-       */
-      if (ctx.lgkm_cnt && instr->opcode == aco_opcode::s_dcache_wb)
-         imm.lgkm = 0;

+   /* It's required to wait for scalar stores before "writing back" data.
+    * It shouldn't cost anything anyways since we're about to do s_endpgm.
+    */
+   if (ctx.lgkm_cnt && instr->opcode == aco_opcode::s_dcache_wb) {
+      assert(ctx.chip_class >= GFX8);
+      imm.lgkm = 0;
+   }
+
+   if (ctx.chip_class >= GFX10) {
      /* GFX10: A store followed by a load at the same address causes a problem because
       * the load doesn't load the correct values unless we wait for the store first.
       * This is NOT mitigated by an s_nop.
--- a/src/amd/compiler/aco_instruction_selection.cpp
+++ b/src/amd/compiler/aco_instruction_selection.cpp
@@ -2657,6 +2657,7 @@ void load_lds(isel_context *ctx, unsigned elem_size_bytes, Temp dst,
   unsigned total_bytes = num_components * elem_size_bytes;
   std::array<Temp, NIR_MAX_VEC_COMPONENTS> result;
   bool large_ds_read = ctx->options->chip_class >= GFX7;
+   bool usable_read2 = ctx->options->chip_class >= GFX7;

   while (bytes_read < total_bytes) {
      unsigned todo = total_bytes - bytes_read;
@@ -2668,7 +2669,7 @@ void load_lds(isel_context *ctx, unsigned elem_size_bytes, Temp dst,
      if (todo >= 16 && aligned16 && large_ds_read) {
         op = aco_opcode::ds_read_b128;
         todo = 16;
-      } else if (todo >= 16 && aligned8) {
+      } else if (todo >= 16 && aligned8 && usable_read2) {
         op = aco_opcode::ds_read2_b64;
         read2 = true;
         todo = 16;
@@ -2678,7 +2679,7 @@ void load_lds(isel_context *ctx, unsigned elem_size_bytes, Temp dst,
      } else if (todo >= 8 && aligned8) {
         op = aco_opcode::ds_read_b64;
         todo = 8;
-      } else if (todo >= 8) {
+      } else if (todo >= 8 && usable_read2) {
         op = aco_opcode::ds_read2_b32;
         read2 = true;
         todo = 8;
@@ -2783,6 +2784,7 @@ void ds_write_helper(isel_context *ctx, Operand m, Temp address, Temp data, unsi
   Builder bld(ctx->program, ctx->block);
   unsigned bytes_written = 0;
   bool large_ds_write = ctx->options->chip_class >= GFX7;
+   bool usable_write2 = ctx->options->chip_class >= GFX7;

   while (bytes_written < total_size * 4) {
      unsigned todo = total_size * 4 - bytes_written;
@@ -2795,7 +2797,7 @@ void ds_write_helper(isel_context *ctx, Operand m, Temp address, Temp data, unsi
      if (todo >= 16 && aligned16 && large_ds_write) {
         op = aco_opcode::ds_write_b128;
         size = 4;
-      } else if (todo >= 16 && aligned8) {
+      } else if (todo >= 16 && aligned8 && usable_write2) {
         op = aco_opcode::ds_write2_b64;
         write2 = true;
         size = 4;
@@ -2805,7 +2807,7 @@ void ds_write_helper(isel_context *ctx, Operand m, Temp address, Temp data, unsi
      } else if (todo >= 8 && aligned8) {
         op = aco_opcode::ds_write_b64;
         size = 2;
-      } else if (todo >= 8) {
+      } else if (todo >= 8 && usable_write2) {
         op = aco_opcode::ds_write2_b32;
         write2 = true;
         size = 2;
--- a/src/amd/compiler/aco_optimizer.cpp
+++ b/src/amd/compiler/aco_optimizer.cpp
@@ -1844,6 +1844,9 @@ bool combine_salu_n2(opt_ctx& ctx, aco_ptr<Instruction>& instr)
   if (instr->definitions[1].isTemp() && ctx.uses[instr->definitions[1].tempId()])
      return false;

+   if (instr->definitions[0].isTemp() && ctx.info[instr->definitions[0].tempId()].is_uniform_bool())
+      return false;
+
   for (unsigned i = 0; i < 2; i++) {
      Instruction *op2_instr = follow_operand(ctx, instr->operands[i]);
      if (!op2_instr || (op2_instr->opcode != aco_opcode::s_not_b32 && op2_instr->opcode != aco_opcode::s_not_b64))
@@ -2601,7 +2604,7 @@ void select_instruction(opt_ctx &ctx, aco_ptr<Instruction>& instr)
               continue;
            /* if one of the operands is sgpr, we cannot add a literal somewhere else on pre-GFX10 or operands other than the 1st */
            if (instr->operands[i].getTemp().type() == RegType::sgpr && (i > 0 || ctx.program->chip_class < GFX10)) {
-               if (ctx.info[instr->operands[i].tempId()].is_literal()) {
+               if (!sgpr_used && ctx.info[instr->operands[i].tempId()].is_literal()) {
                  literal_uses = ctx.uses[instr->operands[i].tempId()];
                  literal_idx = i;
               } else {
--- a/src/amd/compiler/aco_reduce_assign.cpp
+++ b/src/amd/compiler/aco_reduce_assign.cpp
@@ -114,6 +114,11 @@ void setup_reduce_temp(Program* program)
            }
         }

+         if (op == gfx10_wave64_bpermute) {
+            instr->operands[1] = Operand(reduceTmp);
+            continue;
+         }
+
         /* same as before, except for the vector temporary instead of the reduce temporary */
         unsigned cluster_size = static_cast<Pseudo_reduction_instruction *>(instr)->cluster_size;
         bool need_vtmp = op == imul32 || op == fadd64 || op == fmul64 ||
@@ -121,7 +126,7 @@ void setup_reduce_temp(Program* program)
                          op == umax64 || op == imin64 || op == imax64 ||
                          op == imul64;

-         if (program->chip_class >= GFX10 && cluster_size == 64 && op != gfx10_wave64_bpermute)
+         if (program->chip_class >= GFX10 && cluster_size == 64)
            need_vtmp = true;
         if (program->chip_class >= GFX10 && op == iadd64)
            need_vtmp = true;
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -1279,7 +1279,7 @@ for bit_size in [8, 16, 32, 64]:
                                ('bcsel', ('ilt', a, ('isub', a, b)), intmin, ('isub', a, b))), 'options->lower_add_sat'),
   ]

-invert = OrderedDict([('feq', 'fne'), ('fne', 'feq'), ('fge', 'flt'), ('flt', 'fge')])
+invert = OrderedDict([('feq', 'fne'), ('fne', 'feq')])

 for left, right in itertools.combinations_with_replacement(invert.keys(), 2):
   optimizations.append((('inot', ('ior(is_used_once)', (left, a, b), (right, c, d))),
--- a/src/freedreno/ir3/ir3_cp.c
+++ b/src/freedreno/ir3/ir3_cp.c
@@ -305,6 +305,12 @@ lower_immed(struct ir3_cp_ctx *ctx, struct ir3_register *reg, unsigned new_flags

 	reg = ir3_reg_clone(ctx->shader, reg);

+	/* Half constant registers seems to handle only 32-bit values
+	 * within floating-point opcodes. So convert back to 32-bit values.
+	 */
+	if (f_opcode && (new_flags & IR3_REG_HALF))
+		reg->uim_val = fui(_mesa_half_to_float(reg->uim_val));
+
 	/* in some cases, there are restrictions on (abs)/(neg) plus const..
 	 * so just evaluate those and clear the flags:
 	 */
@@ -350,12 +356,6 @@ lower_immed(struct ir3_cp_ctx *ctx, struct ir3_register *reg, unsigned new_flags
 		swiz = i % 4;
 		idx  = i / 4;

-		/* Half constant registers seems to handle only 32-bit values
-		 * within floating-point opcodes. So convert back to 32-bit values. */
-		if (f_opcode && (new_flags & IR3_REG_HALF)) {
-			reg->uim_val = fui(_mesa_half_to_float(reg->uim_val));
-		}
-
 		const_state->immediates[idx].val[swiz] = reg->uim_val;
 		const_state->immediates_count = idx + 1;
 		const_state->immediate_idx++;
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -685,7 +685,7 @@ static void si_bind_blend_state(struct pipe_context *ctx, void *state)

 	if (old_blend->cb_target_mask != blend->cb_target_mask ||
 	    old_blend->dual_src_blend != blend->dual_src_blend ||
-	    (old_blend->blend_enable_4bit != blend->blend_enable_4bit &&
+	    (old_blend->dcc_msaa_corruption_4bit != blend->dcc_msaa_corruption_4bit &&
 	     sctx->framebuffer.nr_samples >= 2 &&
 	     sctx->screen->dcc_msaa_allowed))
 		si_mark_atom_dirty(sctx, &sctx->atoms.s.cb_render_state);
@@ -2242,13 +2242,6 @@ static bool si_is_format_supported(struct pipe_screen *screen,
 		return false;
 	}

-	if (util_format_get_num_planes(format) >= 2) {
-		return util_format_planar_is_supported(screen, format, target,
-						       sample_count,
-						       storage_sample_count,
-						       usage);
-	}
-
 	if (MAX2(1, sample_count) < MAX2(1, storage_sample_count))
 		return false;

--- a/src/gallium/drivers/svga/svga_format.c
+++ b/src/gallium/drivers/svga/svga_format.c
@@ -207,6 +207,8 @@ static const struct vgpu10_format_entry format_conversion_table[] =
   [ PIPE_FORMAT_L32_SINT ] =              { SVGA3D_FORMAT_INVALID,      SVGA3D_FORMAT_INVALID,       SVGA3D_R32_SINT,             TF_XXX1 },
   [ PIPE_FORMAT_L32A32_SINT ] =           { SVGA3D_FORMAT_INVALID,      SVGA3D_FORMAT_INVALID,       SVGA3D_R32G32_SINT,          TF_XXXY },
   [ PIPE_FORMAT_R10G10B10A2_UINT ] =      { SVGA3D_R10G10B10A2_UINT,    SVGA3D_R10G10B10A2_UINT,     SVGA3D_R10G10B10A2_UINT,     0 },
+   /* Must specify following entry to give the sense of size of format_conversion_table[] */
+   [ PIPE_FORMAT_COUNT ] = {SVGA3D_FORMAT_INVALID, SVGA3D_FORMAT_INVALID,    SVGA3D_FORMAT_INVALID,       0 },
 };


--- a/src/gallium/drivers/svga/svga_resource_texture.c
+++ b/src/gallium/drivers/svga/svga_resource_texture.c
@@ -133,25 +133,26 @@ svga_transfer_dma(struct svga_context *svga,
      }
   }
   else {
-      int y, h, srcy;
+      int y, h, y_max;
      unsigned blockheight =
         util_format_get_blockheight(st->base.resource->format);

      h = st->hw_nblocksy * blockheight;
-      srcy = 0;
+      y_max = st->box.y + st->box.h;

-      for (y = 0; y < st->box.h; y += h) {
+      for (y = st->box.y; y < y_max; y += h) {
         unsigned offset, length;
         void *hw, *sw;

-         if (y + h > st->box.h)
-            h = st->box.h - y;
+         if (y + h > y_max)
+            h = y_max - y;

         /* Transfer band must be aligned to pixel block boundaries */
         assert(y % blockheight == 0);
         assert(h % blockheight == 0);

-         offset = y * st->base.stride / blockheight;
+         /* First band starts at the top of the SW buffer. */
+         offset = (y - st->box.y) * st->base.stride / blockheight;
         length = h * st->base.stride / blockheight;

         sw = (uint8_t *) st->swbuf + offset;
@@ -159,9 +160,9 @@ svga_transfer_dma(struct svga_context *svga,
         if (transfer == SVGA3D_WRITE_HOST_VRAM) {
            unsigned usage = PIPE_TRANSFER_WRITE;

-            /* Wait for the previous DMAs to complete */
-            /* TODO: keep one DMA (at half the size) in the background */
-            if (y) {
+            /* Don't write to an in-flight DMA buffer. Synchronize or
+             * discard in-flight storage. */
+            if (y != st->box.y) {
               svga_context_flush(svga, NULL);
               usage |= PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE;
            }
@@ -177,7 +178,7 @@ svga_transfer_dma(struct svga_context *svga,
         svga_transfer_dma_band(svga, st, transfer,
                                st->box.x, y, st->box.z,
                                st->box.w, h, st->box.d,
-                                0, srcy, 0, flags);
+                                0, 0, 0, flags);

         /*
          * Prevent the texture contents to be discarded on the next band
--- a/src/gallium/drivers/svga/svga_state_tgsi_transform.c
+++ b/src/gallium/drivers/svga/svga_state_tgsi_transform.c
@@ -131,7 +131,7 @@ emulate_point_sprite(struct svga_context *svga,
         tgsi_dump(new_tokens, 0);
      }

-      templ.tokens = new_tokens;
+      pipe_shader_state_from_tgsi(&templ, new_tokens);
      templ.stream_output.num_outputs = 0;

      if (streamout) {
--- a/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_knobs.cpp
+++ b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_knobs.cpp
@@ -43,7 +43,7 @@
 //========================================================
 void KnobBase::autoExpandEnvironmentVariables(std::string& text)
 {
-#if (__GNUC__) && (GCC_VERSION < 409000)
+#if (__GNUC__) && (GCC_VERSION < 40900)
    // <regex> isn't implemented prior to gcc-4.9.0
    // unix style variable replacement
    size_t start;
--- a/src/gallium/drivers/swr/rasterizer/common/os.h
+++ b/src/gallium/drivers/swr/rasterizer/common/os.h
@@ -182,7 +182,7 @@ static INLINE void _mm256_storeu2_m128i(__m128i* hi, __m128i* lo, __m256i a)
 }

 // gcc prior to 4.9 doesn't have _mm*_undefined_*
-#if (__GNUC__) && (GCC_VERSION < 409000)
+#if (__GNUC__) && (GCC_VERSION < 40900)
 #define _mm_undefined_si128 _mm_setzero_si128
 #define _mm256_undefined_ps _mm256_setzero_ps
 #endif
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -1395,15 +1395,11 @@ dri2_query_dma_buf_modifiers(__DRIscreen *_screen, int fourcc, int max,
       (pscreen->is_format_supported(pscreen, format, screen->target, 0, 0,
                                     PIPE_BIND_RENDER_TARGET) ||
        pscreen->is_format_supported(pscreen, format, screen->target, 0, 0,
-                                     PIPE_BIND_SAMPLER_VIEW))) {
+                                     PIPE_BIND_SAMPLER_VIEW) ||
+        dri2_yuv_dma_buf_supported(screen, map))) {
      pscreen->query_dmabuf_modifiers(pscreen, format, max, modifiers,
                                      external_only, count);
      return true;
-   } else if (dri2_yuv_dma_buf_supported(screen, map)) {
-      *count = 1;
-      if (modifiers)
-         modifiers[0] = DRM_FORMAT_MOD_NONE;
-      return true;
   }
   return false;
 }
--- a/src/intel/compiler/brw_vec4_nir.cpp
+++ b/src/intel/compiler/brw_vec4_nir.cpp
@@ -1030,7 +1030,8 @@ try_immediate_source(const nir_alu_instr *instr, src_reg *op,
      } else {
         uint8_t vf_values[4] = { 0, 0, 0, 0 };

-         for (unsigned i = 0; i < NIR_MAX_VEC_COMPONENTS; i++) {
+         for (unsigned i = 0; i < ARRAY_SIZE(vf_values); i++) {
+
            if (op[idx].abs)
               f[i] = fabs(f[i]);

--- a/src/intel/dev/gen_device_info.c
+++ b/src/intel/dev/gen_device_info.c
@@ -1447,7 +1447,7 @@ gen_get_device_info_from_fd(int fd, struct gen_device_info *devinfo)
      return false;

   if (!getparam(fd, I915_PARAM_REVISION, &devinfo->revision))
-       return false;
+      devinfo->revision = 0;

   if (!query_topology(devinfo, fd)) {
      if (devinfo->gen >= 10) {
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -2833,6 +2833,10 @@ cmd_buffer_emit_push_constant(struct anv_cmd_buffer *cmd_buffer,
         const struct anv_pipeline_bind_map *bind_map =
            &pipeline->shaders[stage]->bind_map;

+#if GEN_GEN >= 12
+         c.MOCS = cmd_buffer->device->isl_dev.mocs.internal;
+#endif
+
 #if GEN_GEN >= 8 || GEN_IS_HASWELL
         /* The Skylake PRM contains the following restriction:
          *
@@ -2893,6 +2897,7 @@ cmd_buffer_emit_push_constant_all(struct anv_cmd_buffer *cmd_buffer,
   if (count == 0) {
      anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_CONSTANT_ALL), c) {
         c.ShaderUpdateEnable = shader_mask;
+         c.MOCS = cmd_buffer->device->isl_dev.mocs.internal;
      }
      return;
   }
@@ -2923,7 +2928,8 @@ cmd_buffer_emit_push_constant_all(struct anv_cmd_buffer *cmd_buffer,
   dw = anv_batch_emitn(&cmd_buffer->batch, num_dwords,
                        GENX(3DSTATE_CONSTANT_ALL),
                        .ShaderUpdateEnable = shader_mask,
-                        .PointerBufferMask = buffers);
+                        .PointerBufferMask = buffers,
+                        .MOCS = cmd_buffer->device->isl_dev.mocs.internal);

   for (int i = 0; i < count; i++) {
      const struct anv_push_range *range = &bind_map->push_ranges[i];
--- a/src/panfrost/midgard/midgard_compile.h
+++ b/src/panfrost/midgard/midgard_compile.h
@@ -49,7 +49,7 @@ enum {
        PAN_SYSVAL_SSBO = 4,
        PAN_SYSVAL_NUM_WORK_GROUPS = 5,
        PAN_SYSVAL_SAMPLER = 7,
-} pan_sysval;
+};

 #define PAN_TXS_SYSVAL_ID(texidx, dim, is_array)          \
 	((texidx) | ((dim) << 7) | ((is_array) ? (1 << 9) : 0))
@@ -65,7 +65,7 @@ enum {
        PAN_VERTEX_ID   = 16,
        PAN_INSTANCE_ID = 17,
        PAN_MAX_ATTRIBUTE
-} pan_special_attributes;
+};

 typedef struct {
        int work_register_count;
--- a/src/util/format/u_format_parse.py
+++ b/src/util/format/u_format_parse.py
@@ -379,27 +379,16 @@ def parse(filename):
            channel.shift = le_shift
            le_shift += channel.size

+        be_shift = 0
+        for channel in be_channels[3::-1]:
+            channel.shift = be_shift
+            be_shift += channel.size
+
+        assert le_shift == be_shift
        for i in range(4):
            assert (le_swizzles[i] != SWIZZLE_NONE) == (be_swizzles[i] != SWIZZLE_NONE)

        format = Format(name, layout, block_width, block_height, block_depth, le_channels, le_swizzles, be_channels, be_swizzles, colorspace)
-
-        if format.is_array() and not format.is_bitmask():
-            # Formats accessed as arrays by the pack functions (R32G32_FLOAT or
-            # R8G8B8_UNORM, for example) should not be channel-ordering-reversed
-            # for BE.
-            # Note that __eq__ on channels ignores .shift!
-            assert(format.be_channels == format.le_channels)
-            assert(format.be_swizzles == format.le_swizzles)
-            format.be_channels = format.le_channels
-        else:
-            be_shift = 0
-            for channel in format.be_channels[3::-1]:
-                channel.shift = be_shift
-                be_shift += channel.size
-
-            assert le_shift == be_shift
-
        formats.append(format)
    return formats

--- a/src/vulkan/overlay-layer/overlay.cpp
+++ b/src/vulkan/overlay-layer/overlay.cpp
@@ -1319,43 +1319,56 @@ static struct overlay_draw *render_swapchain_display(struct swapchain_data *data

   device_data->vtable.CmdEndRenderPass(draw->command_buffer);

-   /* Bounce the image to display back to present layout. */
-   imb.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
-   imb.pNext = nullptr;
-   imb.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
-   imb.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
-   imb.oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
-   imb.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
-   imb.image = data->images[image_index];
-   imb.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
-   imb.subresourceRange.baseMipLevel = 0;
-   imb.subresourceRange.levelCount = 1;
-   imb.subresourceRange.baseArrayLayer = 0;
-   imb.subresourceRange.layerCount = 1;
-   imb.srcQueueFamilyIndex = device_data->graphic_queue->family_index;
-   imb.dstQueueFamilyIndex = present_queue->family_index;
-   device_data->vtable.CmdPipelineBarrier(draw->command_buffer,
-                                          VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT,
-                                          VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT,
-                                          0,          /* dependency flags */
-                                          0, nullptr, /* memory barriers */
-                                          0, nullptr, /* buffer memory barriers */
-                                          1, &imb);   /* image memory barriers */
+   if (device_data->graphic_queue->family_index != present_queue->family_index)
+   {
+      /* Transfer the image back to the present queue family
+       * image layout was already changed to present by the render pass 
+       */
+      imb.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
+      imb.pNext = nullptr;
+      imb.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
+      imb.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
+      imb.oldLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
+      imb.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
+      imb.image = data->images[image_index];
+      imb.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+      imb.subresourceRange.baseMipLevel = 0;
+      imb.subresourceRange.levelCount = 1;
+      imb.subresourceRange.baseArrayLayer = 0;
+      imb.subresourceRange.layerCount = 1;
+      imb.srcQueueFamilyIndex = device_data->graphic_queue->family_index;
+      imb.dstQueueFamilyIndex = present_queue->family_index;
+      device_data->vtable.CmdPipelineBarrier(draw->command_buffer,
+                                             VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT,
+                                             VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT,
+                                             0,          /* dependency flags */
+                                             0, nullptr, /* memory barriers */
+                                             0, nullptr, /* buffer memory barriers */
+                                             1, &imb);   /* image memory barriers */
+   }

   device_data->vtable.EndCommandBuffer(draw->command_buffer);

+   VkPipelineStageFlags *stages_wait = (VkPipelineStageFlags*) malloc(sizeof(VkPipelineStageFlags) * n_wait_semaphores);
+   for (unsigned i = 0; i < n_wait_semaphores; i++)
+   {
+      // wait in the fragment stage until the swapchain image is ready
+      stages_wait[i] = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
+   }
+   
   VkSubmitInfo submit_info = {};
-   VkPipelineStageFlags stage_wait = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
   submit_info.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
   submit_info.commandBufferCount = 1;
   submit_info.pCommandBuffers = &draw->command_buffer;
-   submit_info.pWaitDstStageMask = &stage_wait;
+   submit_info.pWaitDstStageMask = stages_wait;
   submit_info.waitSemaphoreCount = n_wait_semaphores;
   submit_info.pWaitSemaphores = wait_semaphores;
   submit_info.signalSemaphoreCount = 1;
   submit_info.pSignalSemaphores = &draw->semaphore;

   device_data->vtable.QueueSubmit(device_data->graphic_queue->queue, 1, &submit_info, draw->fence);
+   
+   free(stages_wait);

   return draw;
 }
@@ -1890,15 +1903,18 @@ static VkResult overlay_QueuePresentKHR(
         struct swapchain_data *swapchain_data =
            FIND(struct swapchain_data, swapchain);

+         uint32_t image_index = pPresentInfo->pImageIndices[i];
+
         before_present(swapchain_data,
                        queue_data,
                        pPresentInfo->pWaitSemaphores,
                        pPresentInfo->waitSemaphoreCount,
-                        pPresentInfo->pImageIndices[i]);
+                        image_index);

         VkPresentInfoKHR present_info = *pPresentInfo;
         present_info.swapchainCount = 1;
         present_info.pSwapchains = &swapchain;
+         present_info.pImageIndices = &image_index;

         uint64_t ts0 = os_time_get();
         result = queue_data->device->vtable.QueuePresentKHR(queue, &present_info);
@@ -1910,11 +1926,13 @@ static VkResult overlay_QueuePresentKHR(
         VkSwapchainKHR swapchain = pPresentInfo->pSwapchains[i];
         struct swapchain_data *swapchain_data =
            FIND(struct swapchain_data, swapchain);
+
+         uint32_t image_index = pPresentInfo->pImageIndices[i];
+
         VkPresentInfoKHR present_info = *pPresentInfo;
         present_info.swapchainCount = 1;
         present_info.pSwapchains = &swapchain;
-
-         uint32_t image_index = pPresentInfo->pImageIndices[i];
+         present_info.pImageIndices = &image_index;

         struct overlay_draw *draw = before_present(swapchain_data,
                                                    queue_data,
Author	SHA1	Message	Date
Dylan Baker	6bbbef9699	VERSION: bump for 20.0.0-rc3	2020-02-13 10:18:44 -08:00
Samuel Pitoiset	fa0dcef2ef	nir: do not use De Morgan's Law rules for flt and fge In presence of NaNs, "!(flt(a, b) && flt(c, d))" is NOT EQUAL to "fge(a, b) \|\| fge(c, d)". These optimizations are unsafe for apps that rely on NaN behaviour. pipeline-db (GFX9/LLVM): Totals from affected shaders: SGPRS: 3176 -> 3136 (-1.26 %) VGPRS: 2188 -> 2144 (-2.01 %) Spilled SGPRs: 227 -> 169 (-25.55 %) Code Size: 150572 -> 151800 (0.82 %) bytes Max Waves: 307 -> 310 (0.98 %) pipeline-db (GFX9/ACO): Totals from affected shaders: SGPRS: 18744 -> 18744 (0.00 %) VGPRS: 15576 -> 15580 (0.03 %) Spilled SGPRs: 164 -> 164 (0.00 %) Code Size: 1573012 -> 1576492 (0.22 %) bytes Max Waves: 1534 -> 1532 (-0.13 %) Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2127 Fixes: `d1ed4ffe0b` ("nir: Use De Morgan's Law on logic compounded comparisons") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3696> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3696> (cherry picked from commit `8e77280774`)	2020-02-11 09:49:15 -08:00
Samuel Pitoiset	4558bdb95a	aco: fix creating v_madak if v_mad_f32 has two sgpr literals Do not ignore that src1 can be a sgpr. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2435 Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3759> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3759> (cherry picked from commit `ddd767387f`)	2020-02-11 09:49:15 -08:00
Vinson Lee	75ea9c808d	panfrost: Remove unused anonymous enum variables. This patch fix these build errors with GCC 10. /usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:52: multiple definition of `pan_sysval'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:52: first defined here /usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:68: multiple definition of `pan_special_attributes'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:68: first defined here Fixes: `7e8de5a707` ("panfrost: Implement system values") Fixes: `306800d747` ("pan/midgard: Lower gl_VertexID/gl_InstanceID to attributes") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752> (cherry picked from commit `63345a3596`)	2020-02-11 09:49:14 -08:00
Eric Anholt	e5f13bca20	Revert "gallium: Fix big-endian addressing of non-bitmask array formats." This reverts the functional part of commit `d17ff2f7f1`, leaving the unit test for mesa/pipe agreement on what's an array. The issue is that the util_channel_desc.shift values on array formats are not used for bit addressing in memory, they're bit addressing within a word treating a pixel of the format as a native type, as seen by llvmpipe's use of the values to do shifts (see lp_build_unpack_arith_rgba_aos() for example). This means the values are nonsensical for 3-byte RGB, but then llvmpipe doesn't expose those formats so it works out. I still want to clean up our big-endian format handling at some point, but let's fix the s390x regression first, sort out our format unit tests in CI, then be able to refactor with confidence. Fixes: `d17ff2f7f1` ("gallium: Fix big-endian addressing of non-bitmask array formats.") Closes: #2472 Acked-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721> (cherry picked from commit `1886dbfe73`)	2020-02-11 09:49:14 -08:00
Marek Olšák	f93c8d8598	radeonsi: fix the DCC MSAA bug workaround Cc: 19.3 20.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3646> (cherry picked from commit `fbb27eebc8`)	2020-02-11 09:49:13 -08:00
Neha Bhende	c4e1dd07eb	svga: Use pipe_shader_state_from_tgsi to set shader state Use pipe_shader_state_from_tgsi() to set shader state for transformed shader so that we get all correct data for respective shader state. This fixes several regressed glretrace, piglit crashes found during merging upsteam mesa Fixes: `bf12bc2dd7` (draw: add nir info gathering and building support) Reviewed-by: Charmaine Lee <charmainel@vmware.com> (cherry picked from commit `144561dc5e`)	2020-02-11 09:49:12 -08:00
Neha Bhende	f86e27156d	svga: fix size of format_conversion_table[] Since we are now using sparse matrix for format_conversion_table, we have to make sure we have last entry in table which gives the sense of required size of format_conversion_table Fixes: `84db6ba7` ("svga: Drop unsupported formats from the format table") Reviewed-by: Charmaine Lee <charmainel@vmware.com> (cherry picked from commit `470e73e7f8`)	2020-02-11 09:49:12 -08:00
Dylan Baker	9724b0f32c	.pick_status.json: Update to `2303762735`	2020-02-11 09:49:08 -08:00
Samuel Pitoiset	a3bd400c14	aco: fix waiting for scalar stores before "writing back" data on GFX8-GFX9 Seems required also on GFX8-GFX9 to achieve correct behaviour. This is an undocumented behaviour but it makes real sense to me. pipeline-db on GFX9: Totals from affected shaders: SGPRS: 1018 -> 1018 (0.00 %) VGPRS: 516 -> 516 (0.00 %) Code Size: 40516 -> 40636 (0.30 %) bytes Max Waves: 280 -> 280 (0.00 %) This fixes some sort of sun flickering with Assassins Creed Origins. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2488 Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3750> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3750> (cherry picked from commit `34fd894e42`)	2020-02-10 09:05:28 -08:00
Georg Lehmann	32dc7fff47	Vulkan overlay: use the corresponding image index for each swapchain pImageIndices should be a pointer to the current image index otherwise every swapchain but the first one could have a wrong image index Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3741> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3741> (cherry picked from commit `7283c33b98`)	2020-02-10 09:05:28 -08:00
Marek Olšák	027f9c887c	radeonsi: don't report that multi-plane formats are supported Fixes: `a554b45d` - st/mesa: don't lower YUV when driver supports it natively Closes: #2376 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3632> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3632> (cherry picked from commit `35961b10da`)	2020-02-10 09:05:27 -08:00
Hyunjun Ko	f3f4751851	freedreno/ir3: put the conversion back for half const to the right place. The previous commit leads to match immed values unexpectedly. This makes constlen for each shader including bvert wrong. Also fixes atan2 for mediump deqp tests. Fixes: `cbd1f47433` ("freedreno/ir3: convert back to 32-bit values for half constant registers.") v2: Move conversion up above fabs/fneg modifier handling as well. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737> (cherry picked from commit `260bd32b58`)	2020-02-10 09:05:27 -08:00
Dylan Baker	a25c7674aa	.pick_status.json: Update to `689817c9df`	2020-02-10 09:05:25 -08:00
Georg Lehmann	1d17f42732	Vulkan Overlay: Don't try to change the image layout to present twice The render pass already does the transition. The pipeline barrier is still needed to transfer the queue family ownership. Fixes: `320b0f66c2` ("vulkan/overlay: bounce image back to present layout") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3740> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3740> (cherry picked from commit `f239bb8020`)	2020-02-07 09:21:52 -08:00
Samuel Pitoiset	e393404ff1	aco: do not use ds_{read,write}2 on GFX6 According to LLVM, these instructions have a bounds checking bug. LLVM only uses them on GFX7+. This fixes broken geometry in Assassins Creed Origins. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2489 Fixes: `4a553212fa` ("radv: enable ACO support for GFX6") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3746> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3746> (cherry picked from commit `4b978cd950`)	2020-02-07 09:21:52 -08:00
Tapani Pälli	1f8db81632	intel/vec4: fix valgrind errors with vf_values array Fixes valgrind errors introduced since commit `a8ec4082`. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2346 Fixes: `a8ec4082` ("nir+vtn: vec8+vec16 support") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3691> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3691> (cherry picked from commit `da76dfb515`)	2020-02-07 09:21:51 -08:00
Rhys Perry	8f29aaa2cf	aco: fix gfx10_wave64_bpermute Since `9254fb4fc7`, the pass replaced the SCC clobber with the scalar identity temporary. Just skip most of the temporary setup, since we don't need it for gfx10_wave64_bpermute. Although shuffles are disabled on GFX10, Detroit: Become Human seems to use them anyway. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Fixes: `9254fb4fc7` ('aco: don't use a scalar temporary for reductions on GFX10') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3683> (cherry picked from commit `20eb1acb6f`)	2020-02-07 09:21:49 -08:00
Georg Lehmann	1e0cc313ba	Correctly wait in the fragment stage until all semaphores are signaled This fixes two issues: - a crash if the application uses more than one semaphore for presenting because the driver expects one stage per semaphore - the swapchain image could be not ready yet if the semaphores aren't signaled, #946 is possible related Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3718> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3718> (cherry picked from commit `1c79afd946`)	2020-02-07 09:21:47 -08:00
Thomas Hellstrom	d96f0faacf	svga: Fix banded DMA upload A previous commit ("winsys/svga: Limit the maximum DMA hardware buffer size") made banded DMA transfer kick in when transfering gnome-shell window contents under gnome-shell / wayland. This uncovered a bug where we assumed that banded DMA transfers always occur to the top (y=0) of the surface. Fix this by taking the destination y offset into account. Cc: 19.2 19.3 20.0 <mesa-stable@lists.freedesktop.org> Fixes: `287c94ea49` ("Squashed commit of the following:") Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3733> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3733> (cherry picked from commit `451cf228d5`)	2020-02-07 09:21:46 -08:00
Lionel Landwerlin	203710e94c	anv: set MOCS on push constants v2: Also set MOCS on 3DSTATE_CONSTANT_ALL (Ken) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `67d2cb3e93` ("anv: Add get_push_range_address() helper.") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3732> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3732> (cherry picked from commit `f9febfae41`)	2020-02-07 09:21:45 -08:00
Rafael Antognolli	d189ab9fcc	intel: Load the driver even if I915_PARAM_REVISION is not found. This param is only available starting on kernel 4.1. Use a default value of 0 if it is not found instead. v2: Update commit message (Lionel) Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Mark Janes <mark.a.janes@intel.com> Fixes: `96e1c945f2` ("i965: Move device info initialization to common Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3727> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3727> (cherry picked from commit `4aa7af9e9a`)	2020-02-07 09:21:44 -08:00
Vinson Lee	bd934ff613	swr: Fix GCC 4.9 checks. Fixes: `f0a22956be` ("swr/rast: _mm_undefined_ implementations for gcc<4.9") Fixes: `e21fc2c625` ("swr/rast: non-regex knob fallback code for gcc < 4.9") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Jan Zielinski <jan.zielinski@intel.com> (cherry picked from commit `deb2bbf57e`)	2020-02-07 09:21:44 -08:00
James Xiong	51f7d81dd2	gallium: let the pipe drivers decide the supported modifiers fixes: `ac0219cc5b` ("gallium: dmabuf support for yuv formats that are not natively supported") Signed-off-by: James Xiong <james.xiong@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3527> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3527> (cherry picked from commit `205ce0bea5`)	2020-02-07 09:21:43 -08:00
Timur Kristóf	06a9d51f27	aco/optimizer: Don't combine uniform bool s_and to s_andn2. Fixes: `8a32f57fff` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3714> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3714> (cherry picked from commit `4d34abd15c`)	2020-02-07 09:21:43 -08:00
Dylan Baker	419c992e65	.pick_status.json: Update to `d8bae10bfe`	2020-02-07 09:21:42 -08:00