VERSION: bump for 21.0.2 release

docs: add release notes for 21.0.2
frontend/va/image: add pipe flush for vlVaPutImage
2021-04-07 09:35:30 -07:00 · 2021-04-07 09:35:07 -07:00 · 2021-04-06 18:55:42 +00:00 · 2021-04-06 09:41:56 -07:00 · 2021-04-06 09:41:56 -07:00 · 2021-04-06 09:41:56 -07:00
38 changed files with 5713 additions and 184 deletions
--- a/.pick_status.json
+++ b/.pick_status.json
--- a/2
+++ b/2
@@ -1 +1 @@
-21.0.1
+21.0.2
--- a/docs/ci/docker.rst
+++ b/docs/ci/docker.rst
@@ -8,10 +8,9 @@ VK-GL-CTS, on the shared GitLab runners provided by `freedesktop
 Software architecture
 ---------------------

-The Docker containers are rebuilt from the debian-install.sh script
-when DEBIAN\_TAG is changed in .gitlab-ci.yml, and
-debian-test-install.sh when DEBIAN\_ARM64\_TAG is changed in
-.gitlab-ci.yml.  The resulting images are around 500MB, and are
+The Docker containers are rebuilt using the shell scripts under
+.gitlab-ci/container/ when the FDO\_DISTRIBUTION\_TAG changes in
+.gitlab-ci.yml. The resulting images are around 1 GB, and are
 expected to change approximately weekly (though an individual
 developer working on them may produce many more images while trying to
 come up with a working MR!).
--- a/docs/relnotes/21.0.1.rst
+++ b/docs/relnotes/21.0.1.rst
@@ -19,7 +19,7 @@ SHA256 checksum

 ::

-    TBD.
+    379fc984459394f2ab2d84049efdc3a659869dc1328ce72ef0598506611712bb  mesa-21.0.1.tar.xz


 New features
--- a/docs/relnotes/21.0.2.rst
+++ b/docs/relnotes/21.0.2.rst
@@ -0,0 +1,135 @@
+Mesa 21.0.2 Release Notes / 2021-04-07
+======================================
+
+Mesa 21.0.2 is a bug fix release which fixes bugs found since the 21.0.1 release.
+
+Mesa 21.0.2 implements the OpenGL 4.6 API, but the version reported by
+glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
+glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
+Some drivers don't support all the features required in OpenGL 4.6. OpenGL
+4.6 is **only** available if requested at context creation.
+Compatibility contexts may report a lower version depending on each driver.
+
+Mesa 21.0.2 implements the Vulkan 1.2 API, but the version reported by
+the apiVersion property of the VkPhysicalDeviceProperties struct
+depends on the particular driver being used.
+
+SHA256 checksum
+---------------
+
+::
+
+    TBD.
+
+
+New features
+------------
+
+- None
+
+
+Bug fixes
+---------
+
+- warning: xnack 'Off' was requested for a processor that does not support it! \[AMD VEGAM with LLVM 12.0.0\]
+- Clover doesn't work for kmsro drivers
+- util cpu detection breaks on 128-core AMD machines
+- ACO error with GCN 1 GPU
+- kmsro advertises EGL_MESA_device_software
+
+
+Changes
+-------
+
+Adrian Ratiu (1):
+
+- docs: docker: minor stale documentation fix
+
+Bas Nieuwenhuizen (1):
+
+- radv: Flush caches for shader read operations.
+
+Boyuan Zhang (1):
+
+- frontend/va/image: add pipe flush for vlVaPutImage
+
+Charmaine Lee (1):
+
+- gallivm: increase size of texture target enum bitfield
+
+Dave Airlie (3):
+
+- lavapipe: fix templated descriptor updates
+- util: rework AMD cpu L3 cache affinity code.
+- drisw: move zink down the list below the sw drivers.
+
+Dylan Baker (9):
+
+- docs: Add 21.0.1 hashes
+- .pick_status.json: Update to 9be24c89c8c298069eaa3ff600ba556b9a4557e9
+- .pick_status.json: Update to 8e43abcd2c29366d77fff804a7845b61fb97ca5c
+- .pick_status.json: Mark 75951a44ee9f25d29865f3dd60cdf3b8ce3f7f0c as backported
+- .pick_status.json: Update to a7c0cf500b335069bfe480c947b26052335f897e
+- .pick_status.json: Update to ee14bec09a92e4363ef916d00d4d9baecfb09fa9
+- .pick_status.json: Update to 3c64c090e0d2250d7ee880550f8cbeac0052c8d9
+- .pick_status.json: Update to fb5615af40a5878b127827f80f4185df63933f34
+- .pick_status.json: Update to 1e0a69afa72c61e5f5841db3e5e7f6bb846a0fab
+
+Erik Faye-Lund (1):
+
+- compiler/glsl: avoid null-pointer deref
+
+Gert Wollny (1):
+
+- r600: don't set an index_bias for indirect draw calls
+
+Icecream95 (2):
+
+- panfrost: Disable early-z when alpha test is used
+- pipe-loader,gallium/drm: Fix the kmsro pipe_loader target
+
+Lionel Landwerlin (1):
+
+- intel/fs/copy_prop: check stride constraints with actual final type
+
+Marek Olšák (2):
+
+- ac/llvm: don't set unsupported xnack options to fix LLVM crashes on gfx6-8
+- radeonsi: disable sparse buffers on gfx7-8
+
+Michel Dänzer (2):
+
+- intel/tools: Use subprocess.Popen to read output directly from a pipe
+- Revert "glsl/test: Don't run whitespace tests in parallel"
+
+Mike Blumenkrantz (5):
+
+- util/set: stop leaking u32 key sets which pass a mem ctx
+- lavapipe: use the passed offset for CmdCopyQueryPoolResults
+- util/bitscan: add u_foreach_bit macros
+- lavapipe: fix CmdCopyQueryPoolResults for partial pipeline statistics queries
+- lavapipe: fix array texture region copies
+
+Pierre-Eric Pelloux-Prayer (3):
+
+- mesa/st: fix lower_tex_src_plane in multiple samplers scenario
+- nir/lower_tex: ignore texture_index if tex_instr has deref src
+- mesa/st: fix st_nir_lower_tex_src_plane arguments
+
+Rhys Perry (1):
+
+- aco: implement image_deref_samples
+
+Simon Ser (3):
+
+- egl: fix software flag in \_eglAddDevice call on DRM
+- egl: only take render nodes into account when listing DRM devices
+- Revert "egl: Don't add hardware device if there is no render node v2."
+
+Tapani Pälli (1):
+
+- iris: clamp PointWidth in 3DSTATE_SF like i965 does
+
+Tony Wasserka (1):
+
+- aco/isel: Don't emit unsupported i16<->f16 conversion opcodes on GFX6/7
--- a/src/amd/compiler/aco_instruction_selection.cpp
+++ b/src/amd/compiler/aco_instruction_selection.cpp
@@ -2444,11 +2444,24 @@ void visit_alu_instr(isel_context *ctx, nir_alu_instr *instr)
   case nir_op_i2f16: {
      assert(dst.regClass() == v2b);
      Temp src = get_alu_src(ctx, instr->src[0]);
-      if (instr->src[0].src.ssa->bit_size == 8)
-         src = convert_int(ctx, bld, src, 8, 16, true);
-      else if (instr->src[0].src.ssa->bit_size == 64)
+      const unsigned input_size = instr->src[0].src.ssa->bit_size;
+      if (input_size <= 16) {
+         /* Expand integer to the size expected by the uint→float converter used below */
+         unsigned target_size = (ctx->program->chip_class >= GFX8 ? 16 : 32);
+         if (input_size != target_size) {
+            src = convert_int(ctx, bld, src, input_size, target_size, true);
+         }
+      } else if (input_size == 64) {
         src = convert_int(ctx, bld, src, 64, 32, false);
-      bld.vop1(aco_opcode::v_cvt_f16_i16, Definition(dst), src);
+      }
+
+      if (ctx->program->chip_class >= GFX8) {
+         bld.vop1(aco_opcode::v_cvt_f16_i16, Definition(dst), src);
+      } else {
+         /* GFX7 and earlier do not support direct f16⟷i16 conversions */
+         src = bld.vop1(aco_opcode::v_cvt_f32_i32, bld.def(v1), src);
+         bld.vop1(aco_opcode::v_cvt_f16_f32, Definition(dst), src);
+      }
      break;
   }
   case nir_op_i2f32: {
@@ -2483,11 +2496,24 @@ void visit_alu_instr(isel_context *ctx, nir_alu_instr *instr)
   case nir_op_u2f16: {
      assert(dst.regClass() == v2b);
      Temp src = get_alu_src(ctx, instr->src[0]);
-      if (instr->src[0].src.ssa->bit_size == 8)
-         src = convert_int(ctx, bld, src, 8, 16, false);
-      else if (instr->src[0].src.ssa->bit_size == 64)
+      const unsigned input_size = instr->src[0].src.ssa->bit_size;
+      if (input_size <= 16) {
+         /* Expand integer to the size expected by the uint→float converter used below */
+         unsigned target_size = (ctx->program->chip_class >= GFX8 ? 16 : 32);
+         if (input_size != target_size) {
+            src = convert_int(ctx, bld, src, input_size, target_size, false);
+         }
+      } else if (input_size == 64) {
         src = convert_int(ctx, bld, src, 64, 32, false);
-      bld.vop1(aco_opcode::v_cvt_f16_u16, Definition(dst), src);
+      }
+
+      if (ctx->program->chip_class >= GFX8) {
+         bld.vop1(aco_opcode::v_cvt_f16_u16, Definition(dst), src);
+      } else {
+         /* GFX7 and earlier do not support direct f16⟷u16 conversions */
+         src = bld.vop1(aco_opcode::v_cvt_f32_u32, bld.def(v1), src);
+         bld.vop1(aco_opcode::v_cvt_f16_f32, Definition(dst), src);
+      }
      break;
   }
   case nir_op_u2f32: {
@@ -2524,22 +2550,46 @@ void visit_alu_instr(isel_context *ctx, nir_alu_instr *instr)
   }
   case nir_op_f2i8:
   case nir_op_f2i16: {
-      if (instr->src[0].src.ssa->bit_size == 16)
-         emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_i16_f16, dst);
-      else if (instr->src[0].src.ssa->bit_size == 32)
+      if (instr->src[0].src.ssa->bit_size == 16) {
+         if (ctx->program->chip_class >= GFX8) {
+            emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_i16_f16, dst);
+         } else {
+            /* GFX7 and earlier do not support direct f16⟷i16 conversions */
+            Temp tmp = bld.tmp(v1);
+            emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_f32_f16, tmp);
+            tmp = bld.vop1(aco_opcode::v_cvt_i32_f32, bld.def(v1), tmp);
+            tmp = convert_int(ctx, bld, tmp, 32, 16, false, (dst.type() == RegType::sgpr) ? Temp() : dst);
+            if (dst.type() == RegType::sgpr) {
+               bld.pseudo(aco_opcode::p_as_uniform, Definition(dst), tmp);
+            }
+         }
+      } else if (instr->src[0].src.ssa->bit_size == 32) {
         emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_i32_f32, dst);
-      else
+      } else {
         emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_i32_f64, dst);
+      }
      break;
   }
   case nir_op_f2u8:
   case nir_op_f2u16: {
-      if (instr->src[0].src.ssa->bit_size == 16)
-         emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_u16_f16, dst);
-      else if (instr->src[0].src.ssa->bit_size == 32)
+      if (instr->src[0].src.ssa->bit_size == 16) {
+         if (ctx->program->chip_class >= GFX8) {
+            emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_u16_f16, dst);
+         } else {
+            /* GFX7 and earlier do not support direct f16⟷u16 conversions */
+            Temp tmp = bld.tmp(v1);
+            emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_f32_f16, tmp);
+            tmp = bld.vop1(aco_opcode::v_cvt_u32_f32, bld.def(v1), tmp);
+            tmp = convert_int(ctx, bld, tmp, 32, 16, false, (dst.type() == RegType::sgpr) ? Temp() : dst);
+            if (dst.type() == RegType::sgpr) {
+               bld.pseudo(aco_opcode::p_as_uniform, Definition(dst), tmp);
+            }
+         }
+      } else if (instr->src[0].src.ssa->bit_size == 32) {
         emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_u32_f32, dst);
-      else
+      } else {
         emit_vop1_instruction(ctx, instr, aco_opcode::v_cvt_u32_f64, dst);
+      }
      break;
   }
   case nir_op_f2i32: {
@@ -6456,6 +6506,37 @@ void visit_image_size(isel_context *ctx, nir_intrinsic_instr *instr)
   emit_split_vector(ctx, dst, instr->dest.ssa.num_components);
 }

+void get_image_samples(isel_context *ctx, Definition dst, Temp resource)
+{
+   Builder bld(ctx->program, ctx->block);
+
+   Temp dword3 = emit_extract_vector(ctx, resource, 3, s1);
+   Temp samples_log2 = bld.sop2(aco_opcode::s_bfe_u32, bld.def(s1), bld.def(s1, scc), dword3, Operand(16u | 4u<<16));
+   Temp samples = bld.sop2(aco_opcode::s_lshl_b32, bld.def(s1), bld.def(s1, scc), Operand(1u), samples_log2);
+   Temp type = bld.sop2(aco_opcode::s_bfe_u32, bld.def(s1), bld.def(s1, scc), dword3, Operand(28u | 4u<<16 /* offset=28, width=4 */));
+
+   Operand default_sample = Operand(1u);
+   if (ctx->options->robust_buffer_access) {
+      /* Extract the second dword of the descriptor, if it's
+       * all zero, then it's a null descriptor.
+       */
+      Temp dword1 = emit_extract_vector(ctx, resource, 1, s1);
+      Temp is_non_null_descriptor = bld.sopc(aco_opcode::s_cmp_gt_u32, bld.def(s1, scc), dword1, Operand(0u));
+      default_sample = Operand(is_non_null_descriptor);
+   }
+
+   Temp is_msaa = bld.sopc(aco_opcode::s_cmp_ge_u32, bld.def(s1, scc), type, Operand(14u));
+   bld.sop2(aco_opcode::s_cselect_b32, dst, samples, default_sample, bld.scc(is_msaa));
+}
+
+void visit_image_samples(isel_context *ctx, nir_intrinsic_instr *instr)
+{
+   Builder bld(ctx->program, ctx->block);
+   Temp dst = get_ssa_temp(ctx, &instr->dest.ssa);
+   Temp resource = get_sampler_desc(ctx, nir_instr_as_deref(instr->src[0].ssa->parent_instr), ACO_DESC_IMAGE, NULL, true, false);
+   get_image_samples(ctx, Definition(dst), resource);
+}
+
 void visit_load_ssbo(isel_context *ctx, nir_intrinsic_instr *instr)
 {
   Builder bld(ctx->program, ctx->block);
@@ -8060,6 +8141,9 @@ void visit_intrinsic(isel_context *ctx, nir_intrinsic_instr *instr)
   case nir_intrinsic_image_deref_size:
      visit_image_size(ctx, instr);
      break;
+   case nir_intrinsic_image_deref_samples:
+      visit_image_samples(ctx, instr);
+      break;
   case nir_intrinsic_load_ssbo:
      visit_load_ssbo(ctx, instr);
      break;
@@ -9006,25 +9090,7 @@ void visit_tex(isel_context *ctx, nir_tex_instr *instr)
      return get_buffer_size(ctx, resource, get_ssa_temp(ctx, &instr->dest.ssa), true);

   if (instr->op == nir_texop_texture_samples) {
-      Temp dword3 = emit_extract_vector(ctx, resource, 3, s1);
-
-      Temp samples_log2 = bld.sop2(aco_opcode::s_bfe_u32, bld.def(s1), bld.def(s1, scc), dword3, Operand(16u | 4u<<16));
-      Temp samples = bld.sop2(aco_opcode::s_lshl_b32, bld.def(s1), bld.def(s1, scc), Operand(1u), samples_log2);
-      Temp type = bld.sop2(aco_opcode::s_bfe_u32, bld.def(s1), bld.def(s1, scc), dword3, Operand(28u | 4u<<16 /* offset=28, width=4 */));
-
-      Operand default_sample = Operand(1u);
-      if (ctx->options->robust_buffer_access) {
-         /* Extract the second dword of the descriptor, if it's
-	  * all zero, then it's a null descriptor.
-	  */
-         Temp dword1 = emit_extract_vector(ctx, resource, 1, s1);
-         Temp is_non_null_descriptor = bld.sopc(aco_opcode::s_cmp_gt_u32, bld.def(s1, scc), dword1, Operand(0u));
-         default_sample = Operand(is_non_null_descriptor);
-      }
-
-      Temp is_msaa = bld.sopc(aco_opcode::s_cmp_ge_u32, bld.def(s1, scc), type, Operand(14u));
-      bld.sop2(aco_opcode::s_cselect_b32, Definition(get_ssa_temp(ctx, &instr->dest.ssa)),
-               samples, default_sample, bld.scc(is_msaa));
+      get_image_samples(ctx, Definition(get_ssa_temp(ctx, &instr->dest.ssa)), resource);
      return;
   }

--- a/src/amd/compiler/aco_instruction_selection_setup.cpp
+++ b/src/amd/compiler/aco_instruction_selection_setup.cpp
@@ -799,6 +799,7 @@ void init_context(isel_context *ctx, nir_shader *shader)
                  case nir_intrinsic_read_invocation:
                  case nir_intrinsic_first_invocation:
                  case nir_intrinsic_ballot:
+                  case nir_intrinsic_image_deref_samples:
                     type = RegType::sgpr;
                     break;
                  case nir_intrinsic_load_sample_id:
--- a/src/amd/llvm/ac_llvm_util.c
+++ b/src/amd/llvm/ac_llvm_util.c
@@ -194,13 +194,11 @@ static LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family,
   const char *triple = (tm_options & AC_TM_SUPPORTS_SPILL) ? "amdgcn-mesa-mesa3d" : "amdgcn--";
   LLVMTargetRef target = ac_get_llvm_target(triple);

-   snprintf(features, sizeof(features), "+DumpCode%s%s%s%s%s",
+   snprintf(features, sizeof(features), "+DumpCode%s%s%s",
            LLVM_VERSION_MAJOR >= 11 ? "" : ",-fp32-denormals,+fp64-denormals",
            family >= CHIP_NAVI10 && !(tm_options & AC_TM_WAVE32)
               ? ",+wavefrontsize64,-wavefrontsize32"
               : "",
-            family <= CHIP_NAVI14 && tm_options & AC_TM_FORCE_ENABLE_XNACK ? ",+xnack" : "",
-            family <= CHIP_NAVI14 && tm_options & AC_TM_FORCE_DISABLE_XNACK ? ",-xnack" : "",
            tm_options & AC_TM_PROMOTE_ALLOCA_TO_SCRATCH ? ",-promote-alloca" : "");

   LLVMTargetMachineRef tm =
--- a/src/amd/llvm/ac_llvm_util.h
+++ b/src/amd/llvm/ac_llvm_util.h
@@ -62,8 +62,6 @@ enum ac_func_attr
 enum ac_target_machine_options
 {
   AC_TM_SUPPORTS_SPILL = (1 << 0),
-   AC_TM_FORCE_ENABLE_XNACK = (1 << 1),
-   AC_TM_FORCE_DISABLE_XNACK = (1 << 2),
   AC_TM_PROMOTE_ALLOCA_TO_SCRATCH = (1 << 3),
   AC_TM_CHECK_IR = (1 << 4),
   AC_TM_ENABLE_GLOBAL_ISEL = (1 << 5),
--- a/src/amd/vulkan/radv_meta_fmask_expand.c
+++ b/src/amd/vulkan/radv_meta_fmask_expand.c
@@ -114,7 +114,9 @@ radv_expand_fmask_image_inplace(struct radv_cmd_buffer *cmd_buffer,
 	radv_CmdBindPipeline(radv_cmd_buffer_to_handle(cmd_buffer),
 			     VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);

-	cmd_buffer->state.flush_bits |= radv_dst_access_flush(cmd_buffer, VK_ACCESS_SHADER_WRITE_BIT, image);
+	cmd_buffer->state.flush_bits |=
+		radv_dst_access_flush(cmd_buffer, VK_ACCESS_SHADER_READ_BIT |
+						  VK_ACCESS_SHADER_WRITE_BIT, image);

 	for (unsigned l = 0; l < radv_get_layerCount(image, subresourceRange); l++) {
 		struct radv_image_view iview;
--- a/src/compiler/glsl/gl_nir_lower_samplers_as_deref.c
+++ b/src/compiler/glsl/gl_nir_lower_samplers_as_deref.c
@@ -120,9 +120,10 @@ remove_struct_derefs_prep(nir_deref_instr **p, char **name,

 static void
 record_images_used(struct shader_info *info,
-                   nir_deref_instr *deref)
+                   nir_intrinsic_instr *instr)
 {
-   nir_variable *var = nir_deref_instr_get_variable(deref);
+   nir_variable *var =
+      nir_deref_instr_get_variable(nir_src_as_deref(instr->src[0]));

   /* Structs have been lowered already, so get_aoa_size is sufficient. */
   const unsigned size =
@@ -302,7 +303,7 @@ lower_intrinsic(nir_intrinsic_instr *instr,
      nir_deref_instr *deref =
         lower_deref(b, state, nir_src_as_deref(instr->src[0]));

-      record_images_used(&state->shader->info, deref);
+      record_images_used(&state->shader->info, instr);

      /* don't lower bindless: */
      if (!deref)
--- a/src/compiler/glsl/glcpp/meson.build
+++ b/src/compiler/glsl/glcpp/meson.build
@@ -86,13 +86,6 @@ if with_any_opengl and with_tests and host_machine.system() != 'windows'
    modes += ['valgrind']
  endif

-  # For some unfathomable reason, three out of these four tests often time out
-  # when running within CI. On the assumption that there is some
-  # parallelisation badness happening rather than the non-UNIX tests entering
-  # infinite loops, try just marking them as serial-only.
-  #
-  # This should have a negligible impact on runtime since they are quick to
-  # execute.
  foreach m : modes
    test(
      'glcpp test (@0@)'.format(m),
@@ -104,7 +97,6 @@ if with_any_opengl and with_tests and host_machine.system() != 'windows'
      ],
      suite : ['compiler', 'glcpp'],
      timeout: 60,
-      is_parallel: false,
    )
  endforeach
 endif
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -287,16 +287,17 @@ static void
 convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex,
                   nir_ssa_def *y, nir_ssa_def *u, nir_ssa_def *v,
                   nir_ssa_def *a,
-                   const nir_lower_tex_options *options)
+                   const nir_lower_tex_options *options,
+                   unsigned texture_index)
 {

   const float *offset_vals;
   const nir_const_value_3_4 *m;
   assert((options->bt709_external & options->bt2020_external) == 0);
-   if (options->bt709_external & (1 << tex->texture_index)) {
+   if (options->bt709_external & (1u << texture_index)) {
      m = &bt709_csc_coeffs;
      offset_vals = bt709_csc_offsets;
-   } else if (options->bt2020_external & (1 << tex->texture_index)) {
+   } else if (options->bt2020_external & (1u << texture_index)) {
      m = &bt2020_csc_coeffs;
      offset_vals = bt2020_csc_offsets;
   } else {
@@ -327,7 +328,8 @@ convert_yuv_to_rgb(nir_builder *b, nir_tex_instr *tex,

 static void
 lower_y_uv_external(nir_builder *b, nir_tex_instr *tex,
-                    const nir_lower_tex_options *options)
+                    const nir_lower_tex_options *options,
+                    unsigned texture_index)
 {
   b->cursor = nir_after_instr(&tex->instr);

@@ -339,12 +341,14 @@ lower_y_uv_external(nir_builder *b, nir_tex_instr *tex,
                      nir_channel(b, uv, 0),
                      nir_channel(b, uv, 1),
                      nir_imm_float(b, 1.0f),
-                      options);
+                      options,
+                      texture_index);
 }

 static void
 lower_y_u_v_external(nir_builder *b, nir_tex_instr *tex,
-                     const nir_lower_tex_options *options)
+                     const nir_lower_tex_options *options,
+                     unsigned texture_index)
 {
   b->cursor = nir_after_instr(&tex->instr);

@@ -357,12 +361,14 @@ lower_y_u_v_external(nir_builder *b, nir_tex_instr *tex,
                      nir_channel(b, u, 0),
                      nir_channel(b, v, 0),
                      nir_imm_float(b, 1.0f),
-                      options);
+                      options,
+                      texture_index);
 }

 static void
 lower_yx_xuxv_external(nir_builder *b, nir_tex_instr *tex,
-                       const nir_lower_tex_options *options)
+                       const nir_lower_tex_options *options,
+                       unsigned texture_index)
 {
   b->cursor = nir_after_instr(&tex->instr);

@@ -374,12 +380,14 @@ lower_yx_xuxv_external(nir_builder *b, nir_tex_instr *tex,
                      nir_channel(b, xuxv, 1),
                      nir_channel(b, xuxv, 3),
                      nir_imm_float(b, 1.0f),
-                      options);
+                      options,
+                      texture_index);
 }

 static void
 lower_xy_uxvx_external(nir_builder *b, nir_tex_instr *tex,
-                       const nir_lower_tex_options *options)
+                       const nir_lower_tex_options *options,
+                       unsigned texture_index)
 {
  b->cursor = nir_after_instr(&tex->instr);

@@ -391,12 +399,14 @@ lower_xy_uxvx_external(nir_builder *b, nir_tex_instr *tex,
                     nir_channel(b, uxvx, 0),
                     nir_channel(b, uxvx, 2),
                     nir_imm_float(b, 1.0f),
-                     options);
+                     options,
+                     texture_index);
 }

 static void
 lower_ayuv_external(nir_builder *b, nir_tex_instr *tex,
-                    const nir_lower_tex_options *options)
+                    const nir_lower_tex_options *options,
+                    unsigned texture_index)
 {
  b->cursor = nir_after_instr(&tex->instr);

@@ -407,12 +417,14 @@ lower_ayuv_external(nir_builder *b, nir_tex_instr *tex,
                     nir_channel(b, ayuv, 1),
                     nir_channel(b, ayuv, 0),
                     nir_channel(b, ayuv, 3),
-                     options);
+                     options,
+                     texture_index);
 }

 static void
 lower_xyuv_external(nir_builder *b, nir_tex_instr *tex,
-                    const nir_lower_tex_options *options)
+                    const nir_lower_tex_options *options,
+                    unsigned texture_index)
 {
  b->cursor = nir_after_instr(&tex->instr);

@@ -423,12 +435,14 @@ lower_xyuv_external(nir_builder *b, nir_tex_instr *tex,
                     nir_channel(b, xyuv, 1),
                     nir_channel(b, xyuv, 0),
                     nir_imm_float(b, 1.0f),
-                     options);
+                     options,
+                     texture_index);
 }

 static void
 lower_yuv_external(nir_builder *b, nir_tex_instr *tex,
-                   const nir_lower_tex_options *options)
+                   const nir_lower_tex_options *options,
+                   unsigned texture_index)
 {
  b->cursor = nir_after_instr(&tex->instr);

@@ -439,7 +453,8 @@ lower_yuv_external(nir_builder *b, nir_tex_instr *tex,
                     nir_channel(b, yuv, 1),
                     nir_channel(b, yuv, 2),
                     nir_imm_float(b, 1.0f),
-                     options);
+                     options,
+                     texture_index);
 }

 /*
@@ -1052,38 +1067,45 @@ nir_lower_tex_block(nir_block *block, nir_builder *b,
         progress = true;
      }

-      if ((1 << tex->texture_index) & options->lower_y_uv_external) {
-         lower_y_uv_external(b, tex, options);
+      unsigned texture_index = tex->texture_index;
+      int tex_index = nir_tex_instr_src_index(tex, nir_tex_src_texture_deref);
+      if (tex_index >= 0) {
+         nir_deref_instr *deref = nir_src_as_deref(tex->src[tex_index].src);
+         texture_index = nir_deref_instr_get_variable(deref)->data.binding;
+      }
+
+      if ((1u << texture_index) & options->lower_y_uv_external) {
+         lower_y_uv_external(b, tex, options, texture_index);
         progress = true;
      }

-      if ((1 << tex->texture_index) & options->lower_y_u_v_external) {
-         lower_y_u_v_external(b, tex, options);
+      if ((1u << texture_index) & options->lower_y_u_v_external) {
+         lower_y_u_v_external(b, tex, options, texture_index);
         progress = true;
      }

-      if ((1 << tex->texture_index) & options->lower_yx_xuxv_external) {
-         lower_yx_xuxv_external(b, tex, options);
+      if ((1u << texture_index) & options->lower_yx_xuxv_external) {
+         lower_yx_xuxv_external(b, tex, options, texture_index);
         progress = true;
      }

-      if ((1 << tex->texture_index) & options->lower_xy_uxvx_external) {
-         lower_xy_uxvx_external(b, tex, options);
+      if ((1u << texture_index) & options->lower_xy_uxvx_external) {
+         lower_xy_uxvx_external(b, tex, options, texture_index);
         progress = true;
      }

-      if ((1 << tex->texture_index) & options->lower_ayuv_external) {
-         lower_ayuv_external(b, tex, options);
+      if ((1u << texture_index) & options->lower_ayuv_external) {
+         lower_ayuv_external(b, tex, options, texture_index);
         progress = true;
      }

-      if ((1 << tex->texture_index) & options->lower_xyuv_external) {
-         lower_xyuv_external(b, tex, options);
+      if ((1u << texture_index) & options->lower_xyuv_external) {
+         lower_xyuv_external(b, tex, options, texture_index);
         progress = true;
      }

      if ((1 << tex->texture_index) & options->lower_yuv_external) {
-         lower_yuv_external(b, tex, options);
+         lower_yuv_external(b, tex, options, texture_index);
         progress = true;
      }

@@ -1097,7 +1119,7 @@ nir_lower_tex_block(nir_block *block, nir_builder *b,
         progress = true;
      }

-      if (((1 << tex->texture_index) & options->swizzle_result) &&
+      if (((1u << texture_index) & options->swizzle_result) &&
          !nir_tex_instr_is_query(tex) &&
          !(tex->is_shadow && tex->is_new_style_shadow)) {
         swizzle_result(b, tex, options->swizzles[tex->texture_index]);
@@ -1105,7 +1127,7 @@ nir_lower_tex_block(nir_block *block, nir_builder *b,
      }

      /* should be after swizzle so we know which channels are rgb: */
-      if (((1 << tex->texture_index) & options->lower_srgb) &&
+      if (((1u << texture_index) & options->lower_srgb) &&
          !nir_tex_instr_is_query(tex) && !tex->is_shadow) {
         linearize_srgb_result(b, tex);
         progress = true;
--- a/src/egl/drivers/dri2/platform_drm.c
+++ b/src/egl/drivers/dri2/platform_drm.c
@@ -718,7 +718,7 @@ dri2_initialize_drm(_EGLDisplay *disp)
      goto cleanup;
   }

-   dev = _eglAddDevice(dri2_dpy->fd, disp->Options.ForceSoftware);
+   dev = _eglAddDevice(dri2_dpy->fd, dri2_dpy->gbm_dri->software);
   if (!dev) {
      err = "DRI2: failed to find EGLDevice";
      goto cleanup;
--- a/src/egl/main/egldevice.c
+++ b/src/egl/main/egldevice.c
@@ -109,9 +109,9 @@ static int
 _eglAddDRMDevice(drmDevicePtr device, _EGLDevice **out_dev)
 {
   _EGLDevice *dev;
-   const int wanted_nodes = 1 << DRM_NODE_RENDER | 1 << DRM_NODE_PRIMARY;

-   if ((device->available_nodes & wanted_nodes) != wanted_nodes)
+   if ((device->available_nodes & (1 << DRM_NODE_PRIMARY |
+                                   1 << DRM_NODE_RENDER)) == 0)
      return -1;

   dev = _eglGlobal.DeviceList;
@@ -274,6 +274,9 @@ _eglRefreshDeviceList(void)

   num_devs = drmGetDevices2(0, devices, ARRAY_SIZE(devices));
   for (int i = 0; i < num_devs; i++) {
+      if (!(devices[i]->available_nodes & (1 << DRM_NODE_RENDER)))
+         continue;
+
      ret = _eglAddDRMDevice(devices[i], NULL);

      /* Device is not added - error or already present */
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
@@ -169,7 +169,7 @@ struct lp_static_texture_state
   unsigned swizzle_a:3;

   /* pipe_texture's state */
-   enum pipe_texture_target target:4;        /**< PIPE_TEXTURE_* */
+   enum pipe_texture_target target:5;        /**< PIPE_TEXTURE_* */
   unsigned pot_width:1;     /**< is the width a power of two? */
   unsigned pot_height:1;
   unsigned pot_depth:1;
--- a/src/gallium/auxiliary/target-helpers/drm_helper.h
+++ b/src/gallium/auxiliary/target-helpers/drm_helper.h
@@ -60,6 +60,15 @@ const struct drm_driver_descriptor descriptor_name = {         \

 #endif

+#ifdef GALLIUM_KMSRO_ONLY
+#undef GALLIUM_V3D
+#undef GALLIUM_VC4
+#undef GALLIUM_FREEDRENO
+#undef GALLIUM_ETNAVIV
+#undef GALLIUM_PANFROST
+#undef GALLIUM_LIMA
+#endif
+
 #ifdef GALLIUM_I915
 #include "i915/drm/i915_drm_public.h"
 #include "i915/i915_public.h"
--- a/src/gallium/auxiliary/target-helpers/inline_sw_helper.h
+++ b/src/gallium/auxiliary/target-helpers/inline_sw_helper.h
@@ -81,9 +81,6 @@ sw_screen_create(struct sw_winsys *winsys)
   UNUSED bool only_sw = env_var_as_boolean("LIBGL_ALWAYS_SOFTWARE", false);
   const char *drivers[] = {
      debug_get_option("GALLIUM_DRIVER", ""),
-#if defined(GALLIUM_ZINK)
-      only_sw ? "" : "zink",
-#endif
 #if defined(GALLIUM_D3D12)
      only_sw ? "" : "d3d12",
 #endif
@@ -95,6 +92,9 @@ sw_screen_create(struct sw_winsys *winsys)
 #endif
 #if defined(GALLIUM_SWR)
      "swr",
+#endif
+#if defined(GALLIUM_ZINK)
+      only_sw ? "" : "zink",
 #endif
   };

--- a/src/gallium/auxiliary/target-helpers/sw_helper.h
+++ b/src/gallium/auxiliary/target-helpers/sw_helper.h
@@ -86,9 +86,6 @@ sw_screen_create(struct sw_winsys *winsys)
   UNUSED bool only_sw = env_var_as_boolean("LIBGL_ALWAYS_SOFTWARE", false);
   const char *drivers[] = {
      debug_get_option("GALLIUM_DRIVER", ""),
-#if defined(GALLIUM_ZINK)
-      only_sw ? "" : "zink",
-#endif
 #if defined(GALLIUM_D3D12)
      only_sw ? "" : "d3d12",
 #endif
@@ -100,6 +97,9 @@ sw_screen_create(struct sw_winsys *winsys)
 #endif
 #if defined(GALLIUM_SWR)
      "swr",
+#endif
+#if defined(GALLIUM_ZINK)
+      only_sw ? "" : "zink",
 #endif
   };

--- a/src/gallium/drivers/iris/iris_state.c
+++ b/src/gallium/drivers/iris/iris_state.c
@@ -1761,7 +1761,7 @@ iris_create_rasterizer_state(struct pipe_context *ctx,
      sf.SmoothPointEnable = (state->point_smooth || state->multisample) &&
                             !state->point_quad_rasterization;
      sf.PointWidthSource = state->point_size_per_vertex ? Vertex : State;
-      sf.PointWidth = state->point_size;
+      sf.PointWidth = CLAMP(state->point_size, 0.125f, 255.875f);

      if (state->flatshade_first) {
         sf.TriangleFanProvokingVertexSelect = 1;
--- a/src/gallium/drivers/panfrost/pan_cmdstream.c
+++ b/src/gallium/drivers/panfrost/pan_cmdstream.c
@@ -436,7 +436,8 @@ panfrost_prepare_midgard_fs_state(struct panfrost_context *ctx,
        } else {
                /* Reasons to disable early-Z from a shader perspective */
                bool late_z = fs->can_discard || fs->writes_global ||
-                              fs->writes_depth || fs->writes_stencil;
+                              fs->writes_depth || fs->writes_stencil ||
+                              (zsa->alpha_func != MALI_FUNC_ALWAYS);

                /* If either depth or stencil is enabled, discard matters */
                bool zs_enabled =
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -2210,7 +2210,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info
 		}
 		index_bias = info->index_bias;
 	} else {
-		index_bias = draws[0].start;
+		index_bias = indirect ? 0 : draws[0].start;
 	}

 	/* Set the index offset and primitive restart. */
--- a/src/gallium/drivers/radeonsi/si_get.c
+++ b/src/gallium/drivers/radeonsi/si_get.c
@@ -229,7 +229,9 @@ static int si_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
      return LLVM_VERSION_MAJOR < 9 && !sscreen->info.has_unaligned_shader_loads;

   case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE:
-      return sscreen->info.has_sparse_vm_mappings ? RADEON_SPARSE_PAGE_SIZE : 0;
+      /* Gfx8 (Polaris11) hangs, so don't enable this on Gfx8 and older chips. */
+      return sscreen->info.chip_class >= GFX9 &&
+             sscreen->info.has_sparse_vm_mappings ? RADEON_SPARSE_PAGE_SIZE : 0;

   case PIPE_CAP_UMA:
   case PIPE_CAP_PREFER_IMM_ARRAYS_AS_CONSTBUF:
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -140,8 +140,6 @@ void si_init_compiler(struct si_screen *sscreen, struct ac_llvm_compiler *compil

   enum ac_target_machine_options tm_options =
      (sscreen->debug_flags & DBG(GISEL) ? AC_TM_ENABLE_GLOBAL_ISEL : 0) |
-      (sscreen->info.chip_class <= GFX8 ? AC_TM_FORCE_DISABLE_XNACK :
-       sscreen->info.chip_class <= GFX10 ? AC_TM_FORCE_ENABLE_XNACK : 0) |
      (!sscreen->llvm_has_working_vgpr_indexing ? AC_TM_PROMOTE_ALLOCA_TO_SCRATCH : 0) |
      (sscreen->debug_flags & DBG(CHECK_IR) ? AC_TM_CHECK_IR : 0) |
      (create_low_opt_compiler ? AC_TM_CREATE_LOW_OPT : 0);
--- a/src/gallium/frontends/lavapipe/lvp_descriptor_set.c
+++ b/src/gallium/frontends/lavapipe/lvp_descriptor_set.c
@@ -567,11 +567,12 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
      struct lvp_descriptor *desc =
         &set->descriptors[bind_layout->descriptor_index];
      for (j = 0; j < entry->descriptorCount; ++j) {
+         unsigned idx = j + entry->dstArrayElement;
         switch (entry->descriptorType) {
         case VK_DESCRIPTOR_TYPE_SAMPLER: {
            LVP_FROM_HANDLE(lvp_sampler, sampler,
                            *(VkSampler *)pSrc);
-            desc[j] = (struct lvp_descriptor) {
+            desc[idx] = (struct lvp_descriptor) {
               .type = VK_DESCRIPTOR_TYPE_SAMPLER,
               .info.sampler = sampler,
            };
@@ -579,7 +580,7 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
         }
         case VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER: {
            VkDescriptorImageInfo *info = (VkDescriptorImageInfo *)pSrc;
-            desc[j] = (struct lvp_descriptor) {
+            desc[idx] = (struct lvp_descriptor) {
               .type = entry->descriptorType,
               .info.iview = lvp_image_view_from_handle(info->imageView),
               .info.sampler = lvp_sampler_from_handle(info->sampler),
@@ -591,7 +592,7 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
         case VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT: {
            LVP_FROM_HANDLE(lvp_image_view, iview,
                            ((VkDescriptorImageInfo *)pSrc)->imageView);
-            desc[j] = (struct lvp_descriptor) {
+            desc[idx] = (struct lvp_descriptor) {
               .type = entry->descriptorType,
               .info.iview = iview,
            };
@@ -601,7 +602,7 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
         case VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER: {
            LVP_FROM_HANDLE(lvp_buffer_view, bview,
                            *(VkBufferView *)pSrc);
-            desc[j] = (struct lvp_descriptor) {
+            desc[idx] = (struct lvp_descriptor) {
               .type = entry->descriptorType,
               .info.buffer_view = bview,
            };
@@ -613,7 +614,7 @@ void lvp_UpdateDescriptorSetWithTemplate(VkDevice _device,
         case VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC:
         case VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC: {
            VkDescriptorBufferInfo *info = (VkDescriptorBufferInfo *)pSrc;
-            desc[j] = (struct lvp_descriptor) {
+            desc[idx] = (struct lvp_descriptor) {
               .type = entry->descriptorType,
               .info.offset = info->offset,
               .info.buffer = lvp_buffer_from_handle(info->buffer),
--- a/src/gallium/frontends/lavapipe/lvp_execute.c
+++ b/src/gallium/frontends/lavapipe/lvp_execute.c
@@ -1722,16 +1722,24 @@ static void handle_copy_image(struct lvp_cmd_buffer_entry *cmd,
      struct pipe_box src_box;
      src_box.x = copycmd->regions[i].srcOffset.x;
      src_box.y = copycmd->regions[i].srcOffset.y;
-      src_box.z = copycmd->regions[i].srcOffset.z + copycmd->regions[i].srcSubresource.baseArrayLayer;
      src_box.width = copycmd->regions[i].extent.width;
      src_box.height = copycmd->regions[i].extent.height;
-      src_box.depth = copycmd->regions[i].extent.depth;
+      if (copycmd->src->bo->target == PIPE_TEXTURE_3D) {
+         src_box.depth = copycmd->regions[i].extent.depth;
+         src_box.z = copycmd->regions[i].srcOffset.z;
+      } else {
+         src_box.depth = copycmd->regions[i].srcSubresource.layerCount;
+         src_box.z = copycmd->regions[i].srcSubresource.baseArrayLayer;
+      }

+      unsigned dstz = copycmd->dst->bo->target == PIPE_TEXTURE_3D ?
+                      copycmd->regions[i].dstOffset.z :
+                      copycmd->regions[i].dstSubresource.baseArrayLayer;
      state->pctx->resource_copy_region(state->pctx, copycmd->dst->bo,
                                        copycmd->regions[i].dstSubresource.mipLevel,
                                        copycmd->regions[i].dstOffset.x,
                                        copycmd->regions[i].dstOffset.y,
-                                        copycmd->regions[i].dstOffset.z + copycmd->regions[i].dstSubresource.baseArrayLayer,
+                                        dstz,
                                        copycmd->src->bo,
                                        copycmd->regions[i].srcSubresource.mipLevel,
                                        &src_box);
@@ -2096,7 +2104,7 @@ static void handle_copy_query_pool_results(struct lvp_cmd_buffer_entry *cmd,
   struct lvp_query_pool *pool = copycmd->pool;

   for (unsigned i = copycmd->first_query; i < copycmd->first_query + copycmd->query_count; i++) {
-      unsigned offset = copycmd->dst->offset + (copycmd->stride * (i - copycmd->first_query));
+      unsigned offset = copycmd->dst_offset + copycmd->dst->offset + (copycmd->stride * (i - copycmd->first_query));
      if (pool->queries[i]) {
         if (copycmd->flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT)
            state->pctx->get_query_result_resource(state->pctx,
@@ -2106,21 +2114,35 @@ static void handle_copy_query_pool_results(struct lvp_cmd_buffer_entry *cmd,
                                                   -1,
                                                   copycmd->dst->bo,
                                                   offset + (copycmd->flags & VK_QUERY_RESULT_64_BIT ? 8 : 4));
-         state->pctx->get_query_result_resource(state->pctx,
-                                                pool->queries[i],
-                                                copycmd->flags & VK_QUERY_RESULT_WAIT_BIT,
-                                                copycmd->flags & VK_QUERY_RESULT_64_BIT ? PIPE_QUERY_TYPE_U64 : PIPE_QUERY_TYPE_U32,
-                                                0,
-                                                copycmd->dst->bo,
-                                                offset);
+         if (pool->type == VK_QUERY_TYPE_PIPELINE_STATISTICS) {
+            unsigned num_results = 0;
+            unsigned result_size = copycmd->flags & VK_QUERY_RESULT_64_BIT ? 8 : 4;
+            u_foreach_bit(bit, pool->pipeline_stats)
+               state->pctx->get_query_result_resource(state->pctx,
+                                                      pool->queries[i],
+                                                      copycmd->flags & VK_QUERY_RESULT_WAIT_BIT,
+                                                      copycmd->flags & VK_QUERY_RESULT_64_BIT ? PIPE_QUERY_TYPE_U64 : PIPE_QUERY_TYPE_U32,
+                                                      bit,
+                                                      copycmd->dst->bo,
+                                                      offset + num_results++ * result_size);
+         } else {
+            state->pctx->get_query_result_resource(state->pctx,
+                                                   pool->queries[i],
+                                                   copycmd->flags & VK_QUERY_RESULT_WAIT_BIT,
+                                                   copycmd->flags & VK_QUERY_RESULT_64_BIT ? PIPE_QUERY_TYPE_U64 : PIPE_QUERY_TYPE_U32,
+                                                   0,
+                                                   copycmd->dst->bo,
+                                                   offset);
+         }
      } else {
         /* if no queries emitted yet, just reset the buffer to 0 so avail is reported correctly */
         if (copycmd->flags & VK_QUERY_RESULT_WITH_AVAILABILITY_BIT) {
            struct pipe_transfer *src_t;
            uint32_t *map;

-            struct pipe_box box = {};
-            box.width = copycmd->stride * copycmd->query_count;
+            struct pipe_box box = {0};
+            box.x = offset;
+            box.width = copycmd->stride;
            box.height = 1;
            box.depth = 1;
            map = state->pctx->transfer_map(state->pctx,
--- a/src/gallium/frontends/va/image.c
+++ b/src/gallium/frontends/va/image.c
@@ -696,6 +696,7 @@ vlVaPutImage(VADriverContextP ctx, VASurfaceID surface, VAImageID image,
         }
      }
   }
+   drv->pipe->flush(drv->pipe, NULL, 0);
   mtx_unlock(&drv->mutex);

   return VA_STATUS_SUCCESS;
--- a/src/gallium/targets/pipe-loader/pipe_kmsro.c
+++ b/src/gallium/targets/pipe-loader/pipe_kmsro.c
@@ -2,3 +2,5 @@
 #include "target-helpers/inline_debug_helper.h"
 #include "frontend/drm_driver.h"
 #include "kmsro/drm/kmsro_drm_public.h"
+#define GALLIUM_KMSRO_ONLY
+#include "target-helpers/drm_helper.h"
--- a/src/gbm/backends/dri/gbm_dri.c
+++ b/src/gbm/backends/dri/gbm_dri.c
@@ -486,10 +486,13 @@ dri_screen_create_sw(struct gbm_dri_device *dri)
      return -errno;

   ret = dri_screen_create_dri2(dri, driver_name);
-   if (ret == 0)
+   if (ret != 0)
+      ret = dri_screen_create_swrast(dri);
+   if (ret != 0)
      return ret;

-   return dri_screen_create_swrast(dri);
+   dri->software = true;
+   return 0;
 }

 static const struct gbm_dri_visual gbm_dri_visuals_table[] = {
--- a/src/gbm/backends/dri/gbm_driint.h
+++ b/src/gbm/backends/dri/gbm_driint.h
@@ -63,6 +63,7 @@ struct gbm_dri_device {

   void *driver;
   char *driver_name; /* Name of the DRI module, without the _dri suffix */
+   bool software; /* A software driver was loaded */

   __DRIscreen *screen;
   __DRIcontext *context;
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -367,7 +367,8 @@ is_logic_op(enum opcode opcode)
 }

 static bool
-can_take_stride(fs_inst *inst, unsigned arg, unsigned stride,
+can_take_stride(fs_inst *inst, brw_reg_type dst_type,
+                unsigned arg, unsigned stride,
                const gen_device_info *devinfo)
 {
   if (stride > 4)
@@ -377,9 +378,9 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned stride,
    * of the corresponding channel of the destination, and the provided stride
    * would break this restriction.
    */
-   if (has_dst_aligned_region_restriction(devinfo, inst) &&
+   if (has_dst_aligned_region_restriction(devinfo, inst, dst_type) &&
       !(type_sz(inst->src[arg].type) * stride ==
-           type_sz(inst->dst.type) * inst->dst.stride ||
+           type_sz(dst_type) * inst->dst.stride ||
         stride == 0))
      return false;

@@ -528,10 +529,15 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
   if (instruction_requires_packed_data(inst) && entry_stride != 1)
      return false;

+   const brw_reg_type dst_type = (has_source_modifiers &&
+                                  entry->dst.type != inst->src[arg].type) ?
+      entry->dst.type : inst->dst.type;
+
   /* Bail if the result of composing both strides would exceed the
    * hardware limit.
    */
-   if (!can_take_stride(inst, arg, entry_stride * inst->src[arg].stride,
+   if (!can_take_stride(inst, dst_type, arg,
+                        entry_stride * inst->src[arg].stride,
                        devinfo))
      return false;

--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -549,7 +549,8 @@ is_unordered(const fs_inst *inst)
 */
 static inline bool
 has_dst_aligned_region_restriction(const gen_device_info *devinfo,
-                                   const fs_inst *inst)
+                                   const fs_inst *inst,
+                                   brw_reg_type dst_type)
 {
   const brw_reg_type exec_type = get_exec_type(inst);
   /* Even though the hardware spec claims that "integer DWord multiply"
@@ -563,13 +564,20 @@ has_dst_aligned_region_restriction(const gen_device_info *devinfo,
       (inst->opcode == BRW_OPCODE_MAD &&
        MIN2(type_sz(inst->src[1].type), type_sz(inst->src[2].type)) >= 4));

-   if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
+   if (type_sz(dst_type) > 4 || type_sz(exec_type) > 4 ||
       (type_sz(exec_type) == 4 && is_dword_multiply))
      return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
   else
      return false;
 }

+static inline bool
+has_dst_aligned_region_restriction(const gen_device_info *devinfo,
+                                   const fs_inst *inst)
+{
+   return has_dst_aligned_region_restriction(devinfo, inst, inst->dst.type);
+}
+
 /**
 * Return whether the LOAD_PAYLOAD instruction is a plain copy of bits from
 * the specified register file into a VGRF.
--- a/src/intel/tools/tests/run-test.py
+++ b/src/intel/tools/tests/run-test.py
@@ -7,7 +7,6 @@ import os
 import pathlib
 import subprocess
 import sys
-import tempfile

 # The meson version handles windows paths better, but if it's not available
 # fall back to shlex
@@ -37,18 +36,17 @@ success = True
 for asm_file in args.gen_folder.glob('*.asm'):
    expected_file = asm_file.stem + '.expected'
    expected_path = args.gen_folder / expected_file
-    out_path = tempfile.NamedTemporaryFile()

    try:
        command = i965_asm + [
            '--type', 'hex',
            '--gen', args.gen_name,
-            '--output', out_path.name,
            asm_file
        ]
-        subprocess.run(command,
-                       stdout=subprocess.DEVNULL,
-                       stderr=subprocess.STDOUT)
+        with subprocess.Popen(command,
+                              stdout=subprocess.PIPE,
+                              stderr=subprocess.DEVNULL) as cmd:
+            lines_after = [line.decode('ascii') for line in cmd.stdout.readlines()]
    except OSError as e:
        if e.errno == errno.ENOEXEC:
            print('Skipping due to inability to run host binaries.',
@@ -58,7 +56,6 @@ for asm_file in args.gen_folder.glob('*.asm'):

    with expected_path.open() as f:
        lines_before = f.readlines()
-    lines_after = [line.decode('ascii') for line in out_path]

    diff = ''.join(difflib.unified_diff(lines_before, lines_after,
                                        expected_file, asm_file.stem + '.out'))
--- a/src/mesa/state_tracker/st_nir_lower_tex_src_plane.c
+++ b/src/mesa/state_tracker/st_nir_lower_tex_src_plane.c
@@ -139,7 +139,7 @@ lower_tex_src_plane_block(nir_builder *b, lower_tex_src_state *state, nir_block
         if (tex_index >= 0 && samp_index >= 0) {
            b->cursor = nir_before_instr(&tex->instr);

-            nir_variable* samp = find_sampler(state, plane[0].i32);
+            nir_variable* samp = find_sampler(state, tex->sampler_index);
            assert(samp);

            nir_deref_instr *tex_deref_instr = nir_build_deref_var(b, samp);
--- a/src/mesa/state_tracker/st_program.c
+++ b/src/mesa/state_tracker/st_program.c
@@ -1321,7 +1321,7 @@ st_create_fp_variant(struct st_context *st,
                   key->external.lower_yuv)) {
         NIR_PASS_V(state.ir.nir, st_nir_lower_tex_src_plane,
                    ~stfp->Base.SamplersUsed,
-                    key->external.lower_nv12 || key->external.lower_xy_uxvx ||
+                    key->external.lower_nv12 | key->external.lower_xy_uxvx |
                       key->external.lower_yx_xuxv,
                    key->external.lower_iyuv);
         finalize = true;
--- a/src/util/bitscan.h
+++ b/src/util/bitscan.h
@@ -104,6 +104,11 @@ u_bit_scan(unsigned *mask)
   return i;
 }

+#define u_foreach_bit(b, dword)                          \
+   for (uint32_t __dword = (dword), b;                     \
+        ((b) = ffs(__dword) - 1, __dword);      \
+        __dword &= ~(1 << (b)))
+
 static inline int
 u_bit_scan64(uint64_t *mask)
 {
@@ -112,6 +117,11 @@ u_bit_scan64(uint64_t *mask)
   return i;
 }

+#define u_foreach_bit64(b, dword)                          \
+   for (uint64_t __dword = (dword), b;                     \
+        ((b) = ffsll(__dword) - 1, __dword);      \
+        __dword &= ~(1ull << (b)))
+
 /* Determine if an unsigned value is a power of two.
 *
 * \note
--- a/src/util/set.c
+++ b/src/util/set.c
@@ -165,7 +165,7 @@ key_u32_equals(const void *a, const void *b)
 struct set *
 _mesa_set_create_u32_keys(void *mem_ctx)
 {
-   return _mesa_set_create(NULL, key_u32_hash, key_u32_equals);
+   return _mesa_set_create(mem_ctx, key_u32_hash, key_u32_equals);
 }

 struct set *
--- a/src/util/u_cpu_detect.c
+++ b/src/util/u_cpu_detect.c
@@ -444,20 +444,14 @@ get_cpu_topology(void)
       util_cpu_caps.family < CPU_AMD_LAST) {
      uint32_t regs[4];

-      /* Query the L3 cache count. */
-      cpuid_count(0x8000001D, 3, regs);
-      unsigned cache_level = (regs[0] >> 5) & 0x7;
-      unsigned cores_per_L3 = ((regs[0] >> 14) & 0xfff) + 1;
-
-      if (cache_level != 3 || cores_per_L3 == util_cpu_caps.nr_cpus)
-         return;
-
      uint32_t saved_mask[UTIL_MAX_CPUS / 32] = {0};
      uint32_t mask[UTIL_MAX_CPUS / 32] = {0};
-      uint32_t allowed_mask[UTIL_MAX_CPUS / 32] = {0};
-      uint32_t apic_id[UTIL_MAX_CPUS];
      bool saved = false;

+      uint32_t L3_found[UTIL_MAX_CPUS] = {0};
+      uint32_t num_L3_caches = 0;
+      util_affinity_mask *L3_affinity_masks = NULL;
+
      /* Query APIC IDs from each CPU core.
       *
       * An APIC ID is a logical ID of the CPU with respect to the cache
@@ -484,39 +478,58 @@ get_cpu_topology(void)
                                              !saved ? saved_mask : NULL,
                                              util_cpu_caps.num_cpu_mask_bits)) {
            saved = true;
-            allowed_mask[i / 32] |= cpu_bit;

            /* Query the APIC ID of the current core. */
            cpuid(0x00000001, regs);
-            apic_id[i] = regs[1] >> 24;
+            unsigned apic_id = regs[1] >> 24;
+
+            /* Query the total core count for the CPU */
+            uint32_t core_count = 1;
+            if (regs[3] & (1 << 28))
+               core_count = (regs[1] >> 16) & 0xff;
+
+            core_count = util_next_power_of_two(core_count);
+
+            /* Query the L3 cache count. */
+            cpuid_count(0x8000001D, 3, regs);
+            unsigned cache_level = (regs[0] >> 5) & 0x7;
+            unsigned cores_per_L3 = ((regs[0] >> 14) & 0xfff) + 1;
+
+            if (cache_level != 3)
+               continue;
+
+            unsigned local_core_id = apic_id & (core_count - 1);
+            unsigned phys_id = (apic_id & ~(core_count - 1)) >> util_logbase2(core_count);
+            unsigned local_l3_cache_index = local_core_id / util_next_power_of_two(cores_per_L3);
+#define L3_ID(p, i) (p << 16 | i << 1 | 1);
+
+            unsigned l3_id = L3_ID(phys_id, local_l3_cache_index);
+            int idx = -1;
+            for (unsigned c = 0; c < num_L3_caches; c++) {
+               if (L3_found[c] == l3_id) {
+                  idx = c;
+                  break;
+               }
+            }
+            if (idx == -1) {
+               idx = num_L3_caches;
+               L3_found[num_L3_caches++] = l3_id;
+               L3_affinity_masks = realloc(L3_affinity_masks, sizeof(util_affinity_mask) * num_L3_caches);
+               if (!L3_affinity_masks)
+                  return;
+               memset(&L3_affinity_masks[num_L3_caches - 1], 0, sizeof(util_affinity_mask));
+            }
+            util_cpu_caps.cpu_to_L3[i] = idx;
+            L3_affinity_masks[idx][i / 32] |= cpu_bit;
+
         }
         mask[i / 32] = 0;
      }

+      util_cpu_caps.num_L3_caches = num_L3_caches;
+      util_cpu_caps.L3_affinity_mask = L3_affinity_masks;
+
      if (saved) {
-
-         /* We succeeded in using at least one CPU. */
-         util_cpu_caps.num_L3_caches = util_cpu_caps.nr_cpus / cores_per_L3;
-         util_cpu_caps.cores_per_L3 = cores_per_L3;
-         util_cpu_caps.L3_affinity_mask = calloc(sizeof(util_affinity_mask),
-                                                 util_cpu_caps.num_L3_caches);
-
-         for (unsigned i = 0; i < util_cpu_caps.nr_cpus && i < UTIL_MAX_CPUS;
-              i++) {
-            uint32_t cpu_bit = 1u << (i % 32);
-
-            if (allowed_mask[i / 32] & cpu_bit) {
-               /* Each APIC ID bit represents a topology level, so we need
-                * to round up to the next power of two.
-                */
-               unsigned L3_index = apic_id[i] /
-                                   util_next_power_of_two(cores_per_L3);
-
-               util_cpu_caps.L3_affinity_mask[L3_index][i / 32] |= cpu_bit;
-               util_cpu_caps.cpu_to_L3[i] = L3_index;
-            }
-         }
-
         if (debug_get_option_dump_cpu()) {
            fprintf(stderr, "CPU <-> L3 cache mapping:\n");
            for (unsigned i = 0; i < util_cpu_caps.num_L3_caches; i++) {
Author	SHA1	Message	Date
Dylan Baker	7419e553db	VERSION: bump for 21.0.2 release	2021-04-07 09:35:30 -07:00
Dylan Baker	ebe8cfc3ec	docs: add release notes for 21.0.2	2021-04-07 09:35:07 -07:00
Boyuan Zhang	759ce9f053	frontend/va/image: add pipe flush for vlVaPutImage To fix synchronization issue between multimedia queue and gfx queue. Adding flush call will let multimedia queue to wait for the content of gfx command buffer to be executed, for the case where there is dependency between these two queues. Fixes: `2f50dea218` ("radeonsi: always use a staging texture for linear 1D textures in VRAM") Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit `27209e63ea`) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9995>	2021-04-06 18:55:42 +00:00
Dave Airlie	a3a2783237	drisw: move zink down the list below the sw drivers. We don't ever want drisw path picking zink as the driver, we can revisit this when the penny wrapper work gets further along. This selection causes systems with nvidia/intel dual-gpus to try and pick the intel gpu for rendering in the nvidia context if there is no nvidia GL driver or accel doesn't work. This is a partial revert of the original commit. Fixes: `4a3b42a717` ("drisw: Prefer hardware-layered sw-winsys drivers over pure sw") Acked-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9816> (cherry picked from commit `3e1698fe1b`)	2021-04-06 09:41:56 -07:00
Bas Nieuwenhuizen	b96b1db389	radv: Flush caches for shader read operations. As part of the fmask expand we very much read from the images as well ... Fixes: `8f8d72af55` ("radv: Use access helpers for flushing with meta operations.") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10042> (cherry picked from commit `57511d1458`)	2021-04-06 09:41:56 -07:00
Pierre-Eric Pelloux-Prayer	882d47fae4	mesa/st: fix st_nir_lower_tex_src_plane arguments st_nir_lower_tex_src_plane expects a mask, no a boolean. CC: mesa-stable Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9931> (cherry picked from commit `72c54713aa`)	2021-04-06 09:41:56 -07:00
Pierre-Eric Pelloux-Prayer	1665f478ac	nir/lower_tex: ignore texture_index if tex_instr has deref src texture_index is meaningless when a tex_instr has deref src. Use var->data.binding instead. This fixes the incorrect lowering on radeonsi where the same lowering steps was applied to all tex_instr based on the needs of the first one (since texture_index is always 0). CC: mesa-stable Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9931> (cherry picked from commit `bc438c91d9`)	2021-04-06 09:41:55 -07:00
Adrian Ratiu	6a0f0a34fe	docs: docker: minor stale documentation fix Commits like the following changed the script names and distro tag but didn't update the documentation. We do not explicitely mention script names because they will likely change in the future but the distro tag is less likely to change because it is shared with the upstream ci-templates repo. Fixes: `af7dca3560` ("ci: Update the ci-templates commit.") Fixes: `506e9d5fc7` ("gitlab-ci: Rename container install scripts to ...") Fixes: `c6c7652753` ("gitlab-ci: Organize images using new REPO_SUFFIX ...") Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9781> (cherry picked from commit `8371b75241`)	2021-04-06 09:41:55 -07:00
Marek Olšák	11585bb003	radeonsi: disable sparse buffers on gfx7-8 Cc: 20.3 21.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795> (cherry picked from commit `8ea685dfc0`)	2021-04-06 09:41:55 -07:00
Marek Olšák	816fd2cf5f	ac/llvm: don't set unsupported xnack options to fix LLVM crashes on gfx6-8 LLVM prints an error if xnack is unsupported and it uses a global stream object that is not thread-safe. Since Mesa uses multiple threads to compile shaders, there is a small chance that it will crash. Just don't set any xnack options to use LLVM defaults. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4439 Cc: 20.3 21.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795> (cherry picked from commit `ac78b12e23`)	2021-04-06 09:41:55 -07:00
Dylan Baker	a1328ea781	.pick_status.json: Update to `1e0a69afa7`	2021-04-06 09:41:55 -07:00
Tapani Pälli	c5c7d6a05a	iris: clamp PointWidth in 3DSTATE_SF like i965 does Values match how MinimumPointWidth, MaximumPointWidth is setup. This fixes assert hit in debug build when packing the struct with too large value for genxml. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9942> (cherry picked from commit `b2af419391`)	2021-04-06 09:41:55 -07:00
Charmaine Lee	99a47874de	gallivm: increase size of texture target enum bitfield Need to bump up the size of texture target bitfield for MSVC. Fixes: `0ce7c4a7c9` ("gallivm: Use the proper enum for the texture target bitfield.") Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9928> (cherry picked from commit `a442e3ff55`)	2021-04-06 09:41:55 -07:00
Dylan Baker	ed60dec381	.pick_status.json: Update to `fb5615af40`	2021-04-06 09:41:55 -07:00
Erik Faye-Lund	d30cea2b9b	compiler/glsl: avoid null-pointer deref When we encounter a bindless image here, lower_deref returns a NULL-pointer, and calling record_images_used will try to dereference that NULL-pointer. So let's dig out the var from the source instruction instead of the result of the lowering. Fixes: `5910c938a2` ("nir/glsl: gather bitmask of images used by program") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9895> (cherry picked from commit `89a04a54c4`)	2021-03-30 11:06:52 -07:00
Icecream95	5bcbe14854	pipe-loader,gallium/drm: Fix the kmsro pipe_loader target Include drm_helper.h to define the driver descriptor again, but with a new define GALLIUM_KMSRO_ONLY to disable defining descriptors for the drivers that kmsro uses. Fixes clinfo on Panfrost. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4002 Fixes: `9ec28b8d22` ("gallium/drm: Deduplicate screen creation for the dynamic (clover) pipe loader.") Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9380> (cherry picked from commit `06a883cfe5`)	2021-03-30 09:43:06 -07:00
Lionel Landwerlin	da38b604e3	intel/fs/copy_prop: check stride constraints with actual final type In some cases we will change the type of the destination register of an instruction. This is the type we should use to verify that we're allow to do the replacement. Otherwise we can hit restrictions on CHV and upcoming Xe-Hp for instance where the copy propagation transforms this : send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD mov(16) vgrf11:UW, vgrf10<2>:UW mov(16) vgrf12:UW, vgrf10+0.2<2>:UW mov(16) vgrf15:HF, \|vgrf11\|:HF mov(16) vgrf16:HF, \|vgrf12\|:HF mov(8) vgrf41<2>:UW, vgrf15+0.0:UW group0 mov(8) vgrf42<2>:UW, vgrf15+0.16:UW group8 mov(8) vgrf45<2>:UW, vgrf16+0.0:UW group0 mov(8) vgrf46<2>:UW, vgrf16+0.16:UW group8 into this : send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD mov(8) vgrf41<2>:HF, \|vgrf10+0.0\|<2>:HF group0 mov(8) vgrf42<2>:HF, \|vgrf10+1.0\|<2>:HF group8 mov(8) vgrf45<2>:HF, \|vgrf10+0.2\|<2>:HF group0 mov(8) vgrf46<2>:HF, \|vgrf10+1.2\|<2>:HF group8 Because of the floating point use, stride and offets should be the same. v2: Fix final destination type selection (Curro) v3: constify (Curro) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9832> (cherry picked from commit `aa53665fda`)	2021-03-30 09:43:05 -07:00
Gert Wollny	540172fa43	r600: don't set an index_bias for indirect draw calls The indirect draw call already encodes the index bias so that no additional encoding in the hardware is needed in this case. This fixes a regression with a number of tests from dEQP-GLES31.functional.draw_indirect.random.* Fixes: `c6c532faa8` "gallium/u_vbuf: use updated pipe_draw_start_count while using draw_vbo" Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9877> (cherry picked from commit `acdf1a1234`)	2021-03-30 09:43:04 -07:00
Dylan Baker	ca86b94e55	.pick_status.json: Update to `3c64c090e0`	2021-03-30 09:42:48 -07:00
Dave Airlie	fe9e25b29a	util: rework AMD cpu L3 cache affinity code. This changes how the L3 cache affinity code works out the affinity masks. It works better with multi-CPU systems and should also be capable of handling big/little type situations if they appear in the future. It now iterates over all CPU cores, gets the core count for each CPU, and works out the L3_ID from the physical CPU ID, and the current cores L3 cache. It then tracks how many L3 caches it has seen and reallocate the affinity masks for each one. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4496 Fixes: `d8ea509965` ("util: completely rewrite and do AMD Zen L3 cache pinning correctly") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9782> (cherry picked from commit `11d2db17c5`)	2021-03-29 10:08:35 -07:00
Mike Blumenkrantz	b6123cd4d5	lavapipe: fix array texture region copies these need to use different struct members for copying array textures the buffer2image variants are already doing the right thing Fixes: `b38879f8c5` ("vallium: initial import of the vulkan frontend") Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9761> (cherry picked from commit `dfe9bfef9b`)	2021-03-29 10:08:30 -07:00
Dylan Baker	f5444d504a	.pick_status.json: Update to `ee14bec09a`	2021-03-29 10:08:20 -07:00
Tony Wasserka	5de93ffed8	aco/isel: Don't emit unsupported i16<->f16 conversion opcodes on GFX6/7 Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: `b86305bb57` ("nir/algebraic: collapse conversion opcodes (many patterns)") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4357 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597> (cherry picked from commit `436922c84a`)	2021-03-26 09:52:47 -07:00
Simon Ser	9a439ebcac	Revert "egl: Don't add hardware device if there is no render node v2." This reverts commit `5743a36b2b`. Now that _eglAddDevice is always called with the correct software hint, no need to bail out if the device doesn't have a render node. On split render/display SoCs, the DRM device won't have a render node, yet rendering is hardware-accelerated (via kmsro). Signed-off-by: Simon Ser <contact@emersion.fr> Fixes: `5743a36b2b` ("egl: Don't add hardware device if there is no render node v2.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4178 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9697> (cherry picked from commit `1d349a6484`)	2021-03-26 09:52:45 -07:00
Simon Ser	f1ec9335a8	egl: only take render nodes into account when listing DRM devices We don't want to expose an EGL device for a display-only DRM devices (like VKMS). For these DRM devices we have a separate software-rendering device (the first in the list, always present). There is a similar check in _eglAddDRMDevice, however it will be removed in a future commit to allow split render/display devices to be properly added. We can't figure out whether we're on a split render/display system before loading the driver. Signed-off-by: Simon Ser <contact@emersion.fr> Fixes: `5743a36b2b` ("egl: Don't add hardware device if there is no render node v2.") Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9697> (cherry picked from commit `e39d72aec2`)	2021-03-26 09:52:44 -07:00
Simon Ser	4e4962b464	egl: fix software flag in _eglAddDevice call on DRM On the EGL DRM platform, call _eglAddDevice with the software flag set if GBM has loaded a software driver. This allows _eglAddDevice to make the difference between llvmpipe and kmsro. This is important on split render/display SoCs: we don't want to advertise EGL_MESA_device_software on these systems. Completely drop disp->Options.ForceSoftware, because GBM is responsible for choosing software rendering and doesn't take this hint into account. Signed-off-by: Simon Ser <contact@emersion.fr> Fixes: `5743a36b2b` ("egl: Don't add hardware device if there is no render node v2.") References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4178 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9697> (cherry picked from commit `08a51770bd`)	2021-03-26 09:52:43 -07:00
Pierre-Eric Pelloux-Prayer	bca2aa6e48	mesa/st: fix lower_tex_src_plane in multiple samplers scenario "plane[0].i32" is the plane being lowered, it's not the sampler we're looking for. It worked when there's a single sampler because, eg for NV12, plane[0].i32 for the UV plane would be 1 and the added ":uv" sampler would also land at binding point 1. Fixes: `079e5f73d7` ("mesa/st: rewrite src var when lowering tex_src_plane") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9812> (cherry picked from commit `6298347ec7`)	2021-03-26 09:52:43 -07:00
Dylan Baker	d4e0e7c0f0	.pick_status.json: Update to `a7c0cf500b`	2021-03-26 09:52:35 -07:00
Icecream95	ffd661d50b	panfrost: Disable early-z when alpha test is used Fixes rendering artefacts in Minetest on Midgard. Fixes: `275277a2b4` ("panfrost: Implement alpha testing natively") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9676> (cherry picked from commit `ae62fb3737`) Conflicts: src/gallium/drivers/panfrost/pan_cmdstream.c	2021-03-25 11:00:53 -07:00
Mike Blumenkrantz	a6a79fb31e	lavapipe: fix CmdCopyQueryPoolResults for partial pipeline statistics queries if this isn't a query for all pipeline statistics, the bits that are set need to be individually copied in increasing order Fixes: `b38879f8c5` ("vallium: initial import of the vulkan frontend") Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9813> (cherry picked from commit `4ad5bfd1bd`)	2021-03-25 10:38:01 -07:00
Mike Blumenkrantz	090239c244	util/bitscan: add u_foreach_bit macros this is a standardized (and very slightly improved for usability) version of the macro that has been copied into every vulkan driver includes fixup from Rob Clark <robclark@freedesktop.org> Reviewed-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9191> (cherry picked from commit `e7c7150d63`)	2021-03-25 10:37:59 -07:00
Rhys Perry	2ac46f95bd	aco: implement image_deref_samples It used to be that this intrinsic was never created and texture instructions were always used. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Fixes: `50881d59e6` ("compiler/spirv: fix image sample queries") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9686> (cherry picked from commit `27e2f82f17`)	2021-03-25 10:36:42 -07:00
Mike Blumenkrantz	f0b620307e	lavapipe: use the passed offset for CmdCopyQueryPoolResults this avoids overwriting buffer[0] on every copy Fixes: `b38879f8c5` ("vallium: initial import of the vulkan frontend") Reviewed-by: Dave Airlie <airlied@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9813> (cherry picked from commit `e20aebb83c`)	2021-03-25 10:36:42 -07:00
Dylan Baker	8d32c55d93	.pick_status.json: Mark `75951a44ee` as backported	2021-03-25 10:36:42 -07:00
Mike Blumenkrantz	2733a9c712	util/set: stop leaking u32 key sets which pass a mem ctx Fixes: `10a7682413` ("util: add _mesa_set_create_u32_keys where keys are not pointers") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9810> (cherry picked from commit `5ecad3cb44`)	2021-03-25 09:43:56 -07:00
Dylan Baker	3260a85b5c	.pick_status.json: Update to `8e43abcd2c`	2021-03-25 09:43:56 -07:00
Michel Dänzer	aa8bff051e	Revert "glsl/test: Don't run whitespace tests in parallel" This reverts commit `c60cea0daa`. Didn't have the intended effect, and slowed down the meson test run. Reviewed-by: Dylan Baker <dylan.c.baker@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9528> (cherry picked from commit `5057f14cba`)	2021-03-24 15:51:13 -07:00
Michel Dänzer	8d9ec9cd11	intel/tools: Use subprocess.Popen to read output directly from a pipe Instead of using tempfiles to communicate between child & parent process. The latter sometimes resulted in hitting the meson timeout if there was high filesystem pressure. Fixes: `ccaa5b034f` "intel/tools: rewrite run-test.sh in python" Reviewed-by: Dylan Baker <dylan.c.baker@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9528> (cherry picked from commit `05bf12ccb6`)	2021-03-24 15:51:13 -07:00
Dave Airlie	e37442f1b8	lavapipe: fix templated descriptor updates The template path was buggy but CTS only tested it with Vulkan 1.1 enabled. It was just missing the dstArrayElement offset. Fixes: `41f7fa273d` ("lavapipe: add support for VK_KHR_descriptor_update_template") Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9675> (cherry picked from commit `833847603b`)	2021-03-24 15:51:06 -07:00
Dylan Baker	770b0185ab	.pick_status.json: Update to `9be24c89c8`	2021-03-24 15:51:01 -07:00
Dylan Baker	63267e018d	docs: Add 21.0.1 hashes	2021-03-24 15:50:15 -07:00
@@ -1 +1 @@
 .0.1
 .0.2