Bump version for 20.2.2 release

docs: add release notes for 20.2.2
blorp: allow blits with floating point source layers
2020-11-06 15:40:35 -08:00 · 2020-11-06 15:40:06 -08:00 · 2020-11-02 07:48:29 -08:00 · 2020-11-02 07:48:26 -08:00 · 2020-11-02 07:48:26 -08:00 · 2020-11-02 07:48:26 -08:00
39 changed files with 6267 additions and 99 deletions
--- a/.pick_status.json
+++ b/.pick_status.json
--- a/2
+++ b/2
@@ -1 +1 @@
-20.2.1
+20.2.2
--- a/docs/relnotes/20.2.1.rst
+++ b/docs/relnotes/20.2.1.rst
@@ -19,7 +19,7 @@ SHA256 checksum

 ::

-    TBD.
+    d1a46d9a3f291bc0e0374600bdcb59844fa3eafaa50398e472a36fc65fd0244a  mesa-20.2.1.tar.xz


 New features
--- a/docs/relnotes/20.2.2.rst
+++ b/docs/relnotes/20.2.2.rst
@@ -0,0 +1,147 @@
+Mesa 20.2.2 Release Notes / 2020-11-06
+======================================
+
+Mesa 20.2.2 is a bug fix release which fixes bugs found since the 20.2.1 release.
+
+Mesa 20.2.2 implements the OpenGL 4.6 API, but the version reported by
+glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
+glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
+Some drivers don't support all the features required in OpenGL 4.6. OpenGL
+4.6 is **only** available if requested at context creation.
+Compatibility contexts may report a lower version depending on each driver.
+
+Mesa 20.2.2 implements the Vulkan 1.2 API, but the version reported by
+the apiVersion property of the VkPhysicalDeviceProperties struct
+depends on the particular driver being used.
+
+SHA256 checksum
+---------------
+
+::
+
+    TBD.
+
+
+New features
+------------
+
+- None
+
+
+Bug fixes
+---------
+
+- anv: dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.3d* failures
+- anv: dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.3d* failures
+- radv/aco: Vertex explosion on RPCS3
+- Gnome 3.38 with Xwayland has screen corruption for X11 apps.
+- RADV: Death Stranding glitchy sky rendering
+- Crash in glDrawArrays on Intel iris
+- deinterlace_vaapi=rate=field does not double output's actual frame rate on AMD
+- Steam game Haydee leans on implementation-dependent behavior
+- vc4 in 20.2-rc has regression causing app to crash
+- [RADV/ACO] Star Citizen Lighting/Shadow Issue
+
+
+Changes
+-------
+
+Bas Nieuwenhuizen (3):
+
+- radv: Fix 1D compressed mipmaps on GFX9.
+- radv: Do not access set layout during vkCmdBindDescriptorSets.
+- radv: Fix variable name collision.
+
+Dave Airlie (1):
+
+- gallivm: zero init the temporary register storage.
+
+Dylan Baker (9):
+
+- docs: add SHA256 sums for 20.2.1
+- .pick_status.json: Update to f29c81f863c9879a6a87724cbdae1e1818f3f6b4
+- .pick_status.json: Update to aea74eac3d7706ed8d870504b163356e3f104a4c
+- .pick_status.json: Update to 7c5129985bcac75053823a31674e8a1e2629230c
+- .pick_status.json: Update to 3c87ac1f60875b5bbd4facca22fc426ee747997a
+- .pick_status.json: Update to d0f8fe5909107aa342f62813ced9ce535ed6da32
+- .pick_status.json: Update to 025050bae73d0598d788e3c307328670a3bf51c1
+- .pick_status.json: Update to b92eadb29cc8ef09096d9196434d49e35a3eccaf
+- .pick_status.json: Update to 8077f3f4c4a3d8007caa30eed93fed1c6bbf3c5a
+
+Jose Maria Casanova Crespo (2):
+
+- vc4: Add missing load_ubo set_align in yuv_blit fs.
+- vc4: Enable nir_lower_io for uniforms
+
+Lionel Landwerlin (3):
+
+- intel/dev: Bump Max EU per subslice/dualsubslice
+- anv: fix source/destination layers for 3D blits
+- blorp: allow blits with floating point source layers
+
+Lucas Stach (2):
+
+- etnaviv: drm: fix BO refcount race
+- etnaviv: blt: properly program surface TS offset for clears
+
+Marcin Ślusarz (2):
+
+- vulkan/wsi: fix possible random stalls in wsi_display_wait_for_event
+- intel/tools: fix invalid type in argument to printf
+
+Marek Olšák (2):
+
+- Revert "radeonsi/gfx10: disable vertex grouping"
+- winsys/amdgpu: remove incorrect assertion check against max_check_space_size
+
+Michael Tretter (1):
+
+- etnaviv: free tgsi tokens when shader state is deleted
+
+Michel Dänzer (3):
+
+- loader/dri3: Only allocate additional buffers if needed
+- loader/dri3: Keep current number of back buffers if frame was skipped
+- loader/dri3: Allocate up to 4 back buffers for page flips
+
+Nanley Chery (3):
+
+- st/mesa: Add missing sentinels in format_map[]
+- intel/isl: Drop redundant unpack of unorm channels
+- isl: Fix the aux-map encoding for D24_UNORM_X8
+
+Rhys Perry (4):
+
+- nir/opt_load_store_vectorize: don't vectorize stores across demote
+- aco: add missing SCC clobber in get_buffer_size
+- aco: update phi_map in add_subdword_operand()
+- aco: ignore the ACO-inserted continue in create_continue_phis()
+
+Rob Clark (1):
+
+- freedreno: Disallow tiled if SHARED and not QCOM_COMPRESSED
+
+Ryan Neph (1):
+
+- virgl: Fixes portal2 binary name in tweak config
+
+Samuel Pitoiset (1):
+
+- aco: fix determining if LOD is zero for nir_texop_txf/nir_texop_txs
+
+Tapani Pälli (2):
+
+- gallivm/nir: handle nir_op_flt in lp_build_nir_llvm
+- iris: fix the order of src and dst for fence memcpy
+
+Thong Thai (1):
+
+- frontends/va/postproc: Un-break field flag
+
+Timothy Arceri (1):
+
+- glsl: relax rule on varying matching for shaders older than 4.00
+
+Tony Wasserka (1):
+
+- aco/isel: Always export position data from VS/NGG
--- a/src/amd/compiler/aco_instruction_selection.cpp
+++ b/src/amd/compiler/aco_instruction_selection.cpp
@@ -6194,7 +6194,7 @@ void get_buffer_size(isel_context *ctx, Temp desc, Temp dst, bool in_elements)
      Temp size = emit_extract_vector(ctx, desc, 2, s1);

      Temp size_div3 = bld.vop3(aco_opcode::v_mul_hi_u32, bld.def(v1), bld.copy(bld.def(v1), Operand(0xaaaaaaabu)), size);
-      size_div3 = bld.sop2(aco_opcode::s_lshr_b32, bld.def(s1), bld.as_uniform(size_div3), Operand(1u));
+      size_div3 = bld.sop2(aco_opcode::s_lshr_b32, bld.def(s1), bld.def(s1, scc), bld.as_uniform(size_div3), Operand(1u));

      Temp stride = emit_extract_vector(ctx, desc, 1, s1);
      stride = bld.sop2(aco_opcode::s_bfe_u32, bld.def(s1), bld.def(s1, scc), stride, Operand((5u << 16) | 16u));
@@ -8514,9 +8514,7 @@ void visit_tex(isel_context *ctx, nir_tex_instr *instr)
         has_bias = true;
         break;
      case nir_tex_src_lod: {
-         nir_const_value *val = nir_src_as_const_value(instr->src[i].src);
-
-         if (val && val->f32 <= 0.0) {
+         if (nir_src_is_const(instr->src[i].src) && nir_src_as_uint(instr->src[i].src) == 0) {
            level_zero = true;
         } else {
            lod = get_ssa_temp(ctx, instr->src[i].src.ssa);
@@ -9433,7 +9431,7 @@ static Operand create_continue_phis(isel_context *ctx, unsigned first, unsigned
         continue;
      }

-      if (block.kind & block_kind_continue) {
+      if ((block.kind & block_kind_continue) && block.index != last) {
         vals[idx - first] = header_phi->operands[next_pred];
         next_pred++;
         continue;
@@ -10083,6 +10081,11 @@ static void create_vs_exports(isel_context *ctx)
      ctx->outputs.temps[VARYING_SLOT_LAYER * 4u] = as_vgpr(ctx, get_arg(ctx, ctx->args->ac.view_index));
   }

+   /* Hardware requires position data to always be exported, even if the
+    * application did not write gl_Position.
+    */
+   ctx->outputs.mask[VARYING_SLOT_POS] = 0xf;
+
   /* the order these position exports are created is important */
   int next_pos = 0;
   bool exported_pos = export_vs_varying(ctx, VARYING_SLOT_POS, true, &next_pos);
--- a/src/amd/compiler/aco_register_allocation.cpp
+++ b/src/amd/compiler/aco_register_allocation.cpp
@@ -38,8 +38,10 @@
 namespace aco {
 namespace {

+struct ra_ctx;
+
 unsigned get_subdword_operand_stride(chip_class chip, const aco_ptr<Instruction>& instr, unsigned idx, RegClass rc);
-void add_subdword_operand(chip_class chip, aco_ptr<Instruction>& instr, unsigned idx, unsigned byte, RegClass rc);
+void add_subdword_operand(ra_ctx& ctx, aco_ptr<Instruction>& instr, unsigned idx, unsigned byte, RegClass rc);
 std::pair<unsigned, unsigned> get_subdword_definition_info(Program *program, const aco_ptr<Instruction>& instr, RegClass rc);
 void add_subdword_definition(Program *program, aco_ptr<Instruction>& instr, unsigned idx, PhysReg reg, bool is_partial);

@@ -352,8 +354,22 @@ unsigned get_subdword_operand_stride(chip_class chip, const aco_ptr<Instruction>
   return 4;
 }

-void add_subdword_operand(chip_class chip, aco_ptr<Instruction>& instr, unsigned idx, unsigned byte, RegClass rc)
+void update_phi_map(ra_ctx& ctx, Instruction *old, Instruction *instr)
 {
+   for (Operand& op : instr->operands) {
+      if (!op.isTemp())
+         continue;
+      std::unordered_map<unsigned, phi_info>::iterator phi = ctx.phi_map.find(op.tempId());
+      if (phi != ctx.phi_map.end()) {
+         phi->second.uses.erase(old);
+         phi->second.uses.emplace(instr);
+      }
+   }
+}
+
+void add_subdword_operand(ra_ctx& ctx, aco_ptr<Instruction>& instr, unsigned idx, unsigned byte, RegClass rc)
+{
+   chip_class chip = ctx.program->chip_class;
   if (instr->format == Format::PSEUDO || byte == 0)
      return;

@@ -376,7 +392,9 @@ void add_subdword_operand(chip_class chip, aco_ptr<Instruction>& instr, unsigned
      }
      return;
   } else if (can_use_SDWA(chip, instr)) {
-      convert_to_SDWA(chip, instr);
+      aco_ptr<Instruction> tmp = convert_to_SDWA(chip, instr);
+      if (tmp)
+         update_phi_map(ctx, tmp.get(), instr.get());
      return;
   } else if (rc.bytes() == 2 && can_use_opsel(chip, instr->opcode, idx, byte / 2)) {
      VOP3A_instruction *vop3 = static_cast<VOP3A_instruction *>(instr.get());
@@ -2233,7 +2251,7 @@ void register_allocation(Program *program, std::vector<TempSet>& live_out_per_bl
            if (op.isTemp() && op.isFirstKill() && op.isLateKill())
               register_file.clear(op);
            if (op.isTemp() && op.physReg().byte() != 0)
-               add_subdword_operand(program->chip_class, instr, i, op.physReg().byte(), op.regClass());
+               add_subdword_operand(ctx, instr, i, op.physReg().byte(), op.regClass());
         }

         /* emit parallelcopy */
@@ -2366,19 +2384,9 @@ void register_allocation(Program *program, std::vector<TempSet>& live_out_per_bl
            aco_ptr<Instruction> tmp = std::move(instr);
            Format format = asVOP3(tmp->format);
            instr.reset(create_instruction<VOP3A_instruction>(tmp->opcode, format, tmp->operands.size(), tmp->definitions.size()));
-            for (unsigned i = 0; i < instr->operands.size(); i++) {
-               Operand& operand = tmp->operands[i];
-               instr->operands[i] = operand;
-               /* keep phi_map up to date */
-               if (operand.isTemp()) {
-                  std::unordered_map<unsigned, phi_info>::iterator phi = ctx.phi_map.find(operand.tempId());
-                  if (phi != ctx.phi_map.end()) {
-                     phi->second.uses.erase(tmp.get());
-                     phi->second.uses.emplace(instr.get());
-                  }
-               }
-            }
+            std::copy(tmp->operands.begin(), tmp->operands.end(), instr->operands.begin());
            std::copy(tmp->definitions.begin(), tmp->definitions.end(), instr->definitions.begin());
+            update_phi_map(ctx, tmp.get(), instr.get());
         }

         instructions.emplace_back(std::move(*it));
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -3842,7 +3842,6 @@ radv_bind_descriptor_set(struct radv_cmd_buffer *cmd_buffer,
 	radv_set_descriptor_set(cmd_buffer, bind_point, set, idx);

 	assert(set);
-	assert(!(set->layout->flags & VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR));

 	if (!cmd_buffer->device->use_global_bo_list) {
 		for (unsigned j = 0; j < set->buffer_count; ++j)
@@ -3873,17 +3872,17 @@ void radv_CmdBindDescriptorSets(
 		radv_get_descriptors_state(cmd_buffer, pipelineBindPoint);

 	for (unsigned i = 0; i < descriptorSetCount; ++i) {
-		unsigned idx = i + firstSet;
+		unsigned set_idx = i + firstSet;
 		RADV_FROM_HANDLE(radv_descriptor_set, set, pDescriptorSets[i]);

 		/* If the set is already bound we only need to update the
 		 * (potentially changed) dynamic offsets. */
-		if (descriptors_state->sets[idx] != set ||
-		    !(descriptors_state->valid & (1u << idx))) {
-			radv_bind_descriptor_set(cmd_buffer, pipelineBindPoint, set, idx);
+		if (descriptors_state->sets[set_idx] != set ||
+		    !(descriptors_state->valid & (1u << set_idx))) {
+			radv_bind_descriptor_set(cmd_buffer, pipelineBindPoint, set, set_idx);
 		}

-		for(unsigned j = 0; j < set->layout->dynamic_offset_count; ++j, ++dyn_idx) {
+		for(unsigned j = 0; j < layout->set[set_idx].dynamic_offset_count; ++j, ++dyn_idx) {
 			unsigned idx = j + layout->set[i + firstSet].dynamic_offset_start;
 			uint32_t *dst = descriptors_state->dynamic_buffers + idx * 4;
 			assert(dyn_idx < dynamicOffsetCount);
@@ -3912,8 +3911,7 @@ void radv_CmdBindDescriptorSets(
 				}
 			}

-			cmd_buffer->push_constant_stages |=
-			                     set->layout->dynamic_shader_stages;
+			cmd_buffer->push_constant_stages |= layout->set[set_idx].dynamic_offset_stages;
 		}
 	}
 }
--- a/src/amd/vulkan/radv_descriptor_set.c
+++ b/src/amd/vulkan/radv_descriptor_set.c
@@ -432,10 +432,16 @@ VkResult radv_CreatePipelineLayout(
 		layout->set[set].layout = set_layout;

 		layout->set[set].dynamic_offset_start = dynamic_offset_count;
+		layout->set[set].dynamic_offset_count = 0;
+		layout->set[set].dynamic_offset_stages = 0;
+
 		for (uint32_t b = 0; b < set_layout->binding_count; b++) {
-			dynamic_offset_count += set_layout->binding[b].array_size * set_layout->binding[b].dynamic_offset_count;
-			dynamic_shader_stages |= set_layout->dynamic_shader_stages;
+			layout->set[set].dynamic_offset_count +=
+				set_layout->binding[b].array_size * set_layout->binding[b].dynamic_offset_count;
+			layout->set[set].dynamic_offset_stages |= set_layout->dynamic_shader_stages;
 		}
+		dynamic_offset_count += layout->set[set].dynamic_offset_count;
+		dynamic_shader_stages |= layout->set[set].dynamic_offset_stages;
 		_mesa_sha1_update(&ctx, set_layout, set_layout->layout_size);
 	}

--- a/src/amd/vulkan/radv_descriptor_set.h
+++ b/src/amd/vulkan/radv_descriptor_set.h
@@ -89,7 +89,9 @@ struct radv_pipeline_layout {
   struct {
      struct radv_descriptor_set_layout *layout;
      uint32_t size;
-      uint32_t dynamic_offset_start;
+      uint16_t dynamic_offset_start;
+      uint16_t dynamic_offset_count;
+      VkShaderStageFlags dynamic_offset_stages;
   } set[MAX_SETS];

   uint32_t num_sets;
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -1597,6 +1597,11 @@ radv_image_view_init(struct radv_image_view *iview,
 	iview->aspect_mask = pCreateInfo->subresourceRange.aspectMask;
 	iview->multiple_planes = vk_format_get_plane_count(image->vk_format) > 1 && iview->aspect_mask == VK_IMAGE_ASPECT_COLOR_BIT;

+	iview->base_layer = range->baseArrayLayer;
+	iview->layer_count = radv_get_layerCount(image, range);
+	iview->base_mip = range->baseMipLevel;
+	iview->level_count = radv_get_levelCount(image, range);
+
 	iview->vk_format = pCreateInfo->format;

 	/* If the image has an Android external format, pCreateInfo->format will be
@@ -1652,21 +1657,43 @@ radv_image_view_init(struct radv_image_view *iview,
 		 *
 		 * This means that mip2 will be missing texels.
 		 *
-		 * Fix it by taking the actual extent addrlib assigned to the base mip level.
+		 * Fix this by calculating the base mip's width and height, then convert
+		 * that, and round it back up to get the level 0 size. Clamp the
+		 * converted size between the original values, and the physical extent
+		 * of the base mipmap.
+		 *
+		 * On GFX10 we have to take care to not go over the physical extent
+		 * of the base mipmap as otherwise the GPU computes a different layout.
+		 * Note that the GPU does use the same base-mip dimensions for both a
+		 * block compatible format and the compressed format, so even if we take
+		 * the plain converted dimensions the physical layout is correct.
 		 */
 		if (device->physical_device->rad_info.chip_class >= GFX9 &&
-		     vk_format_is_compressed(image->vk_format) &&
-		     !vk_format_is_compressed(iview->vk_format) &&
-		     iview->image->info.levels > 1) {
-			iview->extent.width = iview->image->planes[0].surface.u.gfx9.base_mip_width;
-			iview->extent.height = iview->image->planes[0].surface.u.gfx9.base_mip_height;
-		}
-	}
+		    vk_format_is_compressed(image->vk_format) &&
+		    !vk_format_is_compressed(iview->vk_format)) {
+			/* If we have multiple levels in the view we should ideally take the last level,
+			 * but the mip calculation has a max(..., 1) so walking back to the base mip in an
+			 * useful way is hard. */
+			if (iview->level_count > 1) {
+				iview->extent.width = iview->image->planes[0].surface.u.gfx9.base_mip_width;
+				iview->extent.height = iview->image->planes[0].surface.u.gfx9.base_mip_height;
+			} else {
+				unsigned lvl_width  = radv_minify(image->info.width , range->baseMipLevel);
+				unsigned lvl_height = radv_minify(image->info.height, range->baseMipLevel);

-	iview->base_layer = range->baseArrayLayer;
-	iview->layer_count = radv_get_layerCount(image, range);
-	iview->base_mip = range->baseMipLevel;
-	iview->level_count = radv_get_levelCount(image, range);
+				lvl_width = round_up_u32(lvl_width * view_bw, img_bw);
+				lvl_height = round_up_u32(lvl_height * view_bh, img_bh);
+
+				lvl_width <<= range->baseMipLevel;
+				lvl_height <<= range->baseMipLevel;
+
+				iview->extent.width = CLAMP(lvl_width, iview->extent.width,
+							    iview->image->planes[0].surface.u.gfx9.base_mip_width);
+				iview->extent.height = CLAMP(lvl_height, iview->extent.height,
+							     iview->image->planes[0].surface.u.gfx9.base_mip_height);
+			}
+		 }
+	}

 	bool disable_compression = extra_create_info ? extra_create_info->disable_compression: false;
 	for (unsigned i = 0; i < (iview->multiple_planes ? vk_format_get_plane_count(image->vk_format) : 1); ++i) {
--- a/src/compiler/glsl/link_varyings.cpp
+++ b/src/compiler/glsl/link_varyings.cpp
@@ -875,10 +875,40 @@ cross_validate_outputs_to_inputs(struct gl_context *ctx,
            /* Check for input vars with unmatched output vars in prev stage
             * taking into account that interface blocks could have a matching
             * output but with different name, so we ignore them.
+             *
+             * Section 4.3.4 (Inputs) of the GLSL 4.10 specifications say:
+             *
+             *   "Only the input variables that are actually read need to be
+             *    written by the previous stage; it is allowed to have
+             *    superfluous declarations of input variables."
+             *
+             * However it's not defined anywhere as to how we should handle
+             * inputs that are not written in the previous stage and it's not
+             * clear what "actually read" means.
+             *
+             * The GLSL 4.20 spec however is much clearer:
+             *
+             *    "Only the input variables that are statically read need to
+             *     be written by the previous stage; it is allowed to have
+             *     superfluous declarations of input variables."
+             *
+             * It also has a table that states it is an error to statically
+             * read an input that is not defined in the previous stage. While
+             * it is not an error to not statically write to the output (it
+             * just needs to be defined to not be an error).
+             *
+             * The text in the GLSL 4.20 spec was an attempt to clarify the
+             * previous spec iterations. However given the difference in spec
+             * and that some applications seem to depend on not erroring when
+             * the input is not actually read in control flow we only apply
+             * this rule to GLSL 4.00 and higher. GLSL 4.00 was chosen as
+             * a 3.30 shader is the highest version of GLSL we have seen in
+             * the wild dependant on the less strict interpretation.
             */
            assert(!input->data.assigned);
            if (input->data.used && !input->get_interface_type() &&
-                !input->data.explicit_location)
+                !input->data.explicit_location &&
+                (prog->data->Version >= (prog->IsES ? 0 : 400)))
               linker_error(prog,
                            "%s shader input `%s' "
                            "has no matching output in the previous stage\n",
--- a/src/compiler/nir/nir_opt_load_store_vectorize.c
+++ b/src/compiler/nir/nir_opt_load_store_vectorize.c
@@ -1219,6 +1219,11 @@ handle_barrier(struct vectorize_ctx *ctx, bool *progress, nir_function_impl *imp
      case nir_intrinsic_discard:
         modes = nir_var_all;
         break;
+      case nir_intrinsic_demote_if:
+      case nir_intrinsic_demote:
+         acquire = false;
+         modes = nir_var_all;
+         break;
      case nir_intrinsic_memory_barrier_buffer:
         modes = nir_var_mem_ssbo | nir_var_mem_global;
         break;
--- a/src/compiler/nir/nir_schedule.c
+++ b/src/compiler/nir/nir_schedule.c
@@ -355,6 +355,8 @@ nir_schedule_intrinsic_deps(nir_deps_state *state,

   case nir_intrinsic_discard:
   case nir_intrinsic_discard_if:
+   case nir_intrinsic_demote:
+   case nir_intrinsic_demote_if:
      /* We are adding two dependencies:
       *
       * * A individual one that we could use to add a read_dep while handling
--- a/src/etnaviv/drm/etnaviv_bo.c
+++ b/src/etnaviv/drm/etnaviv_bo.c
@@ -257,11 +257,15 @@ void etna_bo_del(struct etna_bo *bo)

 	struct etna_device *dev = bo->dev;

-	if (!p_atomic_dec_zero(&bo->refcnt))
-		return;
-
 	pthread_mutex_lock(&etna_drm_table_lock);

+	/* Must test under table lock to avoid racing with the from_dmabuf/name
+	 * paths, which rely on the BO refcount to be stable over the lookup, so
+	 * they can grab a reference when the BO is found in the hash.
+	 */
+	if (!p_atomic_dec_zero(&bo->refcnt))
+	   goto out;
+
 	if (bo->reuse && (etna_bo_cache_free(&dev->bo_cache, bo) == 0))
 		goto out;

--- a/src/gallium/auxiliary/gallivm/lp_bld_nir.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_nir.c
@@ -555,6 +555,7 @@ static LLVMValueRef do_alu_action(struct lp_build_nir_context *bld_base,
   case nir_op_flog2:
      result = lp_build_log2_safe(&bld_base->base, src[0]);
      break;
+   case nir_op_flt:
   case nir_op_flt32:
      result = fcmp32(bld_base, PIPE_FUNC_LESS, src_bit_size[0], src);
      break;
@@ -1975,8 +1976,8 @@ bool lp_build_nir_llvm(

   nir_foreach_register(reg, &func->impl->registers) {
      LLVMTypeRef type = get_register_type(bld_base, reg);
-      LLVMValueRef reg_alloc = lp_build_alloca_undef(bld_base->base.gallivm,
-                                                     type, "reg");
+      LLVMValueRef reg_alloc = lp_build_alloca(bld_base->base.gallivm,
+                                               type, "reg");
      _mesa_hash_table_insert(bld_base->regs, reg, reg_alloc);
   }
   nir_index_ssa_defs(func->impl);
--- a/src/gallium/drivers/etnaviv/etnaviv_blt.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_blt.c
@@ -229,7 +229,7 @@ etna_blit_clear_color_blt(struct pipe_context *pctx, struct pipe_surface *dst,
   if (surf->surf.ts_size) {
      clr.dest.use_ts = 1;
      clr.dest.ts_addr.bo = res->ts_bo;
-      clr.dest.ts_addr.offset = 0;
+      clr.dest.ts_addr.offset = surf->level->ts_offset;
      clr.dest.ts_addr.flags = ETNA_RELOC_WRITE;
      clr.dest.ts_clear_value[0] = new_clear_value;
      clr.dest.ts_clear_value[1] = new_clear_value >> 32;
@@ -308,7 +308,7 @@ etna_blit_clear_zs_blt(struct pipe_context *pctx, struct pipe_surface *dst,
   if (surf->surf.ts_size) {
      clr.dest.use_ts = 1;
      clr.dest.ts_addr.bo = res->ts_bo;
-      clr.dest.ts_addr.offset = 0;
+      clr.dest.ts_addr.offset = surf->level->ts_offset;
      clr.dest.ts_addr.flags = ETNA_RELOC_WRITE;
      clr.dest.ts_clear_value[0] = surf->level->clear_value;
      clr.dest.ts_clear_value[1] = surf->level->clear_value;
--- a/src/gallium/drivers/etnaviv/etnaviv_shader.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_shader.c
@@ -445,6 +445,7 @@ etna_delete_shader_state(struct pipe_context *pctx, void *ss)
         etna_destroy_shader(t);
   }

+   tgsi_free_tokens(shader->tokens);
   ralloc_free(shader->nir);
   FREE(shader);
 }
--- a/src/gallium/drivers/freedreno/freedreno_resource.c
+++ b/src/gallium/drivers/freedreno/freedreno_resource.c
@@ -933,8 +933,12 @@ fd_resource_create_with_modifiers(struct pipe_screen *pscreen,
 	 * should.)
 	 */
 	bool allow_ubwc = drm_find_modifier(DRM_FORMAT_MOD_INVALID, modifiers, count);
-	if (tmpl->bind & PIPE_BIND_SHARED)
+	if (tmpl->bind & PIPE_BIND_SHARED) {
 		allow_ubwc = drm_find_modifier(DRM_FORMAT_MOD_QCOM_COMPRESSED, modifiers, count);
+		if (!allow_ubwc) {
+			linear = true;
+		}
+	}

 	allow_ubwc &= !(fd_mesa_debug & FD_DBG_NOUBWC);

--- a/src/gallium/drivers/iris/iris_fence.c
+++ b/src/gallium/drivers/iris/iris_fence.c
@@ -154,7 +154,7 @@ clear_stale_syncobjs(struct iris_batch *batch)

      if (syncobj != nth_syncobj) {
         *syncobj = *nth_syncobj;
-         memcpy(nth_fence, fence, sizeof(*fence));
+         memcpy(fence, nth_fence, sizeof(*fence));
      }
   }
 }
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -668,23 +668,25 @@ static void gfx10_emit_ge_cntl(struct si_context *sctx, unsigned num_patches)
   if (sctx->ngg) {
      if (sctx->tes_shader.cso) {
         ge_cntl = S_03096C_PRIM_GRP_SIZE(num_patches) |
-                   S_03096C_VERT_GRP_SIZE(256) | /* 256 = disable vertex grouping */
+                   S_03096C_VERT_GRP_SIZE(0) |
                   S_03096C_BREAK_WAVE_AT_EOI(key.u.tess_uses_prim_id);
      } else {
         ge_cntl = si_get_vs_state(sctx)->ge_cntl;
      }
   } else {
      unsigned primgroup_size;
-      unsigned vertgroup_size = 256; /* 256 = disable vertex grouping */
-      ;
+      unsigned vertgroup_size;

      if (sctx->tes_shader.cso) {
         primgroup_size = num_patches; /* must be a multiple of NUM_PATCHES */
+         vertgroup_size = 0;
      } else if (sctx->gs_shader.cso) {
         unsigned vgt_gs_onchip_cntl = sctx->gs_shader.current->ctx_reg.gs.vgt_gs_onchip_cntl;
         primgroup_size = G_028A44_GS_PRIMS_PER_SUBGRP(vgt_gs_onchip_cntl);
+         vertgroup_size = G_028A44_ES_VERTS_PER_SUBGRP(vgt_gs_onchip_cntl);
      } else {
         primgroup_size = 128; /* recommended without a GS and tess */
+         vertgroup_size = 0;
      }

      ge_cntl = S_03096C_PRIM_GRP_SIZE(primgroup_size) | S_03096C_VERT_GRP_SIZE(vertgroup_size) |
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1243,7 +1243,7 @@ static void gfx10_shader_ngg(struct si_screen *sscreen, struct si_shader *shader
                        S_03096C_VERT_GRP_SIZE(shader->ngg.max_gsprims + 2);
   } else {
      shader->ge_cntl = S_03096C_PRIM_GRP_SIZE(shader->ngg.max_gsprims) |
-                        S_03096C_VERT_GRP_SIZE(256) | /* 256 = disable vertex grouping */
+                        S_03096C_VERT_GRP_SIZE(shader->ngg.hw_max_esverts) |
                        S_03096C_BREAK_WAVE_AT_EOI(break_wave_at_eoi);

      /* Bug workaround for a possible hang with non-tessellation cases.
--- a/src/gallium/drivers/vc4/vc4_blit.c
+++ b/src/gallium/drivers/vc4/vc4_blit.c
@@ -299,6 +299,7 @@ static void *vc4_get_yuv_fs(struct pipe_context *pctx, int cpp)
   nir_ssa_dest_init(&load->instr, &load->dest, load->num_components, 32, NULL);
   load->src[0] = nir_src_for_ssa(one);
   load->src[1] = nir_src_for_ssa(nir_iadd(&b, x_offset, y_offset));
+   nir_intrinsic_set_align(load,  4, 0);
   nir_builder_instr_insert(&b, &load->instr);

   nir_store_var(&b, color_out,
--- a/src/gallium/drivers/vc4/vc4_program.c
+++ b/src/gallium/drivers/vc4/vc4_program.c
@@ -2472,7 +2472,8 @@ vc4_shader_state_create(struct pipe_context *pctx,
        if (s->info.stage == MESA_SHADER_VERTEX)
                NIR_PASS_V(s, nir_lower_point_size, 1.0f, 0.0f);

-        NIR_PASS_V(s, nir_lower_io, nir_var_shader_in | nir_var_shader_out,
+        NIR_PASS_V(s, nir_lower_io,
+                   nir_var_shader_in | nir_var_shader_out | nir_var_uniform,
                   type_size, (nir_lower_io_options)0);

        NIR_PASS_V(s, nir_lower_regs_to_ssa);
--- a/src/gallium/frontends/va/postproc.c
+++ b/src/gallium/frontends/va/postproc.c
@@ -321,7 +321,7 @@ vlVaHandleVAProcPipelineParameterBufferType(vlVaDriver *drv, vlVaContext *contex
         VAProcFilterParameterBufferDeinterlacing *deint = buf->data;
         switch (deint->algorithm) {
         case VAProcDeinterlacingBob:
-            if (deint->flags & VA_DEINTERLACING_BOTTOM_FIELD_FIRST)
+            if (deint->flags & VA_DEINTERLACING_BOTTOM_FIELD)
               deinterlace = VL_COMPOSITOR_BOB_BOTTOM;
            else
               deinterlace = VL_COMPOSITOR_BOB_TOP;
@@ -333,7 +333,7 @@ vlVaHandleVAProcPipelineParameterBufferType(vlVaDriver *drv, vlVaContext *contex

         case VAProcDeinterlacingMotionAdaptive:
            src = vlVaApplyDeint(drv, context, param, src,
-				 !!(deint->flags & VA_DEINTERLACING_BOTTOM_FIELD_FIRST));
+				 !!(deint->flags & VA_DEINTERLACING_BOTTOM_FIELD));
            break;

         default:
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -810,7 +810,6 @@ static bool amdgpu_get_new_ib(struct amdgpu_winsys *ws, struct amdgpu_cs *cs,

   ib_size = ib->big_ib_buffer->size - ib->used_ib_space;
   ib->base.current.max_dw = ib_size / 4 - amdgpu_cs_epilog_dws(cs);
-   assert(ib->base.current.max_dw >= ib->max_check_space_size / 4);
   ib->base.gpu_address = info->va_start;
   return true;
 }
@@ -1178,7 +1177,6 @@ static bool amdgpu_cs_check_space(struct radeon_cmdbuf *rcs, unsigned dw,

   ib->base.current.buf = (uint32_t*)(ib->ib_mapped + ib->used_ib_space);
   ib->base.current.max_dw = ib->big_ib_buffer->size / 4 - cs_epilog_dw;
-   assert(ib->base.current.max_dw >= ib->max_check_space_size / 4);
   ib->base.gpu_address = va;

   amdgpu_cs_add_buffer(&cs->main.base, ib->big_ib_buffer,
--- a/src/intel/blorp/blorp.c
+++ b/src/intel/blorp/blorp.c
@@ -63,7 +63,7 @@ void
 brw_blorp_surface_info_init(struct blorp_context *blorp,
                            struct brw_blorp_surface_info *info,
                            const struct blorp_surf *surf,
-                            unsigned int level, unsigned int layer,
+                            unsigned int level, float layer,
                            enum isl_format format, bool is_render_target)
 {
   memset(info, 0, sizeof(*info));
--- a/src/intel/blorp/blorp.h
+++ b/src/intel/blorp/blorp.h
@@ -133,7 +133,7 @@ enum blorp_filter {
 void
 blorp_blit(struct blorp_batch *batch,
           const struct blorp_surf *src_surf,
-           unsigned src_level, unsigned src_layer,
+           unsigned src_level, float src_layer,
           enum isl_format src_format, struct isl_swizzle src_swizzle,
           const struct blorp_surf *dst_surf,
           unsigned dst_level, unsigned dst_layer,
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -56,7 +56,7 @@ brw_blorp_blit_vars_init(nir_builder *b, struct brw_blorp_blit_vars *v,
   LOAD_INPUT(discard_rect, glsl_vec4_type())
   LOAD_INPUT(rect_grid, glsl_vec4_type())
   LOAD_INPUT(coord_transform, glsl_vec4_type())
-   LOAD_INPUT(src_z, glsl_uint_type())
+   LOAD_INPUT(src_z, glsl_float_type())
   LOAD_INPUT(src_offset, glsl_vector_type(GLSL_TYPE_UINT, 2))
   LOAD_INPUT(dst_offset, glsl_vector_type(GLSL_TYPE_UINT, 2))
   LOAD_INPUT(src_inv_size, glsl_vector_type(GLSL_TYPE_FLOAT, 2))
@@ -154,8 +154,13 @@ blorp_create_nir_tex_instr(nir_builder *b, struct brw_blorp_blit_vars *v,
    * more explicit in the future.
    */
   assert(pos->num_components >= 2);
-   pos = nir_vec3(b, nir_channel(b, pos, 0), nir_channel(b, pos, 1),
-                     nir_load_var(b, v->v_src_z));
+   if (op == nir_texop_txf || op == nir_texop_txf_ms || op == nir_texop_txf_ms_mcs) {
+      pos = nir_vec3(b, nir_channel(b, pos, 0), nir_channel(b, pos, 1),
+                        nir_f2i32(b, nir_load_var(b, v->v_src_z)));
+   } else {
+      pos = nir_vec3(b, nir_channel(b, pos, 0), nir_channel(b, pos, 1),
+                        nir_load_var(b, v->v_src_z));
+   }

   tex->src[0].src_type = nir_tex_src_coord;
   tex->src[0].src = nir_src_for_ssa(pos);
@@ -2319,7 +2324,7 @@ do_blorp_blit(struct blorp_batch *batch,
 void
 blorp_blit(struct blorp_batch *batch,
           const struct blorp_surf *src_surf,
-           unsigned src_level, unsigned src_layer,
+           unsigned src_level, float src_layer,
           enum isl_format src_format, struct isl_swizzle src_swizzle,
           const struct blorp_surf *dst_surf,
           unsigned dst_level, unsigned dst_layer,
--- a/src/intel/blorp/blorp_priv.h
+++ b/src/intel/blorp/blorp_priv.h
@@ -61,7 +61,7 @@ struct brw_blorp_surface_info
   struct isl_view view;

   /* Z offset into a 3-D texture or slice of a 2-D array texture. */
-   uint32_t z_offset;
+   float z_offset;

   uint32_t tile_x_sa, tile_y_sa;
 };
@@ -70,7 +70,7 @@ void
 brw_blorp_surface_info_init(struct blorp_context *blorp,
                            struct brw_blorp_surface_info *info,
                            const struct blorp_surf *surf,
-                            unsigned int level, unsigned int layer,
+                            unsigned int level, float layer,
                            enum isl_format format, bool is_render_target);
 void
 blorp_surf_convert_to_single_slice(const struct isl_device *isl_dev,
@@ -148,7 +148,7 @@ struct brw_blorp_wm_inputs
   /* Minimum layer setting works for all the textures types but texture_3d
    * for which the setting has no effect. Use the z-coordinate instead.
    */
-   uint32_t src_z;
+   float src_z;

   /* Pad out to an integral number of registers */
   uint32_t pad[1];
--- a/src/intel/dev/gen_device_info.h
+++ b/src/intel/dev/gen_device_info.h
@@ -38,7 +38,7 @@ struct drm_i915_query_topology_info;

 #define GEN_DEVICE_MAX_SLICES           (6)  /* Maximum on gen10 */
 #define GEN_DEVICE_MAX_SUBSLICES        (8)  /* Maximum on gen11 */
-#define GEN_DEVICE_MAX_EUS_PER_SUBSLICE (10) /* Maximum on Haswell */
+#define GEN_DEVICE_MAX_EUS_PER_SUBSLICE (16) /* Maximum on gen12 */
 #define GEN_DEVICE_MAX_PIXEL_PIPES      (2)  /* Maximum on gen11 */

 /**
--- a/src/intel/isl/isl.c
+++ b/src/intel/isl/isl.c
@@ -2969,7 +2969,7 @@ isl_format_get_aux_map_encoding(enum isl_format format)
   case ISL_FORMAT_R32_SINT: return 0x12;
   case ISL_FORMAT_R32_UINT: return 0x13;
   case ISL_FORMAT_R32_FLOAT: return 0x11;
-   case ISL_FORMAT_R24_UNORM_X8_TYPELESS: return 0x11;
+   case ISL_FORMAT_R24_UNORM_X8_TYPELESS: return 0x13;
   case ISL_FORMAT_B5G6R5_UNORM: return 0xA;
   case ISL_FORMAT_B5G6R5_UNORM_SRGB: return 0xA;
   case ISL_FORMAT_B5G5R5A1_UNORM: return 0xA;
--- a/src/intel/isl/isl_format.c
+++ b/src/intel/isl/isl_format.c
@@ -1272,7 +1272,6 @@ unpack_channel(union isl_color_value *value,

   switch (layout->type) {
   case ISL_UNORM:
-      unpacked.f32 = _mesa_unorm_to_float(packed, layout->bits);
      if (colorspace == ISL_COLORSPACE_SRGB) {
         if (layout->bits == 8) {
            unpacked.f32 = util_format_srgb_8unorm_to_linear_float(packed);
--- a/src/intel/tools/i965_gram.y
+++ b/src/intel/tools/i965_gram.y
@@ -2185,7 +2185,7 @@ execsize:
 	| LPAREN exp2 RPAREN
 	{
 		if ($2 > 32 || !isPowerofTwo($2))
-			error(&@2, "Invalid execution size %d\n", $2);
+			error(&@2, "Invalid execution size %llu\n", $2);

 		$$ = cvt($2) - 1;
 	}
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -709,12 +709,19 @@ void anv_CmdBlitImage(
         }

         bool flip_z = flip_coords(&src_start, &src_end, &dst_start, &dst_end);
-         float src_z_step = (float)(src_end + 1 - src_start) /
-            (float)(dst_end + 1 - dst_start);
+         const unsigned num_layers = dst_end - dst_start;
+         float src_z_step = (float)(src_end - src_start) / (float)num_layers;
+
+         /* There is no interpolation to the pixel center during rendering, so
+          * add the 0.5 offset ourselves here. */
+         float depth_center_offset = 0;
+         if (src_image->type == VK_IMAGE_TYPE_3D)
+            depth_center_offset = 0.5 / num_layers * (src_end - src_start);

         if (flip_z) {
            src_start = src_end;
            src_z_step *= -1;
+            depth_center_offset *= -1;
         }

         unsigned src_x0 = pRegions[r].srcOffsets[0].x;
@@ -729,7 +736,6 @@ void anv_CmdBlitImage(
         unsigned dst_y1 = pRegions[r].dstOffsets[1].y;
         bool flip_y = flip_coords(&src_y0, &src_y1, &dst_y0, &dst_y1);

-         const unsigned num_layers = dst_end - dst_start;
         anv_cmd_buffer_mark_image_written(cmd_buffer, dst_image,
                                           1U << aspect_bit,
                                           dst.aux_usage,
@@ -738,7 +744,7 @@ void anv_CmdBlitImage(

         for (unsigned i = 0; i < num_layers; i++) {
            unsigned dst_z = dst_start + i;
-            unsigned src_z = src_start + i * src_z_step;
+            float src_z = src_start + i * src_z_step + depth_center_offset;

            blorp_blit(&batch, &src, src_res->mipLevel, src_z,
                       src_format.isl_format, src_format.swizzle,
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -272,12 +272,45 @@ dri3_fence_await(xcb_connection_t *c, struct loader_dri3_drawable *draw,
 }

 static void
-dri3_update_num_back(struct loader_dri3_drawable *draw)
+dri3_update_max_num_back(struct loader_dri3_drawable *draw)
 {
-   if (draw->last_present_mode == XCB_PRESENT_COMPLETE_MODE_FLIP)
-      draw->num_back = 3;
-   else
-      draw->num_back = 2;
+   switch (draw->last_present_mode) {
+   case XCB_PRESENT_COMPLETE_MODE_FLIP: {
+      int new_max;
+
+      if (draw->swap_interval == 0)
+         new_max = 4;
+      else
+         new_max = 3;
+
+      assert(new_max <= LOADER_DRI3_MAX_BACK);
+
+      if (new_max != draw->max_num_back) {
+         /* On transition from swap interval == 0 to != 0, start with two
+          * buffers again. Otherwise keep the current number of buffers. Either
+          * way, more will be allocated if needed.
+          */
+         if (new_max < draw->max_num_back)
+            draw->cur_num_back = 2;
+
+         draw->max_num_back = new_max;
+      }
+
+      break;
+   }
+
+   case XCB_PRESENT_COMPLETE_MODE_SKIP:
+      break;
+
+   default:
+      /* On transition from flips to copies, start with a single buffer again,
+       * a second one will be allocated if needed
+       */
+      if (draw->max_num_back != 2)
+         draw->cur_num_back = 1;
+
+      draw->max_num_back = 2;
+   }
 }

 void
@@ -395,7 +428,7 @@ loader_dri3_drawable_init(xcb_connection_t *conn,
   }
   draw->swap_interval = swap_interval;

-   dri3_update_num_back(draw);
+   dri3_update_max_num_back(draw);

   /* Create a new drawable */
   draw->dri_drawable =
@@ -643,6 +676,7 @@ dri3_find_back(struct loader_dri3_drawable *draw)
 {
   int b;
   int num_to_consider;
+   int max_num;

   mtx_lock(&draw->mtx);
   /* Increase the likelyhood of reusing current buffer */
@@ -651,15 +685,18 @@ dri3_find_back(struct loader_dri3_drawable *draw)
   /* Check whether we need to reuse the current back buffer as new back.
    * In that case, wait until it's not busy anymore.
    */
-   num_to_consider = draw->num_back;
   if (!loader_dri3_have_image_blit(draw) && draw->cur_blit_source != -1) {
      num_to_consider = 1;
+      max_num = 1;
      draw->cur_blit_source = -1;
+   } else {
+      num_to_consider = draw->cur_num_back;
+      max_num = draw->max_num_back;
   }

   for (;;) {
      for (b = 0; b < num_to_consider; b++) {
-         int id = LOADER_DRI3_BACK_ID((b + draw->cur_back) % draw->num_back);
+         int id = LOADER_DRI3_BACK_ID((b + draw->cur_back) % draw->cur_num_back);
         struct loader_dri3_buffer *buffer = draw->buffers[id];

         if (!buffer || !buffer->busy) {
@@ -668,7 +705,10 @@ dri3_find_back(struct loader_dri3_drawable *draw)
            return id;
         }
      }
-      if (!dri3_wait_for_event_locked(draw, NULL)) {
+
+      if (num_to_consider < max_num) {
+         num_to_consider = ++draw->cur_num_back;
+      } else if (!dri3_wait_for_event_locked(draw, NULL)) {
         mtx_unlock(&draw->mtx);
         return -1;
      }
@@ -2006,10 +2046,10 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable,
   if (!dri3_update_drawable(draw))
      return false;

-   dri3_update_num_back(draw);
+   dri3_update_max_num_back(draw);

   /* Free no longer needed back buffers */
-   for (buf_id = draw->num_back; buf_id < LOADER_DRI3_MAX_BACK; buf_id++) {
+   for (buf_id = draw->cur_num_back; buf_id < LOADER_DRI3_MAX_BACK; buf_id++) {
      if (draw->cur_blit_source != buf_id && draw->buffers[buf_id]) {
         dri3_free_render_buffer(draw, draw->buffers[buf_id]);
         draw->buffers[buf_id] = NULL;
--- a/src/loader/loader_dri3_helper.h
+++ b/src/loader/loader_dri3_helper.h
@@ -146,7 +146,8 @@ struct loader_dri3_drawable {

   struct loader_dri3_buffer *buffers[LOADER_DRI3_NUM_BUFFERS];
   int cur_back;
-   int num_back;
+   int cur_num_back;
+   int max_num_back;
   int cur_blit_source;

   uint32_t *stamp;
--- a/src/mesa/state_tracker/st_format.c
+++ b/src/mesa/state_tracker/st_format.c
@@ -234,19 +234,19 @@ static const struct format_mapping format_map[] = {
        DEFAULT_RGB_FORMATS }
   },
   {
-      { GL_RGB4 },
+      { GL_RGB4, 0 },
      { PIPE_FORMAT_B4G4R4X4_UNORM, PIPE_FORMAT_B4G4R4A4_UNORM,
        PIPE_FORMAT_A4B4G4R4_UNORM,
        DEFAULT_RGB_FORMATS }
   },
   {
-      { GL_RGB5 },
+      { GL_RGB5, 0 },
      { PIPE_FORMAT_B5G5R5X1_UNORM, PIPE_FORMAT_X1B5G5R5_UNORM,
        PIPE_FORMAT_B5G5R5A1_UNORM, PIPE_FORMAT_A1B5G5R5_UNORM,
        DEFAULT_RGB_FORMATS }
   },
   {
-      { GL_RGB565 },
+      { GL_RGB565, 0 },
      { PIPE_FORMAT_B5G6R5_UNORM, DEFAULT_RGB_FORMATS }
   },

--- a/src/util/00-mesa-defaults.conf
+++ b/src/util/00-mesa-defaults.conf
@@ -689,7 +689,7 @@ TODO: document the other workarounds.
            <option name="gles_emulate_bgra" value="true" />
            <option name="gles_apply_bgra_dest_swizzle" value="true"/>
        </application>
-        <application name="Portal 2" executable="hl2_linux">
+        <application name="Portal 2" executable="portal2_linux">
            <option name="gles_emulate_bgra" value="true" />
            <option name="gles_apply_bgra_dest_swizzle" value="true"/>
        </application>
--- a/src/vulkan/wsi/wsi_common_display.c
+++ b/src/vulkan/wsi/wsi_common_display.c
@@ -1209,8 +1209,8 @@ wsi_display_wait_thread(void *data)
      if (ret > 0) {
         pthread_mutex_lock(&wsi->wait_mutex);
         (void) drmHandleEvent(wsi->fd, &event_context);
-         pthread_mutex_unlock(&wsi->wait_mutex);
         pthread_cond_broadcast(&wsi->wait_cond);
+         pthread_mutex_unlock(&wsi->wait_mutex);
      }
   }
   return NULL;
@@ -1 +1 @@
 .2.1
 .2.2