docs: Update 11.2.0 release notes

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Update version to 11.2.0(final)
2016-04-04 12:41:40 +01:00 · 2016-04-04 12:41:40 +01:00 · 2016-04-04 12:41:40 +01:00 · 2016-04-04 12:41:40 +01:00 · 2016-04-04 12:41:39 +01:00 · 2016-04-04 12:41:39 +01:00
71 changed files with 732 additions and 222 deletions
--- a/2
+++ b/2
@@ -1 +1 @@
-11.2.0-rc2
+11.2.0
--- a/docs/relnotes/11.2.0.html
+++ b/docs/relnotes/11.2.0.html
@@ -14,7 +14,7 @@
 <iframe src="../contents.html"></iframe>
 <div class="content">

-<h1>Mesa 11.2.0 Release Notes / TBD</h1>
+<h1>Mesa 11.2.0 Release Notes / 4 April 2016</h1>

 <p>
 Mesa 11.2.0 is a new development release.
@@ -70,7 +70,217 @@ Note: some of the new features are only available with certain drivers.

 <h2>Bug fixes</h2>

-TBD.
+<ul>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=27512">Bug 27512</a> - Illegal instruction _mesa_x86_64_transform_points4_general</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=75165">Bug 75165</a> - compute.c:464:49: error: function definition is not allowed here</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=79783">Bug 79783</a> - Distorted output in obs-studio where other vendors &quot;work&quot;</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89330">Bug 89330</a> - piglit glsl-1.50 invariant-qualifier-in-out-block-01 regression</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=89969">Bug 89969</a> - nouveau: add support for chunk decoding in order to support vaapi (st/va)</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91806">Bug 91806</a> - configure does not test whether assembler supports sse4.1</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=91927">Bug 91927</a> - [SKL] [regression] piglit compressed textures tests fail  with kernel upgrade</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92193">Bug 92193</a> - [SKL] ES2-CTS.gtf.GL2ExtensionTests.compressed_astc_texture.compressed_astc_texture fails</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92229">Bug 92229</a> - [APITRACE] SOMA have serious graphical errors</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92233">Bug 92233</a> - Unigine Heaven 4.0 silhuette run</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92589">Bug 92589</a> - [BDW BSW SKL CTS] ES31-CTS.texture_gather.* GPU_HANG</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92595">Bug 92595</a> - [HSW,BDW,SKL][GLES 3.1 CTS] Big difference in the results for the ES31-CTS.shader_bitfield_operation.* tests</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92609">Bug 92609</a> - [BDW, BSW] piglit sampling-2d-array-as-2d-layer fails</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92687">Bug 92687</a> - Add support for ARB_internalformat_query2</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92706">Bug 92706</a> - glBlitFramebuffer refuses to blit RGBA to RGB with MSAA</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92709">Bug 92709</a> - &quot;LLVM triggered Diagnostic Handler: unsupported call to function ldexpf in main&quot; when starting race in stuntrally</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92743">Bug 92743</a> - Centroid shouldn't have to match between the FS and the VS</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92759">Bug 92759</a> - [Regression, bisected] Visuals without alpha bits are not sRGB-capable</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92849">Bug 92849</a> - [IVB HSW BDW] piglit image load/store load-from-cleared-image.shader_test fails</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93004">Bug 93004</a> - Guild Wars 2 crash on nouveau DX11 cards</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93048">Bug 93048</a> - [CTS regression] mesa af2723 breaks GL Conformance for debug extension</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93063">Bug 93063</a> - drm_helper.h:227:1: error: static declaration of ‘pipe_virgl_create_screen’ follows non-static declaration</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93091">Bug 93091</a> - [opencl] segfault when running any opencl programs (like clinfo)</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93092">Bug 93092</a> - lp_test_format regression</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93126">Bug 93126</a> - wrongly claim supporting GL_EXT_texture_rg</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93180">Bug 93180</a> - [regression] arb_separate_shader_objects.active sampler conflict fails</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93189">Bug 93189</a> - &quot;./util/u_inlines.h&quot;, line 83: operands have incompatible types: void &quot;:&quot; int</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93215">Bug 93215</a> - [Regression bisected] Ogles1conform Automatic mipmap generation test is fail</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93235">Bug 93235</a> - [regression] dispatch sanity broken by GetPointerv</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93257">Bug 93257</a> - [SKL, bisected] ASTC dEQP tests segfault</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93264">Bug 93264</a> - Tonga VM Faults since llvm ScheduleDAGInstrs: Rework schedule graph builder.</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93300">Bug 93300</a> - Two Worlds 2 renders water incorrectly</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93312">Bug 93312</a> - [SKL][GLES 3.1 CTS] ES31-CTS.layout_binding* GPU_HANG</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93320">Bug 93320</a> - [HSW,BDW,SKL][GLES 3.1 CTS] ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate fail</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93322">Bug 93322</a> - [HSW,BDW,SKL][GLES 3.1 CTS] ES31-CTS.compute_shader.resource-ubo fail</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93323">Bug 93323</a> - [HSW,BDW,SKL][GLES 3.1 CTS]ES31-CTS.shader_image_load_store.basic-allTargets-store-fs fail</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93325">Bug 93325</a> - [HSW,BDW,SKL]ES31-CTS.explicit_uniform_location.uniform-loc-* 2 tests fail</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93339">Bug 93339</a> - glLinkProgram() should fail when a varying is never written to in a previous stage</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93348">Bug 93348</a> - [HSW,BDW,SKL][GLES 3.1 CTS] ES31-CTS.compute_shader.* segfault</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93358">Bug 93358</a> - [HSW] Unreal Elemental demo - assertion error in copy_image_with_blitter</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93387">Bug 93387</a> - inverse() shouldn’t be exposed in GLSL 1.20 and 1.30</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93388">Bug 93388</a> - [i965, regression, bisection] MESA_FORMAT_B8G8R8X8_SRGB changes break kwin</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93407">Bug 93407</a> - [SKL][GLES 3.1 CTS]ES31-CTS.compute_shader.resources-texture fail</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93410">Bug 93410</a> - [BDW,SKL][GLES 3.1 CTS]ES31-CTS.shader_image_load_store.negative-linkErrors fail</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93418">Bug 93418</a> - Geometry Shaders output wrong vertices on Sandy Bridge</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93426">Bug 93426</a> - [SKL,BDW,BSW,BXT] CTS regression: es2-cts.gtf.gl2fixedtests.buffer_objects.buffer_object,s</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93524">Bug 93524</a> - Clover doesn't build</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93526">Bug 93526</a> - GfxBench 4 tessellation demos misrender</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93532">Bug 93532</a> - [HSW,BDW,SKL][GLES 3.1 CTS] ES31-CTS.compute_shader.*. Regression, bisected.</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93540">Bug 93540</a> - [BISECTED, HSW] Rendering issue in Heaven (and other benchmarks)</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93560">Bug 93560</a> - opt_combine_constants failing fabsf(reg-&gt;f) == table.imm[i].val assertion</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93599">Bug 93599</a> - Strange green flashes with &quot;Metro: Last Light Redux&quot; + &quot;Metro 2033 Redux&quot; with Intel Mesa driver</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93648">Bug 93648</a> - Random lines being rendered when playing Dolphin (geometry shaders related, w/ apitrace)</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93650">Bug 93650</a> - GL_ARB_separate_shader_objects is buggy (PCSX2)</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93667">Bug 93667</a> - Crash in eglCreateImageKHR with huge texture size</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93696">Bug 93696</a> - [HSW,BDW;SKL][GLES 3.1 CTS]ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-* fail</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93700">Bug 93700</a> - [SKL, regression] deqp-gles2.functional.texture.completeness</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93717">Bug 93717</a> - Meta mipmap generation can corrupt texture state</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93722">Bug 93722</a> - Segfault when compiling shader with a subroutine that takes a parameter</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93725">Bug 93725</a> - [HSW, regression, bisected] ES31-CTS.texture_gather.*depth*</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93731">Bug 93731</a> - glUniformSubroutinesuiv segfaults when subroutine uniform is bound to a specific location</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93761">Bug 93761</a> - A conditional discard in a fragment shader causes no depth writing at all</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93790">Bug 93790</a> - [HSW] Use after free with compute programs</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93792">Bug 93792</a> - [HSW] intel_mipmap_tree.c:1325: intel_miptree_copy_slice: Assertion `src_mt-&gt;format == dst_mt-&gt;format</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93813">Bug 93813</a> - Incorrect viewport range when GL_CLIP_ORIGIN is GL_UPPER_LEFT</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93840">Bug 93840</a> - [i965] Alien: Isolation fails with GL_ARB_compute_shader enabled</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93862">Bug 93862</a> - [Bisected] &quot;drm/amdgpu: fix amdgpu_bo_pin_restricted VRAM placing v2&quot; is bad</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93878">Bug 93878</a> - [llvmpipe][softpipe] piglit arb_gpu_shader_fp64-double-gettransformfeedbackvarying regression</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93957">Bug 93957</a> - [HSW] Mishandling of sample count when using an attachment-less framebuffer (assertion error)</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93961">Bug 93961</a> - virgl build failure after 2016-02-01 changes - no previous prototype for 'virgl_drm_winsys_create'</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93962">Bug 93962</a> - [HSW, regression, bisected, CTS] ES2-CTS.gtf.GL2FixedTests.scissor.scissor - segfault/asserts</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=93989">Bug 93989</a> - build: flex-2.5.39 seems to be failing for glsl_lexer.ll</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94016">Bug 94016</a> - make check MesaExtensionsTest.AlphabeticallySorted regression</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94019">Bug 94019</a> - [bisected] 3D acceleration broken with gallium/radeon: just get num_tile_pipes from the winsys</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94050">Bug 94050</a> - test_vec4_register_coalesce regression</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94073">Bug 94073</a> - Miscompilation of abs_vec3_vert_xvary_ref.vert in WebGL conformance</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94081">Bug 94081</a> - [HSW] compute shader shared var + atomic op = fail</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94088">Bug 94088</a> - [llvmpipe] SIGFPE pthread_barrier_destroy.c:40</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94091">Bug 94091</a> - Tonga unreal elemental segfault since radeonsi: put image, fmask, and sampler descriptors into one array</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94100">Bug 94100</a> - [HSW] compute indirect dispatch with 0 work groups causes gpu hang</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94134">Bug 94134</a> - [regression] piglit.spec.arb_texture_view.sampling-2d-array-as-2d-layer assertion</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94139">Bug 94139</a> - [regression, HSW, IVB] piglit.spec.arb_compute_shader.minmax</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94150">Bug 94150</a> - UE4 Suntemple rendering errors</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94186">Bug 94186</a> - Crash when launching glxinfo and World of Warcraft with RV790</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94188">Bug 94188</a> - define (or undef) defined behaves stupidly</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94193">Bug 94193</a> - [llvmpipe] Line antialiasing looks different when GL_LINE_STIPPLE is enabled with pattern 0xffff</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94199">Bug 94199</a> - Shader abort/crash</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94253">Bug 94253</a> - [llvmpipe] piglit gl-1.0-swapbuffers-behavior regression</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94254">Bug 94254</a> - [llvmpipe] [softpipe] piglit read-front regression</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94257">Bug 94257</a> - [softpipe] piglit glx-copy-sub-buffer regression</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94274">Bug 94274</a> - [swrast] piglit arb_occlusion_query2-render regression</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94284">Bug 94284</a> - [radeonsi] outlast segfault on start</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94388">Bug 94388</a> - r600_blit.c:281: r600_decompress_depth_textures: Assertion `tex-&gt;is_depth &amp;&amp; !tex-&gt;is_flushing_texture' failed.</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94412">Bug 94412</a> - Trine 3 misrender</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94481">Bug 94481</a> - softpipe - access violation in img_filter_2d_nearest</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94524">Bug 94524</a> - Wrong gl_TessLevelOuter interpretation for isolines</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=94595">Bug 94595</a> - [Mesa AMD&amp;swrast] Texture views attached as framebuffers return their viewed tecture's color encoding and render incorrectly</li>
+
+</ul>

 <h2>Changes</h2>

--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -2625,6 +2625,13 @@ assign_attribute_or_color_locations(gl_shader_program *prog,
 	 continue;
      }

+      if (num_attr >= ARRAY_SIZE(to_assign)) {
+         linker_error(prog, "too many %s (max %u)",
+                      target_index == MESA_SHADER_VERTEX ?
+                      "vertex shader inputs" : "fragment shader outputs",
+                      (unsigned)ARRAY_SIZE(to_assign));
+         return false;
+      }
      to_assign[num_attr].slots = slots;
      to_assign[num_attr].var = var;
      num_attr++;
@@ -3180,7 +3187,6 @@ check_explicit_uniform_locations(struct gl_context *ctx,
      }
   }

-   exec_list_make_empty(&prog->EmptyUniformLocations);
   struct empty_uniform_block *current_block = NULL;

   for (unsigned i = 0; i < prog->NumUniformRemapTable; i++) {
--- a/src/egl/drivers/dri2/platform_android.c
+++ b/src/egl/drivers/dri2/platform_android.c
@@ -537,6 +537,8 @@ droid_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay *dpy)
   EGLint config_attrs[] = {
     EGL_NATIVE_VISUAL_ID,   0,
     EGL_NATIVE_VISUAL_TYPE, 0,
+     EGL_FRAMEBUFFER_TARGET_ANDROID, EGL_TRUE,
+     EGL_RECORDABLE_ANDROID, EGL_TRUE,
     EGL_NONE
   };
   int count, i, j;
@@ -714,7 +716,9 @@ dri2_initialize_android(_EGLDriver *drv, _EGLDisplay *dpy)
      goto cleanup_screen;
   }

+   dpy->Extensions.ANDROID_framebuffer_target = EGL_TRUE;
   dpy->Extensions.ANDROID_image_native_buffer = EGL_TRUE;
+   dpy->Extensions.ANDROID_recordable = EGL_TRUE;
   dpy->Extensions.KHR_image_base = EGL_TRUE;

   /* Fill vtbl last to prevent accidentally calling virtual function during
--- a/src/egl/drivers/dri2/platform_x11.c
+++ b/src/egl/drivers/dri2/platform_x11.c
@@ -1006,6 +1006,9 @@ dri2_create_image_khr_pixmap(_EGLDisplay *disp, _EGLContext *ctx,
   geometry_cookie = xcb_get_geometry (dri2_dpy->conn, drawable);
   buffers_reply = xcb_dri2_get_buffers_reply (dri2_dpy->conn,
 					       buffers_cookie, NULL);
+   if (buffers_reply == NULL)
+     return NULL;
+
   buffers = xcb_dri2_get_buffers_buffers (buffers_reply);
   if (buffers == NULL) {
      return NULL;
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -381,7 +381,9 @@ _eglCreateExtensionsString(_EGLDisplay *dpy)
   char *exts = dpy->ExtensionsString;

   /* Please keep these sorted alphabetically. */
+   _EGL_CHECK_EXTENSION(ANDROID_framebuffer_target);
   _EGL_CHECK_EXTENSION(ANDROID_image_native_buffer);
+   _EGL_CHECK_EXTENSION(ANDROID_recordable);

   _EGL_CHECK_EXTENSION(CHROMIUM_sync_control);

--- a/src/egl/main/eglconfig.c
+++ b/src/egl/main/eglconfig.c
@@ -44,7 +44,6 @@
 #include "egllog.h"


-#define MIN2(A, B)  (((A) < (B)) ? (A) : (B))


 /**
@@ -246,7 +245,13 @@ static const struct {
   /* extensions */
   { EGL_Y_INVERTED_NOK,            ATTRIB_TYPE_BOOLEAN,
                                    ATTRIB_CRITERION_EXACT,
-                                    EGL_DONT_CARE }
+                                    EGL_DONT_CARE },
+   { EGL_FRAMEBUFFER_TARGET_ANDROID, ATTRIB_TYPE_BOOLEAN,
+                                    ATTRIB_CRITERION_EXACT,
+                                    EGL_DONT_CARE },
+   { EGL_RECORDABLE_ANDROID,        ATTRIB_TYPE_BOOLEAN,
+                                    ATTRIB_CRITERION_EXACT,
+                                    EGL_DONT_CARE },
 };


@@ -489,6 +494,10 @@ _eglIsConfigAttribValid(_EGLConfig *conf, EGLint attr)
   switch (attr) {
   case EGL_Y_INVERTED_NOK:
      return conf->Display->Extensions.NOK_texture_from_pixmap;
+   case EGL_FRAMEBUFFER_TARGET_ANDROID:
+      return conf->Display->Extensions.ANDROID_framebuffer_target;
+   case EGL_RECORDABLE_ANDROID:
+      return conf->Display->Extensions.ANDROID_recordable;
   default:
      break;
   }
--- a/src/egl/main/eglconfig.h
+++ b/src/egl/main/eglconfig.h
@@ -86,6 +86,8 @@ struct _egl_config

   /* extensions */
   EGLint YInvertedNOK;
+   EGLint FramebufferTargetAndroid;
+   EGLint RecordableAndroid;
 };


@@ -133,6 +135,8 @@ _eglOffsetOfConfig(EGLint attr)
   ATTRIB_MAP(EGL_CONFORMANT,                Conformant);
   /* extensions */
   ATTRIB_MAP(EGL_Y_INVERTED_NOK,            YInvertedNOK);
+   ATTRIB_MAP(EGL_FRAMEBUFFER_TARGET_ANDROID, FramebufferTargetAndroid);
+   ATTRIB_MAP(EGL_RECORDABLE_ANDROID,        RecordableAndroid);
 #undef ATTRIB_MAP
   default:
      return -1;
--- a/src/egl/main/egldefines.h
+++ b/src/egl/main/egldefines.h
@@ -40,9 +40,16 @@ extern "C" {

 #define _EGL_MAX_EXTENSIONS_LEN 1000

+/* Hardcoded, conservative default for EGL_LARGEST_PBUFFER,
+ * this is used to implement EGL_LARGEST_PBUFFER.
+ */
+#define _EGL_MAX_PBUFFER_WIDTH 4096
+#define _EGL_MAX_PBUFFER_HEIGHT 4096
+
 #define _EGL_VENDOR_STRING "Mesa Project"

 #define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]))
+#define MIN2(A, B)  (((A) < (B)) ? (A) : (B))

 #ifdef __cplusplus
 }
--- a/src/egl/main/egldisplay.h
+++ b/src/egl/main/egldisplay.h
@@ -90,7 +90,9 @@ struct _egl_resource
 struct _egl_extensions
 {
   /* Please keep these sorted alphabetically. */
+   EGLBoolean ANDROID_framebuffer_target;
   EGLBoolean ANDROID_image_native_buffer;
+   EGLBoolean ANDROID_recordable;

   EGLBoolean CHROMIUM_sync_control;

--- a/src/egl/main/eglsurface.c
+++ b/src/egl/main/eglsurface.c
@@ -307,6 +307,12 @@ _eglInitSurface(_EGLSurface *surf, _EGLDisplay *dpy, EGLint type,
   if (err != EGL_SUCCESS)
      return _eglError(err, func);

+   /* if EGL_LARGEST_PBUFFER in use, clamp width and height */
+   if (surf->LargestPbuffer) {
+      surf->Width = MIN2(surf->Width, _EGL_MAX_PBUFFER_WIDTH);
+      surf->Height = MIN2(surf->Height, _EGL_MAX_PBUFFER_HEIGHT);
+   }
+
   return EGL_TRUE;
 }

--- a/src/gallium/auxiliary/draw/draw_pipe_stipple.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_stipple.c
@@ -108,11 +108,11 @@ emit_segment(struct draw_stage *stage, struct prim_header *header,
 }


-static inline unsigned
+static inline bool
 stipple_test(int counter, ushort pattern, int factor)
 {
   int b = (counter / factor) & 0xf;
-   return (1 << b) & pattern;
+   return !!((1 << b) & pattern);
 }


@@ -126,7 +126,7 @@ stipple_line(struct draw_stage *stage, struct prim_header *header)
   const float *pos0 = v0->data[pos];
   const float *pos1 = v1->data[pos];
   float start = 0;
-   int state = 0;
+   bool state = 0;

   float x0 = pos0[0];
   float x1 = pos1[0];
@@ -143,29 +143,29 @@ stipple_line(struct draw_stage *stage, struct prim_header *header)
      stipple->counter = 0;


-   /* XXX ToDo: intead of iterating pixel-by-pixel, use a look-up table.
+   /* XXX ToDo: instead of iterating pixel-by-pixel, use a look-up table.
    */
   for (i = 0; i < length; i++) {
-      int result = stipple_test( (int) stipple->counter+i,
-                                 (ushort) stipple->pattern, stipple->factor );
+      bool result = stipple_test((int)stipple->counter + i,
+                                 (ushort)stipple->pattern, stipple->factor);
      if (result != state) {
         /* changing from "off" to "on" or vice versa */
-	 if (state) {
-	    if (start != i) {
+         if (state) {
+            if (start != i) {
               /* finishing an "on" segment */
-	       emit_segment( stage, header, start / length, i / length );
+               emit_segment(stage, header, start / length, i / length);
            }
-	 }
-	 else {
+         }
+         else {
            /* starting an "on" segment */
-	    start = (float) i;
-	 }
-	 state = result;	   
+            start = (float)i;
+         }
+         state = result;
      }
   }

   if (state && start < length)
-      emit_segment( stage, header, start / length, 1.0 );
+      emit_segment(stage, header, start / length, 1.0);

   stipple->counter += length;
 }
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -1388,7 +1388,9 @@ static boolean parse_declaration( struct translate_ctx *ctx )
         if (str_match_nocase_whole(&cur, "ATOMIC")) {
            decl.Declaration.Atomic = 1;
            ctx->cur = cur;
-         } else if (str_match_nocase_whole(&cur, "SHARED")) {
+         }
+      } else if (file == TGSI_FILE_MEMORY) {
+         if (str_match_nocase_whole(&cur, "SHARED")) {
            decl.Declaration.Shared = 1;
            ctx->cur = cur;
         }
--- a/src/gallium/drivers/llvmpipe/lp_rast.h
+++ b/src/gallium/drivers/llvmpipe/lp_rast.h
@@ -116,6 +116,12 @@ struct lp_rast_plane {

   /* one-pixel sized trivial reject offsets for each plane */
   uint32_t eo;
+   /*
+    * We rely on this struct being 64bit aligned (ideally it would be 128bit
+    * but that's quite the waste) and therefore on 32bit we need padding
+    * since otherwise (even with the 64bit number in there) it wouldn't be.
+    */
+   uint32_t pad;
 };

 /**
--- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
@@ -94,6 +94,8 @@ lp_setup_alloc_triangle(struct lp_scene *scene,
   unsigned plane_sz = nr_planes * sizeof(struct lp_rast_plane);
   struct lp_rast_triangle *tri;

+   STATIC_ASSERT(sizeof(struct lp_rast_plane) % 8 == 0);
+
   *tri_size = (sizeof(struct lp_rast_triangle) +
                3 * input_array_sz +
                plane_sz);
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
@@ -1634,7 +1634,9 @@ CodeEmitterNV50::emitTEX(const TexInstruction *i)
   code[1] |= (i->tex.mask & 0xc) << 12;

   if (i->tex.liveOnly)
-      code[1] |= 4;
+      code[1] |= 1 << 2;
+   if (i->tex.derivAll)
+      code[1] |= 1 << 3;

   defId(i->def(0), 2);

--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
@@ -372,7 +372,8 @@ NV50LegalizeSSA::propagateWriteToOutput(Instruction *st)
      return;

   for (int s = 0; di->srcExists(s); ++s)
-      if (di->src(s).getFile() == FILE_IMMEDIATE)
+      if (di->src(s).getFile() == FILE_IMMEDIATE ||
+          di->src(s).getFile() == FILE_MEMORY_LOCAL)
         return;

   if (prog->getType() == Program::TYPE_GEOMETRY) {
@@ -934,6 +935,7 @@ NV50LoweringPreSSA::handleTXD(TexInstruction *i)

   handleTEX(i);
   i->op = OP_TEX; // no need to clone dPdx/dPdy later
+   i->tex.derivAll = true;

   for (c = 0; c < dim; ++c)
      crd[c] = bld.getScratch();
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -728,9 +728,13 @@ NVC0LoweringPass::handleTEX(TexInstruction *i)
      }

      Value *arrayIndex = i->tex.target.isArray() ? i->getSrc(lyr) : NULL;
-      for (int s = dim; s >= 1; --s)
-         i->setSrc(s, i->getSrc(s - 1));
-      i->setSrc(0, arrayIndex);
+      if (arrayIndex) {
+         for (int s = dim; s >= 1; --s)
+            i->setSrc(s, i->getSrc(s - 1));
+         i->setSrc(0, arrayIndex);
+      } else {
+         i->moveSources(0, 1);
+      }

      if (arrayIndex) {
         int sat = (i->op == OP_TXF) ? 1 : 0;
@@ -852,7 +856,17 @@ NVC0LoweringPass::handleManualTXD(TexInstruction *i)
   Value *zero = bld.loadImm(bld.getSSA(), 0);
   int l, c;
   const int dim = i->tex.target.getDim() + i->tex.target.isCube();
-   const int array = i->tex.target.isArray();
+
+   // This function is invoked after handleTEX lowering, so we have to expect
+   // the arguments in the order that the hw wants them. For Fermi, array and
+   // indirect are both in the leading arg, while for Kepler, array and
+   // indirect are separate (and both precede the coordinates). Maxwell is
+   // handled in a separate function.
+   unsigned array;
+   if (targ->getChipset() < NVISA_GK104_CHIPSET)
+      array = i->tex.target.isArray() || i->tex.rIndirectSrc >= 0;
+   else
+      array = i->tex.target.isArray() + (i->tex.rIndirectSrc >= 0);

   i->op = OP_TEX; // no need to clone dPdx/dPdy later

--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -633,8 +633,6 @@ nv50_stream_output_validate(struct nv50_context *nv50)
   BEGIN_NV04(push, NV50_3D(STRMOUT_BUFFERS_CTRL), 1);
   PUSH_DATA (push, ctrl);

-   nouveau_bufctx_reset(nv50->bufctx_3d, NV50_BIND_SO);
-
   for (i = 0; i < nv50->num_so_targets; ++i) {
      struct nv50_so_target *targ = nv50_so_target(nv50->so_target[i]);
      struct nv04_resource *buf = nv04_resource(targ->pipe.buffer);
--- a/src/gallium/drivers/nouveau/nv50/nv50_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c
@@ -1180,8 +1180,10 @@ nv50_set_stream_output_targets(struct pipe_context *pipe,
   }
   nv50->num_so_targets = num_targets;

-   if (nv50->so_targets_dirty)
+   if (nv50->so_targets_dirty) {
+      nouveau_bufctx_reset(nv50->bufctx_3d, NV50_BIND_SO);
      nv50->dirty |= NV50_NEW_STRMOUT;
+   }
 }

 static void
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c
@@ -294,7 +294,6 @@ nvc0_tfb_validate(struct nvc0_context *nvc0)

   if (!(nvc0->dirty & NVC0_NEW_TFB_TARGETS))
      return;
-   nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_TFB);

   for (b = 0; b < nvc0->num_tfbbufs; ++b) {
      struct nvc0_so_target *targ = nvc0_so_target(nvc0->tfbbuf[b]);
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
@@ -413,7 +413,7 @@ nvc0_sampler_state_delete(struct pipe_context *pipe, void *hwcso)
 {
   unsigned s, i;

-   for (s = 0; s < 5; ++s)
+   for (s = 0; s < 6; ++s)
      for (i = 0; i < nvc0_context(pipe)->num_samplers[s]; ++i)
         if (nvc0_context(pipe)->samplers[s][i] == hwcso)
            nvc0_context(pipe)->samplers[s][i] = NULL;
@@ -1184,8 +1184,10 @@ nvc0_set_transform_feedback_targets(struct pipe_context *pipe,
   }
   nvc0->num_tfbbufs = num_targets;

-   if (nvc0->tfbbuf_dirty)
+   if (nvc0->tfbbuf_dirty) {
+      nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_TFB);
      nvc0->dirty |= NVC0_NEW_TFB_TARGETS;
+   }
 }

 static void
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
@@ -1203,8 +1203,8 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const struct pipe_blit_info *info)
   x0 = (float)info->src.box.x - x_range * (float)info->dst.box.x;
   y0 = (float)info->src.box.y - y_range * (float)info->dst.box.y;

-   x1 = x0 + 16384.0f * x_range;
-   y1 = y0 + 16384.0f * y_range;
+   x1 = x0 + 32768.0f * x_range;
+   y1 = y0 + 32768.0f * y_range;

   x0 *= (float)(1 << nv50_miptree(src)->ms_x);
   x1 *= (float)(1 << nv50_miptree(src)->ms_x);
@@ -1302,6 +1302,17 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const struct pipe_blit_info *info)
   }
   nvc0->state.num_vtxelts = 2;

+   if (nvc0->state.prim_restart) {
+      IMMED_NVC0(push, NVC0_3D(PRIM_RESTART_ENABLE), 0);
+      nvc0->state.prim_restart = 0;
+   }
+
+   if (nvc0->state.index_bias) {
+      IMMED_NVC0(push, NVC0_3D(VB_ELEMENT_BASE), 0);
+      IMMED_NVC0(push, NVC0_3D(VERTEX_ID_BASE), 0);
+      nvc0->state.index_bias = 0;
+   }
+
   for (i = 0; i < info->dst.box.depth; ++i, z += dz) {
      if (info->dst.box.z + i) {
         BEGIN_NVC0(push, NVC0_3D(LAYER), 1);
@@ -1314,14 +1325,14 @@ nvc0_blit_3d(struct nvc0_context *nvc0, const struct pipe_blit_info *info)
      *(vbuf++) = fui(y0);
      *(vbuf++) = fui(z);

-      *(vbuf++) = fui(16384 << nv50_miptree(dst)->ms_x);
+      *(vbuf++) = fui(32768 << nv50_miptree(dst)->ms_x);
      *(vbuf++) = fui(0.0f);
      *(vbuf++) = fui(x1);
      *(vbuf++) = fui(y0);
      *(vbuf++) = fui(z);

      *(vbuf++) = fui(0.0f);
-      *(vbuf++) = fui(16384 << nv50_miptree(dst)->ms_y);
+      *(vbuf++) = fui(32768 << nv50_miptree(dst)->ms_y);
      *(vbuf++) = fui(x0);
      *(vbuf++) = fui(y1);
      *(vbuf++) = fui(z);
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -989,13 +989,6 @@ void evergreen_init_color_surface_rat(struct r600_context *rctx,
 		MAX2(64, rctx->screen->b.info.pipe_interleave_bytes / block_size);
 	unsigned pitch = align(pipe_buffer->width0, pitch_alignment);

-	/* XXX: This is copied from evergreen_init_color_surface().  I don't
-	 * know why this is necessary.
-	 */
-	if (pipe_buffer->usage == PIPE_USAGE_STAGING) {
-		endian = ENDIAN_NONE;
-	}
-
 	surf->cb_color_base = r600_resource(pipe_buffer)->gpu_address >> 8;

 	surf->cb_color_pitch = (pitch / 8) - 1;
@@ -1146,11 +1139,7 @@ void evergreen_init_color_surface(struct r600_context *rctx,
 	swap = r600_translate_colorswap(surf->base.format);
 	assert(swap != ~0);

-	if (rtex->resource.b.b.usage == PIPE_USAGE_STAGING) {
-		endian = ENDIAN_NONE;
-	} else {
-		endian = r600_colorformat_endian_swap(format);
-	}
+	endian = r600_colorformat_endian_swap(format);

 	/* blend clamp should be set for all NORM/SRGB types */
 	if (ntype == V_028C70_NUMBER_UNORM || ntype == V_028C70_NUMBER_SNORM ||
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -930,11 +930,7 @@ static void r600_init_color_surface(struct r600_context *rctx,
 	swap = r600_translate_colorswap(surf->base.format);
 	assert(swap != ~0);

-	if (rtex->resource.b.b.usage == PIPE_USAGE_STAGING) {
-		endian = ENDIAN_NONE;
-	} else {
-		endian = r600_colorformat_endian_swap(format);
-	}
+	endian = r600_colorformat_endian_swap(format);

 	/* set blend bypass according to docs if SINT/UINT or
 	   8/24 COLOR variants */
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -645,21 +645,21 @@ static void r600_set_sampler_views(struct pipe_context *pipe, unsigned shader,
 		if (rviews[i]) {
 			struct r600_texture *rtex =
 				(struct r600_texture*)rviews[i]->base.texture;
+			bool is_buffer = rviews[i]->base.texture->target == PIPE_BUFFER;

-			if (rviews[i]->base.texture->target != PIPE_BUFFER) {
-				if (rtex->is_depth && !rtex->is_flushing_texture) {
-					dst->views.compressed_depthtex_mask |= 1 << i;
-				} else {
-					dst->views.compressed_depthtex_mask &= ~(1 << i);
-				}
-
-				/* Track compressed colorbuffers. */
-				if (rtex->cmask.size) {
-					dst->views.compressed_colortex_mask |= 1 << i;
-				} else {
-					dst->views.compressed_colortex_mask &= ~(1 << i);
-				}
+			if (!is_buffer && rtex->is_depth && !rtex->is_flushing_texture) {
+				dst->views.compressed_depthtex_mask |= 1 << i;
+			} else {
+				dst->views.compressed_depthtex_mask &= ~(1 << i);
 			}
+
+			/* Track compressed colorbuffers. */
+			if (!is_buffer && rtex->cmask.size) {
+				dst->views.compressed_colortex_mask |= 1 << i;
+			} else {
+				dst->views.compressed_colortex_mask &= ~(1 << i);
+			}
+
 			/* Changing from array to non-arrays textures and vice versa requires
 			 * updating TEX_ARRAY_OVERRIDE in sampler states on R6xx-R7xx. */
 			if (rctx->b.chip_class <= R700 &&
--- a/src/gallium/drivers/r600/sb/sb_expr.cpp
+++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
@@ -598,9 +598,13 @@ bool expr_handler::fold_assoc(alu_node *n) {

 	unsigned op = n->bc.op;
 	bool allow_neg = false, cur_neg = false;
+	bool distribute_neg = false;

 	switch(op) {
 	case ALU_OP2_ADD:
+		distribute_neg = true;
+		allow_neg = true;
+		break;
 	case ALU_OP2_MUL:
 	case ALU_OP2_MUL_IEEE:
 		allow_neg = true;
@@ -632,7 +636,7 @@ bool expr_handler::fold_assoc(alu_node *n) {
 		if (v1->is_const()) {
 			literal arg = v1->get_const_value();
 			apply_alu_src_mod(a->bc, 1, arg);
-			if (cur_neg)
+			if (cur_neg && distribute_neg)
 				arg.f = -arg.f;

 			if (a == n)
@@ -660,7 +664,7 @@ bool expr_handler::fold_assoc(alu_node *n) {
 		if (v0->is_const()) {
 			literal arg = v0->get_const_value();
 			apply_alu_src_mod(a->bc, 0, arg);
-			if (cur_neg)
+			if (cur_neg && distribute_neg)
 				arg.f = -arg.f;

 			if (last_arg == 0) {
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -314,7 +314,8 @@ static void *r600_buffer_transfer_map(struct pipe_context *ctx,
 		}
 	}
 	else if ((usage & PIPE_TRANSFER_DISCARD_RANGE) &&
-		 !(usage & PIPE_TRANSFER_UNSYNCHRONIZED) &&
+		 !(usage & (PIPE_TRANSFER_UNSYNCHRONIZED |
+			    PIPE_TRANSFER_PERSISTENT)) &&
 		 !(rscreen->debug_flags & DBG_NO_DISCARD_RANGE) &&
 		 r600_can_dma_copy_buffer(rctx, box->x, 0, box->width)) {
 		assert(usage & PIPE_TRANSFER_WRITE);
@@ -341,7 +342,8 @@ static void *r600_buffer_transfer_map(struct pipe_context *ctx,
 	}
 	/* Using a staging buffer in GTT for larger reads is much faster. */
 	else if ((usage & PIPE_TRANSFER_READ) &&
-		 !(usage & PIPE_TRANSFER_WRITE) &&
+		 !(usage & (PIPE_TRANSFER_WRITE |
+			    PIPE_TRANSFER_PERSISTENT)) &&
 		 rbuffer->domains == RADEON_DOMAIN_VRAM &&
 		 r600_can_dma_copy_buffer(rctx, 0, box->x, box->width)) {
 		struct r600_resource *staging;
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -533,8 +533,14 @@ static unsigned r600_texture_get_htile_size(struct r600_common_screen *rscreen,
 	    rscreen->info.drm_major == 2 && rscreen->info.drm_minor < 38)
 		return 0;

-	/* Overalign HTILE on Stoney to fix piglit/depthstencil-render-miplevels 585. */
-	if (rscreen->family == CHIP_STONEY)
+	/* Overalign HTILE on P2 configs to work around GPU hangs in
+	 * piglit/depthstencil-render-miplevels 585.
+	 *
+	 * This has been confirmed to help Kabini & Stoney, where the hangs
+	 * are always reproducible. I think I have seen the test hang
+	 * on Carrizo too, though it was very rare there.
+	 */
+	if (rscreen->chip_class >= CIK && num_pipes < 4)
 		num_pipes = 4;

 	switch (num_pipes) {
--- a/src/gallium/drivers/radeon/radeon_video.c
+++ b/src/gallium/drivers/radeon/radeon_video.c
@@ -237,6 +237,7 @@ int rvid_get_video_param(struct pipe_screen *screen,
 	case PIPE_VIDEO_CAP_SUPPORTED:
 		switch (codec) {
 		case PIPE_VIDEO_FORMAT_MPEG12:
+			return profile != PIPE_VIDEO_PROFILE_MPEG1;
 		case PIPE_VIDEO_FORMAT_MPEG4:
 		case PIPE_VIDEO_FORMAT_MPEG4_AVC:
 			if (rscreen->family < CHIP_PALM)
@@ -257,7 +258,7 @@ int rvid_get_video_param(struct pipe_screen *screen,
 	case PIPE_VIDEO_CAP_MAX_WIDTH:
 		return (rscreen->family < CHIP_TONGA) ? 2048 : 4096;
 	case PIPE_VIDEO_CAP_MAX_HEIGHT:
-		return (rscreen->family < CHIP_TONGA) ? 1152 : 2304;
+		return (rscreen->family < CHIP_TONGA) ? 1152 : 4096;
 	case PIPE_VIDEO_CAP_PREFERED_FORMAT:
 		return PIPE_FORMAT_NV12;
 	case PIPE_VIDEO_CAP_PREFERS_INTERLACED:
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -303,6 +303,7 @@ static void si_bind_sampler_states(struct pipe_context *ctx, unsigned shader,
 		 */
 		if (samplers->views.views[i] &&
 		    samplers->views.views[i]->texture &&
+		    samplers->views.views[i]->texture->target != PIPE_BUFFER &&
 		    ((struct r600_texture*)samplers->views.views[i]->texture)->fmask.size)
 			continue;

--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -39,6 +39,7 @@
 #include "radeon/radeon_llvm_emit.h"
 #include "util/u_memory.h"
 #include "util/u_pstipple.h"
+#include "util/u_string.h"
 #include "tgsi/tgsi_parse.h"
 #include "tgsi/tgsi_util.h"
 #include "tgsi/tgsi_dump.h"
@@ -4426,7 +4427,7 @@ static void si_shader_dump_disassembly(const struct radeon_shader_binary *binary

 			line = binary->disasm_string;
 			while (*line) {
-				p = strchrnul(line, '\n');
+				p = util_strchrnul(line, '\n');
 				count = p - line;

 				if (count) {
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -2250,11 +2250,7 @@ static void si_initialize_color_surface(struct si_context *sctx,
 	}
 	assert(format != V_028C70_COLOR_INVALID);
 	swap = r600_translate_colorswap(surf->base.format);
-	if (rtex->resource.b.b.usage == PIPE_USAGE_STAGING) {
-		endian = V_028C70_ENDIAN_NONE;
-	} else {
-		endian = si_colorformat_endian_swap(format);
-	}
+	endian = si_colorformat_endian_swap(format);

 	/* blend clamp should be set for all NORM/SRGB types */
 	if (ntype == V_028C70_NUMBER_UNORM ||
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -2209,6 +2209,7 @@ img_filter_2d_ewa(const struct sp_sampler_view *sp_sview,
                  const float t[TGSI_QUAD_SIZE],
                  const float p[TGSI_QUAD_SIZE],
                  const uint faces[TGSI_QUAD_SIZE],
+                  const int8_t *offset,
                  unsigned level,
                  const float dudx, const float dvdx,
                  const float dudy, const float dvdy,
@@ -2268,6 +2269,8 @@ img_filter_2d_ewa(const struct sp_sampler_view *sp_sview,
   /* F *= formScale; */ /* no need to scale F as we don't use it below here */

   args.level = level;
+   args.offset = offset;
+
   for (j = 0; j < TGSI_QUAD_SIZE; j++) {
      /* Heckbert MS thesis, p. 59; scan over the bounding box of the ellipse
       * and incrementally update the value of Ax^2+Bxy*Cy^2; when this
@@ -2431,6 +2434,8 @@ mip_filter_linear_aniso(const struct sp_sampler_view *sp_sview,
   const float dvdy = (t[QUAD_TOP_LEFT]     - t[QUAD_BOTTOM_LEFT]) * t_to_v;
   struct img_filter_args args;

+   args.offset = filt_args->offset;
+
   if (filt_args->control == TGSI_SAMPLER_LOD_BIAS ||
       filt_args->control == TGSI_SAMPLER_LOD_NONE ||
       /* XXX FIXME */
@@ -2503,8 +2508,8 @@ mip_filter_linear_aniso(const struct sp_sampler_view *sp_sview,
       * seem to be worth the extra running time.
       */
      img_filter_2d_ewa(sp_sview, sp_samp, min_filter, mag_filter,
-                        s, t, p, filt_args->faces, level0,
-                        dudx, dvdx, dudy, dvdy, rgba);
+                        s, t, p, filt_args->faces, filt_args->offset,
+                        level0, dudx, dvdx, dudy, dvdy, rgba);
   }

   if (DEBUG_TEX) {
--- a/src/gallium/drivers/virgl/virgl_encode.c
+++ b/src/gallium/drivers/virgl/virgl_encode.c
@@ -741,7 +741,9 @@ int virgl_encode_blit(struct virgl_context *ctx,
   virgl_encoder_write_cmd_dword(ctx, VIRGL_CMD0(VIRGL_CCMD_BLIT, 0, VIRGL_CMD_BLIT_SIZE));
   tmp = VIRGL_CMD_BLIT_S0_MASK(blit->mask) |
      VIRGL_CMD_BLIT_S0_FILTER(blit->filter) |
-      VIRGL_CMD_BLIT_S0_SCISSOR_ENABLE(blit->scissor_enable);
+      VIRGL_CMD_BLIT_S0_SCISSOR_ENABLE(blit->scissor_enable) |
+      VIRGL_CMD_BLIT_S0_RENDER_CONDITION_ENABLE(blit->render_condition_enable) |
+      VIRGL_CMD_BLIT_S0_ALPHA_BLEND(blit->alpha_blend);
   virgl_encoder_write_dword(ctx->cbuf, tmp);
   virgl_encoder_write_dword(ctx->cbuf, (blit->scissor.minx | blit->scissor.miny << 16));
   virgl_encoder_write_dword(ctx->cbuf, (blit->scissor.maxx | blit->scissor.maxy << 16));
--- a/src/gallium/drivers/virgl/virgl_protocol.h
+++ b/src/gallium/drivers/virgl/virgl_protocol.h
@@ -388,6 +388,8 @@ enum virgl_context_cmd {
 #define VIRGL_CMD_BLIT_S0_MASK(x) (((x) & 0xff) << 0)
 #define VIRGL_CMD_BLIT_S0_FILTER(x) (((x) & 0x3) << 8)
 #define VIRGL_CMD_BLIT_S0_SCISSOR_ENABLE(x) (((x) & 0x1) << 10)
+#define VIRGL_CMD_BLIT_S0_RENDER_CONDITION_ENABLE(x) (((x) & 0x1) << 11)
+#define VIRGL_CMD_BLIT_S0_ALPHA_BLEND(x) (((x) & 0x1) << 12)
 #define VIRGL_CMD_BLIT_SCISSOR_MINX_MINY 2
 #define VIRGL_CMD_BLIT_SCISSOR_MAXX_MAXY 3
 #define VIRGL_CMD_BLIT_DST_RES_HANDLE 4
--- a/src/gallium/state_trackers/clover/core/kernel.cpp
+++ b/src/gallium/state_trackers/clover/core/kernel.cpp
@@ -55,7 +55,7 @@ kernel::launch(command_queue &q,
   const auto reduced_grid_size =
      map(divides(), grid_size, block_size);
   void *st = exec.bind(&q, grid_offset);
-   struct pipe_grid_info info;
+   struct pipe_grid_info info = {};

   // The handles are created during exec_context::bind(), so we need make
   // sure to call exec_context::bind() before retrieving them.
--- a/src/gallium/state_trackers/nine/buffer9.c
+++ b/src/gallium/state_trackers/nine/buffer9.c
@@ -174,13 +174,18 @@ NineBuffer9_Lock( struct NineBuffer9 *This,
    u_box_1d(OffsetToLock, SizeToLock, &box);

    if (This->base.pool == D3DPOOL_MANAGED) {
-        if (!This->managed.dirty) {
-            assert(LIST_IS_EMPTY(&This->managed.list));
-            list_add(&This->managed.list, &This->base.base.device->update_buffers);
-            This->managed.dirty = TRUE;
-            This->managed.dirty_box = box;
-        } else {
-            u_box_union_2d(&This->managed.dirty_box, &This->managed.dirty_box, &box);
+        /* READONLY doesn't dirty the buffer */
+        if (!(Flags & D3DLOCK_READONLY)) {
+            if (!This->managed.dirty) {
+                assert(LIST_IS_EMPTY(&This->managed.list));
+                This->managed.dirty = TRUE;
+                This->managed.dirty_box = box;
+            } else {
+                u_box_union_2d(&This->managed.dirty_box, &This->managed.dirty_box, &box);
+                /* Do not upload while we are locking, we'll add it back later */
+                if (!LIST_IS_EMPTY(&This->managed.list))
+                    list_delinit(&This->managed.list);
+            }
        }
        *ppbData = (char *)This->managed.data + OffsetToLock;
        DBG("returning pointer %p\n", *ppbData);
@@ -229,8 +234,13 @@ NineBuffer9_Unlock( struct NineBuffer9 *This )
    user_assert(This->nmaps > 0, D3DERR_INVALIDCALL);
    if (This->base.pool != D3DPOOL_MANAGED)
        This->pipe->transfer_unmap(This->pipe, This->maps[--(This->nmaps)]);
-    else
+    else {
        This->nmaps--;
+        /* TODO: Fix this to upload at the first draw call needing the data,
+         * instead of at the next draw call */
+        if (!This->nmaps && This->managed.dirty && LIST_IS_EMPTY(&This->managed.list))
+            list_add(&This->managed.list, &This->base.base.device->update_buffers);
+    }
    return D3D_OK;
 }

--- a/src/gallium/state_trackers/nine/nine_shader.c
+++ b/src/gallium/state_trackers/nine/nine_shader.c
@@ -830,6 +830,18 @@ nine_ureg_dst_register(unsigned file, int index)
    return ureg_dst(ureg_src_register(file, index));
 }

+static inline struct ureg_src
+nine_get_position_input(struct shader_translator *tx)
+{
+    struct ureg_program *ureg = tx->ureg;
+
+    if (tx->wpos_is_sysval)
+        return ureg_DECL_system_value(ureg, TGSI_SEMANTIC_POSITION, 0);
+    else
+        return ureg_DECL_fs_input(ureg, TGSI_SEMANTIC_POSITION,
+                                  0, TGSI_INTERPOLATE_LINEAR);
+}
+
 static struct ureg_src
 tx_src_param(struct shader_translator *tx, const struct sm1_src_param *param)
 {
@@ -955,16 +967,8 @@ tx_src_param(struct shader_translator *tx, const struct sm1_src_param *param)
    case D3DSPR_MISCTYPE:
        switch (param->idx) {
        case D3DSMO_POSITION:
-           if (ureg_src_is_undef(tx->regs.vPos)) {
-              if (tx->wpos_is_sysval) {
-                  tx->regs.vPos =
-                      ureg_DECL_system_value(ureg, TGSI_SEMANTIC_POSITION, 0);
-              } else {
-                  tx->regs.vPos =
-                      ureg_DECL_fs_input(ureg, TGSI_SEMANTIC_POSITION, 0,
-                                         TGSI_INTERPOLATE_LINEAR);
-              }
-           }
+           if (ureg_src_is_undef(tx->regs.vPos))
+              tx->regs.vPos = nine_get_position_input(tx);
           if (tx->shift_wpos) {
               /* TODO: do this only once */
               struct ureg_dst wpos = tx_scratch(tx);
@@ -2048,9 +2052,16 @@ DECL_SPECIAL(DCL)
            unsigned interp_location = 0;
            /* SM3 only, SM2 input semantic determined by file */
            assert(sem.reg.idx < Elements(tx->regs.v));
+
+            if (tgsi.Name == TGSI_SEMANTIC_POSITION) {
+                tx->regs.v[sem.reg.idx] = nine_get_position_input(tx);
+                return D3D_OK;
+            }
+
            if (sem.reg.mod & NINED3DSPDM_CENTROID ||
                (tgsi.Name == TGSI_SEMANTIC_COLOR && tx->info->force_color_in_centroid))
                interp_location = TGSI_INTERPOLATE_LOC_CENTROID;
+
            tx->regs.v[sem.reg.idx] = ureg_DECL_fs_input_cyl_centroid(
                ureg, tgsi.Name, tgsi.Index,
                nine_tgsi_to_interp_mode(&tgsi),
@@ -3269,12 +3280,7 @@ shader_add_ps_fog_stage(struct shader_translator *tx, struct ureg_src src_col)
    }

    if (tx->info->fog_mode != D3DFOG_NONE) {
-        if (tx->wpos_is_sysval) {
-            depth = ureg_DECL_system_value(ureg, TGSI_SEMANTIC_POSITION, 0);
-        } else {
-            depth = ureg_DECL_fs_input(ureg, TGSI_SEMANTIC_POSITION, 0,
-                                       TGSI_INTERPOLATE_LINEAR);
-        }
+        depth = nine_get_position_input(tx);
        depth = ureg_scalar(depth, TGSI_SWIZZLE_Z);
    }

--- a/src/gallium/state_trackers/omx/vid_dec.c
+++ b/src/gallium/state_trackers/omx/vid_dec.c
@@ -140,7 +140,7 @@ static OMX_ERRORTYPE vid_dec_Constructor(OMX_COMPONENTTYPE *comp, OMX_STRING nam

   r = omx_base_filter_Constructor(comp, name);
   if (r)
-	return r;
+      return r;

   priv->profile = PIPE_VIDEO_PROFILE_UNKNOWN;

@@ -268,7 +268,7 @@ static OMX_ERRORTYPE vid_dec_SetParameter(OMX_HANDLETYPE handle, OMX_INDEXTYPE i
      r = checkHeader(param, sizeof(OMX_PARAM_COMPONENTROLETYPE));
      if (r)
         return r;
- 
+
      if (!strcmp((char *)role->cRole, OMX_VID_DEC_MPEG2_ROLE)) {
         priv->profile = PIPE_VIDEO_PROFILE_MPEG2_MAIN;
      } else if (!strcmp((char *)role->cRole, OMX_VID_DEC_AVC_ROLE)) {
@@ -321,7 +321,7 @@ static OMX_ERRORTYPE vid_dec_GetParameter(OMX_HANDLETYPE handle, OMX_INDEXTYPE i
         strcpy((char *)role->cRole, OMX_VID_DEC_MPEG2_ROLE);
      else if (priv->profile == PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH)
         strcpy((char *)role->cRole, OMX_VID_DEC_AVC_ROLE);
- 
+
      break;
   }

@@ -419,6 +419,7 @@ static OMX_ERRORTYPE vid_dec_DecodeBuffer(omx_base_PortType *port, OMX_BUFFERHEA
   priv->in_buffers[i] = buf;
   priv->sizes[i] = buf->nFilledLen;
   priv->inputs[i] = buf->pBuffer;
+   priv->timestamps[i] = buf->nTimeStamp;

   while (priv->num_in_buffers > (!!(buf->nFlags & OMX_BUFFERFLAG_EOS) ? 0 : 1)) {
      bool eos = !!(priv->in_buffers[0]->nFlags & OMX_BUFFERFLAG_EOS);
@@ -469,12 +470,13 @@ static OMX_ERRORTYPE vid_dec_DecodeBuffer(omx_base_PortType *port, OMX_BUFFERHEA
         priv->in_buffers[0] = priv->in_buffers[1];
         priv->sizes[0] = priv->sizes[1] - delta;
         priv->inputs[0] = priv->inputs[1] + delta;
+         priv->timestamps[0] = priv->timestamps[1];
      }

      if (r)
         return r;
   }
- 
+
   return OMX_ErrorNone;
 }

@@ -513,7 +515,7 @@ static void vid_dec_FillOutput(vid_dec_PrivateType *priv, struct pipe_video_buff

   box.width = def->nFrameWidth / 2;
   box.height = def->nFrameHeight / 2;
- 
+
   src = priv->pipe->transfer_map(priv->pipe, views[1]->texture, 0,
                                  PIPE_TRANSFER_READ, &box, &transfer);
   util_copy_rect(dst, views[1]->texture->format, def->nStride, 0, 0,
@@ -526,9 +528,13 @@ static void vid_dec_FrameDecoded(OMX_COMPONENTTYPE *comp, OMX_BUFFERHEADERTYPE*
 {
   vid_dec_PrivateType *priv = comp->pComponentPrivate;
   bool eos = !!(input->nFlags & OMX_BUFFERFLAG_EOS);
+   OMX_TICKS timestamp;

-   if (!input->pInputPortPrivate)
-      input->pInputPortPrivate = priv->Flush(priv);
+   if (!input->pInputPortPrivate) {
+      input->pInputPortPrivate = priv->Flush(priv, &timestamp);
+      if (timestamp != OMX_VID_DEC_TIMESTAMP_INVALID)
+         input->nTimeStamp = timestamp;
+   }

   if (input->pInputPortPrivate) {
      if (output->pInputPortPrivate) {
@@ -539,6 +545,7 @@ static void vid_dec_FrameDecoded(OMX_COMPONENTTYPE *comp, OMX_BUFFERHEADERTYPE*
         vid_dec_FillOutput(priv, input->pInputPortPrivate, output);
      }
      output->nFilledLen = output->nAllocLen;
+      output->nTimeStamp = input->nTimeStamp;
   }

   if (eos && input->pInputPortPrivate)
--- a/src/gallium/state_trackers/omx/vid_dec.h
+++ b/src/gallium/state_trackers/omx/vid_dec.h
@@ -59,6 +59,8 @@
 #define OMX_VID_DEC_AVC_NAME "OMX.mesa.video_decoder.avc"
 #define OMX_VID_DEC_AVC_ROLE "video_decoder.avc"

+#define OMX_VID_DEC_TIMESTAMP_INVALID ((OMX_TICKS) -1)
+
 struct vl_vlc;

 DERIVEDCLASS(vid_dec_PrivateType, omx_base_filter_PrivateType)
@@ -69,7 +71,7 @@ DERIVEDCLASS(vid_dec_PrivateType, omx_base_filter_PrivateType)
   struct pipe_video_codec *codec; \
   void (*Decode)(vid_dec_PrivateType *priv, struct vl_vlc *vlc, unsigned min_bits_left); \
   void (*EndFrame)(vid_dec_PrivateType *priv); \
-   struct pipe_video_buffer *(*Flush)(vid_dec_PrivateType *priv); \
+   struct pipe_video_buffer *(*Flush)(vid_dec_PrivateType *priv, OMX_TICKS *timestamp); \
   struct pipe_video_buffer *target, *shadow; \
   union { \
      struct { \
@@ -100,6 +102,9 @@ DERIVEDCLASS(vid_dec_PrivateType, omx_base_filter_PrivateType)
   OMX_BUFFERHEADERTYPE *in_buffers[2]; \
   const void *inputs[2]; \
   unsigned sizes[2]; \
+   OMX_TICKS timestamps[2]; \
+   OMX_TICKS timestamp; \
+   bool first_buf_in_frame; \
   bool frame_finished; \
   bool frame_started; \
   unsigned bytes_left; \
--- a/src/gallium/state_trackers/omx/vid_dec_h264.c
+++ b/src/gallium/state_trackers/omx/vid_dec_h264.c
@@ -45,6 +45,7 @@
 struct dpb_list {
   struct list_head list;
   struct pipe_video_buffer *buffer;
+   OMX_TICKS timestamp;
   unsigned poc;
 };

@@ -82,7 +83,7 @@ static const uint8_t Default_8x8_Inter[64] = {

 static void vid_dec_h264_Decode(vid_dec_PrivateType *priv, struct vl_vlc *vlc, unsigned min_bits_left);
 static void vid_dec_h264_EndFrame(vid_dec_PrivateType *priv);
-static struct pipe_video_buffer *vid_dec_h264_Flush(vid_dec_PrivateType *priv);
+static struct pipe_video_buffer *vid_dec_h264_Flush(vid_dec_PrivateType *priv, OMX_TICKS *timestamp);

 void vid_dec_h264_Init(vid_dec_PrivateType *priv)
 {
@@ -91,9 +92,10 @@ void vid_dec_h264_Init(vid_dec_PrivateType *priv)
   priv->Decode = vid_dec_h264_Decode;
   priv->EndFrame = vid_dec_h264_EndFrame;
   priv->Flush = vid_dec_h264_Flush;
-   
+
   LIST_INITHEAD(&priv->codec_data.h264.dpb_list);
   priv->picture.h264.field_order_cnt[0] = priv->picture.h264.field_order_cnt[1] = INT_MAX;
+   priv->first_buf_in_frame = true;
 }

 static void vid_dec_h264_BeginFrame(vid_dec_PrivateType *priv)
@@ -104,6 +106,9 @@ static void vid_dec_h264_BeginFrame(vid_dec_PrivateType *priv)
      return;

   vid_dec_NeedTarget(priv);
+   if (priv->first_buf_in_frame)
+      priv->timestamp = priv->timestamps[0];
+   priv->first_buf_in_frame = false;

   priv->picture.h264.num_ref_frames = priv->picture.h264.pps->sps->max_num_ref_frames;

@@ -127,7 +132,8 @@ static void vid_dec_h264_BeginFrame(vid_dec_PrivateType *priv)
   priv->frame_started = true;
 }

-static struct pipe_video_buffer *vid_dec_h264_Flush(vid_dec_PrivateType *priv)
+static struct pipe_video_buffer *vid_dec_h264_Flush(vid_dec_PrivateType *priv,
+                                                    OMX_TICKS *timestamp)
 {
   struct dpb_list *entry, *result = NULL;
   struct pipe_video_buffer *buf;
@@ -146,6 +152,8 @@ static struct pipe_video_buffer *vid_dec_h264_Flush(vid_dec_PrivateType *priv)
      return NULL;

   buf = result->buffer;
+   if (timestamp)
+      *timestamp = result->timestamp;

   --priv->codec_data.h264.dpb_num;
   LIST_DEL(&result->list);
@@ -159,6 +167,7 @@ static void vid_dec_h264_EndFrame(vid_dec_PrivateType *priv)
   struct dpb_list *entry;
   struct pipe_video_buffer *tmp;
   bool top_field_first;
+   OMX_TICKS timestamp;

   if (!priv->frame_started)
      return;
@@ -181,7 +190,9 @@ static void vid_dec_h264_EndFrame(vid_dec_PrivateType *priv)
   if (!entry)
      return;

+   priv->first_buf_in_frame = true;
   entry->buffer = priv->target;
+   entry->timestamp = priv->timestamp;
   entry->poc = MIN2(priv->picture.h264.field_order_cnt[0], priv->picture.h264.field_order_cnt[1]);
   LIST_ADDTAIL(&entry->list, &priv->codec_data.h264.dpb_list);
   ++priv->codec_data.h264.dpb_num;
@@ -192,7 +203,8 @@ static void vid_dec_h264_EndFrame(vid_dec_PrivateType *priv)
      return;

   tmp = priv->in_buffers[0]->pInputPortPrivate;
-   priv->in_buffers[0]->pInputPortPrivate = vid_dec_h264_Flush(priv);
+   priv->in_buffers[0]->pInputPortPrivate = vid_dec_h264_Flush(priv, &timestamp);
+   priv->in_buffers[0]->nTimeStamp = timestamp;
   priv->target = tmp;
   priv->frame_finished = priv->in_buffers[0]->pInputPortPrivate != NULL;
 }
@@ -829,7 +841,7 @@ static void slice_header(vid_dec_PrivateType *priv, struct vl_rbsp *rbsp,
         priv->picture.h264.field_order_cnt[0] = expectedPicOrderCnt + priv->codec_data.h264.delta_pic_order_cnt[0];
         priv->picture.h264.field_order_cnt[1] = priv->picture.h264.field_order_cnt[0] +
            sps->offset_for_top_to_bottom_field + priv->codec_data.h264.delta_pic_order_cnt[1];
-         
+
      } else if (!priv->picture.h264.bottom_field_flag)
         priv->picture.h264.field_order_cnt[0] = expectedPicOrderCnt + priv->codec_data.h264.delta_pic_order_cnt[0];
      else
@@ -859,7 +871,7 @@ static void slice_header(vid_dec_PrivateType *priv, struct vl_rbsp *rbsp,
      if (!priv->picture.h264.field_pic_flag) {
         priv->picture.h264.field_order_cnt[0] = tempPicOrderCnt;
         priv->picture.h264.field_order_cnt[1] = tempPicOrderCnt;
-         
+
      } else if (!priv->picture.h264.bottom_field_flag)
         priv->picture.h264.field_order_cnt[0] = tempPicOrderCnt;
      else
@@ -876,7 +888,7 @@ static void slice_header(vid_dec_PrivateType *priv, struct vl_rbsp *rbsp,

   priv->picture.h264.num_ref_idx_l0_active_minus1 = pps->num_ref_idx_l0_default_active_minus1;
   priv->picture.h264.num_ref_idx_l1_active_minus1 = pps->num_ref_idx_l1_default_active_minus1;
- 
+
   if (slice_type == PIPE_H264_SLICE_TYPE_P ||
       slice_type == PIPE_H264_SLICE_TYPE_SP ||
       slice_type == PIPE_H264_SLICE_TYPE_B) {
--- a/src/gallium/state_trackers/omx/vid_dec_mpeg12.c
+++ b/src/gallium/state_trackers/omx/vid_dec_mpeg12.c
@@ -61,7 +61,7 @@ static uint8_t default_non_intra_matrix[64] = {

 static void vid_dec_mpeg12_Decode(vid_dec_PrivateType *priv, struct vl_vlc *vlc, unsigned min_bits_left);
 static void vid_dec_mpeg12_EndFrame(vid_dec_PrivateType *priv);
-static struct pipe_video_buffer *vid_dec_mpeg12_Flush(vid_dec_PrivateType *priv);
+static struct pipe_video_buffer *vid_dec_mpeg12_Flush(vid_dec_PrivateType *priv, OMX_TICKS *timestamp);

 void vid_dec_mpeg12_Init(vid_dec_PrivateType *priv)
 {
@@ -131,10 +131,12 @@ static void vid_dec_mpeg12_EndFrame(vid_dec_PrivateType *priv)
   priv->in_buffers[0]->pInputPortPrivate = done;
 }

-static struct pipe_video_buffer *vid_dec_mpeg12_Flush(vid_dec_PrivateType *priv)
+static struct pipe_video_buffer *vid_dec_mpeg12_Flush(vid_dec_PrivateType *priv, OMX_TICKS *timestamp)
 {
   struct pipe_video_buffer *result = priv->picture.mpeg12.ref[1];
   priv->picture.mpeg12.ref[1] = NULL;
+   if (timestamp)
+      *timestamp = OMX_VID_DEC_TIMESTAMP_INVALID;
   return result;
 }

--- a/src/gallium/state_trackers/omx/vid_enc.c
+++ b/src/gallium/state_trackers/omx/vid_enc.c
@@ -179,7 +179,7 @@ static OMX_ERRORTYPE vid_enc_Constructor(OMX_COMPONENTTYPE *comp, OMX_STRING nam
   if (!screen->get_video_param(screen, PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH,
                                PIPE_VIDEO_ENTRYPOINT_ENCODE, PIPE_VIDEO_CAP_SUPPORTED))
      return OMX_ErrorBadParameter;
- 
+
   priv->stacked_frames_num = screen->get_video_param(screen,
                                PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH,
                                PIPE_VIDEO_ENTRYPOINT_ENCODE,
@@ -242,7 +242,7 @@ static OMX_ERRORTYPE vid_enc_Constructor(OMX_COMPONENTTYPE *comp, OMX_STRING nam

   port->Port_AllocateBuffer = vid_enc_AllocateOutBuffer;
   port->Port_FreeBuffer = vid_enc_FreeOutBuffer;
- 
+
   priv->bitrate.eControlRate = OMX_Video_ControlRateDisable;
   priv->bitrate.nTargetBitrate = 0;

@@ -253,7 +253,7 @@ static OMX_ERRORTYPE vid_enc_Constructor(OMX_COMPONENTTYPE *comp, OMX_STRING nam
   priv->profile_level.eProfile = OMX_VIDEO_AVCProfileBaseline;
   priv->profile_level.eLevel = OMX_VIDEO_AVCLevel42;

-   priv->force_pic_type.IntraRefreshVOP = OMX_FALSE; 
+   priv->force_pic_type.IntraRefreshVOP = OMX_FALSE;
   priv->frame_num = 0;
   priv->pic_order_cnt = 0;
   priv->restricted_b_frames = debug_get_bool_option("OMX_USE_RESTRICTED_B_FRAMES", FALSE);
@@ -380,7 +380,7 @@ static OMX_ERRORTYPE vid_enc_SetParameter(OMX_HANDLETYPE handle, OMX_INDEXTYPE i

         port = (omx_base_video_PortType *)priv->ports[OMX_BASE_FILTER_OUTPUTPORT_INDEX];
         port->sPortParam.nBufferSize = framesize * 512 / (16*16);
-      
+
         priv->frame_rate = def->format.video.xFramerate;

         priv->callbacks->EventHandler(comp, priv->callbackData, OMX_EventPortSettingsChanged,
@@ -532,10 +532,10 @@ static OMX_ERRORTYPE vid_enc_SetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx,
   vid_enc_PrivateType *priv = comp->pComponentPrivate;
   OMX_ERRORTYPE r;
   int i;
- 
+
   if (!config)
      return OMX_ErrorBadParameter;
-                         
+
   switch(idx) {
   case OMX_IndexConfigVideoIntraVOPRefresh: {
      OMX_CONFIG_INTRAREFRESHVOPTYPE *type = config;
@@ -543,9 +543,9 @@ static OMX_ERRORTYPE vid_enc_SetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx,
      r = checkHeader(config, sizeof(OMX_CONFIG_INTRAREFRESHVOPTYPE));
      if (r)
         return r;
-      
+
      priv->force_pic_type = *type;
-      
+
      break;
   }
   case OMX_IndexConfigCommonScale: {
@@ -568,11 +568,11 @@ static OMX_ERRORTYPE vid_enc_SetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx,
      priv->scale = *scale;
      if (priv->scale.xWidth != 0xffffffff && priv->scale.xHeight != 0xffffffff) {
         struct pipe_video_buffer templat = {};
- 
+
         templat.buffer_format = PIPE_FORMAT_NV12;
         templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
-         templat.width = priv->scale.xWidth; 
-         templat.height = priv->scale.xHeight; 
+         templat.width = priv->scale.xWidth;
+         templat.height = priv->scale.xHeight;
         templat.interlaced = false;
         for (i = 0; i < OMX_VID_ENC_NUM_SCALING_BUFFERS; ++i) {
            priv->scale_buffer[i] = priv->s_pipe->create_video_buffer(priv->s_pipe, &templat);
@@ -615,7 +615,7 @@ static OMX_ERRORTYPE vid_enc_GetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx,
   default:
      return omx_base_component_GetConfig(handle, idx, config);
   }
-   
+
   return OMX_ErrorNone;
 }

@@ -1010,10 +1010,10 @@ static void enc_ControlPicture(omx_base_PortType *port, struct pipe_h264_enc_pic
   switch (priv->bitrate.eControlRate) {
   case OMX_Video_ControlRateVariable:
      rate_ctrl->rate_ctrl_method = PIPE_H264_ENC_RATE_CONTROL_METHOD_VARIABLE;
-      break; 
+      break;
   case OMX_Video_ControlRateConstant:
      rate_ctrl->rate_ctrl_method = PIPE_H264_ENC_RATE_CONTROL_METHOD_CONSTANT;
-      break; 
+      break;
   case OMX_Video_ControlRateVariableSkipFrames:
      rate_ctrl->rate_ctrl_method = PIPE_H264_ENC_RATE_CONTROL_METHOD_VARIABLE_SKIP;
      break;
@@ -1023,8 +1023,8 @@ static void enc_ControlPicture(omx_base_PortType *port, struct pipe_h264_enc_pic
   default:
      rate_ctrl->rate_ctrl_method = PIPE_H264_ENC_RATE_CONTROL_METHOD_DISABLE;
      break;
-   } 
-      
+   }
+
   rate_ctrl->frame_rate_den = OMX_VID_ENC_CONTROL_FRAME_RATE_DEN_DEFAULT;
   rate_ctrl->frame_rate_num = ((priv->frame_rate) >> 16) * rate_ctrl->frame_rate_den;

@@ -1035,7 +1035,7 @@ static void enc_ControlPicture(omx_base_PortType *port, struct pipe_h264_enc_pic
         rate_ctrl->target_bitrate = priv->bitrate.nTargetBitrate;
      else
         rate_ctrl->target_bitrate = OMX_VID_ENC_BITRATE_MAX;
-      rate_ctrl->peak_bitrate = rate_ctrl->target_bitrate;    
+      rate_ctrl->peak_bitrate = rate_ctrl->target_bitrate;
      if (rate_ctrl->target_bitrate < OMX_VID_ENC_BITRATE_MEDIAN)
         rate_ctrl->vbv_buffer_size = MIN2((rate_ctrl->target_bitrate * 2.75), OMX_VID_ENC_BITRATE_MEDIAN);
      else
@@ -1051,7 +1051,7 @@ static void enc_ControlPicture(omx_base_PortType *port, struct pipe_h264_enc_pic
      rate_ctrl->peak_bits_picture_integer = rate_ctrl->target_bits_picture;
      rate_ctrl->peak_bits_picture_fraction = 0;
   }
-   
+
   picture->quant_i_frames = priv->quant.nQpI;
   picture->quant_p_frames = priv->quant.nQpP;
   picture->quant_b_frames = priv->quant.nQpB;
@@ -1069,7 +1069,7 @@ static void enc_HandleTask(omx_base_PortType *port, struct encode_task *task,
   unsigned size = priv->ports[OMX_BASE_FILTER_OUTPUTPORT_INDEX]->sPortParam.nBufferSize;
   struct pipe_video_buffer *vbuf = task->buf;
   struct pipe_h264_enc_picture_desc picture = {};
- 
+
   /* -------------- scale input image --------- */
   enc_ScaleInput(port, &vbuf, &size);
   priv->s_pipe->flush(priv->s_pipe, NULL, 0);
@@ -1160,7 +1160,7 @@ static OMX_ERRORTYPE vid_enc_EncodeFrame(omx_base_PortType *port, OMX_BUFFERHEAD
       priv->force_pic_type.IntraRefreshVOP) {
      enc_ClearBframes(port, inp);
      picture_type = PIPE_H264_ENC_PICTURE_TYPE_IDR;
-      priv->force_pic_type.IntraRefreshVOP = OMX_FALSE; 
+      priv->force_pic_type.IntraRefreshVOP = OMX_FALSE;
      priv->frame_num = 0;
   } else if (priv->codec->profile == PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE ||
              !(priv->pic_order_cnt % OMX_VID_ENC_P_PERIOD_DEFAULT) ||
@@ -1169,7 +1169,7 @@ static OMX_ERRORTYPE vid_enc_EncodeFrame(omx_base_PortType *port, OMX_BUFFERHEAD
   } else {
      picture_type = PIPE_H264_ENC_PICTURE_TYPE_B;
   }
-   
+
   task->pic_order_cnt = priv->pic_order_cnt++;

   if (picture_type == PIPE_H264_ENC_PICTURE_TYPE_B) {
@@ -1245,7 +1245,7 @@ static void vid_enc_BufferEncoded(OMX_COMPONENTTYPE *comp, OMX_BUFFERHEADERTYPE*
   output->pBuffer = priv->t_pipe->transfer_map(priv->t_pipe, outp->bitstream, 0,
                                                PIPE_TRANSFER_READ_WRITE,
                                                &box, &outp->transfer);
- 
+
   /* ------------- get size of result ----------------- */

   priv->codec->get_feedback(priv->codec, task->feedback, &size);
--- a/src/gallium/winsys/svga/drm/vmw_screen_ioctl.c
+++ b/src/gallium/winsys/svga/drm/vmw_screen_ioctl.c
@@ -52,6 +52,7 @@
 #include <unistd.h>

 #define VMW_MAX_DEFAULT_TEXTURE_SIZE   (128 * 1024 * 1024)
+#define VMW_FENCE_TIMEOUT_SECONDS 60

 struct vmw_region
 {
@@ -721,7 +722,7 @@ vmw_ioctl_fence_finish(struct vmw_winsys_screen *vws,
   memset(&arg, 0, sizeof(arg));

   arg.handle = handle;
-   arg.timeout_us = 10*1000000;
+   arg.timeout_us = VMW_FENCE_TIMEOUT_SECONDS*1000000;
   arg.lazy = 0;
   arg.flags = vflags;

--- a/src/gallium/winsys/svga/drm/vmw_surface.c
+++ b/src/gallium/winsys/svga/drm/vmw_surface.c
@@ -170,6 +170,8 @@ vmw_svga_winsys_surface_unmap(struct svga_winsys_context *swc,
      *rebind = vsrf->rebind;
      vsrf->rebind = FALSE;
      vmw_svga_winsys_buffer_unmap(&vsrf->screen->base, vsrf->buf);
+   } else {
+      *rebind = FALSE;
   }
   pipe_mutex_unlock(vsrf->mutex);
 }
--- a/src/mesa/Android.libmesa_dricore.mk
+++ b/src/mesa/Android.libmesa_dricore.mk
@@ -48,9 +48,8 @@ endif # x86
 endif # MESA_ENABLE_ASM

 ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
-LOCAL_SRC_FILES += \
-	main/streaming-load-memcpy.c \
-	main/sse_minmax.c
+LOCAL_WHOLE_STATIC_LIBRARIES := \
+	libmesa_sse41
 LOCAL_CFLAGS := \
 	-msse4.1 \
       -DUSE_SSE41
@@ -63,7 +62,7 @@ LOCAL_C_INCLUDES := \
 	$(MESA_TOP)/src/gallium/include \
 	$(MESA_TOP)/src/gallium/auxiliary

-LOCAL_WHOLE_STATIC_LIBRARIES := \
+LOCAL_WHOLE_STATIC_LIBRARIES += \
 	libmesa_program

 include $(LOCAL_PATH)/Android.gen.mk
--- a/src/mesa/Android.libmesa_sse41.mk
+++ b/src/mesa/Android.libmesa_sse41.mk
@@ -0,0 +1,44 @@
+# Copyright 2012 Intel Corporation
+# Copyright (C) 2010-2011 Chia-I Wu <olvaffe@gmail.com>
+# Copyright (C) 2010-2011 LunarG Inc.
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
+
+LOCAL_PATH := $(call my-dir)
+
+include $(LOCAL_PATH)/Makefile.sources
+
+include $(CLEAR_VARS)
+
+LOCAL_MODULE := libmesa_sse41
+
+LOCAL_SRC_FILES += \
+	$(X86_SSE41_FILES)
+
+LOCAL_C_INCLUDES := \
+	$(MESA_TOP)/src/mapi \
+	$(MESA_TOP)/src/gallium/include \
+	$(MESA_TOP)/src/gallium/auxiliary
+
+include $(MESA_COMMON_MK)
+include $(BUILD_STATIC_LIBRARY)
+
+endif
--- a/src/mesa/Android.libmesa_st_mesa.mk
+++ b/src/mesa/Android.libmesa_st_mesa.mk
@@ -47,6 +47,8 @@ endif # x86
 endif # MESA_ENABLE_ASM

 ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
+LOCAL_WHOLE_STATIC_LIBRARIES := \
+	libmesa_sse41
 LOCAL_CFLAGS := \
       -DUSE_SSE41
 endif
@@ -58,7 +60,7 @@ LOCAL_C_INCLUDES := \
 	$(MESA_TOP)/src/gallium/auxiliary \
 	$(MESA_TOP)/src/gallium/include

-LOCAL_WHOLE_STATIC_LIBRARIES := \
+LOCAL_WHOLE_STATIC_LIBRARIES += \
 	libmesa_program

 include $(LOCAL_PATH)/Android.gen.mk
--- a/src/mesa/Android.mk
+++ b/src/mesa/Android.mk
@@ -24,5 +24,6 @@ include $(LOCAL_PATH)/Android.mesa_gen_matypes.mk
 include $(LOCAL_PATH)/Android.libmesa_glsl_utils.mk
 include $(LOCAL_PATH)/Android.libmesa_dricore.mk
 include $(LOCAL_PATH)/Android.libmesa_st_mesa.mk
+include $(LOCAL_PATH)/Android.libmesa_sse41.mk

 include $(LOCAL_PATH)/program/Android.mk
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -586,6 +586,10 @@ X86_64_FILES =		\
 	x86-64/x86-64.h	\
 	x86-64/xform4.S

+X86_SSE41_FILES = \
+	main/streaming-load-memcpy.c \
+	main/sse_minmax.c
+
 SPARC_FILES =			\
 	sparc/sparc.h		\
 	sparc/sparc_clip.S	\
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2441,8 +2441,10 @@ fs_visitor::opt_sampler_eot()
    * we have enough space, but it will make sure the dead code eliminator kills
    * the instruction that this will replace.
    */
-   if (tex_inst->header_size != 0)
+   if (tex_inst->header_size != 0) {
+      invalidate_live_intervals();
      return true;
+   }

   fs_reg send_header = ibld.vgrf(BRW_REGISTER_TYPE_F,
                                  load_payload->sources + 1);
@@ -2473,6 +2475,7 @@ fs_visitor::opt_sampler_eot()
   tex_inst->insert_before(cfg->blocks[cfg->num_blocks - 1], new_load_payload);
   tex_inst->src[0] = send_header;

+   invalidate_live_intervals();
   return true;
 }

@@ -5187,12 +5190,18 @@ fs_visitor::optimize()
 void
 fs_visitor::fixup_3src_null_dest()
 {
+   bool progress = false;
+
   foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
      if (inst->is_3src() && inst->dst.is_null()) {
         inst->dst = fs_reg(VGRF, alloc.allocate(dispatch_width / 8),
                            inst->dst.type);
+         progress = true;
      }
   }
+
+   if (progress)
+      invalidate_live_intervals();
 }

 void
@@ -5228,7 +5237,7 @@ fs_visitor::allocate_registers()
       * SIMD8.  There's probably actually some intermediate point where
       * SIMD16 with a couple of spills is still better.
       */
-      if (dispatch_width == 16) {
+      if (dispatch_width == 16 && min_dispatch_width <= 8) {
         fail("Failure to register allocate.  Reduce number of "
              "live scalar values to avoid this.");
      } else {
@@ -5470,6 +5479,13 @@ fs_visitor::run_cs()
   if (shader_time_index >= 0)
      emit_shader_time_begin();

+   if (devinfo->is_haswell && prog_data->total_shared > 0) {
+      /* Move SLM index from g0.0[27:24] to sr0.1[11:8] */
+      const fs_builder abld = bld.exec_all().group(1, 0);
+      abld.MOV(retype(suboffset(brw_sr0_reg(), 1), BRW_REGISTER_TYPE_UW),
+               suboffset(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UW), 1));
+   }
+
   emit_nir_code();

   if (failed)
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -407,6 +407,7 @@ public:
   bool spilled_any_registers;

   const unsigned dispatch_width; /**< 8 or 16 */
+   unsigned min_dispatch_width;

   int shader_time_index;

--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1021,6 +1021,18 @@ fs_visitor::init()
      unreachable("unhandled shader stage");
   }

+   if (stage == MESA_SHADER_COMPUTE) {
+      const brw_cs_prog_data *cs_prog_data =
+         (const brw_cs_prog_data *) prog_data;
+      unsigned size = cs_prog_data->local_size[0] *
+                      cs_prog_data->local_size[1] *
+                      cs_prog_data->local_size[2];
+      size = DIV_ROUND_UP(size, devinfo->max_cs_threads);
+      min_dispatch_width = size > 16 ? 32 : (size > 8 ? 16 : 8);
+   } else {
+      min_dispatch_width = 8;
+   }
+
   this->prog_data = this->stage_prog_data;

   this->failed = false;
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -338,8 +338,6 @@ brw_emit_mi_flush(struct brw_context *brw)
      }
      brw_emit_pipe_control_flush(brw, flags);
   }
-
-   brw_render_cache_set_clear(brw);
 }

 int
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -736,6 +736,22 @@ brw_notification_reg(void)
                  WRITEMASK_X);
 }

+static inline struct brw_reg
+brw_sr0_reg(void)
+{
+   return brw_reg(BRW_ARCHITECTURE_REGISTER_FILE,
+                  BRW_ARF_STATE,
+                  0,
+                  0,
+                  0,
+                  BRW_REGISTER_TYPE_UD,
+                  BRW_VERTICAL_STRIDE_8,
+                  BRW_WIDTH_8,
+                  BRW_HORIZONTAL_STRIDE_1,
+                  BRW_SWIZZLE_XYZW,
+                  WRITEMASK_XYZW);
+}
+
 static inline struct brw_reg
 brw_acc_reg(unsigned width)
 {
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1033,6 +1033,7 @@ vec4_visitor::opt_register_coalesce()

         if (is_nop_mov) {
            inst->remove(block);
+            progress = true;
            continue;
         }
      }
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -685,9 +685,7 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
   case nir_intrinsic_load_instance_id:
   case nir_intrinsic_load_base_instance:
   case nir_intrinsic_load_draw_id:
-   case nir_intrinsic_load_invocation_id:
-   case nir_intrinsic_load_tess_level_inner:
-   case nir_intrinsic_load_tess_level_outer: {
+   case nir_intrinsic_load_invocation_id: {
      gl_system_value sv = nir_system_value_from_intrinsic(instr->intrinsic);
      src_reg val = src_reg(nir_system_values[sv]);
      assert(val.file != BAD_FILE);
--- a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
@@ -402,6 +402,7 @@ vec4_tcs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
         }
      } else if (imm_offset == 1 && indirect_offset.file == BAD_FILE) {
         dst.type = BRW_REGISTER_TYPE_F;
+         unsigned swiz = BRW_SWIZZLE_WZYX;

         /* This is a read of gl_TessLevelOuter[], which lives in the
          * high 4 DWords of the Patch URB header, in reverse order.
@@ -414,6 +415,8 @@ vec4_tcs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
            dst.writemask = WRITEMASK_XYZ;
            break;
         case GL_ISOLINES:
+            /* Isolines are not reversed; swizzle .zw -> .xy */
+            swiz = BRW_SWIZZLE_ZWZW;
            dst.writemask = WRITEMASK_XY;
            return;
         default:
@@ -422,7 +425,7 @@ vec4_tcs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)

         dst_reg tmp(this, glsl_type::vec4_type);
         emit_output_urb_read(tmp, 1, src_reg());
-         emit(MOV(dst, swizzle(src_reg(tmp), BRW_SWIZZLE_WZYX)));
+         emit(MOV(dst, swizzle(src_reg(tmp), swiz)));
      } else {
         emit_output_urb_read(dst, imm_offset, indirect_offset);
      }
@@ -475,8 +478,15 @@ vec4_tcs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
          * Patch URB Header at DWords 4-7.  However, it's reversed, so
          * instead of .xyzw we have .wzyx.
          */
-         swiz = BRW_SWIZZLE_WZYX;
-         mask = writemask_for_backwards_vector(mask);
+         if (key->tes_primitive_mode == GL_ISOLINES) {
+            /* Isolines .xy should be stored in .zw, in order. */
+            swiz = BRW_SWIZZLE4(0, 0, 0, 1);
+            mask <<= 2;
+         } else {
+            /* Other domains are reversed; store .wzyx instead of .xyzw. */
+            swiz = BRW_SWIZZLE_WZYX;
+            mask = writemask_for_backwards_vector(mask);
+         }
      }

      emit_urb_write(swizzle(value, swiz), mask,
--- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
@@ -28,6 +28,7 @@
 */

 #include "brw_vec4_tes.h"
+#include "brw_cfg.h"

 namespace brw {

@@ -53,39 +54,10 @@ vec4_tes_visitor::make_reg_for_system_value(int location, const glsl_type *type)
 void
 vec4_tes_visitor::nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr)
 {
-   const struct brw_tes_prog_data *tes_prog_data =
-      (const struct brw_tes_prog_data *) prog_data;
-
   switch (instr->intrinsic) {
-   case nir_intrinsic_load_tess_level_outer: {
-      dst_reg dst(this, glsl_type::vec4_type);
-      nir_system_values[SYSTEM_VALUE_TESS_LEVEL_OUTER] = dst;
-
-      dst_reg temp(this, glsl_type::vec4_type);
-      vec4_instruction *read =
-         emit(VEC4_OPCODE_URB_READ, temp, input_read_header);
-      read->offset = 1;
-      read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET;
-      emit(MOV(dst, swizzle(src_reg(temp), BRW_SWIZZLE_WZYX)));
+   case nir_intrinsic_load_tess_level_outer:
+   case nir_intrinsic_load_tess_level_inner:
      break;
-   }
-   case nir_intrinsic_load_tess_level_inner: {
-      dst_reg dst(this, glsl_type::vec2_type);
-      nir_system_values[SYSTEM_VALUE_TESS_LEVEL_INNER] = dst;
-
-      /* Set up the message header to reference the proper parts of the URB */
-      dst_reg temp(this, glsl_type::vec4_type);
-      vec4_instruction *read =
-         emit(VEC4_OPCODE_URB_READ, temp, input_read_header);
-      read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET;
-      if (tes_prog_data->domain == BRW_TESS_DOMAIN_QUAD) {
-         emit(MOV(dst, swizzle(src_reg(temp), BRW_SWIZZLE_WZYX)));
-      } else {
-         read->offset = 1;
-         emit(MOV(dst, src_reg(temp)));
-      }
-      break;
-   }
   default:
      vec4_visitor::nir_setup_system_value_intrinsic(instr);
   }
@@ -105,6 +77,25 @@ vec4_tes_visitor::setup_payload()

   reg = setup_uniforms(reg);

+   foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
+      for (int i = 0; i < 3; i++) {
+         if (inst->src[i].file != ATTR)
+            continue;
+
+         struct brw_reg grf =
+            brw_vec4_grf(reg + inst->src[i].nr / 2, 4 * (inst->src[i].nr % 2));
+         grf = stride(grf, 0, 4, 1);
+         grf.swizzle = inst->src[i].swizzle;
+         grf.type = inst->src[i].type;
+         grf.abs = inst->src[i].abs;
+         grf.negate = inst->src[i].negate;
+
+         inst->src[i] = grf;
+      }
+   }
+
+   reg += 8 * prog_data->urb_read_length;
+
   this->first_non_payload_grf = reg;
 }

@@ -148,12 +139,36 @@ vec4_tes_visitor::emit_urb_write_opcode(bool complete)
 void
 vec4_tes_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
 {
+   const struct brw_tes_prog_data *tes_prog_data =
+      (const struct brw_tes_prog_data *) prog_data;
+
   switch (instr->intrinsic) {
   case nir_intrinsic_load_tess_coord:
      /* gl_TessCoord is part of the payload in g1 channels 0-2 and 4-6. */
      emit(MOV(get_nir_dest(instr->dest, BRW_REGISTER_TYPE_F),
               src_reg(brw_vec8_grf(1, 0))));
      break;
+   case nir_intrinsic_load_tess_level_outer:
+      if (tes_prog_data->domain == BRW_TESS_DOMAIN_ISOLINE) {
+         emit(MOV(get_nir_dest(instr->dest, BRW_REGISTER_TYPE_F),
+                  swizzle(src_reg(ATTR, 1, glsl_type::vec4_type),
+                          BRW_SWIZZLE_ZWZW)));
+      } else {
+         emit(MOV(get_nir_dest(instr->dest, BRW_REGISTER_TYPE_F),
+                  swizzle(src_reg(ATTR, 1, glsl_type::vec4_type),
+                          BRW_SWIZZLE_WZYX)));
+      }
+      break;
+   case nir_intrinsic_load_tess_level_inner:
+      if (tes_prog_data->domain == BRW_TESS_DOMAIN_QUAD) {
+         emit(MOV(get_nir_dest(instr->dest, BRW_REGISTER_TYPE_F),
+                  swizzle(src_reg(ATTR, 0, glsl_type::vec4_type),
+                          BRW_SWIZZLE_WZYX)));
+      } else {
+         emit(MOV(get_nir_dest(instr->dest, BRW_REGISTER_TYPE_F),
+                  src_reg(ATTR, 1, glsl_type::float_type)));
+      }
+      break;
   case nir_intrinsic_load_primitive_id:
      emit(TES_OPCODE_GET_PRIMITIVE_ID,
           get_nir_dest(instr->dest, BRW_REGISTER_TYPE_UD));
@@ -169,6 +184,19 @@ vec4_tes_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
         header = src_reg(this, glsl_type::uvec4_type);
         emit(TES_OPCODE_ADD_INDIRECT_URB_OFFSET, dst_reg(header),
              input_read_header, indirect_offset);
+      } else {
+         /* Arbitrarily only push up to 24 vec4 slots worth of data,
+          * which is 12 registers (since each holds 2 vec4 slots).
+          */
+         const unsigned max_push_slots = 24;
+         if (imm_offset < max_push_slots) {
+            emit(MOV(get_nir_dest(instr->dest, BRW_REGISTER_TYPE_D),
+                     src_reg(ATTR, imm_offset, glsl_type::ivec4_type)));
+            prog_data->urb_read_length =
+               MAX2(prog_data->urb_read_length,
+                    DIV_ROUND_UP(imm_offset + 1, 2));
+            break;
+         }
      }

      dst_reg temp(this, glsl_type::ivec4_type);
--- a/src/mesa/drivers/dri/i965/intel_copy_image.c
+++ b/src/mesa/drivers/dri/i965/intel_copy_image.c
@@ -140,9 +140,9 @@ copy_image_with_memcpy(struct brw_context *brw,
   _mesa_get_format_block_size(src_mt->format, &src_bw, &src_bh);

   assert(src_width % src_bw == 0);
-   assert(src_height % src_bw == 0);
+   assert(src_height % src_bh == 0);
   assert(src_x % src_bw == 0);
-   assert(src_y % src_bw == 0);
+   assert(src_y % src_bh == 0);

   /* If we are on the same miptree, same level, and same slice, then
    * intel_miptree_map won't let us map it twice.  We have to do things a
@@ -153,7 +153,7 @@ copy_image_with_memcpy(struct brw_context *brw,

   if (same_slice) {
      assert(dst_x % src_bw == 0);
-      assert(dst_y % src_bw == 0);
+      assert(dst_y % src_bh == 0);

      map_x1 = MIN2(src_x, dst_x);
      map_y1 = MIN2(src_y, dst_y);
--- a/src/mesa/drivers/dri/i965/intel_fbo.c
+++ b/src/mesa/drivers/dri/i965/intel_fbo.c
@@ -1065,7 +1065,28 @@ brw_render_cache_set_check_flush(struct brw_context *brw, drm_intel_bo *bo)
   if (!_mesa_set_search(brw->render_cache, bo))
      return;

-   brw_emit_mi_flush(brw);
+   if (brw->gen >= 6) {
+      if (brw->gen == 6) {
+         /* [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache
+          * Flush Enable = 1, a PIPE_CONTROL with any non-zero
+          * post-sync-op is required.
+          */
+         brw_emit_post_sync_nonzero_flush(brw);
+      }
+
+      brw_emit_pipe_control_flush(brw,
+                                  PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+                                  PIPE_CONTROL_RENDER_TARGET_FLUSH |
+                                  PIPE_CONTROL_CS_STALL);
+
+      brw_emit_pipe_control_flush(brw,
+                                  PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
+                                  PIPE_CONTROL_CONST_CACHE_INVALIDATE);
+   } else {
+      brw_emit_mi_flush(brw);
+   }
+
+   brw_render_cache_set_clear(brw);
 }

 /**
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -50,7 +50,7 @@ intel_miptree_create_for_teximage(struct brw_context *brw,
      width <<= 1;
      if (height != 1)
         height <<= 1;
-      if (depth != 1)
+      if (intelObj->base.Target == GL_TEXTURE_3D)
         depth <<= 1;
   }

--- a/src/mesa/main/debug_output.c
+++ b/src/mesa/main/debug_output.c
@@ -761,15 +761,11 @@ _mesa_set_debug_state_int(struct gl_context *ctx, GLenum pname, GLint val)
 GLint
 _mesa_get_debug_state_int(struct gl_context *ctx, GLenum pname)
 {
-   struct gl_debug_state *debug;
   GLint val;

-   mtx_lock(&ctx->DebugMutex);
-   debug = ctx->Debug;
-   if (!debug) {
-      mtx_unlock(&ctx->DebugMutex);
+   struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);
+   if (!debug)
      return 0;
-   }

   switch (pname) {
   case GL_DEBUG_OUTPUT:
@@ -794,7 +790,7 @@ _mesa_get_debug_state_int(struct gl_context *ctx, GLenum pname)
      break;
   }

-   mtx_unlock(&ctx->DebugMutex);
+   _mesa_unlock_debug_state(ctx);

   return val;
 }
@@ -806,15 +802,11 @@ _mesa_get_debug_state_int(struct gl_context *ctx, GLenum pname)
 void *
 _mesa_get_debug_state_ptr(struct gl_context *ctx, GLenum pname)
 {
-   struct gl_debug_state *debug;
   void *val;
+   struct gl_debug_state *debug = _mesa_lock_debug_state(ctx);

-   mtx_lock(&ctx->DebugMutex);
-   debug = ctx->Debug;
-   if (!debug) {
-      mtx_unlock(&ctx->DebugMutex);
+   if (!debug)
      return NULL;
-   }

   switch (pname) {
   case GL_DEBUG_CALLBACK_FUNCTION_ARB:
@@ -829,7 +821,7 @@ _mesa_get_debug_state_ptr(struct gl_context *ctx, GLenum pname)
      break;
   }

-   mtx_unlock(&ctx->DebugMutex);
+   _mesa_unlock_debug_state(ctx);

   return val;
 }
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -2815,6 +2815,7 @@ reuse_framebuffer_texture_attachment(struct gl_framebuffer *fb,
   dst_att->Complete = src_att->Complete;
   dst_att->TextureLevel = src_att->TextureLevel;
   dst_att->Zoffset = src_att->Zoffset;
+   dst_att->Layered = src_att->Layered;
 }


--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -1055,6 +1055,8 @@ find_custom_value(struct gl_context *ctx, const struct value_desc *d, union valu
      }
      break;
   /* GL_KHR_DEBUG */
+   case GL_DEBUG_OUTPUT:
+   case GL_DEBUG_OUTPUT_SYNCHRONOUS:
   case GL_DEBUG_LOGGED_MESSAGES:
   case GL_DEBUG_NEXT_LOGGED_MESSAGE_LENGTH:
   case GL_DEBUG_GROUP_STACK_DEPTH:
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -126,6 +126,8 @@ descriptor=[
  [ "MAX_TEXTURE_MAX_ANISOTROPY_EXT", "CONTEXT_FLOAT(Const.MaxTextureMaxAnisotropy), extra_EXT_texture_filter_anisotropic" ],

 # GL_KHR_debug (GL 4.3)/ GL_ARB_debug_output
+  [ "DEBUG_OUTPUT", "LOC_CUSTOM, TYPE_BOOLEAN, 0, NO_EXTRA" ],
+  [ "DEBUG_OUTPUT_SYNCHRONOUS", "LOC_CUSTOM, TYPE_BOOLEAN, 0, NO_EXTRA" ],
  [ "DEBUG_LOGGED_MESSAGES", "LOC_CUSTOM, TYPE_INT, 0, NO_EXTRA" ],
  [ "DEBUG_NEXT_LOGGED_MESSAGE_LENGTH", "LOC_CUSTOM, TYPE_INT, 0, NO_EXTRA" ],
  [ "MAX_DEBUG_LOGGED_MESSAGES", "CONST(MAX_DEBUG_LOGGED_MESSAGES), NO_EXTRA" ],
@@ -773,6 +775,7 @@ descriptor=[
  [ "DEPTH_CLAMP", "CONTEXT_BOOL(Transform.DepthClamp), extra_ARB_depth_clamp" ],

 # GL_ATI_fragment_shader
+  [ "FRAGMENT_SHADER_ATI", "CONTEXT_BOOL(ATIFragmentShader.Enabled), extra_ATI_fragment_shader" ],
  [ "NUM_FRAGMENT_REGISTERS_ATI", "CONST(6), extra_ATI_fragment_shader" ],
  [ "NUM_FRAGMENT_CONSTANTS_ATI", "CONST(8), extra_ATI_fragment_shader" ],
  [ "NUM_PASSES_ATI", "CONST(2), extra_ATI_fragment_shader" ],
--- a/src/mesa/main/shaderobj.c
+++ b/src/mesa/main/shaderobj.c
@@ -240,6 +240,8 @@ init_shader_program(struct gl_shader_program *prog)

   prog->TransformFeedback.BufferMode = GL_INTERLEAVED_ATTRIBS;

+   exec_list_make_empty(&prog->EmptyUniformLocations);
+
   prog->InfoLog = ralloc_strdup(prog, "");
 }

--- a/src/mesa/state_tracker/st_cb_fbo.c
+++ b/src/mesa/state_tracker/st_cb_fbo.c
@@ -387,6 +387,7 @@ st_update_renderbuffer_surface(struct st_context *st,
 {
   struct pipe_context *pipe = st->pipe;
   struct pipe_resource *resource = strb->texture;
+   struct st_texture_object *stTexObj = NULL;
   unsigned rtt_width = strb->Base.Width;
   unsigned rtt_height = strb->Base.Height;
   unsigned rtt_depth = strb->Base.Depth;
@@ -398,9 +399,18 @@ st_update_renderbuffer_surface(struct st_context *st,
    */
   boolean enable_srgb = (st->ctx->Color.sRGBEnabled &&
         _mesa_get_format_color_encoding(strb->Base.Format) == GL_SRGB);
-   enum pipe_format format = (enable_srgb) ?
-      util_format_srgb(resource->format) :
-      util_format_linear(resource->format);
+   enum pipe_format format = resource->format;
+
+   if (strb->is_rtt) {
+      stTexObj = st_texture_object(strb->Base.TexImage->TexObject);
+      if (stTexObj->surface_based)
+         format = stTexObj->surface_format;
+   }
+
+   format = (enable_srgb) ?
+      util_format_srgb(format) :
+      util_format_linear(format);
+
   unsigned first_layer, last_layer, level;

   if (resource->target == PIPE_TEXTURE_1D_ARRAY) {
@@ -431,8 +441,8 @@ st_update_renderbuffer_surface(struct st_context *st,

   /* Adjust for texture views */
   if (strb->is_rtt && resource->array_size > 1 &&
-       strb->Base.TexImage->TexObject->Immutable) {
-      struct gl_texture_object *tex = strb->Base.TexImage->TexObject;
+       stTexObj->base.Immutable) {
+      struct gl_texture_object *tex = &stTexObj->base;
      first_layer += tex->MinLayer;
      if (!strb->rtt_layered)
         last_layer += tex->MinLayer;
@@ -492,8 +502,6 @@ st_render_texture(struct gl_context *ctx,

   st_update_renderbuffer_surface(st, strb);

-   strb->Base.Format = st_pipe_format_to_mesa_format(pt->format);
-
   /* Invalidate buffer state so that the pipe's framebuffer state
    * gets updated.
    * That's where the new renderbuffer (which we just created) gets
--- a/src/mesa/state_tracker/st_cb_texture.c
+++ b/src/mesa/state_tracker/st_cb_texture.c
@@ -2886,10 +2886,13 @@ st_finalize_texture(struct gl_context *ctx,
         /* Need to import images in main memory or held in other textures.
          */
         if (stImage && stObj->pt != stImage->pt) {
+            GLuint depth = stObj->depth0;
+            if (stObj->base.Target == GL_TEXTURE_3D)
+               depth = u_minify(depth, level);
            if (level == 0 ||
                (stImage->base.Width == u_minify(stObj->width0, level) &&
                 stImage->base.Height == u_minify(stObj->height0, level) &&
-                 stImage->base.Depth == u_minify(stObj->depth0, level))) {
+                 stImage->base.Depth == depth)) {
               /* src image fits expected dest mipmap level size */
               copy_image_data_to_texture(st, stObj, level, stImage);
            }
--- a/src/mesa/swrast/s_context.c
+++ b/src/mesa/swrast/s_context.c
@@ -900,11 +900,16 @@ void
 _swrast_render_finish( struct gl_context *ctx )
 {
   SWcontext *swrast = SWRAST_CONTEXT(ctx);
+   struct gl_query_object *query = ctx->Query.CurrentOcclusionObject;

   _swrast_flush(ctx);

   if (swrast->Driver.SpanRenderFinish)
      swrast->Driver.SpanRenderFinish( ctx );
+
+   if (query && (query->Target == GL_ANY_SAMPLES_PASSED ||
+                 query->Target == GL_ANY_SAMPLES_PASSED_CONSERVATIVE))
+      query->Result = !!query->Result;
 }