Add release notes for the 10.4.2 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Update version to 10.4.2
2015-01-12 10:30:28 +00:00 · 2015-01-12 10:24:59 +00:00 · 2015-01-07 17:39:52 +00:00 · 2015-01-07 17:35:39 +00:00 · 2015-01-07 17:31:12 +00:00 · 2015-01-07 17:29:01 +00:00
53 changed files with 515 additions and 130 deletions
--- a/2
+++ b/2
@@ -1 +1 @@
-10.4.0
+10.4.2
--- a/docs/index.html
+++ b/docs/index.html
@@ -16,6 +16,13 @@

 <h1>News</h1>

+<h2>December 14, 2014</h2>
+<p>
+<a href="relnotes/10.4.html">Mesa 10.4</a> is released.  This is a new
+development release.  See the release notes for more information about
+the release.
+</p>
+
 <h2>November 8, 2014</h2>
 <p>
 <a href="relnotes/10.3.3.html">Mesa 10.3.3</a> is released.
--- a/docs/relnotes.html
+++ b/docs/relnotes.html
@@ -21,6 +21,7 @@ The release notes summarize what's new or changed in each Mesa release.
 </p>

 <ul>
+<li><a href="relnotes/10.4.html">10.4 release notes</a>
 <li><a href="relnotes/10.3.3.html">10.3.3 release notes</a>
 <li><a href="relnotes/10.3.2.html">10.3.2 release notes</a>
 <li><a href="relnotes/10.3.1.html">10.3.1 release notes</a>
--- a/docs/relnotes/10.4.1.html
+++ b/docs/relnotes/10.4.1.html
@@ -0,0 +1,97 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html lang="en">
+<head>
+  <meta http-equiv="content-type" content="text/html; charset=utf-8">
+  <title>Mesa Release Notes</title>
+  <link rel="stylesheet" type="text/css" href="../mesa.css">
+</head>
+<body>
+
+<div class="header">
+  <h1>The Mesa 3D Graphics Library</h1>
+</div>
+
+<iframe src="../contents.html"></iframe>
+<div class="content">
+
+<h1>Mesa 10.4.1 Release Notes / December 29, 2014</h1>
+
+<p>
+Mesa 10.4.1 is a bug fix release which fixes bugs found since the 10.4.0 release.
+</p>
+<p>
+Mesa 10.4.1 implements the OpenGL 3.3 API, but the version reported by
+glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
+glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
+Some drivers don't support all the features required in OpenGL 3.3.  OpenGL
+3.3 is <strong>only</strong> available if requested at context creation
+because compatibility contexts are not supported.
+</p>
+
+<h2>SHA256 checksums</h2>
+<pre>
+5311285e791a6bfaa468ad002bd1e1164acb3eaa040b5a1bf958bdb7c27e0a9d  MesaLib-10.4.1.tar.gz
+91e8b71c8aff4cb92022a09a872b1c5d1ae5bfec8c6c84dbc4221333da5bf1ca  MesaLib-10.4.1.tar.bz2
+e09c8135f5a86ecb21182c6f8959aafd39ae2f98858fdf7c0e25df65b5abcdb8  MesaLib-10.4.1.zip
+</pre>
+
+<h2>New features</h2>
+<p>None</p>
+
+<h2>Bug fixes</h2>
+
+<p>This list is likely incomplete.</p>
+
+<ul>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=82585">Bug 82585</a> - geometry shader with optional out variable segfaults</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=82991">Bug 82991</a> - Inverted bumpmap in webgl applications</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=83908">Bug 83908</a> - [i965] Incorrect icon colors in Steam Big Picture</li>
+
+</ul>
+
+
+<h2>Changes</h2>
+
+<p>Andres Gomez (1):</p>
+<ul>
+  <li>i965/brw_reg: struct constructor now needs explicit negate and abs values.</li>
+</ul>
+
+<p>Cody Northrop (1):</p>
+<ul>
+  <li>i965: Require pixel alignment for GPU copy blit</li>
+</ul>
+
+<p>Emil Velikov (3):</p>
+<ul>
+  <li>docs: Add 10.4 sha256 sums, news item and link release notes</li>
+  <li>Revert "glx/dri3: Request non-vsynced Present for swapinterval zero. (v3)"</li>
+  <li>Update version to 10.4.1</li>
+</ul>
+
+<p>Ian Romanick (2):</p>
+<ul>
+  <li>linker: Wrap access of producer_var with a NULL check</li>
+  <li>linker: Assign varying locations geometry shader inputs for SSO</li>
+</ul>
+
+<p>Mario Kleiner (4):</p>
+<ul>
+  <li>glx/dri3: Fix glXWaitForSbcOML() to handle targetSBC==0 correctly. (v2)</li>
+  <li>glx/dri3: Track separate (ust, msc) for PresentPixmap vs. PresentNotifyMsc (v2)</li>
+  <li>glx/dri3: Request non-vsynced Present for swapinterval zero. (v3)</li>
+  <li>glx/dri3: Don't fail on glXSwapBuffersMscOML(dpy, window, 0, 0, 0) (v2)</li>
+</ul>
+
+<p>Maxence Le Doré (1):</p>
+<ul>
+  <li>glsl: Add gl_MaxViewports to available builtin constants</li>
+</ul>
+
+
+</div>
+</body>
+</html>
--- a/docs/relnotes/10.4.2.html
+++ b/docs/relnotes/10.4.2.html
@@ -0,0 +1,125 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html lang="en">
+<head>
+  <meta http-equiv="content-type" content="text/html; charset=utf-8">
+  <title>Mesa Release Notes</title>
+  <link rel="stylesheet" type="text/css" href="../mesa.css">
+</head>
+<body>
+
+<div class="header">
+  <h1>The Mesa 3D Graphics Library</h1>
+</div>
+
+<iframe src="../contents.html"></iframe>
+<div class="content">
+
+<h1>Mesa 10.4.2 Release Notes / January 12, 2015</h1>
+
+<p>
+Mesa 10.4.2 is a bug fix release which fixes bugs found since the 10.4.1 release.
+</p>
+<p>
+Mesa 10.4.2 implements the OpenGL 3.3 API, but the version reported by
+glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
+glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
+Some drivers don't support all the features required in OpenGL 3.3.  OpenGL
+3.3 is <strong>only</strong> available if requested at context creation
+because compatibility contexts are not supported.
+</p>
+
+<h2>SHA256 checksums</h2>
+<pre>
+TBD
+</pre>
+
+<h2>New features</h2>
+<p>None</p>
+
+<h2>Bug fixes</h2>
+
+<p>This list is likely incomplete.</p>
+
+<ul>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=85529">Bug 85529</a> - Surfaces not drawn in Unvanquished</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=87619">Bug 87619</a> - Changes to state such as render targets change fragment shader without marking it dirty.</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=87658">Bug 87658</a> - [llvmpipe] SEGV in sse2_has_daz on ancient Pentium4-M</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=87913">Bug 87913</a> - CPU cacheline size of 0 can be returned by CPUID leaf 0x80000006 in some virtual machines</li>
+
+</ul>
+
+
+<h2>Changes</h2>
+
+<p>Chad Versace (2):</p>
+<ul>
+  <li>i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()</li>
+  <li>i965: Use safer pointer arithmetic in gather_oa_results()</li>
+</ul>
+
+<p>Dave Airlie (3):</p>
+<ul>
+  <li>Revert "r600g/sb: fix issues cause by GLSL switching to loops for switch"</li>
+  <li>r600g: fix regression since UCMP change</li>
+  <li>r600g/sb: implement r600 gpr index workaround. (v3.1)</li>
+</ul>
+
+<p>Emil Velikov (2):</p>
+<ul>
+  <li>docs: Add sha256 sums for the 10.4.1 release</li>
+  <li>Update version to 10.4.2</li>
+</ul>
+
+<p>Ilia Mirkin (2):</p>
+<ul>
+  <li>nv50,nvc0: set vertex id base to index_bias</li>
+  <li>nv50/ir: fix texture offsets in release builds</li>
+</ul>
+
+<p>Kenneth Graunke (2):</p>
+<ul>
+  <li>i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.</li>
+  <li>i965: Fix start/base_vertex_location for &gt;1 prims but !BRW_NEW_VERTICES.</li>
+</ul>
+
+<p>Leonid Shatz (1):</p>
+<ul>
+  <li>gallium/util: make sure cache line size is not zero</li>
+</ul>
+
+<p>Marek Olšák (4):</p>
+<ul>
+  <li>glsl_to_tgsi: fix a bug in copy propagation</li>
+  <li>vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays</li>
+  <li>st/mesa: fix GL_PRIMITIVE_RESTART_FIXED_INDEX</li>
+  <li>radeonsi: fix VertexID for OpenGL</li>
+</ul>
+
+<p>Michel Dänzer (1):</p>
+<ul>
+  <li>radeonsi: Don't modify PA_SC_RASTER_CONFIG register value if rb_mask == 0</li>
+</ul>
+
+<p>Roland Scheidegger (1):</p>
+<ul>
+  <li>gallium/util: fix crash with daz detection on x86</li>
+</ul>
+
+<p>Tiziano Bacocco (1):</p>
+<ul>
+  <li>nv50,nvc0: implement half_pixel_center</li>
+</ul>
+
+<p>Vadim Girlin (1):</p>
+<ul>
+  <li>r600g/sb: fix issues with loops created for switch</li>
+</ul>
+
+
+</div>
+</body>
+</html>
--- a/docs/relnotes/10.4.html
+++ b/docs/relnotes/10.4.html
@@ -31,9 +31,11 @@ because compatibility contexts are not supported.
 </p>


-<h2>MD5 checksums</h2>
+<h2>SHA256 checksums</h2>
 <pre>
-TBD.
+abfbfd2d91ce81491c5bb6923ae649212ad5f82d0bee277de8704cc948dc221e  MesaLib-10.4.0.tar.gz
+98a7dff3a1a6708c79789de8b9a05d8042e867067f70e8f30387c15026233219  MesaLib-10.4.0.tar.bz2
+443a6d46d0691b5ac811d8d30091b1716c365689b16d49c57cf273c2b76086fe  MesaLib-10.4.0.zip
 </pre>


--- a/src/gallium/auxiliary/util/u_cpu_detect.c
+++ b/src/gallium/auxiliary/util/u_cpu_detect.c
@@ -272,7 +272,7 @@ static INLINE uint64_t xgetbv(void)


 #if defined(PIPE_ARCH_X86)
-static INLINE boolean sse2_has_daz(void)
+PIPE_ALIGN_STACK static INLINE boolean sse2_has_daz(void)
 {
   struct {
      uint32_t pad1[7];
@@ -409,8 +409,12 @@ util_cpu_detect(void)
      }

      if (regs[0] >= 0x80000006) {
+         /* should we really do this if the clflush size above worked? */
+         unsigned int cacheline;
         cpuid(0x80000006, regs2);
-         util_cpu_caps.cacheline = regs2[2] & 0xFF;
+         cacheline = regs2[2] & 0xFF;
+         if (cacheline > 0)
+            util_cpu_caps.cacheline = cacheline;
      }

      if (!util_cpu_caps.has_sse) {
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
@@ -772,7 +772,8 @@ NV50LoweringPreSSA::handleTEX(TexInstruction *i)
   if (i->tex.useOffsets) {
      for (int c = 0; c < 3; ++c) {
         ImmediateValue val;
-         assert(i->offset[0][c].getImmediate(val));
+         if (!i->offset[0][c].getImmediate(val))
+            assert(!"non-immediate offset");
         i->tex.offset[c] = val.reg.data.u32;
         i->offset[0][c].set(NULL);
      }
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -754,7 +754,8 @@ NVC0LoweringPass::handleTEX(TexInstruction *i)
         assert(i->tex.useOffsets == 1);
         for (c = 0; c < 3; ++c) {
            ImmediateValue val;
-            assert(i->offset[0][c].getImmediate(val));
+            if (!i->offset[0][c].getImmediate(val))
+               assert(!"non-immediate offset passed to non-TXG");
            imm |= (val.reg.data.u32 & 0xf) << (c * 4);
         }
         if (i->op == OP_TXD && chipset >= NVISA_GK104_CHIPSET) {
--- a/src/gallium/drivers/nouveau/nv50/nv50_3d.xml.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_3d.xml.h
@@ -1708,7 +1708,7 @@ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 #define NV50_3D_CULL_FACE_BACK					0x00000405
 #define NV50_3D_CULL_FACE_FRONT_AND_BACK			0x00000408

-#define NV50_3D_LINE_LAST_PIXEL					0x00001924
+#define NV50_3D_PIXEL_CENTER_INTEGER					0x00001924

 #define NVA3_3D_FP_MULTISAMPLE					0x00001928
 #define NVA3_3D_FP_MULTISAMPLE_EXPORT_SAMPLE_MASK		0x00000001
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -461,8 +461,6 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
   PUSH_DATA (push, 0);
   BEGIN_NV04(push, NV50_3D(PRIM_RESTART_WITH_DRAW_ARRAYS), 1);
   PUSH_DATA (push, 1);
-   BEGIN_NV04(push, NV50_3D(LINE_LAST_PIXEL), 1);
-   PUSH_DATA (push, 0);
   BEGIN_NV04(push, NV50_3D(BLEND_SEPARATE_ALPHA), 1);
   PUSH_DATA (push, 1);

@@ -609,6 +607,13 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
   BEGIN_NV04(push, NV50_3D(EDGEFLAG), 1);
   PUSH_DATA (push, 1);

+   BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
+   PUSH_DATA (push, 0);
+   if (screen->base.class_3d >= NV84_3D_CLASS) {
+      BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
+      PUSH_DATA (push, 0);
+   }
+
   PUSH_KICK (push);
 }

--- a/src/gallium/drivers/nouveau/nv50/nv50_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c
@@ -57,10 +57,6 @@
 *  ! pipe_rasterizer_state.flatshade_first also applies to QUADS
 *    (There's a GL query for that, forcing an exception is just ridiculous.)
 *
- *  ! pipe_rasterizer_state.half_pixel_center is ignored - pixel centers
- *     are always at half integer coordinates and the top-left rule applies
- *    (There does not seem to be a hardware switch for this.)
- *
 *  ! pipe_rasterizer_state.sprite_coord_enable is masked with 0xff on NVC0
 *    (The hardware only has 8 slots meant for TexCoord and we have to assign
 *     in advance to maintain elegant separate shader objects.)
@@ -221,7 +217,7 @@ nv50_blend_state_delete(struct pipe_context *pipe, void *hwcso)
   FREE(hwcso);
 }

-/* NOTE: ignoring line_last_pixel, using FALSE (set on screen init) */
+/* NOTE: ignoring line_last_pixel */
 static void *
 nv50_rasterizer_state_create(struct pipe_context *pipe,
                             const struct pipe_rasterizer_state *cso)
@@ -336,6 +332,9 @@ nv50_rasterizer_state_create(struct pipe_context *pipe,
   SB_BEGIN_3D(so, DEPTH_CLIP_NEGATIVE_Z, 1);
   SB_DATA    (so, cso->clip_halfz);

+   SB_BEGIN_3D(so, PIXEL_CENTER_INTEGER, 1);
+   SB_DATA    (so, !cso->half_pixel_center);
+
   assert(so->size <= (sizeof(so->state) / sizeof(so->state[0])));
   return (void *)so;
 }
--- a/src/gallium/drivers/nouveau/nv50/nv50_stateobj.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_stateobj.h
@@ -25,7 +25,7 @@ struct nv50_blend_stateobj {
 struct nv50_rasterizer_stateobj {
   struct pipe_rasterizer_state pipe;
   int size;
-   uint32_t state[48];
+   uint32_t state[49];
 };

 struct nv50_zsa_stateobj {
--- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
@@ -472,6 +472,10 @@ nv50_draw_arrays(struct nv50_context *nv50,
   if (nv50->state.index_bias) {
      BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
      PUSH_DATA (push, 0);
+      if (nv50->screen->base.class_3d >= NV84_3D_CLASS) {
+         BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
+         PUSH_DATA (push, 0);
+      }
      nv50->state.index_bias = 0;
   }

@@ -594,6 +598,10 @@ nv50_draw_elements(struct nv50_context *nv50, boolean shorten,
   if (index_bias != nv50->state.index_bias) {
      BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
      PUSH_DATA (push, index_bias);
+      if (nv50->screen->base.class_3d >= NV84_3D_CLASS) {
+         BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
+         PUSH_DATA (push, index_bias);
+      }
      nv50->state.index_bias = index_bias;
   }

--- a/src/gallium/drivers/nouveau/nvc0/mme/com9097.mme
+++ b/src/gallium/drivers/nouveau/nvc0/mme/com9097.mme
@@ -227,6 +227,7 @@ locn_0f_ts:
 /* NVC0_3D_MACRO_DRAW_ELEMENTS_INDIRECT
 *
 * NOTE: Saves and restores VB_ELEMENT,INSTANCE_BASE.
+ * Forcefully sets VERTEX_ID_BASE to the value of VB_ELEMENT_BASE.
 *
 * arg     = mode
 * parm[0] = count
@@ -247,6 +248,8 @@ locn_0f_ts:
   maddr 0x150d /* VB_ELEMENT,INSTANCE_BASE */
   send $r4
   send $r5
+   maddr 0x446
+   send $r4
   mov $r4 0x1
 dei_again:
   maddr 0x586 /* VERTEX_BEGIN_GL */
@@ -258,8 +261,10 @@ dei_again:
   branz $r2 #dei_again
   mov $r1 (extrinsrt $r1 $r4 0 1 26) /* set INSTANCE_NEXT */
   maddr 0x150d /* VB_ELEMENT,INSTANCE_BASE */
-   exit send $r6
+   send $r6
   send $r7
+   exit maddr 0x446
+   send $r6
 dei_end:
   exit
   nop
--- a/src/gallium/drivers/nouveau/nvc0/mme/com9097.mme.h
+++ b/src/gallium/drivers/nouveau/nvc0/mme/com9097.mme.h
@@ -128,16 +128,18 @@ uint32_t mme9097_draw_elts_indirect[] = {
 	0x00000301,
 	0x00000201,
 	0x017dc451,
-/* 0x000c: dei_again */
+/* 0x000e: dei_again */
 	0x00002431,
-	0x0004d007,
-/* 0x0017: dei_end */
+	0x0005d007,
 	0x00000501,
+/* 0x001b: dei_end */
 	0x01434615,
 	0x01438715,
 	0x05434021,
 	0x00002041,
 	0x00002841,
+	0x01118021,
+	0x00002041,
 	0x00004411,
 	0x01618021,
 	0x00000841,
@@ -148,8 +150,10 @@ uint32_t mme9097_draw_elts_indirect[] = {
 	0xfffe9017,
 	0xd0410912,
 	0x05434021,
-	0x000030c1,
+	0x00003041,
 	0x00003841,
+	0x011180a1,
+	0x00003041,
 	0x00000091,
 	0x00000011,
 };
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_3d.xml.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_3d.xml.h
@@ -1041,7 +1041,7 @@ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 #define NVC0_3D_CULL_FACE_BACK					0x00000405
 #define NVC0_3D_CULL_FACE_FRONT_AND_BACK			0x00000408

-#define NVC0_3D_LINE_LAST_PIXEL					0x00001924
+#define NVC0_3D_PIXEL_CENTER_INTEGER					0x00001924

 #define NVC0_3D_VIEWPORT_TRANSFORM_EN				0x0000192c

--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -786,8 +786,6 @@ nvc0_screen_create(struct nouveau_device *dev)
   PUSH_DATA (push, 0);
   BEGIN_NVC0(push, NVC0_3D(LINE_WIDTH_SEPARATE), 1);
   PUSH_DATA (push, 1);
-   BEGIN_NVC0(push, NVC0_3D(LINE_LAST_PIXEL), 1);
-   PUSH_DATA (push, 0);
   BEGIN_NVC0(push, NVC0_3D(PRIM_RESTART_WITH_DRAW_ARRAYS), 1);
   PUSH_DATA (push, 1);
   BEGIN_NVC0(push, NVC0_3D(BLEND_SEPARATE_ALPHA), 1);
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
@@ -204,7 +204,7 @@ nvc0_blend_state_delete(struct pipe_context *pipe, void *hwcso)
    FREE(hwcso);
 }

-/* NOTE: ignoring line_last_pixel, using FALSE (set on screen init) */
+/* NOTE: ignoring line_last_pixel */
 static void *
 nvc0_rasterizer_state_create(struct pipe_context *pipe,
                             const struct pipe_rasterizer_state *cso)
@@ -315,6 +315,8 @@ nvc0_rasterizer_state_create(struct pipe_context *pipe,

    SB_IMMED_3D(so, DEPTH_CLIP_NEGATIVE_Z, cso->clip_halfz);

+    SB_IMMED_3D(so, PIXEL_CENTER_INTEGER, !cso->half_pixel_center);
+
    assert(so->size <= (sizeof(so->state) / sizeof(so->state[0])));
    return (void *)so;
 }
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_stateobj.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_stateobj.h
@@ -23,7 +23,7 @@ struct nvc0_blend_stateobj {
 struct nvc0_rasterizer_stateobj {
   struct pipe_rasterizer_state pipe;
   int size;
-   uint32_t state[43];
+   uint32_t state[44];
 };

 struct nvc0_zsa_stateobj {
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
@@ -575,8 +575,9 @@ nvc0_draw_arrays(struct nvc0_context *nvc0,
   if (nvc0->state.index_bias) {
      /* index_bias is implied 0 if !info->indexed (really ?) */
      /* TODO: can we deactivate it for the VERTEX_BUFFER_FIRST command ? */
-      PUSH_SPACE(push, 1);
+      PUSH_SPACE(push, 2);
      IMMED_NVC0(push, NVC0_3D(VB_ELEMENT_BASE), 0);
+      IMMED_NVC0(push, NVC0_3D(VERTEX_ID), 0);
      nvc0->state.index_bias = 0;
   }

@@ -705,9 +706,11 @@ nvc0_draw_elements(struct nvc0_context *nvc0, boolean shorten,
   prim = nvc0_prim_gl(mode);

   if (index_bias != nvc0->state.index_bias) {
-      PUSH_SPACE(push, 2);
+      PUSH_SPACE(push, 4);
      BEGIN_NVC0(push, NVC0_3D(VB_ELEMENT_BASE), 1);
      PUSH_DATA (push, index_bias);
+      BEGIN_NVC0(push, NVC0_3D(VERTEX_ID), 1);
+      PUSH_DATA (push, index_bias);
      nvc0->state.index_bias = index_bias;
   }

@@ -818,6 +821,7 @@ nvc0_draw_indirect(struct nvc0_context *nvc0, const struct pipe_draw_info *info)
      if (nvc0->state.index_bias) {
         /* index_bias is implied 0 if !info->indexed (really ?) */
         IMMED_NVC0(push, NVC0_3D(VB_ELEMENT_BASE), 0);
+         IMMED_NVC0(push, NVC0_3D(VERTEX_ID), 0);
         nvc0->state.index_bias = 0;
      }
      size = 4 * 4;
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -6071,7 +6071,7 @@ static int tgsi_ucmp(struct r600_shader_ctx *ctx)
 			continue;

 		memset(&alu, 0, sizeof(struct r600_bytecode_alu));
-		alu.op = ALU_OP3_CNDGE_INT;
+		alu.op = ALU_OP3_CNDE_INT;
 		r600_bytecode_src(&alu.src[0], &ctx->src[0], i);
 		r600_bytecode_src(&alu.src[1], &ctx->src[2], i);
 		r600_bytecode_src(&alu.src[2], &ctx->src[1], i);
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -616,6 +616,8 @@ public:
 	unsigned num_slots;
 	bool uses_mova_gpr;

+	bool r6xx_gpr_index_workaround;
+
 	bool stack_workaround_8xx;
 	bool stack_workaround_9xx;

--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -38,6 +38,18 @@

 namespace r600_sb {

+void bc_finalizer::insert_rv6xx_load_ar_workaround(alu_group_node *b4) {
+
+	alu_group_node *g = sh.create_alu_group();
+	alu_node *a = sh.create_alu();
+
+	a->bc.set_op(ALU_OP0_NOP);
+	a->bc.last = 1;
+
+	g->push_back(a);
+	b4->insert_before(g);
+}
+
 int bc_finalizer::run() {

 	run_on(sh.root);
@@ -46,22 +58,15 @@ int bc_finalizer::run() {
 	for (regions_vec::reverse_iterator I = rv.rbegin(), E = rv.rend(); I != E;
 			++I) {
 		region_node *r = *I;
-		bool is_if = false;
+
 		assert(r);

-		assert(r->first);
-		if (r->first->is_container()) {
-			container_node *repdep1 = static_cast<container_node*>(r->first);
-			assert(repdep1->is_depart() || repdep1->is_repeat());
-			if_node *n_if = static_cast<if_node*>(repdep1->first);
-			if (n_if && n_if->is_if())
-				is_if = true;
-		}
+		bool loop = r->is_loop();

-		if (is_if)
-			finalize_if(r);
-		else
+		if (loop)
 			finalize_loop(r);
+		else
+			finalize_if(r);

 		r->expand();
 	}
@@ -117,35 +122,20 @@ int bc_finalizer::run() {

 void bc_finalizer::finalize_loop(region_node* r) {

+	update_nstack(r);
+
 	cf_node *loop_start = sh.create_cf(CF_OP_LOOP_START_DX10);
 	cf_node *loop_end = sh.create_cf(CF_OP_LOOP_END);
-	bool has_instr = false;

-	if (!r->is_loop()) {
-		for (depart_vec::iterator I = r->departs.begin(), E = r->departs.end();
-		     I != E; ++I) {
-			depart_node *dep = *I;
-			if (!dep->empty()) {
-				has_instr = true;
-				break;
-			}
-		}
-	} else
-		has_instr = true;
-
-	if (has_instr) {
-		loop_start->jump_after(loop_end);
-		loop_end->jump_after(loop_start);
-	}
+	loop_start->jump_after(loop_end);
+	loop_end->jump_after(loop_start);

 	for (depart_vec::iterator I = r->departs.begin(), E = r->departs.end();
 			I != E; ++I) {
 		depart_node *dep = *I;
-		if (has_instr) {
-			cf_node *loop_break = sh.create_cf(CF_OP_LOOP_BREAK);
-			loop_break->jump(loop_end);
-			dep->push_back(loop_break);
-		}
+		cf_node *loop_break = sh.create_cf(CF_OP_LOOP_BREAK);
+		loop_break->jump(loop_end);
+		dep->push_back(loop_break);
 		dep->expand();
 	}

@@ -161,10 +151,8 @@ void bc_finalizer::finalize_loop(region_node* r) {
 		rep->expand();
 	}

-	if (has_instr) {
-		r->push_front(loop_start);
-		r->push_back(loop_end);
-	}
+	r->push_front(loop_start);
+	r->push_back(loop_end);
 }

 void bc_finalizer::finalize_if(region_node* r) {
@@ -235,12 +223,12 @@ void bc_finalizer::finalize_if(region_node* r) {
 }

 void bc_finalizer::run_on(container_node* c) {
-
+	node *prev_node = NULL;
 	for (node_iterator I = c->begin(), E = c->end(); I != E; ++I) {
 		node *n = *I;

 		if (n->is_alu_group()) {
-			finalize_alu_group(static_cast<alu_group_node*>(n));
+			finalize_alu_group(static_cast<alu_group_node*>(n), prev_node);
 		} else {
 			if (n->is_alu_clause()) {
 				cf_node *c = static_cast<cf_node*>(n);
@@ -275,17 +263,22 @@ void bc_finalizer::run_on(container_node* c) {
 			if (n->is_container())
 				run_on(static_cast<container_node*>(n));
 		}
+		prev_node = n;
 	}
 }

-void bc_finalizer::finalize_alu_group(alu_group_node* g) {
+void bc_finalizer::finalize_alu_group(alu_group_node* g, node *prev_node) {

 	alu_node *last = NULL;
+	alu_group_node *prev_g = NULL;
+	bool add_nop = false;
+	if (prev_node && prev_node->is_alu_group()) {
+		prev_g = static_cast<alu_group_node*>(prev_node);
+	}

 	for (node_iterator I = g->begin(), E = g->end(); I != E; ++I) {
 		alu_node *n = static_cast<alu_node*>(*I);
 		unsigned slot = n->bc.slot;
-
 		value *d = n->dst.empty() ? NULL : n->dst[0];

 		if (d && d->is_special_reg()) {
@@ -323,17 +316,22 @@ void bc_finalizer::finalize_alu_group(alu_group_node* g) {

 		update_ngpr(n->bc.dst_gpr);

-		finalize_alu_src(g, n);
+		add_nop |= finalize_alu_src(g, n, prev_g);

 		last = n;
 	}

+	if (add_nop) {
+		if (sh.get_ctx().r6xx_gpr_index_workaround) {
+			insert_rv6xx_load_ar_workaround(g);
+		}
+	}
 	last->bc.last = 1;
 }

-void bc_finalizer::finalize_alu_src(alu_group_node* g, alu_node* a) {
+bool bc_finalizer::finalize_alu_src(alu_group_node* g, alu_node* a, alu_group_node *prev) {
 	vvec &sv = a->src;
-
+	bool add_nop = false;
 	FBC_DUMP(
 		sblog << "finalize_alu_src: ";
 		dump::dump_op(a);
@@ -360,6 +358,15 @@ void bc_finalizer::finalize_alu_src(alu_group_node* g, alu_node* a) {
 			if (!v->rel->is_const()) {
 				src.rel = 1;
 				update_ngpr(v->array->gpr.sel() + v->array->array_size -1);
+				if (prev && !add_nop) {
+					for (node_iterator pI = prev->begin(), pE = prev->end(); pI != pE; ++pI) {
+						alu_node *pn = static_cast<alu_node*>(*pI);
+						if (pn->bc.dst_gpr == src.sel) {
+							add_nop = true;
+							break;
+						}
+					}
+				}
 			} else
 				src.rel = 0;

@@ -417,11 +424,23 @@ void bc_finalizer::finalize_alu_src(alu_group_node* g, alu_node* a) {
 			assert(!"unknown value kind");
 			break;
 		}
+		if (prev && !add_nop) {
+			for (node_iterator pI = prev->begin(), pE = prev->end(); pI != pE; ++pI) {
+				alu_node *pn = static_cast<alu_node*>(*pI);
+				if (pn->bc.dst_rel) {
+					if (pn->bc.dst_gpr == src.sel) {
+						add_nop = true;
+						break;
+					}
+				}
+			}
+		}
 	}

 	while (si < 3) {
 		a->bc.src[si++].sel = 0;
 	}
+	return add_nop;
 }

 void bc_finalizer::copy_fetch_src(fetch_node &dst, fetch_node &src, unsigned arg_start)
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -758,6 +758,8 @@ int bc_parser::prepare_loop(cf_node* c) {
 	c->insert_before(reg);
 	rep->move(c, end->next);

+	reg->src_loop = true;
+
 	loop_stack.push(reg);
 	return 0;
 }
--- a/src/gallium/drivers/r600/sb/sb_context.cpp
+++ b/src/gallium/drivers/r600/sb/sb_context.cpp
@@ -61,6 +61,8 @@ int sb_context::init(r600_isa *isa, sb_hw_chip chip, sb_hw_class cclass) {

 	uses_mova_gpr = is_r600() && chip != HW_CHIP_RV670;

+	r6xx_gpr_index_workaround = is_r600() && chip != HW_CHIP_RV670 && chip != HW_CHIP_RS780 && chip != HW_CHIP_RS880;
+
 	switch (chip) {
 	case HW_CHIP_RV610:
 	case HW_CHIP_RS780:
--- a/src/gallium/drivers/r600/sb/sb_if_conversion.cpp
+++ b/src/gallium/drivers/r600/sb/sb_if_conversion.cpp
@@ -115,13 +115,13 @@ void if_conversion::convert_kill_instructions(region_node *r,
 bool if_conversion::check_and_convert(region_node *r) {

 	depart_node *nd1 = static_cast<depart_node*>(r->first);
-	if (!nd1->is_depart())
+	if (!nd1->is_depart() || nd1->target != r)
 		return false;
 	if_node *nif = static_cast<if_node*>(nd1->first);
 	if (!nif->is_if())
 		return false;
 	depart_node *nd2 = static_cast<depart_node*>(nif->first);
-	if (!nd2->is_depart())
+	if (!nd2->is_depart() || nd2->target != r)
 		return false;

 	value* &em = nif->cond;
--- a/src/gallium/drivers/r600/sb/sb_ir.h
+++ b/src/gallium/drivers/r600/sb/sb_ir.h
@@ -1089,7 +1089,8 @@ typedef std::vector<repeat_node*> repeat_vec;
 class region_node : public container_node {
 protected:
 	region_node(unsigned id) : container_node(NT_REGION, NST_LIST), region_id(id),
-			loop_phi(), phi(), vars_defined(), departs(), repeats() {}
+			loop_phi(), phi(), vars_defined(), departs(), repeats(), src_loop()
+			{}
 public:
 	unsigned region_id;

@@ -1101,12 +1102,16 @@ public:
 	depart_vec departs;
 	repeat_vec repeats;

+	// true if region was created for loop in the parser, sometimes repeat_node
+	// may be optimized away so we need to remember this information
+	bool src_loop;
+
 	virtual bool accept(vpass &p, bool enter);

 	unsigned dep_count() { return departs.size(); }
 	unsigned rep_count() { return repeats.size() + 1; }

-	bool is_loop() { return !repeats.empty(); }
+	bool is_loop() { return src_loop || !repeats.empty(); }

 	container_node* get_entry_code_location() {
 		node *p = first;
--- a/src/gallium/drivers/r600/sb/sb_pass.h
+++ b/src/gallium/drivers/r600/sb/sb_pass.h
@@ -695,8 +695,9 @@ public:

 	void run_on(container_node *c);

-	void finalize_alu_group(alu_group_node *g);
-	void finalize_alu_src(alu_group_node *g, alu_node *a);
+	void insert_rv6xx_load_ar_workaround(alu_group_node *b4);
+	void finalize_alu_group(alu_group_node *g, node *prev_node);
+	bool finalize_alu_src(alu_group_node *g, alu_node *a, alu_group_node *prev_node);

 	void emit_set_grad(fetch_node* f);
 	void finalize_fetch(fetch_node *f);
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -1527,6 +1527,9 @@ bool post_scheduler::check_copy(node *n) {

 	if (!s->is_prealloc()) {
 		recolor_local(s);
+
+		if (!s->chunk || s->chunk != d->chunk)
+			return false;
 	}

 	if (s->gpr == d->gpr) {
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -590,8 +590,11 @@ static void declare_system_value(
 		break;

 	case TGSI_SEMANTIC_VERTEXID:
-		value = LLVMGetParam(radeon_bld->main_fn,
-				     si_shader_ctx->param_vertex_id);
+		value = LLVMBuildAdd(gallivm->builder,
+				     LLVMGetParam(radeon_bld->main_fn,
+						  si_shader_ctx->param_vertex_id),
+				     LLVMGetParam(radeon_bld->main_fn,
+						  SI_PARAM_BASE_VERTEX), "");
 		break;

 	case TGSI_SEMANTIC_SAMPLEID:
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -3281,8 +3281,10 @@ void si_init_config(struct si_context *sctx)
 			break;
 		}

-		/* Always use the default config when all backends are enabled. */
-		if (rb_mask && util_bitcount(rb_mask) >= num_rb) {
+		/* Always use the default config when all backends are enabled
+		 * (or when we failed to determine the enabled backends).
+		 */
+		if (!rb_mask || util_bitcount(rb_mask) >= num_rb) {
 			si_pm4_set_reg(pm4, R_028350_PA_SC_RASTER_CONFIG,
 				       raster_config);
 		} else {
--- a/src/glsl/builtin_variables.cpp
+++ b/src/glsl/builtin_variables.cpp
@@ -724,6 +724,10 @@ builtin_variable_generator::generate_constants()
      add_const("gl_MaxCombinedImageUniforms",
                state->Const.MaxCombinedImageUniforms);
   }
+
+   if (state->is_version(410, 0) ||
+       state->ARB_viewport_array_enable)
+      add_const("gl_MaxViewports", state->Const.MaxViewports);
 }


--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -134,6 +134,9 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx,
   this->Const.MaxFragmentImageUniforms = ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms;
   this->Const.MaxCombinedImageUniforms = ctx->Const.MaxCombinedImageUniforms;

+   /* ARB_viewport_array */
+   this->Const.MaxViewports = ctx->Const.MaxViewports;
+
   this->current_function = NULL;
   this->toplevel_ir = NULL;
   this->found_return = false;
--- a/src/glsl/glsl_parser_extras.h
+++ b/src/glsl/glsl_parser_extras.h
@@ -343,6 +343,9 @@ struct _mesa_glsl_parse_state {
      unsigned MaxGeometryImageUniforms;
      unsigned MaxFragmentImageUniforms;
      unsigned MaxCombinedImageUniforms;
+
+      /* ARB_viewport_array */
+      unsigned MaxViewports;
   } Const;

   /**
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -835,9 +835,11 @@ varying_matches::record(ir_variable *producer_var, ir_variable *consumer_var)
       * regardless of where they appear.  We can trivially satisfy that
       * requirement by changing the interpolation type to flat here.
       */
-      producer_var->data.centroid = false;
-      producer_var->data.sample = false;
-      producer_var->data.interpolation = INTERP_QUALIFIER_FLAT;
+      if (producer_var) {
+         producer_var->data.centroid = false;
+         producer_var->data.sample = false;
+         producer_var->data.interpolation = INTERP_QUALIFIER_FLAT;
+      }

      if (consumer_var) {
         consumer_var->data.centroid = false;
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -2746,6 +2746,21 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog)
   if (last >= 0 && last < MESA_SHADER_FRAGMENT) {
      gl_shader *const sh = prog->_LinkedShaders[last];

+      if (first == MESA_SHADER_GEOMETRY) {
+         /* There was no vertex shader, but we still have to assign varying
+          * locations for use by geometry shader inputs in SSO.
+          *
+          * If the shader is not separable (i.e., prog->SeparateShader is
+          * false), linking will have already failed when first is
+          * MESA_SHADER_GEOMETRY.
+          */
+         if (!assign_varying_locations(ctx, mem_ctx, prog,
+                                       NULL, sh,
+                                       num_tfeedback_decls, tfeedback_decls,
+                                       prog->Geom.VerticesIn))
+            goto done;
+      }
+
      if (num_tfeedback_decls != 0 || prog->SeparateShader) {
         /* There was no fragment shader, but we still have to assign varying
          * locations for use by transform feedback.
--- a/src/glx/dri3_glx.c
+++ b/src/glx/dri3_glx.c
@@ -420,11 +420,14 @@ dri3_handle_present_event(struct dri3_drawable *priv, xcb_present_generic_event_

         if (psc->show_fps_interval)
            show_fps(priv, ce->ust);
+
+         priv->ust = ce->ust;
+         priv->msc = ce->msc;
      } else {
         priv->recv_msc_serial = ce->serial;
+         priv->notify_ust = ce->ust;
+         priv->notify_msc = ce->msc;
      }
-      priv->ust = ce->ust;
-      priv->msc = ce->msc;
      break;
   }
   case XCB_PRESENT_EVENT_IDLE_NOTIFY: {
@@ -498,8 +501,8 @@ dri3_wait_for_msc(__GLXDRIdrawable *pdraw, int64_t target_msc, int64_t divisor,
      }
   }

-   *ust = priv->ust;
-   *msc = priv->msc;
+   *ust = priv->notify_ust;
+   *msc = priv->notify_msc;
   *sbc = priv->recv_sbc;

   return 1;
@@ -529,6 +532,15 @@ dri3_wait_for_sbc(__GLXDRIdrawable *pdraw, int64_t target_sbc, int64_t *ust,
 {
   struct dri3_drawable *priv = (struct dri3_drawable *) pdraw;

+   /* From the GLX_OML_sync_control spec:
+    *
+    *     "If <target_sbc> = 0, the function will block until all previous
+    *      swaps requested with glXSwapBuffersMscOML for that window have
+    *      completed."
+    */
+   if (!target_sbc)
+      target_sbc = priv->send_sbc;
+
   while (priv->recv_sbc < target_sbc) {
      if (!dri3_wait_for_event(pdraw))
         return 0;
@@ -1547,11 +1559,24 @@ dri3_swap_buffers(__GLXDRIdrawable *pdraw, int64_t target_msc, int64_t divisor,
      dri3_fence_reset(c, back);

      /* Compute when we want the frame shown by taking the last known successful
-       * MSC and adding in a swap interval for each outstanding swap request
+       * MSC and adding in a swap interval for each outstanding swap request.
+       * target_msc=divisor=remainder=0 means "Use glXSwapBuffers() semantic"
       */
      ++priv->send_sbc;
-      if (target_msc == 0)
+      if (target_msc == 0 && divisor == 0 && remainder == 0)
         target_msc = priv->msc + priv->swap_interval * (priv->send_sbc - priv->recv_sbc);
+      else if (divisor == 0 && remainder > 0) {
+         /* From the GLX_OML_sync_control spec:
+          *
+          *     "If <divisor> = 0, the swap will occur when MSC becomes
+          *      greater than or equal to <target_msc>."
+          *
+          * Note that there's no mention of the remainder.  The Present extension
+          * throws BadValue for remainder != 0 with divisor == 0, so just drop
+          * the passed in value.
+          */
+         remainder = 0;
+      }

      back->busy = 1;
      back->last_swap = priv->send_sbc;
--- a/src/glx/dri3_priv.h
+++ b/src/glx/dri3_priv.h
@@ -182,9 +182,12 @@ struct dri3_drawable {
   uint64_t send_sbc;
   uint64_t recv_sbc;

-   /* Last received UST/MSC values */
+   /* Last received UST/MSC values for pixmap present complete */
   uint64_t ust, msc;

+   /* Last received UST/MSC values from present notify msc event */
+   uint64_t notify_ust, notify_msc;
+
   /* Serial numbers for tracking wait_for_msc events */
   uint32_t send_msc_serial;
   uint32_t recv_msc_serial;
--- a/src/mesa/drivers/dri/i915/intel_blit.c
+++ b/src/mesa/drivers/dri/i915/intel_blit.c
@@ -271,9 +271,10 @@ intelEmitCopyBlit(struct intel_context *intel,
       dst_buffer, dst_pitch, dst_offset, dst_x, dst_y, w, h);

   /* Blit pitch must be dword-aligned.  Otherwise, the hardware appears to drop
-    * the low bits.
+    * the low bits.  Offsets must be naturally aligned.
    */
-   if (src_pitch % 4 != 0 || dst_pitch % 4 != 0)
+   if (src_pitch % 4 != 0 || src_offset % cpp != 0 ||
+       dst_pitch % 4 != 0 || dst_offset % cpp != 0)
      return false;

   /* For big formats (such as floating point), do the copy using 16 or 32bpp
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1096,11 +1096,8 @@ struct brw_context
   uint32_t pma_stall_bits;

   struct {
-      /** Does the current draw use the index buffer? */
-      bool indexed;
-
-      int start_vertex_location;
-      int base_vertex_location;
+      /** The value of gl_BaseVertex for the current _mesa_prim. */
+      int gl_basevertex;

      /**
       * Buffer and offset used for GL_ARB_shader_draw_parameters
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -181,14 +181,20 @@ static void brw_emit_prim(struct brw_context *brw,
   DBG("PRIM: %s %d %d\n", _mesa_lookup_enum_by_nr(prim->mode),
       prim->start, prim->count);

+   int start_vertex_location = prim->start;
+   int base_vertex_location = prim->basevertex;
+
   if (prim->indexed) {
      vertex_access_type = brw->gen >= 7 ?
         GEN7_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM :
         GEN4_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM;
+      start_vertex_location += brw->ib.start_vertex_offset;
+      base_vertex_location += brw->vb.start_vertex_bias;
   } else {
      vertex_access_type = brw->gen >= 7 ?
         GEN7_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL :
         GEN4_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL;
+      start_vertex_location += brw->vb.start_vertex_bias;
   }

   /* We only need to trim the primitive count on pre-Gen6. */
@@ -263,10 +269,10 @@ static void brw_emit_prim(struct brw_context *brw,
                vertex_access_type);
   }
   OUT_BATCH(verts_per_instance);
-   OUT_BATCH(brw->draw.start_vertex_location);
+   OUT_BATCH(start_vertex_location);
   OUT_BATCH(prim->num_instances);
   OUT_BATCH(prim->base_instance);
-   OUT_BATCH(brw->draw.base_vertex_location);
+   OUT_BATCH(base_vertex_location);
   ADVANCE_BATCH();

   /* Only used on Sandybridge; harmless to set elsewhere. */
@@ -430,9 +436,8 @@ static bool brw_try_draw_prims( struct gl_context *ctx,
         }
      }

-      brw->draw.indexed = prims[i].indexed;
-      brw->draw.start_vertex_location = prims[i].start;
-      brw->draw.base_vertex_location = prims[i].basevertex;
+      brw->draw.gl_basevertex =
+         prims[i].indexed ? prims[i].basevertex : prims[i].start;

      drm_intel_bo_unreference(brw->draw.draw_params_bo);

--- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
@@ -604,19 +604,9 @@ brw_prepare_vertices(struct brw_context *brw)
 void
 brw_prepare_shader_draw_parameters(struct brw_context *brw)
 {
-   int *gl_basevertex_value;
-   if (brw->draw.indexed) {
-      brw->draw.start_vertex_location += brw->ib.start_vertex_offset;
-      brw->draw.base_vertex_location += brw->vb.start_vertex_bias;
-      gl_basevertex_value = &brw->draw.base_vertex_location;
-   } else {
-      brw->draw.start_vertex_location += brw->vb.start_vertex_bias;
-      gl_basevertex_value = &brw->draw.start_vertex_location;
-   }
-
   /* For non-indirect draws, upload gl_BaseVertex. */
   if (brw->vs.prog_data->uses_vertexid && brw->draw.draw_params_bo == NULL) {
-      intel_upload_data(brw, gl_basevertex_value, 4, 4,
+      intel_upload_data(brw, &brw->draw.gl_basevertex, 4, 4,
 			&brw->draw.draw_params_bo,
                        &brw->draw.draw_params_offset);
   }
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -718,12 +718,14 @@ fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src
   }

   struct brw_reg src0 = brw_reg(src.file, src.nr, 1,
+                                 src.negate, src.abs,
 				 BRW_REGISTER_TYPE_F,
 				 vstride,
 				 width,
 				 BRW_HORIZONTAL_STRIDE_0,
 				 BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
   struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
+                                 src.negate, src.abs,
 				 BRW_REGISTER_TYPE_F,
 				 vstride,
 				 width,
@@ -776,12 +778,14 @@ fs_generator::generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src

      /* produce accurate derivatives */
      struct brw_reg src0 = brw_reg(src.file, src.nr, 0,
+                                    src.negate, src.abs,
                                    BRW_REGISTER_TYPE_F,
                                    BRW_VERTICAL_STRIDE_4,
                                    BRW_WIDTH_4,
                                    BRW_HORIZONTAL_STRIDE_1,
                                    BRW_SWIZZLE_XYXY, WRITEMASK_XYZW);
      struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
+                                    src.negate, src.abs,
                                    BRW_REGISTER_TYPE_F,
                                    BRW_VERTICAL_STRIDE_4,
                                    BRW_WIDTH_4,
@@ -810,12 +814,14 @@ fs_generator::generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src
   } else {
      /* replicate the derivative at the top-left pixel to other pixels */
      struct brw_reg src0 = brw_reg(src.file, src.nr, 0,
+                                    src.negate, src.abs,
                                    BRW_REGISTER_TYPE_F,
                                    BRW_VERTICAL_STRIDE_4,
                                    BRW_WIDTH_4,
                                    BRW_HORIZONTAL_STRIDE_0,
                                    BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
      struct brw_reg src1 = brw_reg(src.file, src.nr, 2,
+                                    src.negate, src.abs,
                                    BRW_REGISTER_TYPE_F,
                                    BRW_VERTICAL_STRIDE_4,
                                    BRW_WIDTH_4,
--- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
@@ -907,7 +907,7 @@ gather_oa_results(struct brw_context *brw,
      return;
   }

-   const int snapshot_size = brw->perfmon.entries_per_oa_snapshot;
+   const ptrdiff_t snapshot_size = brw->perfmon.entries_per_oa_snapshot;

   /* First, add the contributions from the "head" interval:
    * (snapshot taken at BeginPerfMonitor time,
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -218,6 +218,8 @@ type_is_signed(unsigned type)
 * \param file      one of the BRW_x_REGISTER_FILE values
 * \param nr        register number/index
 * \param subnr     register sub number
+ * \param negate    register negate modifier
+ * \param abs       register abs modifier
 * \param type      one of BRW_REGISTER_TYPE_x
 * \param vstride   one of BRW_VERTICAL_STRIDE_x
 * \param width     one of BRW_WIDTH_x
@@ -229,6 +231,8 @@ static inline struct brw_reg
 brw_reg(unsigned file,
        unsigned nr,
        unsigned subnr,
+        unsigned negate,
+        unsigned abs,
        enum brw_reg_type type,
        unsigned vstride,
        unsigned width,
@@ -248,8 +252,8 @@ brw_reg(unsigned file,
   reg.file = file;
   reg.nr = nr;
   reg.subnr = subnr * type_sz(type);
-   reg.negate = 0;
-   reg.abs = 0;
+   reg.negate = negate;
+   reg.abs = abs;
   reg.vstride = vstride;
   reg.width = width;
   reg.hstride = hstride;
@@ -276,6 +280,8 @@ brw_vec16_reg(unsigned file, unsigned nr, unsigned subnr)
   return brw_reg(file,
                  nr,
                  subnr,
+                  0,
+                  0,
                  BRW_REGISTER_TYPE_F,
                  BRW_VERTICAL_STRIDE_16,
                  BRW_WIDTH_16,
@@ -291,6 +297,8 @@ brw_vec8_reg(unsigned file, unsigned nr, unsigned subnr)
   return brw_reg(file,
                  nr,
                  subnr,
+                  0,
+                  0,
                  BRW_REGISTER_TYPE_F,
                  BRW_VERTICAL_STRIDE_8,
                  BRW_WIDTH_8,
@@ -306,6 +314,8 @@ brw_vec4_reg(unsigned file, unsigned nr, unsigned subnr)
   return brw_reg(file,
                  nr,
                  subnr,
+                  0,
+                  0,
                  BRW_REGISTER_TYPE_F,
                  BRW_VERTICAL_STRIDE_4,
                  BRW_WIDTH_4,
@@ -321,6 +331,8 @@ brw_vec2_reg(unsigned file, unsigned nr, unsigned subnr)
   return brw_reg(file,
                  nr,
                  subnr,
+                  0,
+                  0,
                  BRW_REGISTER_TYPE_F,
                  BRW_VERTICAL_STRIDE_2,
                  BRW_WIDTH_2,
@@ -336,6 +348,8 @@ brw_vec1_reg(unsigned file, unsigned nr, unsigned subnr)
   return brw_reg(file,
                  nr,
                  subnr,
+                  0,
+                  0,
                  BRW_REGISTER_TYPE_F,
                  BRW_VERTICAL_STRIDE_0,
                  BRW_WIDTH_1,
@@ -435,6 +449,8 @@ static inline struct brw_reg
 brw_imm_reg(enum brw_reg_type type)
 {
   return brw_reg(BRW_IMMEDIATE_VALUE,
+                  0,
+                  0,
                  0,
                  0,
                  type,
@@ -630,6 +646,8 @@ brw_ip_reg(void)
   return brw_reg(BRW_ARCHITECTURE_REGISTER_FILE,
                  BRW_ARF_IP,
                  0,
+                  0,
+                  0,
                  BRW_REGISTER_TYPE_UD,
                  BRW_VERTICAL_STRIDE_4, /* ? */
                  BRW_WIDTH_1,
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1630,6 +1630,8 @@ vec4_visitor::get_timestamp()
   src_reg ts = src_reg(brw_reg(BRW_ARCHITECTURE_REGISTER_FILE,
                                BRW_ARF_TIMESTAMP,
                                0,
+                                0,
+                                0,
                                BRW_REGISTER_TYPE_UD,
                                BRW_VERTICAL_STRIDE_0,
                                BRW_WIDTH_4,
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -535,6 +535,7 @@ brw_update_null_renderbuffer_surface(struct brw_context *brw, unsigned int unit)
   drm_intel_bo *bo = NULL;
   unsigned pitch_minus_1 = 0;
   uint32_t multisampling_state = 0;
+   /* CACHE_NEW_WM_PROG */
   uint32_t surf_index =
      brw->wm.prog_data->binding_table.render_target_start + unit;

@@ -620,6 +621,7 @@ brw_update_renderbuffer_surface(struct brw_context *brw,
   uint32_t format = 0;
   /* _NEW_BUFFERS */
   mesa_format rb_format = _mesa_get_render_format(ctx, intel_rb_format(irb));
+   /* CACHE_NEW_WM_PROG */
   uint32_t surf_index =
      brw->wm.prog_data->binding_table.render_target_start + unit;

@@ -737,7 +739,7 @@ const struct brw_tracked_state brw_renderbuffer_surfaces = {
      .mesa = (_NEW_COLOR |
               _NEW_BUFFERS),
      .brw = BRW_NEW_BATCH,
-      .cache = 0
+      .cache = CACHE_NEW_WM_PROG,
   },
   .emit = brw_update_renderbuffer_surfaces,
 };
@@ -764,6 +766,8 @@ update_stage_texture_surfaces(struct brw_context *brw,
   struct gl_context *ctx = &brw->ctx;

   uint32_t *surf_offset = stage_state->surf_offset;
+
+   /* CACHE_NEW_*_PROG */
   if (for_gather)
      surf_offset += stage_state->prog_data->binding_table.gather_texture_start;
   else
@@ -828,7 +832,7 @@ const struct brw_tracked_state brw_texture_surfaces = {
             BRW_NEW_VERTEX_PROGRAM |
             BRW_NEW_GEOMETRY_PROGRAM |
             BRW_NEW_FRAGMENT_PROGRAM,
-      .cache = 0
+      .cache = CACHE_NEW_VS_PROG | CACHE_NEW_GS_PROG | CACHE_NEW_WM_PROG,
   },
   .emit = brw_update_texture_surfaces,
 };
--- a/src/mesa/drivers/dri/i965/intel_blit.c
+++ b/src/mesa/drivers/dri/i965/intel_blit.c
@@ -342,9 +342,10 @@ intelEmitCopyBlit(struct brw_context *brw,
       dst_buffer, dst_pitch, dst_offset, dst_x, dst_y, w, h);

   /* Blit pitch must be dword-aligned.  Otherwise, the hardware appears to drop
-    * the low bits.
+    * the low bits.  Offsets must be naturally aligned.
    */
-   if (src_pitch % 4 != 0 || dst_pitch % 4 != 0)
+   if (src_pitch % 4 != 0 || src_offset % cpp != 0 ||
+       dst_pitch % 4 != 0 || dst_offset % cpp != 0)
      return false;

   /* For big formats (such as floating point), do the copy using 16 or 32bpp
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
@@ -488,8 +488,8 @@ linear_to_tiled(uint32_t xt1, uint32_t xt2,
         /* Translate by (xt,yt) for single-tile copier. */
         tile_copy(x0-xt, x1-xt, x2-xt, x3-xt,
                   y0-yt, y1-yt,
-                   dst + xt * th + yt * dst_pitch,
-                   src + xt      + yt * src_pitch,
+                   dst + (ptrdiff_t) xt * th + (ptrdiff_t) yt * dst_pitch,
+                   src + (ptrdiff_t) xt      + (ptrdiff_t) yt * src_pitch,
                   src_pitch,
                   swizzle_bit,
                   mem_copy);
@@ -654,7 +654,8 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx,
   linear_to_tiled(
      xoffset * cpp, (xoffset + width) * cpp,
      yoffset, yoffset + height,
-      bo->virtual, pixels - yoffset * src_pitch - xoffset * cpp,
+      bo->virtual,
+      pixels - (ptrdiff_t) yoffset * src_pitch - (ptrdiff_t) xoffset * cpp,
      image->mt->pitch, src_pitch,
      brw->has_swizzling,
      image->mt->tiling,
--- a/src/mesa/state_tracker/st_draw.c
+++ b/src/mesa/state_tracker/st_draw.c
@@ -40,6 +40,7 @@
 #include "main/image.h"
 #include "main/bufferobj.h"
 #include "main/macros.h"
+#include "main/varray.h"

 #include "vbo/vbo.h"

@@ -234,7 +235,7 @@ st_draw_vbo(struct gl_context *ctx,
       * so we only set these fields for indexed drawing:
       */
      info.primitive_restart = ctx->Array._PrimitiveRestart;
-      info.restart_index = ctx->Array.RestartIndex;
+      info.restart_index = _mesa_primitive_restart_index(ctx, ib->type);
   }
   else {
      /* Transform feedback drawing is always non-indexed. */
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -3527,7 +3527,8 @@ glsl_to_tgsi_visitor::copy_propagate(void)
               first = copy_chan;
            } else {
               if (first->src[0].file != copy_chan->src[0].file ||
-        	   first->src[0].index != copy_chan->src[0].index) {
+                   first->src[0].index != copy_chan->src[0].index ||
+                   first->src[0].index2D != copy_chan->src[0].index2D) {
        	  good = false;
        	  break;
               }
--- a/src/mesa/vbo/vbo_exec_array.c
+++ b/src/mesa/vbo/vbo_exec_array.c
@@ -596,7 +596,8 @@ vbo_draw_arrays(struct gl_context *ctx, GLenum mode, GLint start,
   prim[0].is_indirect = 0;

   /* Implement the primitive restart index */
-   if (ctx->Array.PrimitiveRestart && ctx->Array.RestartIndex < count) {
+   if (ctx->Array.PrimitiveRestart && !ctx->Array.PrimitiveRestartFixedIndex &&
+       ctx->Array.RestartIndex < count) {
      GLuint primCount = 0;

      if (ctx->Array.RestartIndex == start) {
@@ -1 +1 @@
 .4.0
 .4.2