mesa: set version string to 7.10

docs: Update 7.10 release notes
docs: Import 7.9.1 release notes from 7.9 branch
2011-01-07 14:09:03 -08:00 · 2011-01-07 14:07:51 -08:00 · 2011-01-07 13:41:15 -08:00 · 2011-01-07 03:12:28 -05:00 · 2011-01-07 07:10:49 +01:00 · 2011-01-06 19:51:14 -05:00
127 changed files with 5785 additions and 1810 deletions
--- a/2
+++ b/2
@@ -180,7 +180,7 @@ ultrix-gcc:

 # Rules for making release tarballs

-VERSION=7.10-devel
+VERSION=7.10
 DIRECTORY = Mesa-$(VERSION)
 LIB_NAME = MesaLib-$(VERSION)
 GLUT_NAME = MesaGLUT-$(VERSION)
--- a/configure.ac
+++ b/configure.ac
@@ -1352,7 +1352,7 @@ if test "x$enable_gallium_egl" = xauto; then
        enable_gallium_egl=$enable_egl
        ;;
    *)
-        enable_gallium_egl=no
+        enable_gallium_egl=$enable_openvg
        ;;
    esac
 fi
@@ -1467,10 +1467,6 @@ AC_SUBST([EGL_CLIENT_APIS])

 if test "x$HAVE_ST_EGL" = xyes; then
 	GALLIUM_TARGET_DIRS="$GALLIUM_TARGET_DIRS egl"
-	# define GLX_DIRECT_RENDERING even when the driver is not dri
-	if test "x$mesa_driver" != xdri -a "x$driglx_direct" = xyes; then
-            DEFINES="$DEFINES -DGLX_DIRECT_RENDERING"
-	fi
 fi

 if test "x$HAVE_ST_XORG" = xyes; then
--- a/docs/egl.html
+++ b/docs/egl.html
@@ -19,10 +19,7 @@ API entry points and helper functions for use by the drivers.  Drivers are
 dynamically loaded by the main library and most of the EGL API calls are
 directly dispatched to the drivers.</p>

-<p>The driver in use decides the window system to support.  For drivers that
-support hardware rendering, there are usually multiple drivers supporting the
-same window system.  Each one of of them supports a certain range of graphics
-cards.</p>
+<p>The driver in use decides the window system to support.</p>

 <h2>Build EGL</h2>

@@ -86,16 +83,19 @@ select the right platforms automatically.</p>

 <li><code>--enable-gles1</code> and <code>--enable-gles2</code>

-<p>These options enable OpenGL ES support in OpenGL.  The result is
-one big library that supports multiple APIs.</p>
+<p>These options enable OpenGL ES support in OpenGL.  The result is one big
+internal library that supports multiple APIs.</p>

 </li>

 <li><code>--enable-gles-overlay</code>

-<p>This option enables OpenGL ES as separate libraries.  This is an alternative
-approach to enable OpenGL ES.  It is only supported by
-<code>egl_gallium</code>.</p>
+<p>This option enables OpenGL ES as separate internal libraries.  This is an
+alternative approach to enable OpenGL ES.</p>
+
+<p>This is only supported by <code>egl_gallium</code>.  For systems using DRI
+drivers, <code>--enable-gles1</code> and <code>--enable-gles2</code> are
+suggested instead as all drivers will benefit.</p>

 </li>

@@ -134,6 +134,16 @@ colon-separated directories where the main library will look for drivers, in
 addition to the default directory.  This variable is ignored for setuid/setgid
 binaries.</p>

+<p>This variable is usually set to test an uninstalled build.  For example, one
+may set</p>
+
+<pre>
+  $ export LD_LIBRARY_PATH=$mesa/lib
+  $ export EGL_DRIVERS_PATH=$mesa/lib/egl
+</pre>
+
+<p>to test a build without installation</p>
+
 </li>

 <li><code>EGL_DRIVER</code>
@@ -180,8 +190,10 @@ variable to true forces the use of software rendering.</p>
 <li><code>egl_dri2</code>

 <p>This driver supports both <code>x11</code> and <code>drm</code> platforms.
-It functions as a DRI2 driver loader.  For <code>x11</code> support, it talks
-to the X server directly using (XCB-)DRI2 protocol.</p>
+It functions as a DRI driver loader.  For <code>x11</code> support, it talks to
+the X server directly using (XCB-)DRI2 protocol.</p>
+
+<p>This driver can share DRI drivers with <code>libGL</code>.</p>

 </li>

@@ -191,6 +203,10 @@ to the X server directly using (XCB-)DRI2 protocol.</p>
 hardwares supported by Gallium3D.  It is the only driver that supports OpenVG.
 The supported platforms are X11, DRM, FBDEV, and GDI.</p>

+<p>This driver comes with its own hardware drivers
+(<code>pipe_&lt;hw&gt;</code>) and client API modules
+(<code>st_&lt;api&gt;</code>).</p>
+
 </li>

 <li><code>egl_glx</code>
@@ -202,6 +218,21 @@ is not available in GLX or GLX extensions.</p>
 </li>
 </ul>

+<h2>Packaging</h2>
+
+<p>The ABI between the main library and its drivers are not stable.  Nor is
+there a plan to stabilize it at the moment.  Of the EGL drivers,
+<code>egl_gallium</code> has its own hardware drivers and client API modules.
+They are considered internal to <code>egl_gallium</code> and there is also no
+stable ABI between them.  These should be kept in mind when packaging for
+distribution.</p>
+
+<p>Generally, <code>egl_dri2</code> is preferred over <code>egl_gallium</code>
+when the system already has DRI drivers.  As <code>egl_gallium</code> is loaded
+before <code>egl_dri2</code> when both are available, <code>egl_gallium</code>
+may either be disabled with <code>--disable-gallium-egl</code> or packaged
+separately.</p>
+
 <h2>Developers</h2>

 <p>The sources of the main library and the classic drivers can be found at
--- a/docs/relnotes-7.10.html
+++ b/docs/relnotes-7.10.html
--- a/docs/relnotes-7.9.1.html
+++ b/docs/relnotes-7.9.1.html
@@ -0,0 +1,404 @@
+<HTML>
+
+<TITLE>Mesa Release Notes</TITLE>
+
+<head><link rel="stylesheet" type="text/css" href="mesa.css"></head>
+
+<BODY>
+
+<body bgcolor="#eeeeee">
+
+<H1>Mesa 7.9.1 Release Notes / January 7, 2011</H1>
+
+<p>
+Mesa 7.9.1 is a bug fix release which fixes bugs found since the 7.9 release.
+</p>
+<p>
+Mesa 7.9.1 implements the OpenGL 2.1 API, but the version reported by
+glGetString(GL_VERSION) depends on the particular driver being used.
+Some drivers don't support all the features required in OpenGL 2.1.
+</p>
+<p>
+See the <a href="install.html">Compiling/Installing page</a> for prerequisites
+for DRI hardware acceleration.
+</p>
+
+
+<h2>MD5 checksums</h2>
+<pre>
+78422843ea875ad4eac35b9b8584032b  MesaLib-7.9.1.tar.gz
+07dc6cfb5928840b8b9df5bd1b3ae434  MesaLib-7.9.1.tar.bz2
+c8eaea5b3c3d6dee784bd8c2db91c80f  MesaLib-7.9.1.zip
+ee9ecae4ca56fbb2d14dc15e3a0a7640  MesaGLUT-7.9.1.tar.gz
+41fc477d524e7dc5c84da8ef22422bea  MesaGLUT-7.9.1.tar.bz2
+90b287229afdf19317aa989d19462e7a  MesaGLUT-7.9.1.zip
+</pre>
+
+
+<h2>New features</h2>
+<p>None.</p>
+
+<h2>Bug fixes</h2>
+<p>This list is likely incomplete.</p>
+<ul>
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=28800">Bug 28800</a> - [r300c, r300g] Texture corruption with World of Warcraft</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=29420">Bug 29420</a> - Amnesia / HPL2 RendererFeatTest - not rendering correctly</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=29946">Bug 29946</a> - [swrast] piglit valgrind glsl-array-bounds-04 fails</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=30261">Bug 30261</a> - [GLSL 1.20] allowing inconsistent invariant declaration between two vertex shaders</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=30632">Bug 30632</a> - [softpipe] state_tracker/st_manager.c:489: st_context_notify_invalid_framebuffer: Assertion `stfb &amp;&amp; stfb-&gt;iface == stfbi' failed.</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=30694">Bug 30694</a> - wincopy will crash on Gallium drivers when going to front buffer</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=30787">Bug 30787</a> - Invalid asm shader does not generate draw-time error when used with GLSL shader</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=30993">Bug 30993</a> - getFramebufferAttachmentParameteriv wrongly generates error</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31101">Bug 31101</a> -  [glsl2] abort() in ir_validate::visit_enter(ir_assignment *ir)</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31193">Bug 31193</a> -  [regression] aa43176e break water reflections</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31194">Bug 31194</a> - The mesa meta save/restore code doesn't ref the current GLSL program</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31371">Bug 31371</a> - glslparsertest: ir.cpp:358: ir_constant::ir_constant(const glsl_type*, const ir_constant_data*): Assertion `(type->base_type &gt;= 0) &amp;&amp; (type->base_type &lt;= 3)' failed.</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31439">Bug 31439</a> - Crash in glBufferSubData() with size == 0</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31495">Bug 31495</a> - [i965 gles2c bisected] OpenGL ES 2.0 conformance GL2Tests_GetBIFD_input.run regressed</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31514">Bug 31514</a> - isBuffer returns true for unbound buffers</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31560">Bug 31560</a> - [tdfx] tdfx_tex.c:702: error: ‘const struct gl_color_table’ has no member named ‘Format’</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31617">Bug 31617</a> - Radeon/Compiz: 'failed to attach dri2 front buffer', error case not handled</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31648">Bug 31648</a> -  [GLSL] array-struct-array gets assertion: `(size &gt;= 1) && (size &lt;= 4)' failed.</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31650">Bug 31650</a> - [GLSL] varying gl_TexCoord fails to be re-declared to different size in the second shader</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31673">Bug 31673</a> - GL_FRAGMENT_PRECISION_HIGH preprocessor macro undefined in GLSL ES</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31690">Bug 31690</a> -  i915 shader compiler fails to flatten if in Aquarium webgl demo.</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31832">Bug 31832</a> - [i915] Bad renderbuffer format: 21</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31841">Bug 31841</a> - [drm:radeon_cs_ioctl] *ERROR* Invalid command stream !</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31894">Bug 31894</a> - Writing to gl_PointSize with GLES2 corrupts other varyings</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31909">Bug 31909</a> - [i965] brw_fs.cpp:1461: void fs_visitor::emit_bool_to_cond_code(ir_rvalue*): Assertion `expr-&gt;operands[i]-&gt;type-&gt;is_scalar()' failed.</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31934">Bug 31934</a> - [gallium] Mapping empty buffer object causes SIGSEGV</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31983">Bug 31983</a> -  [i915 gles2] "if (expression with builtin/varying variables) discard" breaks linkage</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31985">Bug 31985</a> - [GLSL 1.20] initialized uniform array considered as "unsized"</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=31987">Bug 31987</a> - [gles2] if input a wrong pname(GL_NONE) to glGetBoolean, it will not case GL_INVALID_ENUM</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=32035">Bug 32035</a> - [GLSL bisected] comparing unsized array gets segfault</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=32070">Bug 32070</a> - llvmpipe renders stencil demo incorrectly</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=32273">Bug 32273</a> - assertion fails when starting vdrift 2010 release with shaders enabled</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=32287">Bug 32287</a> - [bisected GLSL] float-int failure</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=32311">Bug 32311</a> - [965 bisected] Array look-ups broken on GM45</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=32520">Bug 32520</a> -  [gles2] glBlendFunc(GL_ZERO, GL_DST_COLOR) will result in GL_INVALID_ENUM</li>
+
+<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=32825">Bug 32825</a> - egl_glx driver completely broken in 7.9 branch [fix in master]</li>
+</ul>
+
+
+<h2>Changes</h2>
+<p>The full set of changes can be viewed by using the following GIT command:</p>
+
+<pre>
+  git log mesa-7.9..mesa-7.9.1
+</pre>
+
+<p>Alex Deucher (5):
+<ul>
+  <li>r100: revalidate after radeon_update_renderbuffers</li>
+  <li>r600c: add missing radeon_prepare_render() call on evergreen</li>
+  <li>r600c: properly align mipmaps to group size</li>
+  <li>gallium/egl: fix r300 vs r600 loading</li>
+  <li>r600c: fix some opcodes on evergreen</li>
+</ul></p>
+
+<p>Aras Pranckevicius (2):
+<ul>
+  <li>glsl: fix crash in loop analysis when some controls can't be determined</li>
+  <li>glsl: fix matrix type check in ir_algebraic</li>
+</ul></p>
+
+<p>Brian Paul (27):
+<ul>
+  <li>swrast: fix choose_depth_texture_level() to respect mipmap filtering state</li>
+  <li>st/mesa: replace assertion w/ conditional in framebuffer invalidation</li>
+  <li>egl/i965: include inline_wrapper_sw_helper.h</li>
+  <li>mesa: Add missing else in do_row_3D</li>
+  <li>mesa: add missing formats in _mesa_format_to_type_and_comps()</li>
+  <li>mesa: handle more pixel types in mipmap generation code</li>
+  <li>mesa: make glIsBuffer() return false for never bound buffers</li>
+  <li>mesa: fix glDeleteBuffers() regression</li>
+  <li>swrast: init alpha value to 1.0 in opt_sample_rgb_2d()</li>
+  <li>meta: Mask Stencil.Clear against stencilMax in _mesa_meta_Clear</li>
+  <li>st/mesa: fix mapping of zero-sized buffer objects</li>
+  <li>mesa: check for posix_memalign() errors</li>
+  <li>llvmpipe: fix broken stencil writemask</li>
+  <li>mesa: fix GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME query</li>
+  <li>mesa: return GL_FRAMEBUFFER_DEFAULT as FBO attachment type</li>
+  <li>mesa: make glGet*(GL_NONE) generate GL_INVALID_ENUM</li>
+  <li>mesa: test for cube map completeness in glGenerateMipmap()</li>
+  <li>tnl: Initialize gl_program_machine memory in run_vp.</li>
+  <li>tnl: a better way to initialize the gl_program_machine memory</li>
+  <li>mesa, st/mesa: disable GL_ARB_geometry_shader4</li>
+  <li>glsl: fix off by one in register index assertion</li>
+  <li>st/mesa: fix mipmap generation bug</li>
+  <li>glsl: new glsl_strtod() wrapper to fix decimal point interpretation</li>
+  <li>mesa: no-op glBufferSubData() on size==0</li>
+  <li>tdfx: s/Format/_BaseFormat/</li>
+  <li>st/mesa: fix renderbuffer pointer check in st_Clear()</li>
+  <li>mesa: Bump the number of bits in the register index.</li>
+</ul></p>
+
+<p>Chad Versace (5):
+<ul>
+  <li>glsl: Fix lexer rule for ^=</li>
+  <li>glsl: Fix ast-to-hir for ARB_fragment_coord_conventions</li>
+  <li>glsl: Fix ir_expression::constant_expression_value()</li>
+  <li>glsl: Fix erroneous cast in ast_jump_statement::hir()</li>
+  <li>glsl: Fix linker bug in cross_validate_globals()</li>
+</ul></p>
+
+<p>Chia-I Wu (10):
+<ul>
+  <li>targets/egl: Fix linking with libdrm.</li>
+  <li>st/vega: Fix version check in context creation.</li>
+  <li>st/egl: Do not finish a fence that is NULL.</li>
+  <li>egl: Fix a false negative check in _eglCheckMakeCurrent.</li>
+  <li>st/mesa: Unreference the sampler view in st_bind_surface.</li>
+  <li>egl_dri2: Fix __DRI_DRI2 version 1 support.</li>
+  <li>st/vega: Do not wait NULL fences.</li>
+  <li>mesa: Do not advertise GL_OES_texture_3D.</li>
+  <li>egl_glx: Fix borken driver.</li>
+  <li>egl: Check extensions.</li>
+</ul></p>
+
+<p>Daniel Lichtenberger (1):
+<ul>
+  <li>radeon: fix potential segfault in renderbuffer update</li>
+</ul></p>
+
+<p>Daniel Vetter (1):
+<ul>
+  <li>r200: revalidate after radeon_update_renderbuffers</li>
+</ul></p>
+
+<p>Dave Airlie (1):
+<ul>
+  <li>r300g: fixup rs690 tiling stride alignment calculations.</li>
+</ul></p>
+
+<p>Eric Anholt (13):
+<ul>
+  <li>intel: Allow CopyTexSubImage to InternalFormat 3/4 textures, like RGB/RGBA.</li>
+  <li>glsl: Free the loop state context when we free the loop state.</li>
+  <li>i965: Allow OPCODE_SWZ to put immediates in the first arg.</li>
+  <li>i965: Add support for rendering to SARGB8 FBOs.</li>
+  <li>glsl: Add a helper constructor for expressions that works out result type.</li>
+  <li>glsl: Fix structure and array comparisions.</li>
+  <li>glsl: Quiet unreachable no-return-from-function warning.</li>
+  <li>glsl: Mark the array access for whole-array comparisons.</li>
+  <li>glsl: Fix flipped return of has_value() for array constants.</li>
+  <li>mesa: Add getters for the rest of the supported draw buffers.</li>
+  <li>mesa: Add getters for ARB_copy_buffer's attachment points.</li>
+  <li>i965: Correct the dp_read message descriptor setup on g4x.</li>
+  <li>glsl: Correct the marking of InputsRead/OutputsWritten on in/out matrices.</li>
+</ul></p>
+
+<p>Fabian Bieler (1):
+<ul>
+  <li>glsl: fix lowering conditional returns in subroutines</li>
+</ul></p>
+
+<p>Francisco Jerez (3):
+<ul>
+  <li>meta: Don't leak alpha function/reference value changes.</li>
+  <li>meta: Fix incorrect rendering of the bitmap alpha component.</li>
+  <li>meta: Don't try to disable cube maps if the driver doesn't expose the extension.</li>
+</ul></p>
+
+<p>Henri Verbeet (2):
+<ul>
+  <li>r600: Evergreen has two extra frac_bits for the sampler LOD state.</li>
+  <li>st/mesa: Handle wrapped depth buffers in st_copy_texsubimage().</li>
+</ul></p>
+
+<p>Ian Romanick (33):
+<ul>
+  <li>Add 7.9 md5sums</li>
+  <li>docs: Import 7.8.x release notes from 7.8 branch.</li>
+  <li>docs: download.html does not need to be updated for each release</li>
+  <li>docs: Update mailing lines from sf.net to freedesktop.org</li>
+  <li>docs: added news item for 7.9 release</li>
+  <li>mesa: Validate assembly shaders when GLSL shaders are used</li>
+  <li>linker: Reject shaders that have unresolved function calls</li>
+  <li>mesa: Refactor validation of shader targets</li>
+  <li>glsl: Slightly change the semantic of _LinkedShaders</li>
+  <li>linker: Improve handling of unread/unwritten shader inputs/outputs</li>
+  <li>glsl: Commit lexer files changed by previous cherry picking</li>
+  <li>mesa: Make metaops use program refcounts instead of names.</li>
+  <li>glsl: Fix incorrect gl_type of sampler2DArray and sampler1DArrayShadow</li>
+  <li>mesa: Allow query of MAX_SAMPLES with EXT_framebuffer_multisample</li>
+  <li>glsl: better handling of linker failures</li>
+  <li>mesa: Fix glGet of ES2's GL_MAX_*_VECTORS properties.</li>
+  <li>i915: Disallow alpha, red, RG, and sRGB as render targets</li>
+  <li>glsl/linker: Free any IR discarded by optimization passes.</li>
+  <li>glsl: Add an optimization pass to simplify discards.</li>
+  <li>glsl: Add a lowering pass to move discards out of if-statements.</li>
+  <li>i915: Correctly generate unconditional KIL instructions</li>
+  <li>glsl: Add unary ir_expression constructor</li>
+  <li>glsl: Ensure that equality comparisons don't return a NULL IR tree</li>
+  <li>glcpp: Commit changes in generated files cause by previous commit</li>
+  <li>glsl: Inherrit type of declared variable from initializer</li>
+  <li>glsl: Inherrit type of declared variable from initializer after processing assignment</li>
+  <li>linker: Ensure that unsized arrays have a size after linking</li>
+  <li>linker: Fix regressions caused by previous commit</li>
+  <li>linker: Allow built-in arrays to have different sizes between shader stages</li>
+  <li>ir_to_mesa: Don't generate swizzles for record derefs of non-scalar/vectors</li>
+  <li>Refresh autogenerated file builtin_function.cpp.</li>
+  <li>docs: Initial set of release notes for 7.9.1</li>
+  <li>mesa: set version string to 7.9.1</li>
+</ul></p>
+
+<p>Julien Cristau (1):
+<ul>
+  <li>Makefile: don't include the same files twice in the tarball</li>
+</ul></p>
+
+<p>Kenneth Graunke (19):
+<ul>
+  <li>glcpp: Return NEWLINE token for newlines inside multi-line comments.</li>
+  <li>generate_builtins.py: Output large strings as arrays of characters.</li>
+  <li>glsl: Fix constant component count in vector constructor emitting.</li>
+  <li>ir_dead_functions: Actually free dead functions and signatures.</li>
+  <li>glcpp: Define GL_FRAGMENT_PRECISION_HIGH if GLSL version &gt;= 1.30.</li>
+  <li>glsl: Unconditionally define GL_FRAGMENT_PRECISION_HIGH in ES2 shaders.</li>
+  <li>glsl: Fix constant expression handling for &lt, &gt;, &lt=, &gt;= on vectors.</li>
+  <li>glsl: Use do_common_optimization in the standalone compiler.</li>
+  <li>glsl: Don't inline function prototypes.</li>
+  <li>glsl: Add a virtual as_discard() method.</li>
+  <li>glsl: Remove "discard" support from lower_jumps.</li>
+  <li>glsl: Refactor get_num_operands.</li>
+  <li>glcpp: Don't emit SPACE tokens in conditional_tokens production.</li>
+  <li>glsl: Clean up code by adding a new is_break() function.</li>
+  <li>glsl: Consider the "else" branch when looking for loop breaks.</li>
+  <li>Remove OES_compressed_paletted_texture from the ES2 extension list.</li>
+  <li>glsl/builtins: Compute the correct value for smoothstep(vec, vec, vec).</li>
+  <li>Fix build on systems where "python" is python 3.</li>
+  <li>i965: Internally enable GL_NV_blend_square on ES2.</li>
+</ul></p>
+
+<p>Kristian Høgsberg (1):
+<ul>
+  <li>i965: Don't write mrf assignment for pointsize output</li>
+</ul></p>
+
+<p>Luca Barbieri (1):
+<ul>
+  <li>glsl: Unroll loops with conditional breaks anywhere (not just the end)</li>
+</ul></p>
+
+<p>Marek Olšák (17):
+<ul>
+  <li>r300g: fix microtiling for 16-bits-per-channel formats</li>
+  <li>r300g: fix texture border for 16-bits-per-channel formats</li>
+  <li>r300g: add a default channel ordering of texture border for unhandled formats</li>
+  <li>r300g: fix texture border color for all texture formats</li>
+  <li>r300g: fix rendering with no vertex elements</li>
+  <li>r300/compiler: fix rc_rewrite_depth_out for it to work with any instruction</li>
+  <li>r300g: fix texture border color once again</li>
+  <li>r300g: fix texture swizzling with compressed textures on r400-r500</li>
+  <li>r300g: disable ARB_texture_swizzle if S3TC is enabled on r3xx-only</li>
+  <li>mesa, st/mesa: fix gl_FragCoord with FBOs in Gallium</li>
+  <li>st/mesa: initialize key in st_vp_varient</li>
+  <li>r300/compiler: fix swizzle lowering with a presubtract source operand</li>
+  <li>r300g: fix rendering with a vertex attrib having a zero stride</li>
+  <li>ir_to_mesa: Add support for conditional discards.</li>
+  <li>r300g: finally fix the texture corruption on r3xx-r4xx</li>
+  <li>mesa: fix texel store functions for some float formats</li>
+  <li>r300/compiler: disable the rename_regs pass for loops</li>
+</ul></p>
+
+<p>Mario Kleiner (1):
+<ul>
+  <li>mesa/r300classic: Fix dri2Invalidate/radeon_prepare_render for page flipping.</li>
+</ul></p>
+
+<p>Peter Clifton (1):
+<ul>
+  <li>intel: Fix emit_linear_blit to use DWORD aligned width blits</li>
+</ul></p>
+
+<p>Robert Hooker (2):
+<ul>
+  <li>intel: Add a new B43 pci id.</li>
+  <li>egl_dri2: Add missing intel chip ids.</li>
+</ul></p>
+
+<p>Roland Scheidegger (1):
+<ul>
+  <li>r200: fix r200 large points</li>
+</ul></p>
+
+<p>Thomas Hellstrom (17):
+<ul>
+  <li>st/xorg: Don't try to use option values before processing options</li>
+  <li>xorg/vmwgfx: Make vmwarectrl work also on 64-bit servers</li>
+  <li>st/xorg: Add a customizer option to get rid of annoying cursor update flicker</li>
+  <li>xorg/vmwgfx: Don't hide HW cursors when updating them</li>
+  <li>st/xorg: Don't try to remove invalid fbs</li>
+  <li>st/xorg: Fix typo</li>
+  <li>st/xorg, xorg/vmwgfx: Be a bit more frendly towards cross-compiling environments</li>
+  <li>st/xorg: Fix compilation errors for Xservers compiled without Composite</li>
+  <li>st/xorg: Don't use deprecated x*alloc / xfree functions</li>
+  <li>xorg/vmwgfx: Don't use deprecated x*alloc / xfree functions</li>
+  <li>st/xorg: Fix compilation for Xservers &gt;= 1.10</li>
+  <li>mesa: Make sure we have the talloc cflags when using the talloc headers</li>
+  <li>egl: Add an include for size_t</li>
+  <li>mesa: Add talloc includes for gles</li>
+  <li>st/egl: Fix build for include files in nonstandard places</li>
+  <li>svga/drm: Optionally resolve calls to powf during link-time</li>
+  <li>gallium/targets: Trivial crosscompiling fix</li>
+</ul></p>
+
+<p>Tom Stellard (7):
+<ul>
+  <li>r300/compiler: Make sure presubtract sources use supported swizzles</li>
+  <li>r300/compiler: Fix register allocator's handling of loops</li>
+  <li>r300/compiler: Fix instruction scheduling within IF blocks</li>
+  <li>r300/compiler: Use zero as the register index for unused sources</li>
+  <li>r300/compiler: Ignore alpha dest register when replicating the result</li>
+  <li>r300/compiler: Use correct swizzles for all presubtract sources</li>
+  <li>r300/compiler: Don't allow presubtract sources to be remapped twice</li>
+</ul></p>
+
+<p>Vinson Lee (1):
+<ul>
+  <li>glsl: Fix 'control reaches end of non-void function' warning.</li>
+</ul></p>
+
+<p>richard (1):
+<ul>
+  <li>r600c : inline vertex format is not updated in an app, switch to use vfetch constants. For the 7.9 and 7.10 branches as well.</li>
+</ul></p>
+
+</body>
+</html>
--- a/src/gallium/auxiliary/draw/draw_vs_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_vs_llvm.c
@@ -65,19 +65,7 @@ static void
 vs_llvm_delete( struct draw_vertex_shader *dvs )
 {
   struct llvm_vertex_shader *shader = llvm_vertex_shader(dvs);
-   struct pipe_fence_handle *fence = NULL;
   struct draw_llvm_variant_list_item *li;
-   struct pipe_context *pipe = dvs->draw->pipe;
-
-   /*
-    * XXX: This might be not neccessary at all.
-    */
-   pipe->flush(pipe, 0, &fence);
-   if (fence) {
-      pipe->screen->fence_finish(pipe->screen, fence, 0);
-      pipe->screen->fence_reference(pipe->screen, &fence, NULL);
-   }
-

   li = first_elem(&shader->variants);
   while(!at_end(&shader->variants, li)) {
--- a/src/gallium/drivers/llvmpipe/lp_scene.c
+++ b/src/gallium/drivers/llvmpipe/lp_scene.c
@@ -74,6 +74,7 @@ lp_scene_create( struct pipe_context *pipe )
 void
 lp_scene_destroy(struct lp_scene *scene)
 {
+   lp_fence_reference(&scene->fence, NULL);
   pipe_mutex_destroy(scene->mutex);
   assert(scene->data.head->next == NULL);
   FREE(scene->data.head);
--- a/src/gallium/drivers/llvmpipe/lp_setup.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup.c
@@ -996,6 +996,8 @@ lp_setup_destroy( struct lp_setup_context *setup )
      lp_scene_destroy(scene);
   }

+   lp_fence_reference(&setup->last_fence, NULL);
+
   FREE( setup );
 }

--- a/src/gallium/drivers/r300/r300_context.c
+++ b/src/gallium/drivers/r300/r300_context.c
@@ -35,7 +35,9 @@
 #include "r300_screen_buffer.h"
 #include "r300_winsys.h"

-#include <inttypes.h>
+#ifdef HAVE_LLVM
+#include "gallivm/lp_bld_init.h"
+#endif

 static void r300_update_num_contexts(struct r300_screen *r300screen,
                                     int diff)
@@ -103,9 +105,14 @@ static void r300_destroy_context(struct pipe_context* context)

    if (r300->blitter)
        util_blitter_destroy(r300->blitter);
-    if (r300->draw)
+    if (r300->draw) {
        draw_destroy(r300->draw);

+#ifdef HAVE_LLVM
+        gallivm_destroy(r300->gallivm);
+#endif
+    }
+
    if (r300->upload_vb)
        u_upload_destroy(r300->upload_vb);
    if (r300->upload_ib)
@@ -424,7 +431,12 @@ struct pipe_context* r300_create_context(struct pipe_screen* screen,

    if (!r300screen->caps.has_tcl) {
        /* Create a Draw. This is used for SW TCL. */
+#ifdef HAVE_LLVM
+        r300->gallivm = gallivm_create();
+        r300->draw = draw_create_gallivm(&r300->context, r300->gallivm);
+#else
        r300->draw = draw_create(&r300->context);
+#endif
        if (r300->draw == NULL)
            goto fail;
        /* Enable our renderer. */
--- a/src/gallium/drivers/r300/r300_context.h
+++ b/src/gallium/drivers/r300/r300_context.h
@@ -459,6 +459,7 @@ struct r300_context {
    struct r300_screen *screen;

    /* Draw module. Used mostly for SW TCL. */
+    struct gallivm_state *gallivm;
    struct draw_context* draw;
    /* Vertex buffer for SW TCL. */
    struct pipe_resource* vbo;
--- a/src/gallium/drivers/r300/r300_screen.c
+++ b/src/gallium/drivers/r300/r300_screen.c
@@ -34,6 +34,10 @@

 #include "draw/draw_context.h"

+#ifdef HAVE_LLVM
+#include "gallivm/lp_bld_init.h"
+#endif
+
 /* Return the identifier behind whom the brave coders responsible for this
 * amalgamation of code, sweat, and duct tape, routinely obscure their names.
 *
@@ -484,5 +488,9 @@ struct pipe_screen* r300_screen_create(struct r300_winsys_screen *rws)

    util_format_s3tc_init();

+#ifdef HAVE_LLVM
+    lp_build_init();
+#endif
+
    return &r300screen->screen;
 }
--- a/src/gallium/drivers/r300/r300_screen_buffer.c
+++ b/src/gallium/drivers/r300/r300_screen_buffer.c
@@ -119,6 +119,7 @@ int r300_upload_user_buffers(struct r300_context *r300)
            vb->buffer = upload_buffer;
            vb->buffer_offset = upload_offset;
            r300->validate_buffers = TRUE;
+            r300->aos_dirty = TRUE;
        }
    }
    return ret;
--- a/src/gallium/drivers/r300/r300_state.c
+++ b/src/gallium/drivers/r300/r300_state.c
@@ -1298,29 +1298,27 @@ static void r300_set_fragment_sampler_views(struct pipe_context* pipe,
    }

    for (i = 0; i < count; i++) {
-        if (&state->sampler_views[i]->base != views[i]) {
-            pipe_sampler_view_reference(
-                    (struct pipe_sampler_view**)&state->sampler_views[i],
-                    views[i]);
+        pipe_sampler_view_reference(
+                (struct pipe_sampler_view**)&state->sampler_views[i],
+                views[i]);

-            if (!views[i]) {
-                continue;
-            }
-
-            /* A new sampler view (= texture)... */
-            dirty_tex = TRUE;
-
-            /* Set the texrect factor in the fragment shader.
-             * Needed for RECT and NPOT fallback. */
-            texture = r300_texture(views[i]->texture);
-            if (texture->desc.is_npot) {
-                r300_mark_atom_dirty(r300, &r300->fs_rc_constant_state);
-            }
-
-            state->sampler_views[i]->texcache_region =
-                r300_assign_texture_cache_region(view_index, real_num_views);
-            view_index++;
+        if (!views[i]) {
+            continue;
        }
+
+        /* A new sampler view (= texture)... */
+        dirty_tex = TRUE;
+
+        /* Set the texrect factor in the fragment shader.
+             * Needed for RECT and NPOT fallback. */
+        texture = r300_texture(views[i]->texture);
+        if (texture->desc.is_npot) {
+            r300_mark_atom_dirty(r300, &r300->fs_rc_constant_state);
+        }
+
+        state->sampler_views[i]->texcache_region =
+                r300_assign_texture_cache_region(view_index, real_num_views);
+        view_index++;
    }

    for (i = count; i < tex_units; i++) {
@@ -1496,14 +1494,14 @@ static void r300_set_vertex_buffers(struct pipe_context* pipe,
                any_user_buffer = TRUE;
            }

+            /* The stride of zero means we will be fetching only the first
+             * vertex, so don't care about max_index. */
+            if (!vbo->stride)
+                continue;
+
            if (vbo->max_index == ~0) {
-                /* if no VBO stride then only one vertex value so max index is 1 */
-                /* should think about converting to VS constants like svga does */
-                if (!vbo->stride)
-                    vbo->max_index = 1;
-                else
-                    vbo->max_index =
-                             (vbo->buffer->width0 - vbo->buffer_offset) / vbo->stride;
+                vbo->max_index =
+                        (vbo->buffer->width0 - vbo->buffer_offset) / vbo->stride;
            }

            max_index = MIN2(vbo->max_index, max_index);
--- a/src/gallium/drivers/r300/r300_texture.c
+++ b/src/gallium/drivers/r300/r300_texture.c
@@ -899,7 +899,7 @@ struct pipe_surface* r300_create_surface(struct pipe_context * ctx,
                                               tex->desc.b.b.nr_samples,
                                               tex->desc.microtile,
                                               tex->desc.macrotile[level],
-                                               DIM_HEIGHT);
+                                               DIM_HEIGHT, 0);

        surface->cbzb_height = align((surface->base.height + 1) / 2,
                                     tile_height);
--- a/src/gallium/drivers/r300/r300_texture_desc.c
+++ b/src/gallium/drivers/r300/r300_texture_desc.c
@@ -34,7 +34,7 @@ unsigned r300_get_pixel_alignment(enum pipe_format format,
                                  unsigned num_samples,
                                  enum r300_buffer_tiling microtile,
                                  enum r300_buffer_tiling macrotile,
-                                  enum r300_dim dim)
+                                  enum r300_dim dim, boolean is_rs690)
 {
    static const unsigned table[2][5][3][2] =
    {
@@ -57,6 +57,7 @@ unsigned r300_get_pixel_alignment(enum pipe_format format,
            {{ 16, 8}, { 0,  0}, { 0,  0}}  /* 128 bits per pixel */
        }
    };
+
    static const unsigned aa_block[2] = {4, 8};
    unsigned tile = 0;
    unsigned pixsize = util_format_get_blocksize(format);
@@ -74,6 +75,14 @@ unsigned r300_get_pixel_alignment(enum pipe_format format,
    } else {
        /* Standard alignment. */
        tile = table[macrotile][util_logbase2(pixsize)][microtile][dim];
+        if (macrotile == 0 && is_rs690 && dim == DIM_WIDTH) {
+            int align;
+            int h_tile;
+            h_tile = table[macrotile][util_logbase2(pixsize)][microtile][DIM_HEIGHT];
+            align = 64 / (pixsize * h_tile);
+            if (tile < align)
+                tile = align;
+        }
    }

    assert(tile);
@@ -89,7 +98,7 @@ static boolean r300_texture_macro_switch(struct r300_texture_desc *desc,
    unsigned tile, texdim;

    tile = r300_get_pixel_alignment(desc->b.b.format, desc->b.b.nr_samples,
-                                    desc->microtile, R300_BUFFER_TILED, dim);
+                                    desc->microtile, R300_BUFFER_TILED, dim, 0);
    if (dim == DIM_WIDTH) {
        texdim = u_minify(desc->width0, level);
    } else {
@@ -113,6 +122,9 @@ static unsigned r300_texture_get_stride(struct r300_screen *screen,
                                        unsigned level)
 {
    unsigned tile_width, width, stride;
+    boolean is_rs690 = (screen->caps.family == CHIP_FAMILY_RS600 ||
+                        screen->caps.family == CHIP_FAMILY_RS690 ||
+                        screen->caps.family == CHIP_FAMILY_RS740);

    if (desc->stride_in_bytes_override)
        return desc->stride_in_bytes_override;
@@ -131,38 +143,14 @@ static unsigned r300_texture_get_stride(struct r300_screen *screen,
                                              desc->b.b.nr_samples,
                                              desc->microtile,
                                              desc->macrotile[level],
-                                              DIM_WIDTH);
+                                              DIM_WIDTH, is_rs690);
        width = align(width, tile_width);

        stride = util_format_get_stride(desc->b.b.format, width);
-
-        /* Some IGPs need a minimum stride of 64 bytes, hmm... */
-        if (!desc->macrotile[level] &&
-            (screen->caps.family == CHIP_FAMILY_RS600 ||
-             screen->caps.family == CHIP_FAMILY_RS690 ||
-             screen->caps.family == CHIP_FAMILY_RS740)) {
-            unsigned min_stride;
-
-            if (desc->microtile) {
-                unsigned tile_height =
-                        r300_get_pixel_alignment(desc->b.b.format,
-                                                 desc->b.b.nr_samples,
-                                                 desc->microtile,
-                                                 desc->macrotile[level],
-                                                 DIM_HEIGHT);
-
-                min_stride = 64 / tile_height;
-            } else {
-                min_stride = 64;
-            }
-
-            return stride < min_stride ? min_stride : stride;
-        }
-
        /* The alignment to 32 bytes is sort of implied by the layout... */
        return stride;
    } else {
-        return align(util_format_get_stride(desc->b.b.format, width), 32);
+        return align(util_format_get_stride(desc->b.b.format, width), is_rs690 ? 64 : 32);
    }
 }

@@ -179,7 +167,7 @@ static unsigned r300_texture_get_nblocksy(struct r300_texture_desc *desc,
                                               desc->b.b.nr_samples,
                                               desc->microtile,
                                               desc->macrotile[level],
-                                               DIM_HEIGHT);
+                                               DIM_HEIGHT, 0);
        height = align(height, tile_height);

        /* This is needed for the kernel checker, unfortunately. */
--- a/src/gallium/drivers/r300/r300_texture_desc.h
+++ b/src/gallium/drivers/r300/r300_texture_desc.h
@@ -41,7 +41,7 @@ unsigned r300_get_pixel_alignment(enum pipe_format format,
                                  unsigned num_samples,
                                  enum r300_buffer_tiling microtile,
                                  enum r300_buffer_tiling macrotile,
-                                  enum r300_dim dim);
+                                  enum r300_dim dim, boolean is_rs690);

 boolean r300_texture_desc_init(struct r300_screen *rscreen,
                               struct r300_texture_desc *desc,
--- a/src/gallium/drivers/r600/Makefile
+++ b/src/gallium/drivers/r600/Makefile
@@ -21,6 +21,7 @@ C_SOURCES = \
 	evergreen_state.c \
 	eg_asm.c \
 	r600_translate.c \
-	r600_state_common.c
+	r600_state_common.c \
+	r600_upload.c

 include ../../Makefile.template
--- a/src/gallium/drivers/r600/SConscript
+++ b/src/gallium/drivers/r600/SConscript
@@ -28,6 +28,7 @@ r600 = env.ConvenienceLibrary(
        'r600_state_common.c',
        'r600_texture.c',
        'r600_translate.c',
+        'r600_upload.c',
        'r700_asm.c',
        'evergreen_state.c',
        'eg_asm.c',
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1069,12 +1069,76 @@ void evergreen_init_config(struct r600_pipe_context *rctx)
 		num_hs_stack_entries = 42;
 		num_ls_stack_entries = 42;
 		break;
+	case CHIP_BARTS:
+		num_ps_gprs = 93;
+		num_vs_gprs = 46;
+		num_temp_gprs = 4;
+		num_gs_gprs = 31;
+		num_es_gprs = 31;
+		num_hs_gprs = 23;
+		num_ls_gprs = 23;
+		num_ps_threads = 128;
+		num_vs_threads = 20;
+		num_gs_threads = 20;
+		num_es_threads = 20;
+		num_hs_threads = 20;
+		num_ls_threads = 20;
+		num_ps_stack_entries = 85;
+		num_vs_stack_entries = 85;
+		num_gs_stack_entries = 85;
+		num_es_stack_entries = 85;
+		num_hs_stack_entries = 85;
+		num_ls_stack_entries = 85;
+		break;
+	case CHIP_TURKS:
+		num_ps_gprs = 93;
+		num_vs_gprs = 46;
+		num_temp_gprs = 4;
+		num_gs_gprs = 31;
+		num_es_gprs = 31;
+		num_hs_gprs = 23;
+		num_ls_gprs = 23;
+		num_ps_threads = 128;
+		num_vs_threads = 20;
+		num_gs_threads = 20;
+		num_es_threads = 20;
+		num_hs_threads = 20;
+		num_ls_threads = 20;
+		num_ps_stack_entries = 42;
+		num_vs_stack_entries = 42;
+		num_gs_stack_entries = 42;
+		num_es_stack_entries = 42;
+		num_hs_stack_entries = 42;
+		num_ls_stack_entries = 42;
+		break;
+	case CHIP_CAICOS:
+		num_ps_gprs = 93;
+		num_vs_gprs = 46;
+		num_temp_gprs = 4;
+		num_gs_gprs = 31;
+		num_es_gprs = 31;
+		num_hs_gprs = 23;
+		num_ls_gprs = 23;
+		num_ps_threads = 128;
+		num_vs_threads = 10;
+		num_gs_threads = 10;
+		num_es_threads = 10;
+		num_hs_threads = 10;
+		num_ls_threads = 10;
+		num_ps_stack_entries = 42;
+		num_vs_stack_entries = 42;
+		num_gs_stack_entries = 42;
+		num_es_stack_entries = 42;
+		num_hs_stack_entries = 42;
+		num_ls_stack_entries = 42;
+		break;
 	}

 	tmp = 0x00000000;
 	switch (family) {
 	case CHIP_CEDAR:
 	case CHIP_PALM:
+	case CHIP_CAICOS:
 		break;
 	default:
 		tmp |= S_008C00_VC_ENABLE(1);
@@ -1295,11 +1359,6 @@ void evergreen_vertex_buffer_update(struct r600_pipe_context *rctx)
 	if (rctx->vertex_elements == NULL || !rctx->nvertex_buffer)
 		return;

-	/* delete previous translated vertex elements */
-	if (rctx->tran.new_velems) {
-		r600_end_vertex_translate(rctx);
-	}
-
 	if (rctx->vertex_elements->incompatible_layout) {
 		/* translate rebind new vertex elements so
 		 * return once translated
@@ -1332,21 +1391,21 @@ void evergreen_vertex_buffer_update(struct r600_pipe_context *rctx)
 			vbuffer_index = rctx->vertex_elements->elements[i].vertex_buffer_index;
 			vertex_buffer = &rctx->vertex_buffer[vbuffer_index];
 			rbuffer = (struct r600_resource*)vertex_buffer->buffer;
-			offset = rctx->vertex_elements->vbuffer_offset[i] +
-				vertex_buffer->buffer_offset +
-				r600_bo_offset(rbuffer->bo);
+			offset = rctx->vertex_elements->vbuffer_offset[i];
 		} else {
 			/* bind vertex buffer once */
 			vertex_buffer = &rctx->vertex_buffer[i];
 			rbuffer = (struct r600_resource*)vertex_buffer->buffer;
-			offset = vertex_buffer->buffer_offset +
-				r600_bo_offset(rbuffer->bo);
+			offset = 0;
 		}
+		if (vertex_buffer == NULL || rbuffer == NULL)
+			continue;
+		offset += vertex_buffer->buffer_offset + r600_bo_offset(rbuffer->bo);

 		r600_pipe_state_add_reg(rstate, R_030000_RESOURCE0_WORD0,
 					offset, 0xFFFFFFFF, rbuffer->bo);
 		r600_pipe_state_add_reg(rstate, R_030004_RESOURCE0_WORD1,
-					rbuffer->size - offset - 1, 0xFFFFFFFF, NULL);
+					rbuffer->bo_size - offset - 1, 0xFFFFFFFF, NULL);
 		r600_pipe_state_add_reg(rstate, R_030008_RESOURCE0_WORD2,
 					S_030008_STRIDE(vertex_buffer->stride),
 					0xFFFFFFFF, NULL);
--- a/src/gallium/drivers/r600/r600.h
+++ b/src/gallium/drivers/r600/r600.h
@@ -92,6 +92,9 @@ enum radeon_family {
 	CHIP_CYPRESS,
 	CHIP_HEMLOCK,
 	CHIP_PALM,
+	CHIP_BARTS,
+	CHIP_TURKS,
+	CHIP_CAICOS,
 	CHIP_LAST,
 };

--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -155,6 +155,9 @@ int r600_bc_init(struct r600_bc *bc, enum radeon_family family)
 	case CHIP_CYPRESS:
 	case CHIP_HEMLOCK:
 	case CHIP_PALM:
+	case CHIP_BARTS:
+	case CHIP_TURKS:
+	case CHIP_CAICOS:
 		bc->chiprev = CHIPREV_EVERGREEN;
 		break;
 	default:
@@ -470,7 +473,22 @@ int r600_bc_add_alu_type(struct r600_bc *bc, const struct r600_bc_alu *alu, int
 	bc->cf_last->ndw += 2;
 	bc->ndw += 2;

-	bc->cf_last->kcache0_mode = 2;
+	/* The following configuration provides 64 128-bit constants.
+	 * Each cacheline holds 16 128-bit constants and each
+	 * kcache can lock 2 cachelines and there are 2 kcaches per
+	 * ALU clause for a max of 64 constants.
+	 * For supporting more than 64 constants, the code needs
+	 * to be broken down into multiple ALU clauses.
+	 */
+	/* select the constant buffer (0-15) for each kcache */
+	bc->cf_last->kcache0_bank = 0;
+	bc->cf_last->kcache1_bank = 0;
+	/* lock 2 cachelines per kcache; 4 total */
+	bc->cf_last->kcache0_mode = V_SQ_CF_KCACHE_LOCK_2;
+	bc->cf_last->kcache1_mode = V_SQ_CF_KCACHE_LOCK_2;
+	/* set the cacheline offsets for each kcache */
+	bc->cf_last->kcache0_addr = 0;
+	bc->cf_last->kcache1_addr = 2;

 	/* process cur ALU instructions for bank swizzle */
 	if (alu->last) {
--- a/src/gallium/drivers/r600/r600_buffer.c
+++ b/src/gallium/drivers/r600/r600_buffer.c
@@ -29,7 +29,6 @@
 #include <util/u_math.h>
 #include <util/u_inlines.h>
 #include <util/u_memory.h>
-#include <util/u_upload_mgr.h>
 #include "state_tracker/drm_driver.h"
 #include <xf86drm.h>
 #include "radeon_drm.h"
@@ -53,12 +52,13 @@ struct pipe_resource *r600_buffer_create(struct pipe_screen *screen,

 	rbuffer->magic = R600_BUFFER_MAGIC;
 	rbuffer->user_buffer = NULL;
-	rbuffer->num_ranges = 0;
 	rbuffer->r.base.b = *templ;
 	pipe_reference_init(&rbuffer->r.base.b.reference, 1);
 	rbuffer->r.base.b.screen = screen;
 	rbuffer->r.base.vtbl = &r600_buffer_vtbl;
 	rbuffer->r.size = rbuffer->r.base.b.width0;
+	rbuffer->r.bo_size = rbuffer->r.size;
+	rbuffer->uploaded = FALSE;
 	bo = r600_bo((struct radeon*)screen->winsys, rbuffer->r.base.b.width0, alignment, rbuffer->r.base.b.bind, rbuffer->r.base.b.usage);
 	if (bo == NULL) {
 		FREE(rbuffer);
@@ -91,9 +91,10 @@ struct pipe_resource *r600_user_buffer_create(struct pipe_screen *screen,
 	rbuffer->r.base.b.depth0 = 1;
 	rbuffer->r.base.b.array_size = 1;
 	rbuffer->r.base.b.flags = 0;
-	rbuffer->num_ranges = 0;
 	rbuffer->r.bo = NULL;
+	rbuffer->r.bo_size = 0;
 	rbuffer->user_buffer = ptr;
+	rbuffer->uploaded = FALSE;
 	return &rbuffer->r.base.b;
 }

@@ -105,6 +106,7 @@ static void r600_buffer_destroy(struct pipe_screen *screen,
 	if (rbuffer->r.bo) {
 		r600_bo_reference((struct radeon*)screen->winsys, &rbuffer->r.bo, NULL);
 	}
+	rbuffer->r.bo = NULL;
 	FREE(rbuffer);
 }

@@ -114,29 +116,10 @@ static void *r600_buffer_transfer_map(struct pipe_context *pipe,
 	struct r600_resource_buffer *rbuffer = r600_buffer(transfer->resource);
 	int write = 0;
 	uint8_t *data;
-	int i;
-	boolean flush = FALSE;

 	if (rbuffer->user_buffer)
 		return (uint8_t*)rbuffer->user_buffer + transfer->box.x;

-	if (transfer->usage & PIPE_TRANSFER_DISCARD) {
-		for (i = 0; i < rbuffer->num_ranges; i++) {
-			if ((transfer->box.x >= rbuffer->ranges[i].start) &&
-			    (transfer->box.x < rbuffer->ranges[i].end))
-				flush = TRUE;
-
-			if (flush) {
-				r600_bo_reference((struct radeon*)pipe->winsys, &rbuffer->r.bo, NULL);
-				rbuffer->num_ranges = 0;
-				rbuffer->r.bo = r600_bo((struct radeon*)pipe->winsys,
-							rbuffer->r.base.b.width0, 0,
-							rbuffer->r.base.b.bind,
-							rbuffer->r.base.b.usage);
-				break;
-			}
-		}
-	}
 	if (transfer->usage & PIPE_TRANSFER_DONTBLOCK) {
 		/* FIXME */
 	}
@@ -155,36 +138,17 @@ static void r600_buffer_transfer_unmap(struct pipe_context *pipe,
 {
 	struct r600_resource_buffer *rbuffer = r600_buffer(transfer->resource);

+	if (rbuffer->user_buffer)
+		return;
+
 	if (rbuffer->r.bo)
 		r600_bo_unmap((struct radeon*)pipe->winsys, rbuffer->r.bo);
 }

 static void r600_buffer_transfer_flush_region(struct pipe_context *pipe,
-					      struct pipe_transfer *transfer,
-					      const struct pipe_box *box)
+						struct pipe_transfer *transfer,
+						const struct pipe_box *box)
 {
-	struct r600_resource_buffer *rbuffer = r600_buffer(transfer->resource);
-	unsigned i;
-	unsigned offset = transfer->box.x + box->x;
-	unsigned length = box->width;
-
-	assert(box->x + box->width <= transfer->box.width);
-
-	if (rbuffer->user_buffer)
-		return;
-
-	/* mark the range as used */
-	for(i = 0; i < rbuffer->num_ranges; ++i) {
-		if(offset <= rbuffer->ranges[i].end && rbuffer->ranges[i].start <= (offset+box->width)) {
-			rbuffer->ranges[i].start = MIN2(rbuffer->ranges[i].start, offset);
-			rbuffer->ranges[i].end   = MAX2(rbuffer->ranges[i].end, (offset+length));
-			return;
-		}
-	}
-
-	rbuffer->ranges[rbuffer->num_ranges].start = offset;
-	rbuffer->ranges[rbuffer->num_ranges].end = offset+length;
-	rbuffer->num_ranges++;
 }

 unsigned r600_buffer_is_referenced_by_cs(struct pipe_context *context,
@@ -236,29 +200,25 @@ struct u_resource_vtbl r600_buffer_vtbl =

 int r600_upload_index_buffer(struct r600_pipe_context *rctx, struct r600_drawl *draw)
 {
-	struct pipe_resource *upload_buffer = NULL;
-	unsigned index_offset = draw->index_buffer_offset;
-	int ret = 0;
-
 	if (r600_buffer_is_user_buffer(draw->index_buffer)) {
-		ret = u_upload_buffer(rctx->upload_ib,
-				      index_offset,
-				      draw->count * draw->index_size,
-				      draw->index_buffer,
-				      &index_offset,
-				      &upload_buffer);
-		if (ret) {
-			goto done;
-		}
-		draw->index_buffer_offset = index_offset;
+		struct r600_resource_buffer *rbuffer = r600_buffer(draw->index_buffer);
+		unsigned upload_offset;
+		int ret = 0;

-		/* Transfer ownership. */
-		pipe_resource_reference(&draw->index_buffer, upload_buffer);
-		pipe_resource_reference(&upload_buffer, NULL);
+		ret = r600_upload_buffer(rctx->rupload_vb,
+					draw->index_buffer_offset,
+					draw->count * draw->index_size,
+					rbuffer,
+					&upload_offset,
+					&rbuffer->r.bo_size,
+					&rbuffer->r.bo);
+		if (ret)
+			return ret;
+		rbuffer->uploaded = TRUE;
+		draw->index_buffer_offset = upload_offset;
 	}

-done:
-	return ret;
+	return 0;
 }

 int r600_upload_user_buffers(struct r600_pipe_context *rctx)
@@ -270,23 +230,21 @@ int r600_upload_user_buffers(struct r600_pipe_context *rctx)
 	nr = rctx->nvertex_buffer;

 	for (i = 0; i < nr; i++) {
-//		struct pipe_vertex_buffer *vb = &rctx->vertex_buffer[rctx->vertex_elements->elements[i].vertex_buffer_index];
 		struct pipe_vertex_buffer *vb = &rctx->vertex_buffer[i];

 		if (r600_buffer_is_user_buffer(vb->buffer)) {
-			struct pipe_resource *upload_buffer = NULL;
-			unsigned offset = 0; /*vb->buffer_offset * 4;*/
-			unsigned size = vb->buffer->width0;
+			struct r600_resource_buffer *rbuffer = r600_buffer(vb->buffer);
 			unsigned upload_offset;
-			ret = u_upload_buffer(rctx->upload_vb,
-					      offset, size,
-					      vb->buffer,
-					      &upload_offset, &upload_buffer);
+
+			ret = r600_upload_buffer(rctx->rupload_vb,
+						0, vb->buffer->width0,
+						rbuffer,
+						&upload_offset,
+						&rbuffer->r.bo_size,
+						&rbuffer->r.bo);
 			if (ret)
 				return ret;
-
-			pipe_resource_reference(&vb->buffer, NULL);
-			vb->buffer = upload_buffer;
+			rbuffer->uploaded = TRUE;
 			vb->buffer_offset = upload_offset;
 		}
 	}
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -35,7 +35,6 @@
 #include <util/u_pack_color.h>
 #include <util/u_memory.h>
 #include <util/u_inlines.h>
-#include <util/u_upload_mgr.h>
 #include <pipebuffer/pb_buffer.h>
 #include "r600.h"
 #include "r600d.h"
@@ -59,9 +58,6 @@ static void r600_flush(struct pipe_context *ctx, unsigned flags,
 	if (!rctx->ctx.pm4_cdwords)
 		return;

-	u_upload_flush(rctx->upload_vb);
-	u_upload_flush(rctx->upload_ib);
-
 #if 0
 	sprintf(dname, "gallium-%08d.bof", dc);
 	if (dc < 20) {
@@ -71,6 +67,8 @@ static void r600_flush(struct pipe_context *ctx, unsigned flags,
 	dc++;
 #endif
 	r600_context_flush(&rctx->ctx);
+
+	r600_upload_flush(rctx->rupload_vb);
 }

 static void r600_destroy_context(struct pipe_context *context)
@@ -89,8 +87,7 @@ static void r600_destroy_context(struct pipe_context *context)
 		free(rctx->states[i]);
 	}

-	u_upload_destroy(rctx->upload_vb);
-	u_upload_destroy(rctx->upload_ib);
+	r600_upload_destroy(rctx->rupload_vb);

 	if (rctx->tran.translate_cache)
 		translate_cache_destroy(rctx->tran.translate_cache);
@@ -151,6 +148,9 @@ static struct pipe_context *r600_create_context(struct pipe_screen *screen, void
 	case CHIP_CYPRESS:
 	case CHIP_HEMLOCK:
 	case CHIP_PALM:
+	case CHIP_BARTS:
+	case CHIP_TURKS:
+	case CHIP_CAICOS:
 		rctx->context.draw_vbo = evergreen_draw;
 		evergreen_init_state_functions(rctx);
 		if (evergreen_context_init(&rctx->ctx, rctx->radeon)) {
@@ -165,16 +165,8 @@ static struct pipe_context *r600_create_context(struct pipe_screen *screen, void
 		return NULL;
 	}

-	rctx->upload_ib = u_upload_create(&rctx->context, 32 * 1024, 16,
-					  PIPE_BIND_INDEX_BUFFER);
-	if (rctx->upload_ib == NULL) {
-		r600_destroy_context(&rctx->context);
-		return NULL;
-	}
-
-	rctx->upload_vb = u_upload_create(&rctx->context, 128 * 1024, 16,
-					  PIPE_BIND_VERTEX_BUFFER);
-	if (rctx->upload_vb == NULL) {
+	rctx->rupload_vb = r600_upload_create(rctx, 128 * 1024, 16);
+	if (rctx->rupload_vb == NULL) {
 		r600_destroy_context(&rctx->context);
 		return NULL;
 	}
@@ -243,6 +235,9 @@ static const char *r600_get_family_name(enum radeon_family family)
 	case CHIP_CYPRESS: return "AMD CYPRESS";
 	case CHIP_HEMLOCK: return "AMD HEMLOCK";
 	case CHIP_PALM: return "AMD PALM";
+	case CHIP_BARTS: return "AMD BARTS";
+	case CHIP_TURKS: return "AMD TURKS";
+	case CHIP_CAICOS: return "AMD CAICOS";
 	default: return "AMD unknown";
 	}
 }
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -111,7 +111,7 @@ struct r600_pipe_shader {
 #define NUM_TEX_UNITS 16

 struct r600_textures_info {
-	struct r600_pipe_sampler_view   *views[NUM_TEX_UNITS];
+	struct r600_pipe_sampler_view	*views[NUM_TEX_UNITS];
 	unsigned			n_views;
 	void				*samplers[NUM_TEX_UNITS];
 	unsigned			n_samplers;
@@ -131,6 +131,8 @@ struct r600_translate_context {
 #define R600_CONSTANT_ARRAY_SIZE 256
 #define R600_RESOURCE_ARRAY_SIZE 160

+struct r600_upload;
+
 struct r600_pipe_context {
 	struct pipe_context		context;
 	struct blitter_context		*blitter;
@@ -163,8 +165,7 @@ struct r600_pipe_context {
 	/* shader information */
 	unsigned			sprite_coord_enable;
 	bool				flatshade;
-	struct u_upload_mgr		*upload_vb;
-	struct u_upload_mgr		*upload_ib;
+	struct r600_upload		*rupload_vb;
 	unsigned			any_user_vbs;
 	struct r600_textures_info	ps_samplers;
 	unsigned			vb_max_index;
@@ -270,13 +271,13 @@ void r600_sampler_view_destroy(struct pipe_context *ctx,
 void r600_bind_state(struct pipe_context *ctx, void *state);
 void r600_delete_state(struct pipe_context *ctx, void *state);
 void r600_bind_vertex_elements(struct pipe_context *ctx, void *state);
-
 void *r600_create_shader_state(struct pipe_context *ctx,
 			       const struct pipe_shader_state *state);
 void r600_bind_ps_shader(struct pipe_context *ctx, void *state);
 void r600_bind_vs_shader(struct pipe_context *ctx, void *state);
 void r600_delete_ps_shader(struct pipe_context *ctx, void *state);
 void r600_delete_vs_shader(struct pipe_context *ctx, void *state);
+
 /*
 * common helpers
 */
--- a/src/gallium/drivers/r600/r600_resource.h
+++ b/src/gallium/drivers/r600/r600_resource.h
@@ -46,6 +46,7 @@ struct r600_resource {
 	struct u_resource		base;
 	struct r600_bo			*bo;
 	u32				size;
+	unsigned			bo_size;
 };

 struct r600_resource_texture {
@@ -61,7 +62,21 @@ struct r600_resource_texture {
 	unsigned			tile_type;
 	unsigned			depth;
 	unsigned			dirty;
-	struct r600_resource_texture    *flushed_depth_texture;
+	struct r600_resource_texture	*flushed_depth_texture;
+};
+
+#define R600_BUFFER_MAGIC 0xabcd1600
+
+struct r600_resource_buffer {
+	struct r600_resource		r;
+	uint32_t			magic;
+	void				*user_buffer;
+	bool				uploaded;
+};
+
+struct r600_surface {
+	struct pipe_surface		base;
+	unsigned			aligned_height;
 };

 void r600_init_screen_resource_functions(struct pipe_screen *screen);
@@ -73,41 +88,25 @@ struct pipe_resource *r600_texture_from_handle(struct pipe_screen *screen,
 						const struct pipe_resource *base,
 						struct winsys_handle *whandle);

-#define R600_BUFFER_MAGIC 0xabcd1600
-#define R600_BUFFER_MAX_RANGES 32
-
-struct r600_buffer_range {
-	uint32_t start;
-	uint32_t end;
-};
-
-struct r600_resource_buffer {
-	struct r600_resource r;
-	uint32_t magic;
-	void *user_buffer;
-	struct r600_buffer_range ranges[R600_BUFFER_MAX_RANGES];
-	unsigned num_ranges;
-};
-
 /* r600_buffer */
 static INLINE struct r600_resource_buffer *r600_buffer(struct pipe_resource *buffer)
 {
 	if (buffer) {
 		assert(((struct r600_resource_buffer *)buffer)->magic == R600_BUFFER_MAGIC);
 		return (struct r600_resource_buffer *)buffer;
-    }
-    return NULL;
+	}
+	return NULL;
 }

 static INLINE boolean r600_buffer_is_user_buffer(struct pipe_resource *buffer)
 {
-    return r600_buffer(buffer)->user_buffer ? TRUE : FALSE;
+	if (r600_buffer(buffer)->uploaded)
+		return FALSE;
+	return r600_buffer(buffer)->user_buffer ? TRUE : FALSE;
 }

-int r600_texture_depth_flush(struct pipe_context *ctx,
-			     struct pipe_resource *texture);
-
-extern int (*r600_blit_uncompress_depth_ptr)(struct pipe_context *ctx, struct r600_resource_texture *texture);
+int r600_texture_depth_flush(struct pipe_context *ctx, struct pipe_resource *texture);
+int (*r600_blit_uncompress_depth_ptr)(struct pipe_context *ctx, struct r600_resource_texture *texture);

 /* r600_texture.c texture transfer functions. */
 struct pipe_transfer* r600_texture_get_transfer(struct pipe_context *ctx,
@@ -122,9 +121,15 @@ void* r600_texture_transfer_map(struct pipe_context *ctx,
 void r600_texture_transfer_unmap(struct pipe_context *ctx,
 				 struct pipe_transfer* transfer);

-struct r600_surface {
-	struct pipe_surface base;
-	unsigned aligned_height;
-};
+struct r600_pipe_context;
+struct r600_upload *r600_upload_create(struct r600_pipe_context *rctx,
+					unsigned default_size,
+					unsigned alignment);
+void r600_upload_flush(struct r600_upload *upload);
+void r600_upload_destroy(struct r600_upload *upload);
+int r600_upload_buffer(struct r600_upload *upload, unsigned offset,
+			unsigned size, struct r600_resource_buffer *in_buffer,
+			unsigned *out_offset, unsigned *out_size,
+			struct r600_bo **out_buffer);

 #endif
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -589,6 +589,8 @@ int r600_shader_from_tgsi(const struct tgsi_token *tokens, struct r600_shader *s
 			if (r)
 				goto out_err;
 			break;
+		case TGSI_TOKEN_TYPE_PROPERTY:
+			break;
 		default:
 			R600_ERR("unsupported token type %d\n", ctx.parse.FullToken.Token.Type);
 			r = -EINVAL;
@@ -1451,7 +1453,7 @@ static int tgsi_pow(struct r600_shader_ctx *ctx)
 		return r;
 	/* b * LOG2(a) */
 	memset(&alu, 0, sizeof(struct r600_bc_alu));
-	alu.inst = CTX_INST(V_SQ_ALU_WORD1_OP2_SQ_OP2_INST_MUL_IEEE);
+	alu.inst = CTX_INST(V_SQ_ALU_WORD1_OP2_SQ_OP2_INST_MUL);
 	r = tgsi_src(ctx, &inst->Src[1], &alu.src[0]);
 	if (r)
 		return r;
--- a/src/gallium/drivers/r600/r600_sq.h
+++ b/src/gallium/drivers/r600/r600_sq.h
@@ -74,6 +74,10 @@
 #define   S_SQ_CF_ALU_WORD0_KCACHE_MODE0(x)                          (((x) & 0x3) << 30)
 #define   G_SQ_CF_ALU_WORD0_KCACHE_MODE0(x)                          (((x) >> 30) & 0x3)
 #define   C_SQ_CF_ALU_WORD0_KCACHE_MODE0                             0x3FFFFFFF
+#define     V_SQ_CF_KCACHE_NOP                                       0x00000000
+#define     V_SQ_CF_KCACHE_LOCK_1                                    0x00000001
+#define     V_SQ_CF_KCACHE_LOCK_2                                    0x00000002
+#define     V_SQ_CF_KCACHE_LOCK_LOOP_INDEX                           0x00000003
 #define P_SQ_CF_ALU_WORD1
 #define   S_SQ_CF_ALU_WORD1_KCACHE_MODE1(x)                          (((x) & 0x3) << 0)
 #define   G_SQ_CF_ALU_WORD1_KCACHE_MODE1(x)                          (((x) >> 0) & 0x3)
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -36,7 +36,6 @@
 #include <util/u_pack_color.h>
 #include <util/u_memory.h>
 #include <util/u_inlines.h>
-#include <util/u_upload_mgr.h>
 #include <util/u_framebuffer.h>
 #include <pipebuffer/pb_buffer.h>
 #include "r600.h"
@@ -136,11 +135,6 @@ void r600_vertex_buffer_update(struct r600_pipe_context *rctx)
 	if (rctx->vertex_elements == NULL || !rctx->nvertex_buffer)
 		return;

-	/* delete previous translated vertex elements */
-	if (rctx->tran.new_velems) {
-		r600_end_vertex_translate(rctx);
-	}
-
 	if (rctx->vertex_elements->incompatible_layout) {
 		/* translate rebind new vertex elements so
 		 * return once translated
@@ -173,21 +167,21 @@ void r600_vertex_buffer_update(struct r600_pipe_context *rctx)
 			vbuffer_index = rctx->vertex_elements->elements[i].vertex_buffer_index;
 			vertex_buffer = &rctx->vertex_buffer[vbuffer_index];
 			rbuffer = (struct r600_resource*)vertex_buffer->buffer;
-			offset = rctx->vertex_elements->vbuffer_offset[i] +
-				vertex_buffer->buffer_offset +
-				r600_bo_offset(rbuffer->bo);
+			offset = rctx->vertex_elements->vbuffer_offset[i];
 		} else {
 			/* bind vertex buffer once */
 			vertex_buffer = &rctx->vertex_buffer[i];
 			rbuffer = (struct r600_resource*)vertex_buffer->buffer;
-			offset = vertex_buffer->buffer_offset +
-				r600_bo_offset(rbuffer->bo);
+			offset = 0;
 		}
+		if (vertex_buffer == NULL || rbuffer == NULL)
+			continue;
+		offset += vertex_buffer->buffer_offset + r600_bo_offset(rbuffer->bo);

 		r600_pipe_state_add_reg(rstate, R_038000_RESOURCE0_WORD0,
 					offset, 0xFFFFFFFF, rbuffer->bo);
 		r600_pipe_state_add_reg(rstate, R_038004_RESOURCE0_WORD1,
-					rbuffer->size - offset - 1, 0xFFFFFFFF, NULL);
+					rbuffer->bo_size - offset - 1, 0xFFFFFFFF, NULL);
 		r600_pipe_state_add_reg(rstate, R_038008_RESOURCE0_WORD2,
 					S_038008_STRIDE(vertex_buffer->stride),
 					0xFFFFFFFF, NULL);
@@ -281,7 +275,6 @@ void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *info)
 {
 	struct r600_pipe_context *rctx = (struct r600_pipe_context *)ctx;
 	struct r600_drawl draw;
-	boolean translate = FALSE;

 	memset(&draw, 0, sizeof(struct r600_drawl));
 	draw.ctx = ctx;
@@ -313,9 +306,6 @@ void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *info)
 	}
 	r600_draw_common(&draw);

-	if (translate)
-		r600_end_vertex_translate(rctx);
-
 	pipe_resource_reference(&draw.index_buffer, NULL);
 }

--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -119,6 +119,11 @@ void r600_bind_vertex_elements(struct pipe_context *ctx, void *state)
 	struct r600_pipe_context *rctx = (struct r600_pipe_context *)ctx;
 	struct r600_vertex_element *v = (struct r600_vertex_element*)state;

+	/* delete previous translated vertex elements */
+	if (rctx->tran.new_velems) {
+		r600_end_vertex_translate(rctx);
+	}
+
 	rctx->vertex_elements = v;
 	if (v) {
 		rctx->states[v->rstate.id] = &v->rstate;
@@ -174,8 +179,16 @@ void r600_set_vertex_buffers(struct pipe_context *ctx, unsigned count,
 	struct pipe_vertex_buffer *vbo;
 	unsigned max_index = (unsigned)-1;

-	for (int i = 0; i < rctx->nvertex_buffer; i++) {
-		pipe_resource_reference(&rctx->vertex_buffer[i].buffer, NULL);
+	if (rctx->family >= CHIP_CEDAR) {
+		for (int i = 0; i < rctx->nvertex_buffer; i++) {
+			pipe_resource_reference(&rctx->vertex_buffer[i].buffer, NULL);
+			evergreen_context_pipe_state_set_fs_resource(&rctx->ctx, NULL, i);
+		}
+	} else {
+		for (int i = 0; i < rctx->nvertex_buffer; i++) {
+			pipe_resource_reference(&rctx->vertex_buffer[i].buffer, NULL);
+			r600_context_pipe_state_set_fs_resource(&rctx->ctx, NULL, i);
+		}
 	}
 	memcpy(rctx->vertex_buffer, buffers, sizeof(struct pipe_vertex_buffer) * count);

@@ -183,15 +196,19 @@ void r600_set_vertex_buffers(struct pipe_context *ctx, unsigned count,
 		vbo = (struct pipe_vertex_buffer*)&buffers[i];

 		rctx->vertex_buffer[i].buffer = NULL;
+		if (buffers[i].buffer == NULL)
+			continue;
 		if (r600_buffer_is_user_buffer(buffers[i].buffer))
 			rctx->any_user_vbs = TRUE;
 		pipe_resource_reference(&rctx->vertex_buffer[i].buffer, buffers[i].buffer);

+		/* The stride of zero means we will be fetching only the first
+		 * vertex, so don't care about max_index. */
+		if (!vbo->stride)
+			continue;
+
 		if (vbo->max_index == ~0) {
-			if (!vbo->stride)
-				vbo->max_index = 1;
-			else
-				vbo->max_index = (vbo->buffer->width0 - vbo->buffer_offset) / vbo->stride;
+			vbo->max_index = (vbo->buffer->width0 - vbo->buffer_offset) / vbo->stride;
 		}
 		max_index = MIN2(vbo->max_index, max_index);
 	}
--- a/src/gallium/drivers/r600/r600_translate.c
+++ b/src/gallium/drivers/r600/r600_translate.c
@@ -42,6 +42,7 @@ void r600_begin_vertex_translate(struct r600_pipe_context *rctx)
 	struct pipe_resource *out_buffer;
 	unsigned i, num_verts;
 	struct pipe_vertex_element new_velems[PIPE_MAX_ATTRIBS];
+	void *tmp;

 	/* Initialize the translate key, i.e. the recipe how vertices should be
 	 * translated. */
@@ -159,8 +160,9 @@ void r600_begin_vertex_translate(struct r600_pipe_context *rctx)
 		}
 	}

-	rctx->tran.new_velems = pipe->create_vertex_elements_state(pipe, ve->count, new_velems);
-	pipe->bind_vertex_elements_state(pipe, rctx->tran.new_velems);
+	tmp = pipe->create_vertex_elements_state(pipe, ve->count, new_velems);
+	pipe->bind_vertex_elements_state(pipe, tmp);
+	rctx->tran.new_velems = tmp;

 	pipe_resource_reference(&out_buffer, NULL);
 }
@@ -173,15 +175,11 @@ void r600_end_vertex_translate(struct r600_pipe_context *rctx)
 		return;
 	}
 	/* Restore vertex elements. */
-	if (rctx->vertex_elements == rctx->tran.new_velems) {
-		pipe->bind_vertex_elements_state(pipe, NULL);
-	}
 	pipe->delete_vertex_elements_state(pipe, rctx->tran.new_velems);
 	rctx->tran.new_velems = NULL;

 	/* Delete the now-unused VBO. */
-	pipe_resource_reference(&rctx->vertex_buffer[rctx->tran.vb_slot].buffer,
-				NULL);
+	pipe_resource_reference(&rctx->vertex_buffer[rctx->tran.vb_slot].buffer, NULL);
 }

 void r600_translate_index_buffer(struct r600_pipe_context *r600,
--- a/src/gallium/drivers/r600/r600_upload.c
+++ b/src/gallium/drivers/r600/r600_upload.c
@@ -0,0 +1,114 @@
+/*
+ * Copyright 2010 Jerome Glisse <glisse@freedesktop.org>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * on the rights to use, copy, modify, merge, publish, distribute, sub
+ * license, and/or sell copies of the Software, and to permit persons to whom
+ * the Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *      Jerome Glisse <jglisse@redhat.com>
+ */
+#include <errno.h>
+#include "util/u_inlines.h"
+#include "util/u_memory.h"
+#include "r600.h"
+#include "r600_pipe.h"
+#include "r600_resource.h"
+
+struct r600_upload {
+	struct r600_pipe_context	*rctx;
+	struct r600_bo			*buffer;
+	char				*ptr;
+	unsigned			size;
+	unsigned			default_size;
+	unsigned			total_alloc_size;
+	unsigned			offset;
+	unsigned			alignment;
+};
+
+struct r600_upload *r600_upload_create(struct r600_pipe_context *rctx,
+					unsigned default_size,
+					unsigned alignment)
+{
+	struct r600_upload *upload = CALLOC_STRUCT(r600_upload);
+
+	if (upload == NULL)
+		return NULL;
+
+	upload->rctx = rctx;
+	upload->size = 0;
+	upload->default_size = default_size;
+	upload->alignment = alignment;
+	upload->ptr = NULL;
+	upload->buffer = NULL;
+	upload->total_alloc_size = 0;
+
+	return upload;
+}
+
+void r600_upload_flush(struct r600_upload *upload)
+{
+	if (upload->buffer) {
+		r600_bo_reference(upload->rctx->radeon, &upload->buffer, NULL);
+	}
+	upload->default_size = MAX2(upload->total_alloc_size, upload->default_size);
+	upload->total_alloc_size = 0;
+	upload->size = 0;
+	upload->offset = 0;
+	upload->ptr = NULL;
+	upload->buffer = NULL;
+}
+
+void r600_upload_destroy(struct r600_upload *upload)
+{
+	r600_upload_flush(upload);
+	FREE(upload);
+}
+
+int r600_upload_buffer(struct r600_upload *upload, unsigned offset,
+			unsigned size, struct r600_resource_buffer *in_buffer,
+			unsigned *out_offset, unsigned *out_size,
+			struct r600_bo **out_buffer)
+{
+	unsigned alloc_size = align(size, upload->alignment);
+	const void *in_ptr = NULL;
+
+	if (upload->offset + alloc_size > upload->size) {
+		if (upload->size) {
+			r600_bo_reference(upload->rctx->radeon, &upload->buffer, NULL);
+		}
+		upload->size = align(MAX2(upload->default_size, alloc_size), 4096);
+		upload->total_alloc_size += upload->size;
+		upload->offset = 0;
+		upload->buffer = r600_bo(upload->rctx->radeon, upload->size, 4096, PIPE_BIND_VERTEX_BUFFER, 0);
+		if (upload->buffer == NULL) {
+			return -ENOMEM;
+		}
+		upload->ptr = r600_bo_map(upload->rctx->radeon, upload->buffer, 0, NULL);
+	}
+
+	in_ptr = in_buffer->user_buffer;
+	memcpy(upload->ptr + upload->offset, (uint8_t *) in_ptr + offset, size);
+	*out_offset = upload->offset;
+	*out_size = upload->size;
+	*out_buffer = NULL;
+	r600_bo_reference(upload->rctx->radeon, out_buffer, upload->buffer);
+	upload->offset += alloc_size;
+
+	return 0;
+}
--- a/src/gallium/state_trackers/egl/common/egl_g3d_api.c
+++ b/src/gallium/state_trackers/egl/common/egl_g3d_api.c
@@ -158,17 +158,17 @@ egl_g3d_choose_config(_EGLDriver *drv, _EGLDisplay *dpy, const EGLint *attribs,
         (_EGLArrayForEach) egl_g3d_match_config, (void *) &criteria);

   /* perform sorting of configs */
-   if (tmp_configs && tmp_size) {
+   if (configs && tmp_size) {
      _eglSortConfigs((const _EGLConfig **) tmp_configs, tmp_size,
            egl_g3d_compare_config, (void *) &criteria);
-      size = MIN2(tmp_size, size);
-      for (i = 0; i < size; i++)
+      tmp_size = MIN2(tmp_size, size);
+      for (i = 0; i < tmp_size; i++)
         configs[i] = _eglGetConfigHandle(tmp_configs[i]);
   }

   FREE(tmp_configs);

-   *num_configs = size;
+   *num_configs = tmp_size;

   return EGL_TRUE;
 }
@@ -402,7 +402,6 @@ egl_g3d_create_pbuffer_surface(_EGLDriver *drv, _EGLDisplay *dpy,
                               _EGLConfig *conf, const EGLint *attribs)
 {
   struct egl_g3d_surface *gsurf;
-   struct pipe_resource *ptex = NULL;

   gsurf = create_pbuffer_surface(dpy, conf, attribs,
         "eglCreatePbufferSurface");
@@ -411,13 +410,6 @@ egl_g3d_create_pbuffer_surface(_EGLDriver *drv, _EGLDisplay *dpy,

   gsurf->client_buffer_type = EGL_NONE;

-   if (!gsurf->stfbi->validate(gsurf->stfbi,
-            &gsurf->stvis.render_buffer, 1, &ptex)) {
-      egl_g3d_destroy_st_framebuffer(gsurf->stfbi);
-      FREE(gsurf);
-      return NULL;
-   }
-
   return &gsurf->base;
 }

@@ -477,12 +469,14 @@ egl_g3d_create_pbuffer_from_client_buffer(_EGLDriver *drv, _EGLDisplay *dpy,
   gsurf->client_buffer_type = buftype;
   gsurf->client_buffer = buffer;

+   /* validate now so that it fails if the client buffer is invalid */
   if (!gsurf->stfbi->validate(gsurf->stfbi,
            &gsurf->stvis.render_buffer, 1, &ptex)) {
      egl_g3d_destroy_st_framebuffer(gsurf->stfbi);
      FREE(gsurf);
      return NULL;
   }
+   pipe_resource_reference(&ptex, NULL);

   return &gsurf->base;
 }
@@ -676,14 +670,13 @@ egl_g3d_copy_buffers(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf,

   ptex = get_pipe_resource(gdpy->native, nsurf, NATIVE_ATTACHMENT_FRONT_LEFT);
   if (ptex) {
-      struct pipe_resource *psrc = gsurf->render_texture;
      struct pipe_box src_box;
+
      u_box_origin_2d(ptex->width0, ptex->height0, &src_box);
-      if (psrc) {
-         gdpy->pipe->resource_copy_region(gdpy->pipe, ptex, 0, 0, 0, 0,
-               gsurf->render_texture, 0, &src_box);
-         nsurf->present(nsurf, NATIVE_ATTACHMENT_FRONT_LEFT, FALSE, 0);
-      }
+      gdpy->pipe->resource_copy_region(gdpy->pipe, ptex, 0, 0, 0, 0,
+            gsurf->render_texture, 0, &src_box);
+      gdpy->pipe->flush(gdpy->pipe, PIPE_FLUSH_RENDER_CACHE, NULL);
+      nsurf->present(nsurf, NATIVE_ATTACHMENT_FRONT_LEFT, FALSE, 0);

      pipe_resource_reference(&ptex, NULL);
   }
--- a/src/gallium/state_trackers/egl/x11/native_dri2.c
+++ b/src/gallium/state_trackers/egl/x11/native_dri2.c
@@ -548,6 +548,10 @@ dri2_display_convert_config(struct native_display *ndpy,
   if (!mode->xRenderable || !mode->drawableType)
      return FALSE;

+   /* fast/slow configs are probably not relevant */
+   if (mode->visualRating == GLX_SLOW_CONFIG)
+      return FALSE;
+
   nconf->buffer_mask = 1 << NATIVE_ATTACHMENT_FRONT_LEFT;
   if (mode->doubleBufferMode)
      nconf->buffer_mask |= 1 << NATIVE_ATTACHMENT_BACK_LEFT;
@@ -568,13 +572,32 @@ dri2_display_convert_config(struct native_display *ndpy,
   if (nconf->color_format == PIPE_FORMAT_NONE)
      return FALSE;

-   if (mode->drawableType & GLX_WINDOW_BIT)
+   if ((mode->drawableType & GLX_WINDOW_BIT) && mode->visualID)
      nconf->window_bit = TRUE;
   if (mode->drawableType & GLX_PIXMAP_BIT)
      nconf->pixmap_bit = TRUE;

   nconf->native_visual_id = mode->visualID;
-   nconf->native_visual_type = mode->visualType;
+   switch (mode->visualType) {
+   case GLX_TRUE_COLOR:
+      nconf->native_visual_type = TrueColor;
+      break;
+   case GLX_DIRECT_COLOR:
+      nconf->native_visual_type = DirectColor;
+      break;
+   case GLX_PSEUDO_COLOR:
+      nconf->native_visual_type = PseudoColor;
+      break;
+   case GLX_STATIC_COLOR:
+      nconf->native_visual_type = StaticColor;
+      break;
+   case GLX_GRAY_SCALE:
+      nconf->native_visual_type = GrayScale;
+      break;
+   case GLX_STATIC_GRAY:
+      nconf->native_visual_type = StaticGray;
+      break;
+   }
   nconf->level = mode->level;
   nconf->samples = mode->samples;

@@ -614,8 +637,17 @@ dri2_display_get_configs(struct native_display *ndpy, int *num_configs)
      count = 0;
      for (i = 0; i < num_modes; i++) {
         struct native_config *nconf = &dri2dpy->configs[count].base;
-         if (dri2_display_convert_config(&dri2dpy->base, modes, nconf))
-            count++;
+
+         if (dri2_display_convert_config(&dri2dpy->base, modes, nconf)) {
+            int j;
+            /* look for duplicates */
+            for (j = 0; j < count; j++) {
+               if (memcmp(&dri2dpy->configs[j], nconf, sizeof(*nconf)) == 0)
+                  break;
+            }
+            if (j == count)
+               count++;
+         }
         modes = modes->next;
      }

--- a/src/gallium/winsys/r600/drm/Makefile
+++ b/src/gallium/winsys/r600/drm/Makefile
@@ -8,12 +8,12 @@ C_SOURCES = \
 	bof.c \
 	evergreen_hw_context.c \
 	radeon_bo.c \
-	radeon_bo_pb.c \
 	radeon_pciid.c \
 	r600.c \
 	r600_bo.c \
 	r600_drm.c \
-	r600_hw_context.c
+	r600_hw_context.c \
+	r600_bomgr.c

 LIBRARY_INCLUDES = -I$(TOP)/src/gallium/drivers/r600 \
 		   $(shell pkg-config libdrm --cflags-only-I)
--- a/src/gallium/winsys/r600/drm/SConscript
+++ b/src/gallium/winsys/r600/drm/SConscript
@@ -6,12 +6,12 @@ r600_sources = [
    'bof.c',
    'evergreen_hw_context.c',
    'radeon_bo.c',
-    'radeon_bo_pb.c',
    'radeon_pciid.c',
    'r600.c',
    'r600_bo.c',
    'r600_drm.c',
    'r600_hw_context.c',
+    'r600_bomgr.c',
 ]

 env.ParseConfig('pkg-config --cflags libdrm_radeon')
--- a/src/gallium/winsys/r600/drm/evergreen_hw_context.c
+++ b/src/gallium/winsys/r600/drm/evergreen_hw_context.c
@@ -36,7 +36,6 @@
 #include "pipe/p_compiler.h"
 #include "util/u_inlines.h"
 #include "util/u_memory.h"
-#include <pipebuffer/pb_bufmgr.h>
 #include "r600_priv.h"

 #define GROUP_FORCE_NEW_BLOCK	0
--- a/src/gallium/winsys/r600/drm/r600.c
+++ b/src/gallium/winsys/r600/drm/r600.c
@@ -27,7 +27,6 @@
 #include "radeon_drm.h"
 #include "pipe/p_compiler.h"
 #include "util/u_inlines.h"
-#include <pipebuffer/pb_bufmgr.h>
 #include "r600_priv.h"

 enum radeon_family r600_get_family(struct radeon *r600)
@@ -80,58 +79,6 @@ struct radeon *r600_new(int fd, unsigned device)
 		r600_delete(r600);
 		return NULL;
 	}
-	switch (r600->family) {
-	case CHIP_R600:
-	case CHIP_RV610:
-	case CHIP_RV630:
-	case CHIP_RV670:
-	case CHIP_RV620:
-	case CHIP_RV635:
-	case CHIP_RS780:
-	case CHIP_RS880:
-	case CHIP_RV770:
-	case CHIP_RV730:
-	case CHIP_RV710:
-	case CHIP_RV740:
-	case CHIP_CEDAR:
-	case CHIP_REDWOOD:
-	case CHIP_JUNIPER:
-	case CHIP_CYPRESS:
-	case CHIP_HEMLOCK:
-	case CHIP_PALM:
-		break;
-	case CHIP_R100:
-	case CHIP_RV100:
-	case CHIP_RS100:
-	case CHIP_RV200:
-	case CHIP_RS200:
-	case CHIP_R200:
-	case CHIP_RV250:
-	case CHIP_RS300:
-	case CHIP_RV280:
-	case CHIP_R300:
-	case CHIP_R350:
-	case CHIP_RV350:
-	case CHIP_RV380:
-	case CHIP_R420:
-	case CHIP_R423:
-	case CHIP_RV410:
-	case CHIP_RS400:
-	case CHIP_RS480:
-	case CHIP_RS600:
-	case CHIP_RS690:
-	case CHIP_RS740:
-	case CHIP_RV515:
-	case CHIP_R520:
-	case CHIP_RV530:
-	case CHIP_RV560:
-	case CHIP_RV570:
-	case CHIP_R580:
-	default:
-		R600_ERR("unknown or unsupported chipset 0x%04X\n", r600->device);
-		break;
-	}
-
 	/* setup class */
 	switch (r600->family) {
 	case CHIP_R600:
@@ -156,6 +103,9 @@ struct radeon *r600_new(int fd, unsigned device)
 	case CHIP_CYPRESS:
 	case CHIP_HEMLOCK:
 	case CHIP_PALM:
+	case CHIP_BARTS:
+	case CHIP_TURKS:
+	case CHIP_CAICOS:
 		r600->chip_class = EVERGREEN;
 		break;
 	default:
--- a/src/gallium/winsys/r600/drm/r600_bo.c
+++ b/src/gallium/winsys/r600/drm/r600_bo.c
@@ -36,142 +36,153 @@ struct r600_bo *r600_bo(struct radeon *radeon,
 			unsigned size, unsigned alignment,
 			unsigned binding, unsigned usage)
 {
-	struct r600_bo *ws_bo = calloc(1, sizeof(struct r600_bo));
-	struct pb_desc desc;
-	struct pb_manager *man;
+	struct r600_bo *bo;
+	struct radeon_bo *rbo;

-	desc.alignment = alignment;
-	desc.usage = (PB_USAGE_CPU_READ_WRITE | PB_USAGE_GPU_READ_WRITE);
-	ws_bo->size = size;
+	if (binding & (PIPE_BIND_CONSTANT_BUFFER | PIPE_BIND_VERTEX_BUFFER | PIPE_BIND_INDEX_BUFFER)) {
+		bo = r600_bomgr_bo_create(radeon->bomgr, size, alignment, *radeon->cfence);
+		if (bo) {
+			return bo;
+		}
+	}

-	if (binding & (PIPE_BIND_CONSTANT_BUFFER | PIPE_BIND_VERTEX_BUFFER | PIPE_BIND_INDEX_BUFFER))
-		man = radeon->cman;
-	else
-		man = radeon->kman;
+	rbo = radeon_bo(radeon, 0, size, alignment);
+	if (rbo == NULL) {
+		return NULL;
+	}
+
+	bo = calloc(1, sizeof(struct r600_bo));
+	bo->size = size;
+	bo->alignment = alignment;
+	bo->bo = rbo;
+	if (binding & (PIPE_BIND_CONSTANT_BUFFER | PIPE_BIND_VERTEX_BUFFER | PIPE_BIND_INDEX_BUFFER)) {
+		r600_bomgr_bo_init(radeon->bomgr, bo);
+	}

 	/* Staging resources particpate in transfers and blits only
 	 * and are used for uploads and downloads from regular
 	 * resources.  We generate them internally for some transfers.
 	 */
 	if (usage == PIPE_USAGE_STAGING)
-                ws_bo->domains = RADEON_GEM_DOMAIN_CPU | RADEON_GEM_DOMAIN_GTT;
-        else
-                ws_bo->domains = (RADEON_GEM_DOMAIN_CPU |
-                                  RADEON_GEM_DOMAIN_GTT |
-                                  RADEON_GEM_DOMAIN_VRAM);
+		bo->domains = RADEON_GEM_DOMAIN_CPU | RADEON_GEM_DOMAIN_GTT;
+	else
+		bo->domains = (RADEON_GEM_DOMAIN_CPU |
+				RADEON_GEM_DOMAIN_GTT |
+				RADEON_GEM_DOMAIN_VRAM);

-
-	ws_bo->pb = man->create_buffer(man, size, &desc);
-	if (ws_bo->pb == NULL) {
-		free(ws_bo);
-		return NULL;
-	}
-
-	pipe_reference_init(&ws_bo->reference, 1);
-	return ws_bo;
+	pipe_reference_init(&bo->reference, 1);
+	return bo;
 }

 struct r600_bo *r600_bo_handle(struct radeon *radeon,
 			       unsigned handle, unsigned *array_mode)
 {
-	struct r600_bo *ws_bo = calloc(1, sizeof(struct r600_bo));
-	struct radeon_bo *bo;
+	struct r600_bo *bo = calloc(1, sizeof(struct r600_bo));
+	struct radeon_bo *rbo;

-	ws_bo->pb = radeon_bo_pb_create_buffer_from_handle(radeon->kman, handle);
-	if (!ws_bo->pb) {
-		free(ws_bo);
+	rbo = bo->bo = radeon_bo(radeon, handle, 0, 0);
+	if (rbo == NULL) {
+		free(bo);
 		return NULL;
 	}
-	bo = radeon_bo_pb_get_bo(ws_bo->pb);
-	ws_bo->size = bo->size;
-	ws_bo->domains = (RADEON_GEM_DOMAIN_CPU |
-			  RADEON_GEM_DOMAIN_GTT |
-			  RADEON_GEM_DOMAIN_VRAM);
+	bo->size = rbo->size;
+	bo->domains = (RADEON_GEM_DOMAIN_CPU |
+			RADEON_GEM_DOMAIN_GTT |
+			RADEON_GEM_DOMAIN_VRAM);

-	pipe_reference_init(&ws_bo->reference, 1);
+	pipe_reference_init(&bo->reference, 1);

-	radeon_bo_get_tiling_flags(radeon, bo, &ws_bo->tiling_flags,
-				   &ws_bo->kernel_pitch);
+	radeon_bo_get_tiling_flags(radeon, rbo, &bo->tiling_flags, &bo->kernel_pitch);
 	if (array_mode) {
-		if (ws_bo->tiling_flags) {
-			if (ws_bo->tiling_flags & RADEON_TILING_MICRO)
+		if (bo->tiling_flags) {
+			if (bo->tiling_flags & RADEON_TILING_MICRO)
 				*array_mode = V_0280A0_ARRAY_1D_TILED_THIN1;
-			if ((ws_bo->tiling_flags & (RADEON_TILING_MICRO | RADEON_TILING_MACRO)) ==
+			if ((bo->tiling_flags & (RADEON_TILING_MICRO | RADEON_TILING_MACRO)) ==
 			    (RADEON_TILING_MICRO | RADEON_TILING_MACRO))
 				*array_mode = V_0280A0_ARRAY_2D_TILED_THIN1;
 		} else {
 			*array_mode = 0;
 		}
 	}
-	return ws_bo;
+	return bo;
 }

 void *r600_bo_map(struct radeon *radeon, struct r600_bo *bo, unsigned usage, void *ctx)
 {
-	return pb_map(bo->pb, usage, ctx);
+	struct pipe_context *pctx = ctx;
+
+	if (usage & PB_USAGE_UNSYNCHRONIZED) {
+		radeon_bo_map(radeon, bo->bo);
+		return (uint8_t *) bo->bo->data + bo->offset;
+	}
+
+	if (p_atomic_read(&bo->bo->reference.count) > 1) {
+		if (usage & PB_USAGE_DONTBLOCK) {
+			return NULL;
+		}
+		if (ctx) {
+			pctx->flush(pctx, 0, NULL);
+		}
+	}
+
+	if (usage & PB_USAGE_DONTBLOCK) {
+		uint32_t domain;
+
+		if (radeon_bo_busy(radeon, bo->bo, &domain))
+			return NULL;
+		if (radeon_bo_map(radeon, bo->bo)) {
+			return NULL;
+		}
+		goto out;
+	}
+
+	radeon_bo_map(radeon, bo->bo);
+	if (radeon_bo_wait(radeon, bo->bo)) {
+		radeon_bo_unmap(radeon, bo->bo);
+		return NULL;
+	}
+
+out:
+	return (uint8_t *) bo->bo->data + bo->offset;
 }

 void r600_bo_unmap(struct radeon *radeon, struct r600_bo *bo)
 {
-	pb_unmap(bo->pb);
+	radeon_bo_unmap(radeon, bo->bo);
 }

-static void r600_bo_destroy(struct radeon *radeon, struct r600_bo *bo)
+void r600_bo_destroy(struct radeon *radeon, struct r600_bo *bo)
 {
-	if (bo->pb)
-		pb_reference(&bo->pb, NULL);
+	if (bo->manager_id) {
+		if (!r600_bomgr_bo_destroy(radeon->bomgr, bo)) {
+			/* destroy is delayed by buffer manager */
+			return;
+		}
+	}
+	radeon_bo_reference(radeon, &bo->bo, NULL);
 	free(bo);
 }

-void r600_bo_reference(struct radeon *radeon, struct r600_bo **dst,
-			    struct r600_bo *src)
+void r600_bo_reference(struct radeon *radeon, struct r600_bo **dst, struct r600_bo *src)
 {
 	struct r600_bo *old = *dst;
- 		
+
 	if (pipe_reference(&(*dst)->reference, &src->reference)) {
 		r600_bo_destroy(radeon, old);
 	}
 	*dst = src;
 }

-unsigned r600_bo_get_handle(struct r600_bo *pb_bo)
-{
-	struct radeon_bo *bo;
-
-	bo = radeon_bo_pb_get_bo(pb_bo->pb);
-	if (!bo)
-		return 0;
-
-	return bo->handle;
-}
-
-unsigned r600_bo_get_size(struct r600_bo *pb_bo)
-{
-	struct radeon_bo *bo;
-
-	bo = radeon_bo_pb_get_bo(pb_bo->pb);
-	if (!bo)
-		return 0;
-
-	return bo->size;
-}
-
-boolean r600_bo_get_winsys_handle(struct radeon *radeon, struct r600_bo *pb_bo,
+boolean r600_bo_get_winsys_handle(struct radeon *radeon, struct r600_bo *bo,
 				unsigned stride, struct winsys_handle *whandle)
 {
-	struct radeon_bo *bo;
-
-	bo = radeon_bo_pb_get_bo(pb_bo->pb);
-	if (!bo)
-		return FALSE;
-
 	whandle->stride = stride;
 	switch(whandle->type) {
 	case DRM_API_HANDLE_TYPE_KMS:
-		whandle->handle = r600_bo_get_handle(pb_bo);
+		whandle->handle = r600_bo_get_handle(bo);
 		break;
 	case DRM_API_HANDLE_TYPE_SHARED:
-		if (radeon_bo_get_name(radeon, bo, &whandle->handle))
+		if (radeon_bo_get_name(radeon, bo->bo, &whandle->handle))
 			return FALSE;
 		break;
 	default:
--- a/src/gallium/winsys/r600/drm/r600_bomgr.c
+++ b/src/gallium/winsys/r600/drm/r600_bomgr.c
@@ -0,0 +1,161 @@
+/*
+ * Copyright 2010 VMWare.
+ * Copyright 2010 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * on the rights to use, copy, modify, merge, publish, distribute, sub
+ * license, and/or sell copies of the Software, and to permit persons to whom
+ * the Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *      Jose Fonseca <jrfonseca-at-vmware-dot-com>
+ *      Thomas Hellström <thomas-at-vmware-dot-com>
+ *      Jerome Glisse <jglisse@redhat.com>
+ */
+#include <util/u_memory.h>
+#include <util/u_double_list.h>
+#include <util/u_time.h>
+#include <pipebuffer/pb_bufmgr.h>
+#include "r600_priv.h"
+
+static void r600_bomgr_timeout_flush(struct r600_bomgr *mgr)
+{
+	struct r600_bo *bo, *tmp;
+	int64_t now;
+
+	now = os_time_get();
+	LIST_FOR_EACH_ENTRY_SAFE(bo, tmp, &mgr->delayed, list) {
+		if(!os_time_timeout(bo->start, bo->end, now))
+			break;
+
+		mgr->num_delayed--;
+		bo->manager_id = 0;
+		LIST_DEL(&bo->list);
+		r600_bo_destroy(mgr->radeon, bo);
+	}
+}
+
+static INLINE int r600_bo_is_compat(struct r600_bomgr *mgr,
+					struct r600_bo *bo,
+					unsigned size,
+					unsigned alignment,
+					unsigned cfence)
+{
+	if(bo->size < size) {
+		return 0;
+	}
+
+	/* be lenient with size */
+	if(bo->size >= 2*size) {
+		return 0;
+	}
+
+	if(!pb_check_alignment(alignment, bo->alignment)) {
+		return 0;
+	}
+
+	if (!fence_is_after(cfence, bo->fence)) {
+		return 0;
+	}
+
+	return 1;
+}
+
+struct r600_bo *r600_bomgr_bo_create(struct r600_bomgr *mgr,
+					unsigned size,
+					unsigned alignment,
+					unsigned cfence)
+{
+	struct r600_bo *bo, *tmp;
+	int64_t now;
+
+
+	pipe_mutex_lock(mgr->mutex);
+
+	now = os_time_get();
+	LIST_FOR_EACH_ENTRY_SAFE(bo, tmp, &mgr->delayed, list) {
+		if(r600_bo_is_compat(mgr, bo, size, alignment, cfence)) {
+			LIST_DEL(&bo->list);
+			--mgr->num_delayed;
+			r600_bomgr_timeout_flush(mgr);
+			pipe_mutex_unlock(mgr->mutex);
+			LIST_INITHEAD(&bo->list);
+			pipe_reference_init(&bo->reference, 1);
+			return bo;
+		}
+
+		if(os_time_timeout(bo->start, bo->end, now)) {
+			mgr->num_delayed--;
+			bo->manager_id = 0;
+			LIST_DEL(&bo->list);
+			r600_bo_destroy(mgr->radeon, bo);
+		}
+	}
+
+	pipe_mutex_unlock(mgr->mutex);
+	return NULL;
+}
+
+void r600_bomgr_bo_init(struct r600_bomgr *mgr, struct r600_bo *bo)
+{
+	LIST_INITHEAD(&bo->list);
+	bo->manager_id = 1;
+}
+
+bool r600_bomgr_bo_destroy(struct r600_bomgr *mgr, struct r600_bo *bo)
+{
+	bo->start = os_time_get();
+	bo->end = bo->start + mgr->usecs;
+	pipe_mutex_lock(mgr->mutex);
+	LIST_ADDTAIL(&bo->list, &mgr->delayed);
+	++mgr->num_delayed;
+	pipe_mutex_unlock(mgr->mutex);
+	return FALSE;
+}
+
+void r600_bomgr_destroy(struct r600_bomgr *mgr)
+{
+	struct r600_bo *bo, *tmp;
+
+	pipe_mutex_lock(mgr->mutex);
+	LIST_FOR_EACH_ENTRY_SAFE(bo, tmp, &mgr->delayed, list) {
+		mgr->num_delayed--;
+		bo->manager_id = 0;
+		LIST_DEL(&bo->list);
+		r600_bo_destroy(mgr->radeon, bo);
+	}
+	pipe_mutex_unlock(mgr->mutex);
+
+	FREE(mgr);
+}
+
+struct r600_bomgr *r600_bomgr_create(struct radeon *radeon, unsigned usecs)
+{
+	struct r600_bomgr *mgr;
+
+	mgr = CALLOC_STRUCT(r600_bomgr);
+	if (mgr == NULL)
+		return NULL;
+
+	mgr->radeon = radeon;
+	mgr->usecs = usecs;
+	LIST_INITHEAD(&mgr->delayed);
+	mgr->num_delayed = 0;
+	pipe_mutex_init(mgr->mutex);
+
+	return mgr;
+}
--- a/src/gallium/winsys/r600/drm/r600_drm.c
+++ b/src/gallium/winsys/r600/drm/r600_drm.c
@@ -30,7 +30,6 @@
 #include <sys/ioctl.h>
 #include "util/u_inlines.h"
 #include "util/u_debug.h"
-#include <pipebuffer/pb_bufmgr.h>
 #include "r600.h"
 #include "r600_priv.h"
 #include "r600_drm_public.h"
@@ -135,59 +134,6 @@ static struct radeon *radeon_new(int fd, unsigned device)
 		fprintf(stderr, "Unknown chipset 0x%04X\n", radeon->device);
 		return radeon_decref(radeon);
 	}
-	switch (radeon->family) {
-	case CHIP_R600:
-	case CHIP_RV610:
-	case CHIP_RV630:
-	case CHIP_RV670:
-	case CHIP_RV620:
-	case CHIP_RV635:
-	case CHIP_RS780:
-	case CHIP_RS880:
-	case CHIP_RV770:
-	case CHIP_RV730:
-	case CHIP_RV710:
-	case CHIP_RV740:
-	case CHIP_CEDAR:
-	case CHIP_REDWOOD:
-	case CHIP_JUNIPER:
-	case CHIP_CYPRESS:
-	case CHIP_HEMLOCK:
-	case CHIP_PALM:
-		break;
-	case CHIP_R100:
-	case CHIP_RV100:
-	case CHIP_RS100:
-	case CHIP_RV200:
-	case CHIP_RS200:
-	case CHIP_R200:
-	case CHIP_RV250:
-	case CHIP_RS300:
-	case CHIP_RV280:
-	case CHIP_R300:
-	case CHIP_R350:
-	case CHIP_RV350:
-	case CHIP_RV380:
-	case CHIP_R420:
-	case CHIP_R423:
-	case CHIP_RV410:
-	case CHIP_RS400:
-	case CHIP_RS480:
-	case CHIP_RS600:
-	case CHIP_RS690:
-	case CHIP_RS740:
-	case CHIP_RV515:
-	case CHIP_R520:
-	case CHIP_RV530:
-	case CHIP_RV560:
-	case CHIP_RV570:
-	case CHIP_R580:
-	default:
-		fprintf(stderr, "%s unknown or unsupported chipset 0x%04X\n",
-			__func__, radeon->device);
-		break;
-	}
-
 	/* setup class */
 	switch (radeon->family) {
 	case CHIP_R600:
@@ -216,6 +162,9 @@ static struct radeon *radeon_new(int fd, unsigned device)
 	case CHIP_CYPRESS:
 	case CHIP_HEMLOCK:
 	case CHIP_PALM:
+	case CHIP_BARTS:
+	case CHIP_TURKS:
+	case CHIP_CAICOS:
 		radeon->chip_class = EVERGREEN;
 		/* set default group bytes, overridden by tiling info ioctl */
 		radeon->tiling_info.group_bytes = 512;
@@ -230,12 +179,10 @@ static struct radeon *radeon_new(int fd, unsigned device)
 		if (radeon_drm_get_tiling(radeon))
 			return NULL;
 	}
-	radeon->kman = radeon_bo_pbmgr_create(radeon);
-	if (!radeon->kman)
-		return NULL;
-	radeon->cman = pb_cache_manager_create(radeon->kman, 1000000);
-	if (!radeon->cman)
+	radeon->bomgr = r600_bomgr_create(radeon, 1000000);
+	if (radeon->bomgr == NULL) {
 		return NULL;
+	}
 	return radeon;
 }

@@ -252,11 +199,8 @@ struct radeon *radeon_decref(struct radeon *radeon)
 		return NULL;
 	}

-	if (radeon->cman)
-		radeon->cman->destroy(radeon->cman);
-
-	if (radeon->kman)
-		radeon->kman->destroy(radeon->kman);
+	if (radeon->bomgr)
+		r600_bomgr_destroy(radeon->bomgr);

 	if (radeon->fd >= 0)
 		drmClose(radeon->fd);
--- a/src/gallium/winsys/r600/drm/r600_hw_context.c
+++ b/src/gallium/winsys/r600/drm/r600_hw_context.c
@@ -28,16 +28,15 @@
 #include <string.h>
 #include <stdlib.h>
 #include <assert.h>
-#include "xf86drm.h"
-#include "r600.h"
-#include "r600d.h"
-#include "radeon_drm.h"
-#include "bof.h"
-#include "pipe/p_compiler.h"
-#include "util/u_inlines.h"
-#include "util/u_memory.h"
+#include <pipe/p_compiler.h>
+#include <util/u_inlines.h>
+#include <util/u_memory.h>
 #include <pipebuffer/pb_bufmgr.h>
+#include "xf86drm.h"
+#include "radeon_drm.h"
 #include "r600_priv.h"
+#include "bof.h"
+#include "r600d.h"

 #define GROUP_FORCE_NEW_BLOCK	0

@@ -50,6 +49,7 @@ int r600_context_init_fence(struct r600_context *ctx)
 	}
 	ctx->cfence = r600_bo_map(ctx->radeon, ctx->fence_bo, PB_USAGE_UNSYNCHRONIZED, NULL);
 	*ctx->cfence = 0;
+	ctx->radeon->cfence = ctx->cfence;
 	LIST_INITHEAD(&ctx->fenced_bo);
 	return 0;
 }
@@ -814,6 +814,7 @@ void r600_context_bo_reloc(struct r600_context *ctx, u32 *pm4, struct r600_bo *r
 	ctx->reloc[ctx->creloc].write_domain = rbo->domains & (RADEON_GEM_DOMAIN_GTT | RADEON_GEM_DOMAIN_VRAM);
 	ctx->reloc[ctx->creloc].flags = 0;
 	radeon_bo_reference(ctx->radeon, &ctx->bo[ctx->creloc], bo);
+	rbo->fence = ctx->fence;
 	ctx->creloc++;
 	/* set PKT3 to point to proper reloc */
 	*pm4 = bo->reloc_id;
@@ -836,6 +837,7 @@ void r600_context_pipe_state_set(struct r600_context *ctx, struct r600_pipe_stat
 			/* find relocation */
 			id = block->pm4_bo_index[id];
 			r600_bo_reference(ctx->radeon, &block->reloc[id].bo, state->regs[i].bo);
+			state->regs[i].bo->fence = ctx->fence;
 		}
 		if (!(block->status & R600_BLOCK_STATUS_DIRTY)) {
 			block->status |= R600_BLOCK_STATUS_ENABLED;
@@ -875,10 +877,13 @@ static inline void r600_context_pipe_state_set_resource(struct r600_context *ctx
 		 */
 		r600_bo_reference(ctx->radeon, &block->reloc[1].bo, state->regs[0].bo);
 		r600_bo_reference(ctx->radeon, &block->reloc[2].bo, state->regs[0].bo);
+		state->regs[0].bo->fence = ctx->fence;
 	} else {
 		/* TEXTURE RESOURCE */
 		r600_bo_reference(ctx->radeon, &block->reloc[1].bo, state->regs[2].bo);
 		r600_bo_reference(ctx->radeon, &block->reloc[2].bo, state->regs[3].bo);
+		state->regs[2].bo->fence = ctx->fence;
+		state->regs[3].bo->fence = ctx->fence;
 	}
 	if (!(block->status & R600_BLOCK_STATUS_DIRTY)) {
 		block->status |= R600_BLOCK_STATUS_ENABLED;
--- a/src/gallium/winsys/r600/drm/r600_priv.h
+++ b/src/gallium/winsys/r600/drm/r600_priv.h
@@ -30,24 +30,24 @@
 #include <stdint.h>
 #include <stdlib.h>
 #include <assert.h>
-#include <pipebuffer/pb_bufmgr.h>
-#include "util/u_double_list.h"
+#include <util/u_double_list.h>
+#include <util/u_inlines.h>
+#include <os/os_thread.h>
 #include "r600.h"

+struct r600_bomgr;
+
 struct radeon {
 	int				fd;
 	int				refcount;
 	unsigned			device;
 	unsigned			family;
 	enum chip_class			chip_class;
-	struct pb_manager *kman; /* kernel bo manager */
-	struct pb_manager *cman; /* cached bo manager */
-	struct r600_tiling_info tiling_info;
+	struct r600_tiling_info		tiling_info;
+	struct r600_bomgr		*bomgr;
+	unsigned			*cfence;
 };

-struct radeon *r600_new(int fd, unsigned device);
-void r600_delete(struct radeon *r600);
-
 struct r600_reg {
 	unsigned			opcode;
 	unsigned			offset_base;
@@ -75,25 +75,49 @@ struct radeon_bo {

 struct r600_bo {
 	struct pipe_reference		reference;
-	struct pb_buffer		*pb;
 	unsigned			size;
 	unsigned			tiling_flags;
 	unsigned			kernel_pitch;
 	unsigned			domains;
+	struct radeon_bo		*bo;
+	unsigned			fence;
+	/* manager data */
+	struct list_head		list;
+	unsigned			manager_id;
+	unsigned			alignment;
+	unsigned			offset;
+	int64_t				start;
+	int64_t				end;
 };

+struct r600_bomgr {
+	struct radeon			*radeon;
+	unsigned			usecs;
+	pipe_mutex			mutex;
+	struct list_head		delayed;
+	unsigned			num_delayed;
+};

-/* radeon_pciid.c */
+/*
+ * r600_drm.c
+ */
+struct radeon *r600_new(int fd, unsigned device);
+void r600_delete(struct radeon *r600);
+
+/*
+ * radeon_pciid.c
+ */
 unsigned radeon_family_from_device(unsigned device);

-/* radeon_bo.c */
+/*
+ * radeon_bo.c
+ */
 struct radeon_bo *radeon_bo(struct radeon *radeon, unsigned handle,
 			    unsigned size, unsigned alignment);
 void radeon_bo_reference(struct radeon *radeon, struct radeon_bo **dst,
 			 struct radeon_bo *src);
 int radeon_bo_wait(struct radeon *radeon, struct radeon_bo *bo);
 int radeon_bo_busy(struct radeon *radeon, struct radeon_bo *bo, uint32_t *domain);
-void radeon_bo_pbmgr_flush_maps(struct pb_manager *_mgr);
 int radeon_bo_fencelist(struct radeon *radeon, struct radeon_bo **bolist, uint32_t num_bo);
 int radeon_bo_get_tiling_flags(struct radeon *radeon,
 			       struct radeon_bo *bo,
@@ -103,13 +127,9 @@ int radeon_bo_get_name(struct radeon *radeon,
 		       struct radeon_bo *bo,
 		       uint32_t *name);

-/* radeon_bo_pb.c */
-struct radeon_bo *radeon_bo_pb_get_bo(struct pb_buffer *_buf);
-struct pb_manager *radeon_bo_pbmgr_create(struct radeon *radeon);
-struct pb_buffer *radeon_bo_pb_create_buffer_from_handle(struct pb_manager *_mgr,
-							 uint32_t handle);
-
-/* r600_hw_context.c */
+/*
+ * r600_hw_context.c
+ */
 int r600_context_init_fence(struct r600_context *ctx);
 void r600_context_bo_reloc(struct r600_context *ctx, u32 *pm4, struct r600_bo *rbo);
 void r600_context_bo_flush(struct r600_context *ctx, unsigned flush_flags,
@@ -117,14 +137,27 @@ void r600_context_bo_flush(struct r600_context *ctx, unsigned flush_flags,
 struct r600_bo *r600_context_reg_bo(struct r600_context *ctx, unsigned offset);
 int r600_context_add_block(struct r600_context *ctx, const struct r600_reg *reg, unsigned nreg);

-/* r600_bo.c */
-unsigned r600_bo_get_handle(struct r600_bo *bo);
-unsigned r600_bo_get_size(struct r600_bo *bo);
-static INLINE struct radeon_bo *r600_bo_get_bo(struct r600_bo *bo)
-{
-	return radeon_bo_pb_get_bo(bo->pb);
-}
+/*
+ * r600_bo.c
+ */
+void r600_bo_destroy(struct radeon *radeon, struct r600_bo *bo);

+/*
+ * r600_bomgr.c
+ */
+struct r600_bomgr *r600_bomgr_create(struct radeon *radeon, unsigned usecs);
+void r600_bomgr_destroy(struct r600_bomgr *mgr);
+bool r600_bomgr_bo_destroy(struct r600_bomgr *mgr, struct r600_bo *bo);
+void r600_bomgr_bo_init(struct r600_bomgr *mgr, struct r600_bo *bo);
+struct r600_bo *r600_bomgr_bo_create(struct r600_bomgr *mgr,
+					unsigned size,
+					unsigned alignment,
+					unsigned cfence);
+
+
+/*
+ * helpers
+ */
 #define CTX_RANGE_ID(ctx, offset) (((offset) >> (ctx)->hash_shift) & 255)
 #define CTX_BLOCK_ID(ctx, offset) ((offset) & ((1 << (ctx)->hash_shift) - 1))

@@ -172,6 +205,9 @@ static inline void r600_context_block_emit_dirty(struct r600_context *ctx, struc
 	LIST_DELINIT(&block->list);
 }

+/*
+ * radeon_bo.c
+ */
 static inline int radeon_bo_map(struct radeon *radeon, struct radeon_bo *bo)
 {
 	bo->map_count++;
@@ -184,4 +220,35 @@ static inline void radeon_bo_unmap(struct radeon *radeon, struct radeon_bo *bo)
 	assert(bo->map_count >= 0);
 }

+/*
+ * r600_bo
+ */
+static inline struct radeon_bo *r600_bo_get_bo(struct r600_bo *bo)
+{
+	return bo->bo;
+}
+
+static unsigned inline r600_bo_get_handle(struct r600_bo *bo)
+{
+	return bo->bo->handle;
+}
+
+static unsigned inline r600_bo_get_size(struct r600_bo *bo)
+{
+	return bo->size;
+}
+
+/*
+ * fence
+ */
+static inline bool fence_is_after(unsigned fence, unsigned ofence)
+{
+	/* handle wrap around */
+	if (fence < 0x80000000 && ofence > 0x80000000)
+		return TRUE;
+	if (fence > ofence)
+		return TRUE;
+	return FALSE;
+}
+
 #endif
--- a/src/gallium/winsys/r600/drm/radeon_bo_pb.c
+++ b/src/gallium/winsys/r600/drm/radeon_bo_pb.c
@@ -1,260 +0,0 @@
-/*
- * Copyright 2010 Dave Airlie
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * on the rights to use, copy, modify, merge, publish, distribute, sub
- * license, and/or sell copies of the Software, and to permit persons to whom
- * the Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
- * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
- * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
- * USE OR OTHER DEALINGS IN THE SOFTWARE.
- *
- * Authors:
- *      Dave Airlie
- */
-#include <util/u_inlines.h>
-#include <util/u_memory.h>
-#include <util/u_double_list.h>
-#include <pipebuffer/pb_buffer.h>
-#include <pipebuffer/pb_bufmgr.h>
-#include "r600_priv.h"
-
-struct radeon_bo_pb {
-	struct pb_buffer b;
-	struct radeon_bo *bo;
-
-	struct radeon_bo_pbmgr *mgr;
-};
-
-extern const struct pb_vtbl radeon_bo_pb_vtbl;
-
-static INLINE struct radeon_bo_pb *radeon_bo_pb(struct pb_buffer *buf)
-{
-	assert(buf);
-	assert(buf->vtbl == &radeon_bo_pb_vtbl);
-	return (struct radeon_bo_pb *)buf;
-}
-
-struct radeon_bo_pbmgr {
-	struct pb_manager b;
-	struct radeon *radeon;
-};
-
-static INLINE struct radeon_bo_pbmgr *radeon_bo_pbmgr(struct pb_manager *mgr)
-{
-	assert(mgr);
-	return (struct radeon_bo_pbmgr *)mgr;
-}
-
-static void radeon_bo_pb_destroy(struct pb_buffer *_buf)
-{
-	struct radeon_bo_pb *buf = radeon_bo_pb(_buf);
-
-	/* If this buffer is on the list of buffers to unmap,
-	 * do the unmapping now.
-	 */
-	radeon_bo_unmap(buf->mgr->radeon, buf->bo);
-	radeon_bo_reference(buf->mgr->radeon, &buf->bo, NULL);
-	FREE(buf);
-}
-
-static void *
-radeon_bo_pb_map_internal(struct pb_buffer *_buf,
-			  unsigned flags, void *ctx)
-{
-	struct radeon_bo_pb *buf = radeon_bo_pb(_buf);
-	struct pipe_context *pctx = ctx;
-
-	if (flags & PB_USAGE_UNSYNCHRONIZED) {
-		if (radeon_bo_map(buf->mgr->radeon, buf->bo)) {
-			return NULL;
-		}
-		return buf->bo->data;
-	}
-
-	if (p_atomic_read(&buf->bo->reference.count) > 1) {
-		if (flags & PB_USAGE_DONTBLOCK) {
-			return NULL;
-		}
-		if (ctx) {
-			pctx->flush(pctx, 0, NULL);
-		}
-	}
-
-	if (flags & PB_USAGE_DONTBLOCK) {
-		uint32_t domain;
-		if (radeon_bo_busy(buf->mgr->radeon, buf->bo, &domain))
-			return NULL;
-		if (radeon_bo_map(buf->mgr->radeon, buf->bo)) {
-			return NULL;
-		}
-		goto out;
-	}
-
-	if (radeon_bo_map(buf->mgr->radeon, buf->bo)) {
-		return NULL;
-	}
-	if (radeon_bo_wait(buf->mgr->radeon, buf->bo)) {
-		radeon_bo_unmap(buf->mgr->radeon, buf->bo);
-		return NULL;
-	}
-out:
-	return buf->bo->data;
-}
-
-static void radeon_bo_pb_unmap_internal(struct pb_buffer *_buf)
-{
-}
-
-static void
-radeon_bo_pb_get_base_buffer(struct pb_buffer *buf,
-			     struct pb_buffer **base_buf,
-			     unsigned *offset)
-{
-	*base_buf = buf;
-	*offset = 0;
-}
-
-static enum pipe_error
-radeon_bo_pb_validate(struct pb_buffer *_buf, 
-		      struct pb_validate *vl,
-		      unsigned flags)
-{
-	/* Always pinned */
-	return PIPE_OK;
-}
-
-static void
-radeon_bo_pb_fence(struct pb_buffer *buf,
-		   struct pipe_fence_handle *fence)
-{
-}
-
-const struct pb_vtbl radeon_bo_pb_vtbl = {
-    radeon_bo_pb_destroy,
-    radeon_bo_pb_map_internal,
-    radeon_bo_pb_unmap_internal,
-    radeon_bo_pb_validate,
-    radeon_bo_pb_fence,
-    radeon_bo_pb_get_base_buffer,
-};
-
-struct pb_buffer *
-radeon_bo_pb_create_buffer_from_handle(struct pb_manager *_mgr,
-				       uint32_t handle)
-{
-	struct radeon_bo_pbmgr *mgr = radeon_bo_pbmgr(_mgr);
-	struct radeon *radeon = mgr->radeon;
-	struct radeon_bo_pb *bo;
-	struct radeon_bo *hw_bo;
-
-	hw_bo = radeon_bo(radeon, handle, 0, 0);
-	if (hw_bo == NULL)
-		return NULL;
-
-	bo = CALLOC_STRUCT(radeon_bo_pb);
-	if (!bo) {
-		radeon_bo_reference(radeon, &hw_bo, NULL);
-		return NULL;
-	}
-
-	pipe_reference_init(&bo->b.base.reference, 1);
-	bo->b.base.alignment = 0;
-	bo->b.base.usage = PB_USAGE_GPU_WRITE | PB_USAGE_GPU_READ;
-	bo->b.base.size = hw_bo->size;
-	bo->b.vtbl = &radeon_bo_pb_vtbl;
-	bo->mgr = mgr;
-
-	bo->bo = hw_bo;
-
-	return &bo->b;
-}
-
-static struct pb_buffer *
-radeon_bo_pb_create_buffer(struct pb_manager *_mgr,
-			   pb_size size,
-			   const struct pb_desc *desc)
-{
-	struct radeon_bo_pbmgr *mgr = radeon_bo_pbmgr(_mgr);
-	struct radeon *radeon = mgr->radeon;
-	struct radeon_bo_pb *bo;
-
-	bo = CALLOC_STRUCT(radeon_bo_pb);
-	if (!bo)
-		goto error1;
-
-	pipe_reference_init(&bo->b.base.reference, 1);
-	bo->b.base.alignment = desc->alignment;
-	bo->b.base.usage = desc->usage;
-	bo->b.base.size = size;
-	bo->b.vtbl = &radeon_bo_pb_vtbl;
-	bo->mgr = mgr;
-
-	bo->bo = radeon_bo(radeon, 0, size, desc->alignment);
-	if (bo->bo == NULL)
-		goto error2;
-	return &bo->b;
-
-error2:
-	FREE(bo);
-error1:
-	return NULL;
-}
-
-static void
-radeon_bo_pbmgr_flush(struct pb_manager *mgr)
-{
-    /* NOP */
-}
-
-static void
-radeon_bo_pbmgr_destroy(struct pb_manager *_mgr)
-{
-	struct radeon_bo_pbmgr *mgr = radeon_bo_pbmgr(_mgr);
-	FREE(mgr);
-}
-
-struct pb_manager *radeon_bo_pbmgr_create(struct radeon *radeon)
-{
-	struct radeon_bo_pbmgr *mgr;
-
-	mgr = CALLOC_STRUCT(radeon_bo_pbmgr);
-	if (!mgr)
-		return NULL;
-
-	mgr->b.destroy = radeon_bo_pbmgr_destroy;
-	mgr->b.create_buffer = radeon_bo_pb_create_buffer;
-	mgr->b.flush = radeon_bo_pbmgr_flush;
-
-	mgr->radeon = radeon;
-	return &mgr->b;
-}
-
-struct radeon_bo *radeon_bo_pb_get_bo(struct pb_buffer *_buf)
-{
-	struct radeon_bo_pb *buf;
-	if (_buf->vtbl == &radeon_bo_pb_vtbl) {
-		buf = radeon_bo_pb(_buf);
-		return buf->bo;
-	} else {
-		struct pb_buffer *base_buf;
-		pb_size offset;
-		pb_get_base_buffer(_buf, &base_buf, &offset);
-		if (base_buf->vtbl == &radeon_bo_pb_vtbl) {
-			buf = radeon_bo_pb(base_buf);
-			return buf->bo;
-		}
-	}
-	return NULL;
-}
--- a/src/gallium/winsys/r600/drm/radeon_pciid.c
+++ b/src/gallium/winsys/r600/drm/radeon_pciid.c
@@ -445,6 +445,42 @@ struct pci_id radeon_pci_id[] = {
 	{0x1002, 0x9803, CHIP_PALM},
 	{0x1002, 0x9804, CHIP_PALM},
 	{0x1002, 0x9805, CHIP_PALM},
+	{0x1002, 0x6720, CHIP_BARTS},
+	{0x1002, 0x6721, CHIP_BARTS},
+	{0x1002, 0x6722, CHIP_BARTS},
+	{0x1002, 0x6723, CHIP_BARTS},
+	{0x1002, 0x6724, CHIP_BARTS},
+	{0x1002, 0x6725, CHIP_BARTS},
+	{0x1002, 0x6726, CHIP_BARTS},
+	{0x1002, 0x6727, CHIP_BARTS},
+	{0x1002, 0x6728, CHIP_BARTS},
+	{0x1002, 0x6729, CHIP_BARTS},
+	{0x1002, 0x6738, CHIP_BARTS},
+	{0x1002, 0x6739, CHIP_BARTS},
+	{0x1002, 0x6740, CHIP_TURKS},
+	{0x1002, 0x6741, CHIP_TURKS},
+	{0x1002, 0x6742, CHIP_TURKS},
+	{0x1002, 0x6743, CHIP_TURKS},
+	{0x1002, 0x6744, CHIP_TURKS},
+	{0x1002, 0x6745, CHIP_TURKS},
+	{0x1002, 0x6746, CHIP_TURKS},
+	{0x1002, 0x6747, CHIP_TURKS},
+	{0x1002, 0x6748, CHIP_TURKS},
+	{0x1002, 0x6749, CHIP_TURKS},
+	{0x1002, 0x6750, CHIP_TURKS},
+	{0x1002, 0x6758, CHIP_TURKS},
+	{0x1002, 0x6759, CHIP_TURKS},
+	{0x1002, 0x6760, CHIP_CAICOS},
+	{0x1002, 0x6761, CHIP_CAICOS},
+	{0x1002, 0x6762, CHIP_CAICOS},
+	{0x1002, 0x6763, CHIP_CAICOS},
+	{0x1002, 0x6764, CHIP_CAICOS},
+	{0x1002, 0x6765, CHIP_CAICOS},
+	{0x1002, 0x6766, CHIP_CAICOS},
+	{0x1002, 0x6767, CHIP_CAICOS},
+	{0x1002, 0x6768, CHIP_CAICOS},
+	{0x1002, 0x6770, CHIP_CAICOS},
+	{0x1002, 0x6779, CHIP_CAICOS},
 	{0, 0},
 };

--- a/src/glsl/Makefile
+++ b/src/glsl/Makefile
@@ -16,6 +16,7 @@ GLCPP_SOURCES = \
 	glcpp/glcpp.c

 C_SOURCES = \
+	strtod.c \
 	$(LIBGLCPP_SOURCES)

 CXX_SOURCES = \
--- a/src/glsl/SConscript
+++ b/src/glsl/SConscript
@@ -76,6 +76,7 @@ sources = [
    'opt_swizzle_swizzle.cpp',
    'opt_tree_grafting.cpp',
    's_expression.cpp',
+    'strtod.c',
 ]

 glsl = env.ConvenienceLibrary(
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -1567,18 +1567,38 @@ ast_expression::hir(exec_list *instructions,
 	 }
      }

-      /* From section 4.1.7 of the GLSL 1.30 spec:
+      /* From page 23 (29 of the PDF) of the GLSL 1.30 spec:
+       *
       *    "Samplers aggregated into arrays within a shader (using square
       *    brackets [ ]) can only be indexed with integral constant
       *    expressions [...]."
+       *
+       * This restriction was added in GLSL 1.30.  Shaders using earlier version
+       * of the language should not be rejected by the compiler front-end for
+       * using this construct.  This allows useful things such as using a loop
+       * counter as the index to an array of samplers.  If the loop in unrolled,
+       * the code should compile correctly.  Instead, emit a warning.
       */
      if (array->type->is_array() &&
          array->type->element_type()->is_sampler() &&
          const_index == NULL) {

-         _mesa_glsl_error(&loc, state, "sampler arrays can only be indexed "
-                          "with constant expressions");
-         error_emitted = true;
+	 if (state->language_version == 100) {
+	    _mesa_glsl_warning(&loc, state,
+			       "sampler arrays indexed with non-constant "
+			       "expressions is optional in GLSL ES 1.00");
+	 } else if (state->language_version < 130) {
+	    _mesa_glsl_warning(&loc, state,
+			       "sampler arrays indexed with non-constant "
+			       "expressions is forbidden in GLSL 1.30 and "
+			       "later");
+	 } else {
+	    _mesa_glsl_error(&loc, state,
+			     "sampler arrays indexed with non-constant "
+			     "expressions is forbidden in GLSL 1.30 and "
+			     "later");
+	    error_emitted = true;
+	 }
      }

      if (error_emitted)
@@ -2242,6 +2262,17 @@ ast_declarator_list::hir(exec_list *instructions,
 	    if (this->type->qualifier.flags.q.constant)
 	       var->read_only = false;

+	    /* Never emit code to initialize a uniform.
+	     */
+	    const glsl_type *initializer_type;
+	    if (!this->type->qualifier.flags.q.uniform) {
+	       result = do_assignment(&initializer_instructions, state,
+				      lhs, rhs,
+				      this->get_location());
+	       initializer_type = result->type;
+	    } else
+	       initializer_type = rhs->type;
+
 	    /* If the declared variable is an unsized array, it must inherrit
 	     * its full type from the initializer.  A declaration such as
 	     *
@@ -2256,16 +2287,14 @@ ast_declarator_list::hir(exec_list *instructions,
 	     *
 	     * If the declared variable is not an array, the types must
 	     * already match exactly.  As a result, the type assignment
-	     * here can be done unconditionally.
+	     * here can be done unconditionally.  For non-uniforms the call
+	     * to do_assignment can change the type of the initializer (via
+	     * the implicit conversion rules).  For uniforms the initializer
+	     * must be a constant expression, and the type of that expression
+	     * was validated above.
 	     */
-	    var->type = rhs->type;
+	    var->type = initializer_type;

-	    /* Never emit code to initialize a uniform.
-	     */
-	    if (!this->type->qualifier.flags.q.uniform)
-	       result = do_assignment(&initializer_instructions, state,
-				      lhs, rhs,
-				      this->get_location());
 	    var->read_only = temp;
 	 }
      }
--- a/src/glsl/builtin_function.cpp
+++ b/src/glsl/builtin_function.cpp
@@ -3010,40 +3010,26 @@ static const char builtin_smoothstep[] =
   "       (declare (in) float edge1)\n"
   "       (declare (in) float x))\n"
   "     ((declare () float t)\n"
-   "\n"
   "      (assign (constant bool (1)) (x) (var_ref t)\n"
   "              (expression float max\n"
   "	                  (expression float min\n"
   "	                              (expression float / (expression float - (var_ref x) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
   "	                              (constant float (1.0)))\n"
   "	                  (constant float (0.0))))\n"
-   "      (return (expression float * (var_ref t) (expression float * (var_ref t) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (var_ref t))))))))\n"
-   "\n"
+   "      (return (expression float * (var_ref t) (expression float * (var_ref t) (expression float - (constant float (3.0)) (expression float * (constant float (2.0)) (var_ref t))))))))\n"
   "   (signature vec2\n"
   "     (parameters\n"
   "       (declare (in) float edge0)\n"
   "       (declare (in) float edge1)\n"
   "       (declare (in) vec2 x))\n"
   "     ((declare () vec2 t)\n"
-   "      (declare () vec2 retval)\n"
-   "\n"
-   "      (assign (constant bool (1)) (x) (var_ref t)\n"
-   "              (expression float max\n"
-   "	                  (expression float min\n"
-   "	                              (expression float / (expression float - (swiz x (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
+   "      (assign (constant bool (1)) (xy) (var_ref t)\n"
+   "              (expression vec2 max\n"
+   "	                  (expression vec2 min\n"
+   "	                              (expression vec2 / (expression vec2 - (var_ref x) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
   "	                              (constant float (1.0)))\n"
   "	                  (constant float (0.0))))\n"
-   "      (assign (constant bool (1)) (x) (var_ref retval) (expression float * (swiz x (var_ref t)) (expression float * (swiz x (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz x (var_ref t)))))))\n"
-   "\n"
-   "      (assign (constant bool (1)) (y) (var_ref t)\n"
-   "              (expression float max\n"
-   "	                  (expression float min\n"
-   "	                              (expression float / (expression float - (swiz y (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
-   "	                              (constant float (1.0)))\n"
-   "	                  (constant float (0.0))))\n"
-   "      (assign (constant bool (1)) (y) (var_ref retval) (expression float * (swiz y (var_ref t)) (expression float * (swiz y (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz y (var_ref t)))))))\n"
-   "      (return (var_ref retval))\n"
-   "      ))\n"
+   "      (return (expression vec2 * (var_ref t) (expression vec2 * (var_ref t) (expression vec2 - (constant float (3.0)) (expression vec2 * (constant float (2.0)) (var_ref t))))))))\n"
   "\n"
   "   (signature vec3\n"
   "     (parameters\n"
@@ -3051,33 +3037,13 @@ static const char builtin_smoothstep[] =
   "       (declare (in) float edge1)\n"
   "       (declare (in) vec3 x))\n"
   "     ((declare () vec3 t)\n"
-   "      (declare () vec3 retval)\n"
-   "\n"
-   "      (assign (constant bool (1)) (x) (var_ref t)\n"
-   "              (expression float max\n"
-   "	                  (expression float min\n"
-   "	                              (expression float / (expression float - (swiz x (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
+   "      (assign (constant bool (1)) (xyz) (var_ref t)\n"
+   "              (expression vec3 max\n"
+   "	                  (expression vec3 min\n"
+   "	                              (expression vec3 / (expression vec3 - (var_ref x) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
   "	                              (constant float (1.0)))\n"
   "	                  (constant float (0.0))))\n"
-   "      (assign (constant bool (1)) (x) (var_ref retval) (expression float * (swiz x (var_ref t)) (expression float * (swiz x (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz x (var_ref t)))))))\n"
-   "\n"
-   "      (assign (constant bool (1)) (y) (var_ref t)\n"
-   "              (expression float max\n"
-   "	                  (expression float min\n"
-   "	                              (expression float / (expression float - (swiz y (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
-   "	                              (constant float (1.0)))\n"
-   "	                  (constant float (0.0))))\n"
-   "      (assign (constant bool (1)) (y) (var_ref retval) (expression float * (swiz y (var_ref t)) (expression float * (swiz y (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz y (var_ref t)))))))\n"
-   "\n"
-   "      (assign (constant bool (1)) (z) (var_ref t)\n"
-   "              (expression float max\n"
-   "	                  (expression float min\n"
-   "	                              (expression float / (expression float - (swiz z (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
-   "	                              (constant float (1.0)))\n"
-   "	                  (constant float (0.0))))\n"
-   "      (assign (constant bool (1)) (z) (var_ref retval) (expression float * (swiz z (var_ref t)) (expression float * (swiz z (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz z (var_ref t)))))))\n"
-   "      (return (var_ref retval))\n"
-   "      ))\n"
+   "      (return (expression vec3 * (var_ref t) (expression vec3 * (var_ref t) (expression vec3 - (constant float (3.0)) (expression vec3 * (constant float (2.0)) (var_ref t))))))))\n"
   "\n"
   "\n"
   "   (signature vec4\n"
@@ -3086,74 +3052,55 @@ static const char builtin_smoothstep[] =
   "       (declare (in) float edge1)\n"
   "       (declare (in) vec4 x))\n"
   "     ((declare () vec4 t)\n"
-   "      (declare () vec4 retval)\n"
-   "\n"
-   "      (assign (constant bool (1)) (x) (var_ref t)\n"
-   "              (expression float max\n"
-   "	                  (expression float min\n"
-   "	                              (expression float / (expression float - (swiz x (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
+   "      (assign (constant bool (1)) (xyzw) (var_ref t)\n"
+   "              (expression vec4 max\n"
+   "	                  (expression vec4 min\n"
+   "	                              (expression vec4 / (expression vec4 - (var_ref x) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
   "	                              (constant float (1.0)))\n"
   "	                  (constant float (0.0))))\n"
-   "      (assign (constant bool (1)) (x) (var_ref retval) (expression float * (swiz x (var_ref t)) (expression float * (swiz x (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz x (var_ref t)))))))\n"
-   "\n"
-   "      (assign (constant bool (1)) (y) (var_ref t)\n"
-   "              (expression float max\n"
-   "	                  (expression float min\n"
-   "	                              (expression float / (expression float - (swiz y (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
-   "	                              (constant float (1.0)))\n"
-   "	                  (constant float (0.0))))\n"
-   "      (assign (constant bool (1)) (y) (var_ref retval) (expression float * (swiz y (var_ref t)) (expression float * (swiz y (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz y (var_ref t)))))))\n"
-   "\n"
-   "      (assign (constant bool (1)) (z) (var_ref t)\n"
-   "              (expression float max\n"
-   "	                  (expression float min\n"
-   "	                              (expression float / (expression float - (swiz z (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
-   "	                              (constant float (1.0)))\n"
-   "	                  (constant float (0.0))))\n"
-   "      (assign (constant bool (1)) (z) (var_ref retval) (expression float * (swiz z (var_ref t)) (expression float * (swiz z (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz z (var_ref t)))))))\n"
-   "\n"
-   "      (assign (constant bool (1)) (w) (var_ref t)\n"
-   "              (expression float max\n"
-   "	                  (expression float min\n"
-   "	                              (expression float / (expression float - (swiz w (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))\n"
-   "	                              (constant float (1.0)))\n"
-   "	                  (constant float (0.0))))\n"
-   "      (assign (constant bool (1)) (w) (var_ref retval) (expression float * (swiz w (var_ref t)) (expression float * (swiz w (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz w (var_ref t)))))))\n"
-   "      (return (var_ref retval))\n"
-   "      ))\n"
+   "      (return (expression vec4 * (var_ref t) (expression vec4 * (var_ref t) (expression vec4 - (constant float (3.0)) (expression vec4 * (constant float (2.0)) (var_ref t))))))))\n"
   "\n"
   "   (signature vec2\n"
   "     (parameters\n"
   "       (declare (in) vec2 edge0)\n"
   "       (declare (in) vec2 edge1)\n"
   "       (declare (in) vec2 x))\n"
-   "     ((return (expression vec2 max\n"
+   "     ((declare () vec2 t)\n"
+   "      (assign (constant bool (1)) (xy) (var_ref t)\n"
+   "              (expression vec2 max\n"
   "                          (expression vec2 min\n"
   "                                      (expression vec2 / (expression vec2 - (var_ref x) (var_ref edge0)) (expression vec2 - (var_ref edge1) (var_ref edge0)))\n"
-   "                                      (constant vec2 (1.0 1.0)))\n"
-   "                          (constant vec2 (0.0 0.0))))))\n"
+   "                                      (constant float (1.0)))\n"
+   "                          (constant float (0.0))))\n"
+   "      (return (expression vec2 * (var_ref t) (expression vec2 * (var_ref t) (expression vec2 - (constant float (3.0)) (expression vec2 * (constant float (2.0)) (var_ref t))))))))\n"
   "\n"
   "   (signature vec3\n"
   "     (parameters\n"
   "       (declare (in) vec3 edge0)\n"
   "       (declare (in) vec3 edge1)\n"
   "       (declare (in) vec3 x))\n"
-   "     ((return (expression vec3 max\n"
+   "     ((declare () vec3 t)\n"
+   "      (assign (constant bool (1)) (xyz) (var_ref t)\n"
+   "              (expression vec3 max\n"
   "                          (expression vec3 min\n"
   "                                      (expression vec3 / (expression vec3 - (var_ref x) (var_ref edge0)) (expression vec3 - (var_ref edge1) (var_ref edge0)))\n"
-   "                                      (constant vec3 (1.0 1.0 1.0)))\n"
-   "                          (constant vec3 (0.0 0.0 0.0))))))\n"
+   "                                      (constant float (1.0)))\n"
+   "                          (constant float (0.0))))\n"
+   "      (return (expression vec3 * (var_ref t) (expression vec3 * (var_ref t) (expression vec3 - (constant float (3.0)) (expression vec3 * (constant float (2.0)) (var_ref t))))))))\n"
   "\n"
   "   (signature vec4\n"
   "     (parameters\n"
   "       (declare (in) vec4 edge0)\n"
   "       (declare (in) vec4 edge1)\n"
   "       (declare (in) vec4 x))\n"
-   "     ((return (expression vec4 max\n"
+   "     ((declare () vec4 t)\n"
+   "      (assign (constant bool (1)) (xyzw) (var_ref t)\n"
+   "              (expression vec4 max\n"
   "                          (expression vec4 min\n"
   "                                      (expression vec4 / (expression vec4 - (var_ref x) (var_ref edge0)) (expression vec4 - (var_ref edge1) (var_ref edge0)))\n"
-   "                                      (constant vec4 (1.0 1.0 1.0 1.0)))\n"
-   "                          (constant vec4 (0.0 0.0 0.0 0.0))))))\n"
+   "                                      (constant float (1.0)))\n"
+   "                          (constant float (0.0))))\n"
+   "      (return (expression vec4 * (var_ref t) (expression vec4 * (var_ref t) (expression vec4 - (constant float (3.0)) (expression vec4 * (constant float (2.0)) (var_ref t))))))))\n"
   "))\n"
   "\n"
   ""
--- a/src/glsl/builtins/ir/smoothstep
+++ b/src/glsl/builtins/ir/smoothstep
@@ -5,40 +5,26 @@
       (declare (in) float edge1)
       (declare (in) float x))
     ((declare () float t)
-
      (assign (constant bool (1)) (x) (var_ref t)
              (expression float max
 	                  (expression float min
 	                              (expression float / (expression float - (var_ref x) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
 	                              (constant float (1.0)))
 	                  (constant float (0.0))))
-      (return (expression float * (var_ref t) (expression float * (var_ref t) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (var_ref t))))))))
-
+      (return (expression float * (var_ref t) (expression float * (var_ref t) (expression float - (constant float (3.0)) (expression float * (constant float (2.0)) (var_ref t))))))))
   (signature vec2
     (parameters
       (declare (in) float edge0)
       (declare (in) float edge1)
       (declare (in) vec2 x))
     ((declare () vec2 t)
-      (declare () vec2 retval)
-
-      (assign (constant bool (1)) (x) (var_ref t)
-              (expression float max
-	                  (expression float min
-	                              (expression float / (expression float - (swiz x (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
+      (assign (constant bool (1)) (xy) (var_ref t)
+              (expression vec2 max
+	                  (expression vec2 min
+	                              (expression vec2 / (expression vec2 - (var_ref x) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
 	                              (constant float (1.0)))
 	                  (constant float (0.0))))
-      (assign (constant bool (1)) (x) (var_ref retval) (expression float * (swiz x (var_ref t)) (expression float * (swiz x (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz x (var_ref t)))))))
-
-      (assign (constant bool (1)) (y) (var_ref t)
-              (expression float max
-	                  (expression float min
-	                              (expression float / (expression float - (swiz y (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
-	                              (constant float (1.0)))
-	                  (constant float (0.0))))
-      (assign (constant bool (1)) (y) (var_ref retval) (expression float * (swiz y (var_ref t)) (expression float * (swiz y (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz y (var_ref t)))))))
-      (return (var_ref retval))
-      ))
+      (return (expression vec2 * (var_ref t) (expression vec2 * (var_ref t) (expression vec2 - (constant float (3.0)) (expression vec2 * (constant float (2.0)) (var_ref t))))))))

   (signature vec3
     (parameters
@@ -46,33 +32,13 @@
       (declare (in) float edge1)
       (declare (in) vec3 x))
     ((declare () vec3 t)
-      (declare () vec3 retval)
-
-      (assign (constant bool (1)) (x) (var_ref t)
-              (expression float max
-	                  (expression float min
-	                              (expression float / (expression float - (swiz x (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
+      (assign (constant bool (1)) (xyz) (var_ref t)
+              (expression vec3 max
+	                  (expression vec3 min
+	                              (expression vec3 / (expression vec3 - (var_ref x) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
 	                              (constant float (1.0)))
 	                  (constant float (0.0))))
-      (assign (constant bool (1)) (x) (var_ref retval) (expression float * (swiz x (var_ref t)) (expression float * (swiz x (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz x (var_ref t)))))))
-
-      (assign (constant bool (1)) (y) (var_ref t)
-              (expression float max
-	                  (expression float min
-	                              (expression float / (expression float - (swiz y (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
-	                              (constant float (1.0)))
-	                  (constant float (0.0))))
-      (assign (constant bool (1)) (y) (var_ref retval) (expression float * (swiz y (var_ref t)) (expression float * (swiz y (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz y (var_ref t)))))))
-
-      (assign (constant bool (1)) (z) (var_ref t)
-              (expression float max
-	                  (expression float min
-	                              (expression float / (expression float - (swiz z (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
-	                              (constant float (1.0)))
-	                  (constant float (0.0))))
-      (assign (constant bool (1)) (z) (var_ref retval) (expression float * (swiz z (var_ref t)) (expression float * (swiz z (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz z (var_ref t)))))))
-      (return (var_ref retval))
-      ))
+      (return (expression vec3 * (var_ref t) (expression vec3 * (var_ref t) (expression vec3 - (constant float (3.0)) (expression vec3 * (constant float (2.0)) (var_ref t))))))))


   (signature vec4
@@ -81,73 +47,54 @@
       (declare (in) float edge1)
       (declare (in) vec4 x))
     ((declare () vec4 t)
-      (declare () vec4 retval)
-
-      (assign (constant bool (1)) (x) (var_ref t)
-              (expression float max
-	                  (expression float min
-	                              (expression float / (expression float - (swiz x (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
+      (assign (constant bool (1)) (xyzw) (var_ref t)
+              (expression vec4 max
+	                  (expression vec4 min
+	                              (expression vec4 / (expression vec4 - (var_ref x) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
 	                              (constant float (1.0)))
 	                  (constant float (0.0))))
-      (assign (constant bool (1)) (x) (var_ref retval) (expression float * (swiz x (var_ref t)) (expression float * (swiz x (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz x (var_ref t)))))))
-
-      (assign (constant bool (1)) (y) (var_ref t)
-              (expression float max
-	                  (expression float min
-	                              (expression float / (expression float - (swiz y (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
-	                              (constant float (1.0)))
-	                  (constant float (0.0))))
-      (assign (constant bool (1)) (y) (var_ref retval) (expression float * (swiz y (var_ref t)) (expression float * (swiz y (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz y (var_ref t)))))))
-
-      (assign (constant bool (1)) (z) (var_ref t)
-              (expression float max
-	                  (expression float min
-	                              (expression float / (expression float - (swiz z (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
-	                              (constant float (1.0)))
-	                  (constant float (0.0))))
-      (assign (constant bool (1)) (z) (var_ref retval) (expression float * (swiz z (var_ref t)) (expression float * (swiz z (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz z (var_ref t)))))))
-
-      (assign (constant bool (1)) (w) (var_ref t)
-              (expression float max
-	                  (expression float min
-	                              (expression float / (expression float - (swiz w (var_ref x)) (var_ref edge0)) (expression float - (var_ref edge1) (var_ref edge0)))
-	                              (constant float (1.0)))
-	                  (constant float (0.0))))
-      (assign (constant bool (1)) (w) (var_ref retval) (expression float * (swiz w (var_ref t)) (expression float * (swiz w (var_ref t)) (expression float - (constant float (3.000000)) (expression float * (constant float (2.000000)) (swiz w (var_ref t)))))))
-      (return (var_ref retval))
-      ))
+      (return (expression vec4 * (var_ref t) (expression vec4 * (var_ref t) (expression vec4 - (constant float (3.0)) (expression vec4 * (constant float (2.0)) (var_ref t))))))))

   (signature vec2
     (parameters
       (declare (in) vec2 edge0)
       (declare (in) vec2 edge1)
       (declare (in) vec2 x))
-     ((return (expression vec2 max
+     ((declare () vec2 t)
+      (assign (constant bool (1)) (xy) (var_ref t)
+              (expression vec2 max
                          (expression vec2 min
                                      (expression vec2 / (expression vec2 - (var_ref x) (var_ref edge0)) (expression vec2 - (var_ref edge1) (var_ref edge0)))
-                                      (constant vec2 (1.0 1.0)))
-                          (constant vec2 (0.0 0.0))))))
+                                      (constant float (1.0)))
+                          (constant float (0.0))))
+      (return (expression vec2 * (var_ref t) (expression vec2 * (var_ref t) (expression vec2 - (constant float (3.0)) (expression vec2 * (constant float (2.0)) (var_ref t))))))))

   (signature vec3
     (parameters
       (declare (in) vec3 edge0)
       (declare (in) vec3 edge1)
       (declare (in) vec3 x))
-     ((return (expression vec3 max
+     ((declare () vec3 t)
+      (assign (constant bool (1)) (xyz) (var_ref t)
+              (expression vec3 max
                          (expression vec3 min
                                      (expression vec3 / (expression vec3 - (var_ref x) (var_ref edge0)) (expression vec3 - (var_ref edge1) (var_ref edge0)))
-                                      (constant vec3 (1.0 1.0 1.0)))
-                          (constant vec3 (0.0 0.0 0.0))))))
+                                      (constant float (1.0)))
+                          (constant float (0.0))))
+      (return (expression vec3 * (var_ref t) (expression vec3 * (var_ref t) (expression vec3 - (constant float (3.0)) (expression vec3 * (constant float (2.0)) (var_ref t))))))))

   (signature vec4
     (parameters
       (declare (in) vec4 edge0)
       (declare (in) vec4 edge1)
       (declare (in) vec4 x))
-     ((return (expression vec4 max
+     ((declare () vec4 t)
+      (assign (constant bool (1)) (xyzw) (var_ref t)
+              (expression vec4 max
                          (expression vec4 min
                                      (expression vec4 / (expression vec4 - (var_ref x) (var_ref edge0)) (expression vec4 - (var_ref edge1) (var_ref edge0)))
-                                      (constant vec4 (1.0 1.0 1.0 1.0)))
-                          (constant vec4 (0.0 0.0 0.0 0.0))))))
+                                      (constant float (1.0)))
+                          (constant float (0.0))))
+      (return (expression vec4 * (var_ref t) (expression vec4 * (var_ref t) (expression vec4 - (constant float (3.0)) (expression vec4 * (constant float (2.0)) (var_ref t))))))))
 ))

--- a/src/glsl/glsl_lexer.cpp
+++ b/src/glsl/glsl_lexer.cpp
--- a/src/glsl/glsl_lexer.lpp
+++ b/src/glsl/glsl_lexer.lpp
@@ -22,6 +22,7 @@
 * DEALINGS IN THE SOFTWARE.
 */
 #include <ctype.h>
+#include "strtod.h"
 #include "ast.h"
 #include "glsl_parser_extras.h"
 #include "glsl_parser.h"
@@ -293,23 +294,23 @@ layout		{
 			}

 [0-9]+\.[0-9]+([eE][+-]?[0-9]+)?[fF]?	{
-			    yylval->real = strtod(yytext, NULL);
+			    yylval->real = glsl_strtod(yytext, NULL);
 			    return FLOATCONSTANT;
 			}
 \.[0-9]+([eE][+-]?[0-9]+)?[fF]?		{
-			    yylval->real = strtod(yytext, NULL);
+			    yylval->real = glsl_strtod(yytext, NULL);
 			    return FLOATCONSTANT;
 			}
 [0-9]+\.([eE][+-]?[0-9]+)?[fF]?		{
-			    yylval->real = strtod(yytext, NULL);
+			    yylval->real = glsl_strtod(yytext, NULL);
 			    return FLOATCONSTANT;
 			}
 [0-9]+[eE][+-]?[0-9]+[fF]?		{
-			    yylval->real = strtod(yytext, NULL);
+			    yylval->real = glsl_strtod(yytext, NULL);
 			    return FLOATCONSTANT;
 			}
 [0-9]+[fF]		{
-			    yylval->real = strtod(yytext, NULL);
+			    yylval->real = glsl_strtod(yytext, NULL);
 			    return FLOATCONSTANT;
 			}

--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -53,7 +53,7 @@ bool do_lower_jumps(exec_list *instructions, bool pull_out_jumps = true, bool lo
 bool do_lower_texture_projection(exec_list *instructions);
 bool do_if_simplification(exec_list *instructions);
 bool do_discard_simplification(exec_list *instructions);
-bool do_if_to_cond_assign(exec_list *instructions);
+bool lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth = 0);
 bool do_mat_op_to_vec(exec_list *instructions);
 bool do_mod_to_fract(exec_list *instructions);
 bool do_noop_swizzle(exec_list *instructions);
--- a/src/glsl/ir_set_program_inouts.cpp
+++ b/src/glsl/ir_set_program_inouts.cpp
@@ -66,7 +66,7 @@ public:
 };

 static void
-mark(struct gl_program *prog, ir_variable *var, int index)
+mark(struct gl_program *prog, ir_variable *var, int offset, int len)
 {
   /* As of GLSL 1.20, varyings can only be floats, floating-point
    * vectors or matrices, or arrays of them.  For Mesa programs using
@@ -75,25 +75,12 @@ mark(struct gl_program *prog, ir_variable *var, int index)
    * something doing a more clever packing would use something other
    * than InputsRead/OutputsWritten.
    */
-   const glsl_type *element_type;
-   int element_size;

-   if (var->type->is_array())
-      element_type = var->type->fields.array;
-   else
-      element_type = var->type;
-
-   if (element_type->is_matrix())
-      element_size = element_type->matrix_columns;
-   else
-      element_size = 1;
-
-   index *= element_size;
-   for (int i = 0; i < element_size; i++) {
+   for (int i = 0; i < len; i++) {
      if (var->mode == ir_var_in)
-	 prog->InputsRead |= BITFIELD64_BIT(var->location + index + i);
+	 prog->InputsRead |= BITFIELD64_BIT(var->location + offset + i);
      else
-	 prog->OutputsWritten |= BITFIELD64_BIT(var->location + index + i);
+	 prog->OutputsWritten |= BITFIELD64_BIT(var->location + offset + i);
   }
 }

@@ -106,10 +93,11 @@ ir_set_program_inouts_visitor::visit(ir_dereference_variable *ir)

   if (ir->type->is_array()) {
      for (unsigned int i = 0; i < ir->type->length; i++) {
-	 mark(this->prog, ir->var, i);
+	 mark(this->prog, ir->var, i,
+	      ir->type->length * ir->type->fields.array->matrix_columns);
      }
   } else {
-      mark(this->prog, ir->var, 0);
+      mark(this->prog, ir->var, 0, ir->type->matrix_columns);
   }

   return visit_continue;
@@ -128,7 +116,14 @@ ir_set_program_inouts_visitor::visit_enter(ir_dereference_array *ir)
      var = (ir_variable *)hash_table_find(this->ht, deref_var->var);

   if (index && var) {
-      mark(this->prog, var, index->value.i[0]);
+      int width = 1;
+
+      if (deref_var->type->is_array() &&
+	  deref_var->type->fields.array->is_matrix()) {
+	 width = deref_var->type->fields.array->matrix_columns;
+      }
+
+      mark(this->prog, var, index->value.i[0] * width, width);
      return visit_continue_with_parent;
   }

--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -487,14 +487,35 @@ cross_validate_outputs_to_inputs(struct gl_shader_program *prog,
 	 /* Check that the types match between stages.
 	  */
 	 if (input->type != output->type) {
-	    linker_error_printf(prog,
-				"%s shader output `%s' declared as "
-				"type `%s', but %s shader input declared "
-				"as type `%s'\n",
-				producer_stage, output->name,
-				output->type->name,
-				consumer_stage, input->type->name);
-	    return false;
+	    /* There is a bit of a special case for gl_TexCoord.  This
+	     * built-in is unsized by default.  Appliations that variable
+	     * access it must redeclare it with a size.  There is some
+	     * language in the GLSL spec that implies the fragment shader
+	     * and vertex shader do not have to agree on this size.  Other
+	     * driver behave this way, and one or two applications seem to
+	     * rely on it.
+	     *
+	     * Neither declaration needs to be modified here because the array
+	     * sizes are fixed later when update_array_sizes is called.
+	     *
+	     * From page 48 (page 54 of the PDF) of the GLSL 1.10 spec:
+	     *
+	     *     "Unlike user-defined varying variables, the built-in
+	     *     varying variables don't have a strict one-to-one
+	     *     correspondence between the vertex language and the
+	     *     fragment language."
+	     */
+	    if (!output->type->is_array()
+		|| (strncmp("gl_", output->name, 3) != 0)) {
+	       linker_error_printf(prog,
+				   "%s shader output `%s' declared as "
+				   "type `%s', but %s shader input declared "
+				   "as type `%s'\n",
+				   producer_stage, output->name,
+				   output->type->name,
+				   consumer_stage, input->type->name);
+	       return false;
+	    }
 	 }

 	 /* Check that all of the qualifiers match between stages.
--- a/src/glsl/loop_unroll.cpp
+++ b/src/glsl/loop_unroll.cpp
@@ -43,6 +43,14 @@ public:
 };


+static bool
+is_break(ir_instruction *ir)
+{
+   return ir != NULL && ir->ir_type == ir_type_loop_jump
+		     && ((ir_loop_jump *) ir)->is_break();
+}
+
+
 ir_visitor_status
 loop_unroll_visitor::visit_leave(ir_loop *ir)
 {
@@ -73,44 +81,74 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
   if (ls->num_loop_jumps > 1)
      return visit_continue;
   else if (ls->num_loop_jumps) {
-      /* recognize loops in the form produced by ir_lower_jumps */
-      ir_instruction *last_ir =
-	 ((ir_instruction*)ir->body_instructions.get_tail());
-
+      ir_instruction *last_ir = (ir_instruction *) ir->body_instructions.get_tail();
      assert(last_ir != NULL);

-      ir_if *last_if = last_ir->as_if();
-      if (last_if) {
-	 bool continue_from_then_branch;
+      if (is_break(last_ir)) {
+         /* If the only loop-jump is a break at the end of the loop, the loop
+          * will execute exactly once.  Remove the break, set the iteration
+          * count, and fall through to the normal unroller.
+          */
+         last_ir->remove();
+         iterations = 1;

-	 /* Determine which if-statement branch, if any, ends with a break.
-	  * The branch that did *not* have the break will get a temporary
-	  * continue inserted in each iteration of the loop unroll.
-	  *
-	  * Note that since ls->num_loop_jumps is <= 1, it is impossible for
-	  * both branches to end with a break.
-	  */
-	 ir_instruction *last =
-	    (ir_instruction *) last_if->then_instructions.get_tail();
+         this->progress = true;
+      } else {
+         ir_if *ir_if = NULL;
+         ir_instruction *break_ir = NULL;
+         bool continue_from_then_branch = false;

-	 if (last && last->ir_type == ir_type_loop_jump
-	     && ((ir_loop_jump*) last)->is_break()) {
-	    continue_from_then_branch = false;
-	 } else {
-	    last = (ir_instruction *) last_if->then_instructions.get_tail();
+         foreach_list(node, &ir->body_instructions) {
+            /* recognize loops in the form produced by ir_lower_jumps */
+            ir_instruction *cur_ir = (ir_instruction *) node;

-	    if (last && last->ir_type == ir_type_loop_jump
-		&& ((ir_loop_jump*) last)->is_break())
-	       continue_from_then_branch = true;
-	    else
-	       /* Bail out if neither if-statement branch ends with a break.
+            ir_if = cur_ir->as_if();
+            if (ir_if != NULL) {
+	       /* Determine which if-statement branch, if any, ends with a
+		* break.  The branch that did *not* have the break will get a
+		* temporary continue inserted in each iteration of the loop
+		* unroll.
+		*
+		* Note that since ls->num_loop_jumps is <= 1, it is impossible
+		* for both branches to end with a break.
 		*/
-	       return visit_continue;
-	 }
+               ir_instruction *ir_if_last =
+                  (ir_instruction *) ir_if->then_instructions.get_tail();

-	 /* Remove the break from the if-statement.
-	  */
-	 last->remove();
+               if (is_break(ir_if_last)) {
+                  continue_from_then_branch = false;
+                  break_ir = ir_if_last;
+                  break;
+               } else {
+                  ir_if_last =
+		     (ir_instruction *) ir_if->else_instructions.get_tail();
+
+                  if (is_break(ir_if_last)) {
+                     break_ir = ir_if_last;
+                     continue_from_then_branch = true;
+                     break;
+                  }
+               }
+            }
+         }
+
+         if (break_ir == NULL)
+            return visit_continue;
+
+         /* move instructions after then if in the continue branch */
+         while (!ir_if->get_next()->is_tail_sentinel()) {
+            ir_instruction *move_ir = (ir_instruction *) ir_if->get_next();
+
+            move_ir->remove();
+            if (continue_from_then_branch)
+               ir_if->then_instructions.push_tail(move_ir);
+            else
+               ir_if->else_instructions.push_tail(move_ir);
+         }
+
+         /* Remove the break from the if-statement.
+          */
+         break_ir->remove();

         void *const mem_ctx = talloc_parent(ir);
         ir_instruction *ir_to_replace = ir;
@@ -121,8 +159,8 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
            copy_list.make_empty();
            clone_ir_list(mem_ctx, &copy_list, &ir->body_instructions);

-            last_if = ((ir_instruction*)copy_list.get_tail())->as_if();
-            assert(last_if);
+            ir_if = ((ir_instruction *) copy_list.get_tail())->as_if();
+            assert(ir_if != NULL);

            ir_to_replace->insert_before(&copy_list);
            ir_to_replace->remove();
@@ -132,7 +170,7 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)
 	       new(mem_ctx) ir_loop_jump(ir_loop_jump::jump_continue);

            exec_list *const list = (continue_from_then_branch)
-	       ? &last_if->then_instructions : &last_if->else_instructions;
+               ? &ir_if->then_instructions : &ir_if->else_instructions;

            list->push_tail(ir_to_replace);
         }
@@ -141,18 +179,7 @@ loop_unroll_visitor::visit_leave(ir_loop *ir)

         this->progress = true;
         return visit_continue;
-      } else if (last_ir->ir_type == ir_type_loop_jump
-		 && ((ir_loop_jump *)last_ir)->is_break()) {
-	 /* If the only loop-jump is a break at the end of the loop, the loop
-	  * will execute exactly once.  Remove the break, set the iteration
-	  * count, and fall through to the normal unroller.
-	  */
-         last_ir->remove();
-	 iterations = 1;
-
-	 this->progress = true;
-      } else
-         return visit_continue;
+      }
   }

   void *const mem_ctx = talloc_parent(ir);
--- a/src/glsl/lower_if_to_cond_assign.cpp
+++ b/src/glsl/lower_if_to_cond_assign.cpp
@@ -24,12 +24,25 @@
 /**
 * \file lower_if_to_cond_assign.cpp
 *
- * This attempts to flatten all if statements to conditional
- * assignments for GPUs that don't do control flow.
+ * This attempts to flatten if-statements to conditional assignments for
+ * GPUs with limited or no flow control support.
 *
 * It can't handle other control flow being inside of its block, such
 * as calls or loops.  Hopefully loop unrolling and inlining will take
 * care of those.
+ *
+ * Drivers for GPUs with no control flow support should simply call
+ *
+ *    lower_if_to_cond_assign(instructions)
+ *
+ * to attempt to flatten all if-statements.
+ *
+ * Some GPUs (such as i965 prior to gen6) do support control flow, but have a
+ * maximum nesting depth N.  Drivers for such hardware can call
+ *
+ *    lower_if_to_cond_assign(instructions, N)
+ *
+ * to attempt to flatten any if-statements appearing at depth > N.
 */

 #include "glsl_types.h"
@@ -37,20 +50,25 @@

 class ir_if_to_cond_assign_visitor : public ir_hierarchical_visitor {
 public:
-   ir_if_to_cond_assign_visitor()
+   ir_if_to_cond_assign_visitor(unsigned max_depth)
   {
      this->progress = false;
+      this->max_depth = max_depth;
+      this->depth = 0;
   }

+   ir_visitor_status visit_enter(ir_if *);
   ir_visitor_status visit_leave(ir_if *);

   bool progress;
+   unsigned max_depth;
+   unsigned depth;
 };

 bool
-do_if_to_cond_assign(exec_list *instructions)
+lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth)
 {
-   ir_if_to_cond_assign_visitor v;
+   ir_if_to_cond_assign_visitor v(max_depth);

   visit_list_elements(&v, instructions);

@@ -119,9 +137,23 @@ move_block_to_cond_assign(void *mem_ctx,
   }
 }

+ir_visitor_status
+ir_if_to_cond_assign_visitor::visit_enter(ir_if *ir)
+{
+   (void) ir;
+   this->depth++;
+   return visit_continue;
+}
+
 ir_visitor_status
 ir_if_to_cond_assign_visitor::visit_leave(ir_if *ir)
 {
+   /* Only flatten when beyond the GPU's maximum supported nesting depth. */
+   if (this->depth <= this->max_depth)
+      return visit_continue;
+
+   this->depth--;
+
   bool found_control_flow = false;
   ir_variable *cond_var;
   ir_assignment *assign;
--- a/src/glsl/lower_jumps.cpp
+++ b/src/glsl/lower_jumps.cpp
@@ -512,7 +512,11 @@ lower_continue:
      if(this->loop.may_set_return_flag) {
         assert(this->function.return_flag);
         ir_if* return_if = new(ir) ir_if(new(ir) ir_dereference_variable(this->function.return_flag));
-         return_if->then_instructions.push_tail(new(ir) ir_loop_jump(saved_loop.loop ? ir_loop_jump::jump_break : ir_loop_jump::jump_continue));
+         saved_loop.may_set_return_flag = true;
+         if(saved_loop.loop)
+            return_if->then_instructions.push_tail(new(ir) ir_loop_jump(ir_loop_jump::jump_break));
+         else
+            move_outer_block_inside(ir, &return_if->else_instructions);
         ir->insert_after(return_if);
      }

--- a/src/glsl/s_expression.cpp
+++ b/src/glsl/s_expression.cpp
@@ -62,7 +62,7 @@ read_atom(void *ctx, const char *& src)

   // Check if the atom is a number.
   char *float_end = NULL;
-   double f = strtod(src, &float_end);
+   double f = glsl_strtod(src, &float_end);
   if (float_end != src) {
      char *int_end = NULL;
      int i = strtol(src, &int_end, 10);
--- a/src/glsl/s_expression.h
+++ b/src/glsl/s_expression.h
@@ -26,6 +26,7 @@
 #ifndef S_EXPRESSION_H
 #define S_EXPRESSION_H

+#include "strtod.h"
 #include "list.h"

 #define SX_AS_(t,x) ((x) && ((s_expression*) x)->is_##t()) ? ((s_##t*) (x)) \
--- a/src/glsl/strtod.c
+++ b/src/glsl/strtod.c
@@ -0,0 +1,56 @@
+/*
+ * Copyright 2010 VMware, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include <stdlib.h>
+
+#ifdef _GNU_SOURCE
+#include <locale.h>
+#ifdef __APPLE__
+#include <xlocale.h>
+#endif
+#endif
+
+#include "strtod.h"
+
+
+
+/**
+ * Wrapper around strtod which uses the "C" locale so the decimal
+ * point is always '.'
+ */
+double
+glsl_strtod(const char *s, char **end)
+{
+#if defined(_GNU_SOURCE) && !defined(__CYGWIN__) && !defined(__FreeBSD__)
+   static locale_t loc = NULL;
+   if (!loc) {
+      loc = newlocale(LC_CTYPE_MASK, "C", NULL);
+   }
+   return strtod_l(s, end, loc);
+#else
+   return strtod(s, end);
+#endif
+}
--- a/src/glsl/strtod.h
+++ b/src/glsl/strtod.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright 2010 VMware, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#ifndef STRTOD_H
+#define STRTOD_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+extern double
+glsl_strtod(const char *s, char **end);
+
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif
--- a/src/mesa/drivers/common/meta.c
+++ b/src/mesa/drivers/common/meta.c
@@ -266,13 +266,16 @@ struct gen_mipmap_state
   GLuint FBO;
 };

-
+#define MAX_META_OPS_DEPTH      2
 /**
 * All per-context meta state.
 */
 struct gl_meta_state
 {
-   struct save_state Save;    /**< state saved during meta-ops */
+   /** Stack of state saved during meta-ops */
+   struct save_state Save[MAX_META_OPS_DEPTH];
+   /** Save stack depth */
+   GLuint SaveStackDepth;

   struct temp_texture TempTex;

@@ -324,8 +327,13 @@ _mesa_meta_free(struct gl_context *ctx)
 static void
 _mesa_meta_begin(struct gl_context *ctx, GLbitfield state)
 {
-   struct save_state *save = &ctx->Meta->Save;
+   struct save_state *save;

+   /* hope MAX_META_OPS_DEPTH is large enough */
+   assert(ctx->Meta->SaveStackDepth < MAX_META_OPS_DEPTH);
+
+   save = &ctx->Meta->Save[ctx->Meta->SaveStackDepth++];
+   memset(save, 0, sizeof(*save));
   save->SavedState = state;

   if (state & META_ALPHA_TEST) {
@@ -575,7 +583,7 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield state)
 static void
 _mesa_meta_end(struct gl_context *ctx)
 {
-   struct save_state *save = &ctx->Meta->Save;
+   struct save_state *save = &ctx->Meta->Save[--ctx->Meta->SaveStackDepth];
   const GLbitfield state = save->SavedState;

   if (state & META_ALPHA_TEST) {
--- a/src/mesa/drivers/dri/common/spantmp2.h
+++ b/src/mesa/drivers/dri/common/spantmp2.h
@@ -48,6 +48,15 @@
 #define HW_WRITE_CLIPLOOP()	HW_CLIPLOOP()
 #endif

+#ifdef SPANTMP_MESA_FMT
+#define SPANTMP_PIXEL_FMT GL_NONE
+#define SPANTMP_PIXEL_TYPE GL_NONE
+#endif
+
+#ifndef SPANTMP_MESA_FMT
+#define SPANTMP_MESA_FMT MESA_FORMAT_COUNT
+#endif
+
 #if (SPANTMP_PIXEL_FMT == GL_RGB)  && (SPANTMP_PIXEL_TYPE == GL_UNSIGNED_SHORT_5_6_5)

 /**
@@ -445,6 +454,118 @@
 	rgba[3] = p;							\
     } while (0)

+#elif (SPANTMP_MESA_FMT == MESA_FORMAT_R8)
+
+#ifndef GET_VALUE
+#ifndef GET_PTR
+#define GET_PTR(_x, _y) (     buf + (_x) + (_y) * pitch)
+#endif
+
+#define GET_VALUE(_x, _y) *(volatile GLubyte *)(GET_PTR(_x, _y))
+#define PUT_VALUE(_x, _y, _v) *(volatile GLubyte *)(GET_PTR(_x, _y)) = (_v)
+#endif /* GET_VALUE */
+
+# define INIT_MONO_PIXEL(p, color)                       \
+     p = color[0]
+
+# define WRITE_RGBA(_x, _y, r, g, b, a)                                 \
+   PUT_VALUE(_x, _y, r)
+
+#define WRITE_PIXEL(_x, _y, p) PUT_VALUE(_x, _y, p)
+
+#define READ_RGBA( rgba, _x, _y )				        \
+     do {								\
+        GLubyte p = GET_VALUE(_x, _y);					\
+	rgba[0] = p;							\
+	rgba[1] = 0;							\
+	rgba[2] = 0;							\
+	rgba[3] = 0;							\
+     } while (0)
+
+#elif (SPANTMP_MESA_FMT == MESA_FORMAT_RG88)
+
+#ifndef GET_VALUE
+#ifndef GET_PTR
+#define GET_PTR(_x, _y) (     buf + (_x) * 2 + (_y) * pitch)
+#endif
+
+#define GET_VALUE(_x, _y) *(volatile GLushort *)(GET_PTR(_x, _y))
+#define PUT_VALUE(_x, _y, _v) *(volatile GLushort *)(GET_PTR(_x, _y)) = (_v)
+#endif /* GET_VALUE */
+
+# define INIT_MONO_PIXEL(p, color)                       \
+   PACK_COLOR_8888(color[0], color[1], 0, 0)
+
+# define WRITE_RGBA(_x, _y, r, g, b, a)                                 \
+   PUT_VALUE(_x, _y, r)
+
+#define WRITE_PIXEL(_x, _y, p) PUT_VALUE(_x, _y, p)
+
+#define READ_RGBA( rgba, _x, _y )				        \
+     do {								\
+        GLushort p = GET_VALUE(_x, _y);					\
+	rgba[0] = p & 0xff;						\
+	rgba[1] = (p >> 8) & 0xff;					\
+	rgba[2] = 0;							\
+	rgba[3] = 0;							\
+     } while (0)
+
+#elif (SPANTMP_MESA_FMT == MESA_FORMAT_R16)
+
+#ifndef GET_VALUE
+#ifndef GET_PTR
+#define GET_PTR(_x, _y) (     buf + (_x) * 2 + (_y) * pitch)
+#endif
+
+#define GET_VALUE(_x, _y) *(volatile GLushort *)(GET_PTR(_x, _y))
+#define PUT_VALUE(_x, _y, _v) *(volatile GLushort *)(GET_PTR(_x, _y)) = (_v)
+#endif /* GET_VALUE */
+
+# define INIT_MONO_PIXEL(p, color)                       \
+     p = color[0]
+
+# define WRITE_RGBA(_x, _y, r, g, b, a)                                 \
+   PUT_VALUE(_x, _y, r)
+
+#define WRITE_PIXEL(_x, _y, p) PUT_VALUE(_x, _y, p)
+
+#define READ_RGBA( rgba, _x, _y )				        \
+     do {								\
+        GLushort p = GET_VALUE(_x, _y);					\
+	rgba[0] = p;							\
+	rgba[1] = 0;							\
+	rgba[2] = 0;							\
+	rgba[3] = 0;							\
+     } while (0)
+
+#elif (SPANTMP_MESA_FMT == MESA_FORMAT_RG1616)
+
+#ifndef GET_VALUE
+#ifndef GET_PTR
+#define GET_PTR(_x, _y) (     buf + (_x) * 4 + (_y) * pitch)
+#endif
+
+#define GET_VALUE(_x, _y) *(volatile GLuint *)(GET_PTR(_x, _y))
+#define PUT_VALUE(_x, _y, _v) *(volatile GLuint *)(GET_PTR(_x, _y)) = (_v)
+#endif /* GET_VALUE */
+
+# define INIT_MONO_PIXEL(p, color)                       \
+   ((color[1] << 16) | (color[0]))
+
+# define WRITE_RGBA(_x, _y, r, g, b, a)                                 \
+   PUT_VALUE(_x, _y, r)
+
+#define WRITE_PIXEL(_x, _y, p) PUT_VALUE(_x, _y, p)
+
+#define READ_RGBA( rgba, _x, _y )				        \
+     do {								\
+        GLuint p = GET_VALUE(_x, _y);					\
+	rgba[0] = p & 0xffff;						\
+	rgba[1] = (p >> 16) & 0xffff;					\
+	rgba[2] = 0;							\
+	rgba[3] = 0;							\
+     } while (0)
+
 #else
 #error SPANTMP_PIXEL_FMT must be set to a valid value!
 #endif
@@ -914,3 +1035,4 @@ static void TAG(InitPointers)(struct gl_renderbuffer *rb)
 #undef GET_PTR
 #undef SPANTMP_PIXEL_FMT
 #undef SPANTMP_PIXEL_TYPE
+#undef SPANTMP_MESA_FMT
--- a/src/mesa/drivers/dri/i915/i830_vtbl.c
+++ b/src/mesa/drivers/dri/i915/i830_vtbl.c
@@ -364,7 +364,7 @@ i830_emit_invarient_state(struct intel_context *intel)


 #define emit( intel, state, size )			\
-   intel_batchbuffer_data(intel->batch, state, size )
+   intel_batchbuffer_data(intel->batch, state, size, false)

 static GLuint
 get_dirty(struct i830_hw_state *state)
@@ -429,7 +429,8 @@ i830_emit_state(struct intel_context *intel)
    * batchbuffer fills up.
    */
   intel_batchbuffer_require_space(intel->batch,
-				   get_state_size(state) + INTEL_PRIM_EMIT_SIZE);
+				   get_state_size(state) + INTEL_PRIM_EMIT_SIZE,
+				   false);
   count = 0;
 again:
   aper_count = 0;
--- a/src/mesa/drivers/dri/i915/i915_vtbl.c
+++ b/src/mesa/drivers/dri/i915/i915_vtbl.c
@@ -217,7 +217,7 @@ i915_emit_invarient_state(struct intel_context *intel)


 #define emit(intel, state, size )		     \
-   intel_batchbuffer_data(intel->batch, state, size)
+   intel_batchbuffer_data(intel->batch, state, size, false)

 static GLuint
 get_dirty(struct i915_hw_state *state)
@@ -300,7 +300,8 @@ i915_emit_state(struct intel_context *intel)
    * batchbuffer fills up.
    */
   intel_batchbuffer_require_space(intel->batch,
-				   get_state_size(state) + INTEL_PRIM_EMIT_SIZE);
+				   get_state_size(state) + INTEL_PRIM_EMIT_SIZE,
+				   false);
   count = 0;
 again:
   aper_count = 0;
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -151,6 +151,13 @@ GLboolean brwCreateContext( int api,
      MIN2(ctx->Const.FragmentProgram.MaxNativeParameters,
 	   ctx->Const.FragmentProgram.MaxEnvParams);

+   /* Gen6 converts quads to polygon in beginning of 3D pipeline,
+      but we're not sure how it's actually done for vertex order,
+      that affect provoking vertex decision. Always use last vertex
+      convention for quad primitive which works as expected for now. */
+   if (intel->gen == 6)
+       ctx->Const.QuadsFollowProvokingVertexConvention = GL_FALSE;
+
   if (intel->is_g4x || intel->gen >= 5) {
      brw->CMD_VF_STATISTICS = CMD_VF_STATISTICS_GM45;
      brw->CMD_PIPELINE_SELECT = CMD_PIPELINE_SELECT_GM45;
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -906,6 +906,8 @@
 # define GEN6_VS_VECTOR_MASK_ENABLE			(1 << 30)
 # define GEN6_VS_SAMPLER_COUNT_SHIFT			27
 # define GEN6_VS_BINDING_TABLE_ENTRY_COUNT_SHIFT	18
+# define GEN6_VS_FLOATING_POINT_MODE_IEEE_754		(0 << 16)
+# define GEN6_VS_FLOATING_POINT_MODE_ALT		(1 << 16)
 /* DW4 */
 # define GEN6_VS_DISPATCH_START_GRF_SHIFT		20
 # define GEN6_VS_URB_READ_LENGTH_SHIFT			11
@@ -1029,6 +1031,13 @@
 # define ATTRIBUTE_0_CONST_SOURCE_SHIFT			9
 # define ATTRIBUTE_0_SWIZZLE_SHIFT			6
 # define ATTRIBUTE_0_SOURCE_SHIFT			0
+
+# define ATTRIBUTE_SWIZZLE_INPUTATTR                    0
+# define ATTRIBUTE_SWIZZLE_INPUTATTR_FACING             1
+# define ATTRIBUTE_SWIZZLE_INPUTATTR_W                  2
+# define ATTRIBUTE_SWIZZLE_INPUTATTR_FACING_W           3
+# define ATTRIBUTE_SWIZZLE_SHIFT                        6
+
 /* DW16: Point sprite texture coordinate enables */
 /* DW17: Constant interpolation enables */
 /* DW18: attr 0-7 wrap shortest enables */
@@ -1041,6 +1050,8 @@
 # define GEN6_WM_VECTOR_MASK_ENABLE			(1 << 30)
 # define GEN6_WM_SAMPLER_COUNT_SHIFT			27
 # define GEN6_WM_BINDING_TABLE_ENTRY_COUNT_SHIFT	18
+# define GEN6_WM_FLOATING_POINT_MODE_IEEE_754		(0 << 16)
+# define GEN6_WM_FLOATING_POINT_MODE_ALT		(1 << 16)
 /* DW3: scratch space */
 /* DW4 */
 # define GEN6_WM_STATISTICS_ENABLE			(1 << 31)
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -973,7 +973,7 @@ int brw_disasm (FILE *file, struct brw_instruction *inst, int gen)
 			inst->bits3.dp_render_cache.send_commit_msg,
 			inst->bits3.dp_render_cache.msg_length,
 			inst->bits3.dp_render_cache.response_length);
-	    } else if (gen >= 5) {
+	    } else if (gen >= 5 /* FINISHME: || is_g4x */) {
 		format (file, " (%d, %d, %d)",
 			inst->bits3.dp_read_gen5.binding_table_index,
 			inst->bits3.dp_read_gen5.msg_control,
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -159,7 +159,7 @@ static void brw_emit_prim(struct brw_context *brw,
   }
   if (prim_packet.verts_per_instance) {
      intel_batchbuffer_data( brw->intel.batch, &prim_packet,
-			      sizeof(prim_packet));
+			      sizeof(prim_packet), false);
   }
   if (intel->always_flush_cache) {
      intel_batchbuffer_emit_mi_flush(intel->batch);
@@ -351,7 +351,8 @@ static GLboolean brw_try_draw_prims( struct gl_context *ctx,
       * an upper bound of how much we might emit in a single
       * brw_try_draw_prims().
       */
-      intel_batchbuffer_require_space(intel->batch, intel->batch->size / 4);
+      intel_batchbuffer_require_space(intel->batch, intel->batch->size / 4,
+				      false);

      hw_prim = brw_set_prim(brw, &prim[i]);

--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -861,7 +861,8 @@ void brw_fb_WRITE(struct brw_compile *p,
 		   GLuint binding_table_index,
 		   GLuint msg_length,
 		   GLuint response_length,
-		   GLboolean eot);
+		   GLboolean eot,
+		   GLboolean header_present);

 void brw_SAMPLE(struct brw_compile *p,
 		struct brw_reg dest,
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -536,6 +536,16 @@ brw_set_dp_read_message(struct brw_context *brw,
       insn->bits3.dp_read_gen5.end_of_thread = 0;
       insn->bits2.send_gen5.sfid = BRW_MESSAGE_TARGET_DATAPORT_READ;
       insn->bits2.send_gen5.end_of_thread = 0;
+   } else if (intel->is_g4x) {
+       insn->bits3.dp_read_g4x.binding_table_index = binding_table_index; /*0:7*/
+       insn->bits3.dp_read_g4x.msg_control = msg_control;  /*8:10*/
+       insn->bits3.dp_read_g4x.msg_type = msg_type;  /*11:13*/
+       insn->bits3.dp_read_g4x.target_cache = target_cache;  /*14:15*/
+       insn->bits3.dp_read_g4x.response_length = response_length;  /*16:19*/
+       insn->bits3.dp_read_g4x.msg_length = msg_length;  /*20:23*/
+       insn->bits3.dp_read_g4x.msg_target = BRW_MESSAGE_TARGET_DATAPORT_READ; /*24:27*/
+       insn->bits3.dp_read_g4x.pad1 = 0;
+       insn->bits3.dp_read_g4x.end_of_thread = 0;
   } else {
       insn->bits3.dp_read.binding_table_index = binding_table_index; /*0:7*/
       insn->bits3.dp_read.msg_control = msg_control;  /*8:11*/
@@ -1708,29 +1718,22 @@ void brw_dp_READ_4_vs(struct brw_compile *p,
                      GLuint location,
                      GLuint bind_table_index)
 {
+   struct intel_context *intel = &p->brw->intel;
   struct brw_instruction *insn;
   GLuint msg_reg_nr = 1;
-   struct brw_reg b;

-   /*
-   printf("vs const read msg, location %u, msg_reg_nr %d\n",
-          location, msg_reg_nr);
-   */
+   if (intel->gen >= 6)
+      location /= 16;

   /* Setup MRF[1] with location/offset into const buffer */
   brw_push_insn_state(p);
+   brw_set_access_mode(p, BRW_ALIGN_1);
   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
   brw_set_mask_control(p, BRW_MASK_DISABLE);
   brw_set_predicate_control(p, BRW_PREDICATE_NONE);
-
-   /* XXX I think we're setting all the dwords of MRF[1] to 'location'.
-    * when the docs say only dword[2] should be set.  Hmmm.  But it works.
-    */
-   b = brw_message_reg(msg_reg_nr);
-   b = retype(b, BRW_REGISTER_TYPE_UD);
-   /*b = get_element_ud(b, 2);*/
-   brw_MOV(p, b, brw_imm_ud(location));
-
+   brw_MOV(p, retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE, msg_reg_nr, 2),
+		     BRW_REGISTER_TYPE_UD),
+	   brw_imm_ud(location));
   brw_pop_insn_state(p);

   insn = next_insn(p, BRW_OPCODE_SEND);
@@ -1741,7 +1744,11 @@ void brw_dp_READ_4_vs(struct brw_compile *p,
   insn->header.mask_control = BRW_MASK_DISABLE;

   brw_set_dest(p, insn, dest);
-   brw_set_src0(insn, brw_null_reg());
+   if (intel->gen >= 6) {
+      brw_set_src0(insn, brw_message_reg(msg_reg_nr));
+   } else {
+      brw_set_src0(insn, brw_null_reg());
+   }

   brw_set_dp_read_message(p->brw,
 			   insn,
@@ -1768,6 +1775,7 @@ void brw_dp_READ_4_vs_relative(struct brw_compile *p,

   /* Setup MRF[1] with offset into const buffer */
   brw_push_insn_state(p);
+   brw_set_access_mode(p, BRW_ALIGN_1);
   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
   brw_set_mask_control(p, BRW_MASK_DISABLE);
   brw_set_predicate_control(p, BRW_PREDICATE_NONE);
@@ -1775,7 +1783,7 @@ void brw_dp_READ_4_vs_relative(struct brw_compile *p,
   /* M1.0 is block offset 0, M1.4 is block offset 1, all other
    * fields ignored.
    */
-   brw_ADD(p, retype(brw_message_reg(1), BRW_REGISTER_TYPE_UD),
+   brw_ADD(p, retype(brw_message_reg(1), BRW_REGISTER_TYPE_D),
 	   addr_reg, brw_imm_d(offset));
   brw_pop_insn_state(p);

@@ -1816,12 +1824,12 @@ void brw_fb_WRITE(struct brw_compile *p,
                  GLuint binding_table_index,
                  GLuint msg_length,
                  GLuint response_length,
-                  GLboolean eot)
+                  GLboolean eot,
+                  GLboolean header_present)
 {
   struct intel_context *intel = &p->brw->intel;
   struct brw_instruction *insn;
   GLuint msg_control, msg_type;
-   GLboolean header_present = GL_TRUE;

   if (intel->gen >= 6 && binding_table_index == 0) {
      insn = next_insn(p, BRW_OPCODE_SENDC);
@@ -1833,9 +1841,6 @@ void brw_fb_WRITE(struct brw_compile *p,
   insn->header.compression_control = BRW_COMPRESSION_NONE;

   if (intel->gen >= 6) {
-      if (msg_length == 4)
-	 header_present = GL_FALSE;
-
       /* headerless version, just submit color payload */
       src0 = brw_message_reg(msg_reg_nr);

@@ -1940,7 +1945,8 @@ void brw_SAMPLE(struct brw_compile *p,
 	 brw_set_compression_control(p, BRW_COMPRESSION_NONE);
 	 brw_set_mask_control(p, BRW_MASK_DISABLE);

-	 brw_MOV(p, m1, brw_vec8_grf(0,0));	 
+	 brw_MOV(p, retype(m1, BRW_REGISTER_TYPE_UD),
+		 retype(brw_vec8_grf(0,0), BRW_REGISTER_TYPE_UD));
  	 brw_MOV(p, get_element_ud(m1, 2), brw_imm_ud(newmask << 12)); 

 	 brw_pop_insn_state(p);
@@ -2001,7 +2007,8 @@ void brw_SAMPLE(struct brw_compile *p,
       */
      brw_push_insn_state(p);
      brw_set_compression_control(p, BRW_COMPRESSION_NONE);
-      brw_MOV(p, reg, reg);	      
+      brw_MOV(p, retype(reg, BRW_REGISTER_TYPE_UD),
+	      retype(reg, BRW_REGISTER_TYPE_UD));
      brw_pop_insn_state(p);
   }

@@ -2033,7 +2040,8 @@ void brw_urb_WRITE(struct brw_compile *p,
   if (intel->gen >= 6) {
      brw_push_insn_state(p);
      brw_set_mask_control( p, BRW_MASK_DISABLE );
-      brw_MOV(p, brw_message_reg(msg_reg_nr), src0);
+      brw_MOV(p, retype(brw_message_reg(msg_reg_nr), BRW_REGISTER_TYPE_UD),
+	      retype(src0, BRW_REGISTER_TYPE_UD));
      brw_pop_insn_state(p);
      src0 = brw_message_reg(msg_reg_nr);
   }
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -89,6 +89,9 @@ brw_compile_shader(struct gl_context *ctx, struct gl_shader *shader)
 GLboolean
 brw_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
 {
+   struct brw_context *brw = brw_context(ctx);
+   struct intel_context *intel = &brw->intel;
+
   struct brw_shader *shader =
      (struct brw_shader *)prog->_LinkedShaders[MESA_SHADER_FRAGMENT];
   if (shader != NULL) {
@@ -107,7 +110,15 @@ brw_link_shader(struct gl_context *ctx, struct gl_shader_program *prog)
 			 SUB_TO_ADD_NEG |
 			 EXP_TO_EXP2 |
 			 LOG_TO_LOG2);
+
+      /* Pre-gen6 HW can only nest if-statements 16 deep.  Beyond this,
+       * if-statements need to be flattened.
+       */
+      if (intel->gen < 6)
+	 lower_if_to_cond_assign(shader->ir, 16);
+
      do_lower_texture_projection(shader->ir);
+      do_vec_index_to_cond_assign(shader->ir);
      brw_do_cubemap_normalize(shader->ir);

      do {
@@ -474,8 +485,13 @@ fs_visitor::emit_fragcoord_interpolation(ir_variable *ir)
   wpos.reg_offset++;

   /* gl_FragCoord.z */
-   emit(fs_inst(FS_OPCODE_LINTERP, wpos, this->delta_x, this->delta_y,
-		interp_reg(FRAG_ATTRIB_WPOS, 2)));
+   if (intel->gen >= 6) {
+      emit(fs_inst(BRW_OPCODE_MOV, wpos,
+		   fs_reg(brw_vec8_grf(c->source_depth_reg, 0))));
+   } else {
+      emit(fs_inst(FS_OPCODE_LINTERP, wpos, this->delta_x, this->delta_y,
+		   interp_reg(FRAG_ATTRIB_WPOS, 2)));
+   }
   wpos.reg_offset++;

   /* gl_FragCoord.w: Already set up in emit_interpolation */
@@ -770,6 +786,30 @@ fs_visitor::try_emit_saturate(ir_expression *ir)
   return true;
 }

+static uint32_t
+brw_conditional_for_comparison(unsigned int op)
+{
+   switch (op) {
+   case ir_binop_less:
+      return BRW_CONDITIONAL_L;
+   case ir_binop_greater:
+      return BRW_CONDITIONAL_G;
+   case ir_binop_lequal:
+      return BRW_CONDITIONAL_LE;
+   case ir_binop_gequal:
+      return BRW_CONDITIONAL_GE;
+   case ir_binop_equal:
+   case ir_binop_all_equal: /* same as equal for scalars */
+      return BRW_CONDITIONAL_Z;
+   case ir_binop_nequal:
+   case ir_binop_any_nequal: /* same as nequal for scalars */
+      return BRW_CONDITIONAL_NZ;
+   default:
+      assert(!"not reached: bad operation for comparison");
+      return BRW_CONDITIONAL_NZ;
+   }
+}
+
 void
 fs_visitor::visit(ir_expression *ir)
 {
@@ -885,35 +925,20 @@ fs_visitor::visit(ir_expression *ir)
      break;

   case ir_binop_less:
-      inst = emit(fs_inst(BRW_OPCODE_CMP, this->result, op[0], op[1]));
-      inst->conditional_mod = BRW_CONDITIONAL_L;
-      emit(fs_inst(BRW_OPCODE_AND, this->result, this->result, fs_reg(0x1)));
-      break;
   case ir_binop_greater:
-      inst = emit(fs_inst(BRW_OPCODE_CMP, this->result, op[0], op[1]));
-      inst->conditional_mod = BRW_CONDITIONAL_G;
-      emit(fs_inst(BRW_OPCODE_AND, this->result, this->result, fs_reg(0x1)));
-      break;
   case ir_binop_lequal:
-      inst = emit(fs_inst(BRW_OPCODE_CMP, this->result, op[0], op[1]));
-      inst->conditional_mod = BRW_CONDITIONAL_LE;
-      emit(fs_inst(BRW_OPCODE_AND, this->result, this->result, fs_reg(0x1)));
-      break;
   case ir_binop_gequal:
-      inst = emit(fs_inst(BRW_OPCODE_CMP, this->result, op[0], op[1]));
-      inst->conditional_mod = BRW_CONDITIONAL_GE;
-      emit(fs_inst(BRW_OPCODE_AND, this->result, this->result, fs_reg(0x1)));
-      break;
   case ir_binop_equal:
-   case ir_binop_all_equal: /* same as nequal for scalars */
-      inst = emit(fs_inst(BRW_OPCODE_CMP, this->result, op[0], op[1]));
-      inst->conditional_mod = BRW_CONDITIONAL_Z;
-      emit(fs_inst(BRW_OPCODE_AND, this->result, this->result, fs_reg(0x1)));
-      break;
+   case ir_binop_all_equal:
   case ir_binop_nequal:
-   case ir_binop_any_nequal: /* same as nequal for scalars */
-      inst = emit(fs_inst(BRW_OPCODE_CMP, this->result, op[0], op[1]));
-      inst->conditional_mod = BRW_CONDITIONAL_NZ;
+   case ir_binop_any_nequal:
+      temp = this->result;
+      /* original gen4 does implicit conversion before comparison. */
+      if (intel->gen < 5)
+	 temp.type = op[0].type;
+
+      inst = emit(fs_inst(BRW_OPCODE_CMP, temp, op[0], op[1]));
+      inst->conditional_mod = brw_conditional_for_comparison(ir->operation);
      emit(fs_inst(BRW_OPCODE_AND, this->result, this->result, fs_reg(0x1)));
      break;

@@ -958,7 +983,12 @@ fs_visitor::visit(ir_expression *ir)
      break;
   case ir_unop_f2b:
   case ir_unop_i2b:
-      inst = emit(fs_inst(BRW_OPCODE_CMP, this->result, op[0], fs_reg(0.0f)));
+      temp = this->result;
+      /* original gen4 does implicit conversion before comparison. */
+      if (intel->gen < 5)
+	 temp.type = op[0].type;
+
+      inst = emit(fs_inst(BRW_OPCODE_CMP, temp, op[0], fs_reg(0.0f)));
      inst->conditional_mod = BRW_CONDITIONAL_NZ;
      inst = emit(fs_inst(BRW_OPCODE_AND, this->result,
 			  this->result, fs_reg(1)));
@@ -1541,7 +1571,7 @@ fs_visitor::emit_bool_to_cond_code(ir_rvalue *ir)
 	    inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_d,
 				op[0], fs_reg(0.0f)));
 	 } else {
-	    inst = emit(fs_inst(BRW_OPCODE_MOV, reg_null_d, op[0]));
+	    inst = emit(fs_inst(BRW_OPCODE_MOV, reg_null_f, op[0]));
 	 }
 	 inst->conditional_mod = BRW_CONDITIONAL_NZ;
 	 break;
@@ -1556,31 +1586,18 @@ fs_visitor::emit_bool_to_cond_code(ir_rvalue *ir)
 	 break;

      case ir_binop_greater:
-	 inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_G;
-	 break;
      case ir_binop_gequal:
-	 inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_GE;
-	 break;
      case ir_binop_less:
-	 inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_L;
-	 break;
      case ir_binop_lequal:
-	 inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_LE;
-	 break;
      case ir_binop_equal:
      case ir_binop_all_equal:
-	 inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_Z;
-	 break;
      case ir_binop_nequal:
      case ir_binop_any_nequal:
-	 inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_cmp, op[0], op[1]));
+	 inst->conditional_mod =
+	    brw_conditional_for_comparison(expr->operation);
 	 break;
+
      default:
 	 assert(!"not reached");
 	 this->fail = true;
@@ -1659,30 +1676,16 @@ fs_visitor::emit_if_gen6(ir_if *ir)
 	 return;

      case ir_binop_greater:
-	 inst = emit(fs_inst(BRW_OPCODE_IF, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_G;
-	 return;
      case ir_binop_gequal:
-	 inst = emit(fs_inst(BRW_OPCODE_IF, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_GE;
-	 return;
      case ir_binop_less:
-	 inst = emit(fs_inst(BRW_OPCODE_IF, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_L;
-	 return;
      case ir_binop_lequal:
-	 inst = emit(fs_inst(BRW_OPCODE_IF, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_LE;
-	 return;
      case ir_binop_equal:
      case ir_binop_all_equal:
-	 inst = emit(fs_inst(BRW_OPCODE_IF, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_Z;
-	 return;
      case ir_binop_nequal:
      case ir_binop_any_nequal:
 	 inst = emit(fs_inst(BRW_OPCODE_IF, reg_null_d, op[0], op[1]));
-	 inst->conditional_mod = BRW_CONDITIONAL_NZ;
+	 inst->conditional_mod =
+	    brw_conditional_for_comparison(expr->operation);
 	 return;
      default:
 	 assert(!"not reached");
@@ -1764,32 +1767,9 @@ fs_visitor::visit(ir_loop *ir)
      this->base_ir = ir->to;
      ir->to->accept(this);

-      fs_inst *inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_d,
+      fs_inst *inst = emit(fs_inst(BRW_OPCODE_CMP, reg_null_cmp,
 				   counter, this->result));
-      switch (ir->cmp) {
-      case ir_binop_equal:
-	 inst->conditional_mod = BRW_CONDITIONAL_Z;
-	 break;
-      case ir_binop_nequal:
-	 inst->conditional_mod = BRW_CONDITIONAL_NZ;
-	 break;
-      case ir_binop_gequal:
-	 inst->conditional_mod = BRW_CONDITIONAL_GE;
-	 break;
-      case ir_binop_lequal:
-	 inst->conditional_mod = BRW_CONDITIONAL_LE;
-	 break;
-      case ir_binop_greater:
-	 inst->conditional_mod = BRW_CONDITIONAL_G;
-	 break;
-      case ir_binop_less:
-	 inst->conditional_mod = BRW_CONDITIONAL_L;
-	 break;
-      default:
-	 assert(!"not reached: unknown loop condition");
-	 this->fail = true;
-	 break;
-      }
+      inst->conditional_mod = brw_conditional_for_comparison(ir->cmp);

      inst = emit(fs_inst(BRW_OPCODE_BREAK));
      inst->predicated = true;
@@ -2158,7 +2138,8 @@ fs_visitor::generate_fb_write(fs_inst *inst)
 		inst->target,
 		inst->mlen,
 		0,
-		eot);
+		eot,
+		inst->header_present);
 }

 void
@@ -3280,6 +3261,7 @@ static struct brw_reg brw_reg_from_fs_reg(fs_reg *reg)
 	 break;
      default:
 	 assert(!"not reached");
+	 brw_reg = brw_null_reg();
 	 break;
      }
      break;
@@ -3294,6 +3276,10 @@ static struct brw_reg brw_reg_from_fs_reg(fs_reg *reg)
      assert(!"not reached");
      brw_reg = brw_null_reg();
      break;
+   default:
+      assert(!"not reached");
+      brw_reg = brw_null_reg();
+      break;
   }
   if (reg->abs)
      brw_reg = brw_abs(brw_reg);
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -348,6 +348,23 @@ public:
 					  hash_table_pointer_hash,
 					  hash_table_pointer_compare);

+      /* There's a question that appears to be left open in the spec:
+       * How do implicit dst conversions interact with the CMP
+       * instruction or conditional mods?  On gen6, the instruction:
+       *
+       * CMP null<d> src0<f> src1<f>
+       *
+       * will do src1 - src0 and compare that result as if it was an
+       * integer.  On gen4, it will do src1 - src0 as float, convert
+       * the result to int, and compare as int.  In between, it
+       * appears that it does src1 - src0 and does the compare in the
+       * execution type so dst type doesn't matter.
+       */
+      if (this->intel->gen > 4)
+	 this->reg_null_cmp = reg_null_d;
+      else
+	 this->reg_null_cmp = reg_null_f;
+
      this->frag_color = NULL;
      this->frag_data = NULL;
      this->frag_depth = NULL;
@@ -485,6 +502,7 @@ public:
   fs_reg pixel_w;
   fs_reg delta_x;
   fs_reg delta_y;
+   fs_reg reg_null_cmp;

   int grf_used;
 };
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -96,6 +96,9 @@ static void compile_gs_prog( struct brw_context *brw,
      brw_gs_quad_strip( &c, key );
      break;
   case GL_LINE_LOOP:
+      /* Gen6: LINELOOP is converted to LINESTRIP at the beginning of the 3D pipeline */
+      if (intel->gen == 6)
+          return;
      brw_gs_lines( &c );
      break;
   case GL_LINES:
@@ -189,7 +192,7 @@ static void populate_key( struct brw_context *brw,
   }

   if (intel->gen == 6)
-       prim_gs_always = brw->primitive == GL_LINE_LOOP;
+       prim_gs_always = 0;
   else
       prim_gs_always = brw->primitive == GL_QUADS ||
 			brw->primitive == GL_QUAD_STRIP ||
--- a/src/mesa/drivers/dri/i965/brw_queryobj.c
+++ b/src/mesa/drivers/dri/i965/brw_queryobj.c
@@ -232,6 +232,12 @@ brw_prepare_query_begin(struct brw_context *brw)
      brw->query.bo = NULL;

      brw->query.bo = drm_intel_bo_alloc(intel->bufmgr, "query", 4096, 1);
+
+      /* clear target buffer */
+      drm_intel_bo_map(brw->query.bo, GL_TRUE);
+      memset((char *)brw->query.bo->virtual, 0, 4096);
+      drm_intel_bo_unmap(brw->query.bo);
+
      brw->query.index = 0;
   }

--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -164,7 +164,8 @@ void brw_destroy_caches( struct brw_context *brw );
 /***********************************************************************
 * brw_state_batch.c
 */
-#define BRW_BATCH_STRUCT(brw, s) intel_batchbuffer_data( brw->intel.batch, (s), sizeof(*(s)))
+#define BRW_BATCH_STRUCT(brw, s) intel_batchbuffer_data(brw->intel.batch, (s), \
+							sizeof(*(s)), false)
 #define BRW_CACHED_BATCH_STRUCT(brw, s) brw_cached_batch_struct( brw, (s), sizeof(*(s)) )

 GLboolean brw_cached_batch_struct( struct brw_context *brw,
--- a/src/mesa/drivers/dri/i965/brw_state_batch.c
+++ b/src/mesa/drivers/dri/i965/brw_state_batch.c
@@ -48,7 +48,7 @@ GLboolean brw_cached_batch_struct( struct brw_context *brw,
   struct header *newheader = (struct header *)data;

   if (brw->emit_state_always) {
-      intel_batchbuffer_data(brw->intel.batch, data, sz);
+      intel_batchbuffer_data(brw->intel.batch, data, sz, false);
      return GL_TRUE;
   }

@@ -75,7 +75,7 @@ GLboolean brw_cached_batch_struct( struct brw_context *brw,

 emit:
   memcpy(item->header, newheader, sz);
-   intel_batchbuffer_data(brw->intel.batch, data, sz);
+   intel_batchbuffer_data(brw->intel.batch, data, sz, false);
   return GL_TRUE;
 }

--- a/src/mesa/drivers/dri/i965/brw_structs.h
+++ b/src/mesa/drivers/dri/i965/brw_structs.h
@@ -1064,6 +1064,15 @@ struct brw_sampler_default_color {
   GLfloat color[4];
 };

+struct gen5_sampler_default_color {
+   uint8_t ub[4];
+   float f[4];
+   uint16_t hf[4];
+   uint16_t us[4];
+   int16_t s[4];
+   uint8_t b[4];
+};
+
 struct brw_sampler_state
 {
   
@@ -1169,7 +1178,12 @@ struct brw_surface_state
      GLuint cube_neg_y:1; 
      GLuint cube_pos_x:1; 
      GLuint cube_neg_x:1; 
-      GLuint pad:4;
+      GLuint pad:2;
+      /* Required on gen6 for surfaces accessed through render cache messages.
+       */
+      GLuint render_cache_read_write:1;
+      /* Ironlake and newer: instead of replicating one of the texels */
+      GLuint cube_corner_average:1;
      GLuint mipmap_layout_mode:1; 
      GLuint vert_line_stride_ofs:1; 
      GLuint vert_line_stride:1; 
@@ -1649,6 +1663,18 @@ struct brw_instruction
 	 GLuint end_of_thread:1;
      } dp_read;

+      struct {
+	 GLuint binding_table_index:8;
+	 GLuint msg_control:3;
+	 GLuint msg_type:3;
+	 GLuint target_cache:2;
+	 GLuint response_length:4;
+	 GLuint msg_length:4;
+	 GLuint msg_target:4;
+	 GLuint pad1:3;
+	 GLuint end_of_thread:1;
+      } dp_read_g4x;
+
      struct {
 	 GLuint binding_table_index:8;
 	 GLuint msg_control:3;  
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -130,6 +130,7 @@ static void brw_upload_vs_prog(struct brw_context *brw)
   key.nr_userclip = brw_count_bits(ctx->Transform.ClipPlanesEnabled);
   key.copy_edgeflag = (ctx->Polygon.FrontMode != GL_FILL ||
 			ctx->Polygon.BackMode != GL_FILL);
+   key.two_side_color = (ctx->Light.Enabled && ctx->Light.Model.TwoSide);

   /* _NEW_POINT */
   if (ctx->Point.PointSprite) {
@@ -157,7 +158,7 @@ static void brw_upload_vs_prog(struct brw_context *brw)
 */
 const struct brw_tracked_state brw_vs_prog = {
   .dirty = {
-      .mesa  = _NEW_TRANSFORM | _NEW_POLYGON | _NEW_POINT,
+      .mesa  = _NEW_TRANSFORM | _NEW_POLYGON | _NEW_POINT | _NEW_LIGHT,
      .brw   = BRW_NEW_VERTEX_PROGRAM,
      .cache = 0
   },
--- a/src/mesa/drivers/dri/i965/brw_vs.h
+++ b/src/mesa/drivers/dri/i965/brw_vs.h
@@ -44,6 +44,7 @@ struct brw_vs_prog_key {
   GLuint nr_userclip:4;
   GLuint copy_edgeflag:1;
   GLuint point_coord_replace:8;
+   GLuint two_side_color: 1;
 };


--- a/src/mesa/drivers/dri/i965/brw_vs_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_vs_emit.c
@@ -140,11 +140,13 @@ clear_current_const(struct brw_vs_compile *c)
 static void brw_vs_alloc_regs( struct brw_vs_compile *c )
 {
   struct intel_context *intel = &c->func.brw->intel;
-   GLuint i, reg = 0, mrf;
+   GLuint i, reg = 0, mrf, j;
   int attributes_in_vue;
   int first_reladdr_output;
   int max_constant;
   int constant = 0;
+   int vert_result_reoder[VERT_RESULT_MAX];
+   int bfc = 0;

   /* Determine whether to use a real constant buffer or use a block
    * of GRF registers for constants.  The later is faster but only
@@ -254,7 +256,7 @@ static void brw_vs_alloc_regs( struct brw_vs_compile *c )
   }
   reg += (constant + 1) / 2;
   c->prog_data.curb_read_length = reg - 1;
-   c->prog_data.nr_params = constant;
+   c->prog_data.nr_params = constant * 4;
   /* XXX 0 causes a bug elsewhere... */
   if (intel->gen < 6 && c->prog_data.nr_params == 0)
      c->prog_data.nr_params = 4;
@@ -291,7 +293,36 @@ static void brw_vs_alloc_regs( struct brw_vs_compile *c )
      mrf = 4;

   first_reladdr_output = get_first_reladdr_output(&c->vp->program);
-   for (i = 0; i < VERT_RESULT_MAX; i++) {
+
+   for (i = 0; i < VERT_RESULT_MAX; i++)
+       vert_result_reoder[i] = i;
+
+   /* adjust attribute order in VUE for BFC0/BFC1 on Gen6+ */
+   if (intel->gen >= 6 && c->key.two_side_color) {
+       if ((c->prog_data.outputs_written & BITFIELD64_BIT(VERT_RESULT_COL1)) &&
+           (c->prog_data.outputs_written & BITFIELD64_BIT(VERT_RESULT_BFC1))) {
+           assert(c->prog_data.outputs_written & BITFIELD64_BIT(VERT_RESULT_COL0));
+           assert(c->prog_data.outputs_written & BITFIELD64_BIT(VERT_RESULT_BFC0));
+           bfc = 2;
+       } else if ((c->prog_data.outputs_written & BITFIELD64_BIT(VERT_RESULT_COL0)) &&
+           (c->prog_data.outputs_written & BITFIELD64_BIT(VERT_RESULT_BFC0)))
+           bfc = 1;
+
+       if (bfc) {
+           for (i = 0; i < bfc; i++) {
+               vert_result_reoder[VERT_RESULT_COL0 + i * 2 + 0] = VERT_RESULT_COL0 + i;
+               vert_result_reoder[VERT_RESULT_COL0 + i * 2 + 1] = VERT_RESULT_BFC0 + i;
+           }
+
+           for (i = VERT_RESULT_COL0 + bfc * 2; i < VERT_RESULT_BFC0 + bfc; i++) {
+               vert_result_reoder[i] = i - bfc;
+           }
+       }
+   }
+
+   for (j = 0; j < VERT_RESULT_MAX; j++) {
+      i = vert_result_reoder[j];
+
      if (c->prog_data.outputs_written & BITFIELD64_BIT(i)) {
 	 c->nr_outputs++;
         assert(i < Elements(c->regs[PROGRAM_OUTPUT]));
@@ -627,6 +658,22 @@ static void emit_min( struct brw_compile *p,
   }
 }

+static void emit_arl(struct brw_compile *p,
+		     struct brw_reg dst,
+		     struct brw_reg src)
+{
+   struct intel_context *intel = &p->brw->intel;
+
+   if (intel->gen >= 6) {
+      struct brw_reg dst_f = retype(dst, BRW_REGISTER_TYPE_F);
+
+      brw_RNDD(p, dst_f, src);
+      brw_MOV(p, dst, dst_f);
+   } else {
+      brw_RNDD(p, dst, src);
+   }
+}
+
 static void emit_math1_gen4(struct brw_vs_compile *c,
 			    GLuint function,
 			    struct brw_reg dst,
@@ -1072,8 +1119,6 @@ get_constant(struct brw_vs_compile *c,

   assert(argIndex < 3);

-   assert(c->func.brw->intel.gen < 6); /* FINISHME */
-
   if (c->current_const[argIndex].index != src->Index) {
      /* Keep track of the last constant loaded in this slot, for reuse. */
      c->current_const[argIndex].index = src->Index;
@@ -1091,7 +1136,7 @@ get_constant(struct brw_vs_compile *c,
   }

   /* replicate lower four floats into upper half (to get XYZWXYZW) */
-   const_reg = stride(const_reg, 0, 4, 0);
+   const_reg = stride(const_reg, 0, 4, 1);
   const_reg.subnr = 0;

   return const_reg;
@@ -1104,14 +1149,14 @@ get_reladdr_constant(struct brw_vs_compile *c,
 {
   const struct prog_src_register *src = &inst->SrcReg[argIndex];
   struct brw_compile *p = &c->func;
+   struct brw_context *brw = p->brw;
+   struct intel_context *intel = &brw->intel;
   struct brw_reg const_reg = c->current_const[argIndex].reg;
-   struct brw_reg addrReg = c->regs[PROGRAM_ADDRESS][0];
-   struct brw_reg byte_addr_reg = retype(get_tmp(c), BRW_REGISTER_TYPE_D);
+   struct brw_reg addr_reg = c->regs[PROGRAM_ADDRESS][0];
+   uint32_t offset;

   assert(argIndex < 3);

-   assert(c->func.brw->intel.gen < 6); /* FINISHME */
-
   /* Can't reuse a reladdr constant load. */
   c->current_const[argIndex].index = -1;

@@ -1120,15 +1165,21 @@ get_reladdr_constant(struct brw_vs_compile *c,
 	  src->Index, argIndex, c->current_const[argIndex].reg.nr);
 #endif

-   brw_MUL(p, byte_addr_reg, addrReg, brw_imm_ud(16));
+   if (intel->gen >= 6) {
+      offset = src->Index;
+   } else {
+      struct brw_reg byte_addr_reg = retype(get_tmp(c), BRW_REGISTER_TYPE_D);
+      brw_MUL(p, byte_addr_reg, addr_reg, brw_imm_d(16));
+      addr_reg = byte_addr_reg;
+      offset = 16 * src->Index;
+   }

   /* fetch the first vec4 */
   brw_dp_READ_4_vs_relative(p,
-			     const_reg,                     /* writeback dest */
-			     byte_addr_reg,                 /* address register */
-			     16 * src->Index,               /* byte offset */
-			     SURF_INDEX_VERT_CONST_BUFFER   /* binding table index */
-			     );
+			     const_reg,
+			     addr_reg,
+			     offset,
+			     SURF_INDEX_VERT_CONST_BUFFER);

   return const_reg;
 }
@@ -1928,7 +1979,7 @@ void brw_vs_emit(struct brw_vs_compile *c )
 	 emit_math1(c, BRW_MATH_FUNCTION_EXP, dst, args[0], BRW_MATH_PRECISION_FULL);
 	 break;
      case OPCODE_ARL:
-	 brw_RNDD(p, dst, args[0]);
+	 emit_arl(p, dst, args[0]);
 	 break;
      case OPCODE_FLR:
 	 brw_RNDD(p, dst, args[0]);
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -378,6 +378,10 @@ static void brw_wm_populate_key( struct brw_context *brw,
 	       swizzles[2] = SWIZZLE_ZERO;
 	    } else if (t->DepthMode == GL_LUMINANCE) {
 	       swizzles[3] = SWIZZLE_ONE;
+	    } else if (t->DepthMode == GL_RED) {
+	       swizzles[1] = SWIZZLE_ZERO;
+	       swizzles[2] = SWIZZLE_ZERO;
+	       swizzles[3] = SWIZZLE_ZERO;
 	    }
 	 }

--- a/src/mesa/drivers/dri/i965/brw_wm_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_emit.c
@@ -219,43 +219,45 @@ void emit_wpos_xy(struct brw_wm_compile *c,
 		  const struct brw_reg *arg0)
 {
   struct brw_compile *p = &c->func;
+   struct intel_context *intel = &p->brw->intel;
+   struct brw_reg delta_x = retype(arg0[0], BRW_REGISTER_TYPE_W);
+   struct brw_reg delta_y = retype(arg0[1], BRW_REGISTER_TYPE_W);

   if (mask & WRITEMASK_X) {
+      if (intel->gen >= 6) {
+	 struct brw_reg delta_x_f = retype(delta_x, BRW_REGISTER_TYPE_F);
+	 brw_MOV(p, delta_x_f, delta_x);
+	 delta_x = delta_x_f;
+      }
+
      if (c->fp->program.PixelCenterInteger) {
 	 /* X' = X */
-	 brw_MOV(p,
-		 dst[0],
-		 retype(arg0[0], BRW_REGISTER_TYPE_W));
+	 brw_MOV(p, dst[0], delta_x);
      } else {
 	 /* X' = X + 0.5 */
-	 brw_ADD(p,
-		 dst[0],
-		 retype(arg0[0], BRW_REGISTER_TYPE_W),
-		 brw_imm_f(0.5));
+	 brw_ADD(p, dst[0], delta_x, brw_imm_f(0.5));
      }
   }

   if (mask & WRITEMASK_Y) {
+      if (intel->gen >= 6) {
+	 struct brw_reg delta_y_f = retype(delta_y, BRW_REGISTER_TYPE_F);
+	 brw_MOV(p, delta_y_f, delta_y);
+	 delta_y = delta_y_f;
+      }
+
      if (c->fp->program.OriginUpperLeft) {
 	 if (c->fp->program.PixelCenterInteger) {
 	    /* Y' = Y */
-	    brw_MOV(p,
-		    dst[1],
-		    retype(arg0[1], BRW_REGISTER_TYPE_W));
+	    brw_MOV(p, dst[1], delta_y);
 	 } else {
-	    /* Y' = Y + 0.5 */
-	    brw_ADD(p,
-		    dst[1],
-		    retype(arg0[1], BRW_REGISTER_TYPE_W),
-		    brw_imm_f(0.5));
+	    brw_ADD(p, dst[1], delta_y, brw_imm_f(0.5));
 	 }
      } else {
 	 float center_offset = c->fp->program.PixelCenterInteger ? 0.0 : 0.5;

 	 /* Y' = (height - 1) - Y + center */
-	 brw_ADD(p,
-		 dst[1],
-		 negate(retype(arg0[1], BRW_REGISTER_TYPE_W)),
+	 brw_ADD(p, dst[1], negate(delta_y),
 		 brw_imm_f(c->key.drawable_height - 1 + center_offset));
      }
   }
@@ -971,34 +973,23 @@ void emit_math2(struct brw_wm_compile *c,
      struct brw_reg temp_dst = dst[dst_chan];

      if (arg0[0].hstride == BRW_HORIZONTAL_STRIDE_0) {
-	 if (arg1[0].hstride == BRW_HORIZONTAL_STRIDE_0) {
-	    /* Both scalar arguments.  Do scalar calc. */
-	    src0.hstride = BRW_HORIZONTAL_STRIDE_1;
-	    src1.hstride = BRW_HORIZONTAL_STRIDE_1;
-	    temp_dst.hstride = BRW_HORIZONTAL_STRIDE_1;
-	    temp_dst.width = BRW_WIDTH_1;
+	 brw_MOV(p, temp_dst, src0);
+	 src0 = temp_dst;
+      }

-	    if (arg0[0].subnr != 0) {
-	       brw_MOV(p, temp_dst, src0);
-	       src0 = temp_dst;
-
-	       /* Ouch.  We've used the temp as a dst, and we still
-		* need a temp to store arg1 in, because src and dst
-		* offsets have to be equal.  Leaving this up to
-		* glsl2-965 to handle correctly.
-		*/
-	       assert(arg1[0].subnr == 0);
-	    } else if (arg1[0].subnr != 0) {
-	       brw_MOV(p, temp_dst, src1);
-	       src1 = temp_dst;
-	    }
-	 } else {
-	    brw_MOV(p, temp_dst, src0);
-	    src0 = temp_dst;
-	 }
-      } else if (arg1[0].hstride == BRW_HORIZONTAL_STRIDE_0) {
-	 brw_MOV(p, temp_dst, src1);
-	 src1 = temp_dst;
+      if (arg1[0].hstride == BRW_HORIZONTAL_STRIDE_0) {
+	 /* This is a heinous hack to get a temporary register for use
+	  * in case both arg0 and arg1 are constants.  Why you're
+	  * doing exponentiation on constant values in the shader, we
+	  * don't know.
+	  *
+	  * max_wm_grf is almost surely less than the maximum GRF, and
+	  * gen6 doesn't care about the number of GRFs used in a
+	  * shader like pre-gen6 did.
+	  */
+	 struct brw_reg temp = brw_vec8_grf(c->max_wm_grf, 0);
+	 brw_MOV(p, temp, src1);
+	 src1 = temp;
      }

      brw_set_saturate(p, (mask & SATURATE) ? 1 : 0);
@@ -1016,14 +1007,6 @@ void emit_math2(struct brw_wm_compile *c,
 		   sechalf(src0),
 		   sechalf(src1));
      }
-
-      /* Splat a scalar result into all the channels. */
-      if (arg0[0].hstride == BRW_HORIZONTAL_STRIDE_0 &&
-	  arg1[0].hstride == BRW_HORIZONTAL_STRIDE_0) {
-	 temp_dst.hstride = BRW_HORIZONTAL_STRIDE_0;
-	 temp_dst.vstride = BRW_VERTICAL_STRIDE_0;
-	 brw_MOV(p, dst[dst_chan], temp_dst);
-      }
   } else {
      GLuint saturate = ((mask & SATURATE) ?
 			 BRW_MATH_SATURATE_SATURATE :
@@ -1373,7 +1356,8 @@ static void fire_fb_write( struct brw_wm_compile *c,
 		target,		
 		nr,
 		0, 
-		eot);
+		eot,
+		GL_TRUE);
 }


@@ -1518,7 +1502,8 @@ void emit_fb_write(struct brw_wm_compile *c,
       */
      brw_push_insn_state(p);
      brw_set_mask_control(p, BRW_MASK_DISABLE);
-      brw_MOV(p, brw_message_reg(0), brw_vec8_grf(0, 0));
+      brw_MOV(p, retype(brw_message_reg(0), BRW_REGISTER_TYPE_UD),
+	      retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD));
      brw_pop_insn_state(p);

      if (target != 0) {
--- a/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_sampler_state.c
@@ -69,12 +69,43 @@ static GLuint translate_wrap_mode( GLenum wrap )
 static drm_intel_bo *upload_default_color( struct brw_context *brw,
 				     const GLfloat *color )
 {
-   struct brw_sampler_default_color sdc;
+   struct intel_context *intel = &brw->intel;

-   COPY_4V(sdc.color, color); 
-   
-   return brw_cache_data(&brw->cache, BRW_SAMPLER_DEFAULT_COLOR,
-			 &sdc, sizeof(sdc));
+   if (intel->gen >= 5) {
+      struct gen5_sampler_default_color sdc;
+
+      memset(&sdc, 0, sizeof(sdc));
+
+      UNCLAMPED_FLOAT_TO_UBYTE(sdc.ub[0], color[0]);
+      UNCLAMPED_FLOAT_TO_UBYTE(sdc.ub[1], color[1]);
+      UNCLAMPED_FLOAT_TO_UBYTE(sdc.ub[2], color[2]);
+      UNCLAMPED_FLOAT_TO_UBYTE(sdc.ub[3], color[3]);
+
+      UNCLAMPED_FLOAT_TO_USHORT(sdc.us[0], color[0]);
+      UNCLAMPED_FLOAT_TO_USHORT(sdc.us[1], color[1]);
+      UNCLAMPED_FLOAT_TO_USHORT(sdc.us[2], color[2]);
+      UNCLAMPED_FLOAT_TO_USHORT(sdc.us[3], color[3]);
+
+      UNCLAMPED_FLOAT_TO_SHORT(sdc.s[0], color[0]);
+      UNCLAMPED_FLOAT_TO_SHORT(sdc.s[1], color[1]);
+      UNCLAMPED_FLOAT_TO_SHORT(sdc.s[2], color[2]);
+      UNCLAMPED_FLOAT_TO_SHORT(sdc.s[3], color[3]);
+
+      /* XXX: Fill in half floats */
+      /* XXX: Fill in signed bytes */
+
+      COPY_4V(sdc.f, color);
+
+      return brw_cache_data(&brw->cache, BRW_SAMPLER_DEFAULT_COLOR,
+			    &sdc, sizeof(sdc));
+   } else {
+      struct brw_sampler_default_color sdc;
+
+      COPY_4V(sdc.color, color);
+
+      return brw_cache_data(&brw->cache, BRW_SAMPLER_DEFAULT_COLOR,
+			    &sdc, sizeof(sdc));
+   }
 }


--- a/src/mesa/drivers/dri/i965/brw_wm_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_state.c
@@ -87,7 +87,6 @@ wm_unit_populate_key(struct brw_context *brw, struct brw_wm_unit_key *key)
 {
   struct gl_context *ctx = &brw->intel.ctx;
   const struct gl_fragment_program *fp = brw->fragment_program;
-   const struct brw_fragment_program *bfp = (struct brw_fragment_program *) fp;
   struct intel_context *intel = &brw->intel;

   memset(key, 0, sizeof(*key));
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -139,6 +139,8 @@ static GLuint translate_tex_format( gl_format mesa_format,
 	  return BRW_SURFACEFORMAT_I16_UNORM;
      else if (depth_mode == GL_ALPHA)
 	  return BRW_SURFACEFORMAT_A16_UNORM;
+      else if (depth_mode == GL_RED)
+	  return BRW_SURFACEFORMAT_R16_UNORM;
      else
 	  return BRW_SURFACEFORMAT_L16_UNORM;

@@ -174,6 +176,8 @@ static GLuint translate_tex_format( gl_format mesa_format,
         return BRW_SURFACEFORMAT_I24X8_UNORM;
      else if (depth_mode == GL_ALPHA)
         return BRW_SURFACEFORMAT_A24X8_UNORM;
+      else if (depth_mode == GL_RED)
+         return BRW_SURFACEFORMAT_R24_UNORM_X8_TYPELESS;
      else
         return BRW_SURFACEFORMAT_L24X8_UNORM;

@@ -274,6 +278,7 @@ brw_create_constant_surface(struct brw_context *brw,
 			    drm_intel_bo **out_bo,
 			    uint32_t *out_offset)
 {
+   struct intel_context *intel = &brw->intel;
   const GLint w = width - 1;
   struct brw_surface_state surf;
   void *map;
@@ -284,6 +289,9 @@ brw_create_constant_surface(struct brw_context *brw,
   surf.ss0.surface_type = BRW_SURFACE_BUFFER;
   surf.ss0.surface_format = BRW_SURFACEFORMAT_R32G32B32A32_FLOAT;

+   if (intel->gen >= 6)
+      surf.ss0.render_cache_read_write = 1;
+
   assert(bo);
   surf.ss1.base_addr = bo->offset; /* reloc */

--- a/src/mesa/drivers/dri/i965/gen6_clip_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_clip_state.c
@@ -43,7 +43,10 @@ upload_clip_state(struct brw_context *brw)
      depth_clamp = GEN6_CLIP_Z_TEST;

   if (ctx->Light.ProvokingVertex == GL_FIRST_VERTEX_CONVENTION) {
-      provoking = 0;
+      provoking =
+	 (0 << GEN6_CLIP_TRI_PROVOKE_SHIFT) |
+	 (1 << GEN6_CLIP_TRIFAN_PROVOKE_SHIFT) |
+	 (0 << GEN6_CLIP_LINE_PROVOKE_SHIFT);
   } else {
      provoking =
 	 (2 << GEN6_CLIP_TRI_PROVOKE_SHIFT) |
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -33,9 +33,10 @@
 #include "intel_batchbuffer.h"

 static uint32_t
-get_attr_override(struct brw_context *brw, int fs_attr)
+get_attr_override(struct brw_context *brw, int fs_attr, int two_side_color)
 {
   int attr_index = 0, i, vs_attr;
+   int bfc = 0;

   if (fs_attr <= FRAG_ATTRIB_TEX7)
      vs_attr = fs_attr;
@@ -57,6 +58,30 @@ get_attr_override(struct brw_context *brw, int fs_attr)
 	 attr_index++;
   }

+   assert(attr_index < 32);
+
+   if (two_side_color) {
+       if ((brw->vs.prog_data->outputs_written & BITFIELD64_BIT(VERT_RESULT_COL1)) &&
+           (brw->vs.prog_data->outputs_written & BITFIELD64_BIT(VERT_RESULT_BFC1))) {
+           assert(brw->vs.prog_data->outputs_written & BITFIELD64_BIT(VERT_RESULT_COL0));
+           assert(brw->vs.prog_data->outputs_written & BITFIELD64_BIT(VERT_RESULT_BFC0));
+           bfc = 2;
+       } else if ((brw->vs.prog_data->outputs_written & BITFIELD64_BIT(VERT_RESULT_COL0)) &&
+                (brw->vs.prog_data->outputs_written & BITFIELD64_BIT(VERT_RESULT_BFC0)))
+           bfc = 1;
+   }
+
+   if (bfc && (fs_attr <= FRAG_ATTRIB_TEX7 && fs_attr > FRAG_ATTRIB_WPOS)) {
+       if (fs_attr == FRAG_ATTRIB_COL0)
+           attr_index |= (ATTRIBUTE_SWIZZLE_INPUTATTR_FACING << ATTRIBUTE_SWIZZLE_SHIFT);
+       else if (fs_attr == FRAG_ATTRIB_COL1 && bfc == 2) {
+           attr_index++;
+           attr_index |= (ATTRIBUTE_SWIZZLE_INPUTATTR_FACING << ATTRIBUTE_SWIZZLE_SHIFT);
+       } else {
+           attr_index += bfc;
+       }
+   }
+
   return attr_index;
 }

@@ -75,6 +100,7 @@ upload_sf_state(struct brw_context *brw)
   GLboolean render_to_fbo = brw->intel.ctx.DrawBuffer->Name != 0;
   int attr = 0;
   int urb_start;
+   int two_side_color = (ctx->Light.Enabled && ctx->Light.Model.TwoSide);

   /* _NEW_TRANSFORM */
   if (ctx->Transform.ClipPlanesEnabled)
@@ -224,7 +250,7 @@ upload_sf_state(struct brw_context *brw)

      for (; attr < 64; attr++) {
 	 if (brw->fragment_program->Base.InputsRead & BITFIELD64_BIT(attr)) {
-	    attr_overrides |= get_attr_override(brw, attr);
+	    attr_overrides |= get_attr_override(brw, attr, two_side_color);
 	    attr++;
 	    break;
 	 }
@@ -232,7 +258,7 @@ upload_sf_state(struct brw_context *brw)

      for (; attr < 64; attr++) {
 	 if (brw->fragment_program->Base.InputsRead & BITFIELD64_BIT(attr)) {
-	    attr_overrides |= get_attr_override(brw, attr) << 16;
+	    attr_overrides |= get_attr_override(brw, attr, two_side_color) << 16;
 	    attr++;
 	    break;
 	 }
--- a/src/mesa/drivers/dri/i965/gen6_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c
@@ -130,6 +130,7 @@ upload_vs_state(struct brw_context *brw)
   OUT_BATCH(CMD_3D_VS_STATE << 16 | (6 - 2));
   OUT_RELOC(brw->vs.prog_bo, I915_GEM_DOMAIN_INSTRUCTION, 0, 0);
   OUT_BATCH((0 << GEN6_VS_SAMPLER_COUNT_SHIFT) |
+	     GEN6_VS_FLOATING_POINT_MODE_ALT |
 	     (brw->vs.nr_surfaces << GEN6_VS_BINDING_TABLE_ENTRY_COUNT_SHIFT));
   OUT_BATCH(0); /* scratch space base offset */
   OUT_BATCH((1 << GEN6_VS_DISPATCH_START_GRF_SHIFT) |
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -133,6 +133,9 @@ upload_wm_state(struct brw_context *brw)
   dw5 |= GEN6_WM_LINE_AA_WIDTH_1_0;
   dw5 |= GEN6_WM_LINE_END_CAP_AA_WIDTH_0_5;

+   /* OpenGL non-ieee floating point mode */
+   dw2 |= GEN6_WM_FLOATING_POINT_MODE_ALT;
+
   /* BRW_NEW_NR_WM_SURFACES */
   dw2 |= brw->wm.nr_surfaces << GEN6_WM_BINDING_TABLE_ENTRY_COUNT_SHIFT;

--- a/src/mesa/drivers/dri/intel/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/intel/intel_batchbuffer.c
@@ -93,8 +93,16 @@ do_flush_locked(struct intel_batchbuffer *batch, GLuint used)
   batch->ptr = NULL;

   if (!intel->intelScreen->no_hw) {
-      drm_intel_bo_exec(batch->buf, used, NULL, 0,
-			(x_off & 0xffff) | (y_off << 16));
+      int ring;
+
+      if (intel->gen < 6 || !intel->batch->is_blit) {
+	 ring = I915_EXEC_RENDER;
+      } else {
+	 ring = I915_EXEC_BLT;
+      }
+
+      drm_intel_bo_mrb_exec(batch->buf, used, NULL, 0,
+			    (x_off & 0xffff) | (y_off << 16), ring);
   }

   if (unlikely(INTEL_DEBUG & DEBUG_BATCH)) {
@@ -242,10 +250,10 @@ intel_batchbuffer_emit_reloc_fenced(struct intel_batchbuffer *batch,

 void
 intel_batchbuffer_data(struct intel_batchbuffer *batch,
-                       const void *data, GLuint bytes)
+                       const void *data, GLuint bytes, bool is_blit)
 {
   assert((bytes & 3) == 0);
-   intel_batchbuffer_require_space(batch, bytes);
+   intel_batchbuffer_require_space(batch, bytes, is_blit);
   __memcpy(batch->ptr, data, bytes);
   batch->ptr += bytes;
 }
@@ -262,22 +270,32 @@ intel_batchbuffer_emit_mi_flush(struct intel_batchbuffer *batch)
   struct intel_context *intel = batch->intel;

   if (intel->gen >= 6) {
-      BEGIN_BATCH(8);
+      if (intel->batch->is_blit) {
+	 BEGIN_BATCH_BLT(4);
+	 OUT_BATCH(MI_FLUSH_DW);
+	 OUT_BATCH(0);
+	 OUT_BATCH(0);
+	 OUT_BATCH(0);
+	 ADVANCE_BATCH();
+      } else {
+	 BEGIN_BATCH(8);
+	 /* XXX workaround: issue any post sync != 0 before write
+	  * cache flush = 1
+	  */
+	 OUT_BATCH(_3DSTATE_PIPE_CONTROL);
+	 OUT_BATCH(PIPE_CONTROL_WRITE_IMMEDIATE);
+	 OUT_BATCH(0); /* write address */
+	 OUT_BATCH(0); /* write data */

-      /* XXX workaround: issue any post sync != 0 before write cache flush = 1 */
-      OUT_BATCH(_3DSTATE_PIPE_CONTROL);
-      OUT_BATCH(PIPE_CONTROL_WRITE_IMMEDIATE);
-      OUT_BATCH(0); /* write address */
-      OUT_BATCH(0); /* write data */
-
-      OUT_BATCH(_3DSTATE_PIPE_CONTROL);
-      OUT_BATCH(PIPE_CONTROL_INSTRUCTION_FLUSH |
-		PIPE_CONTROL_WRITE_FLUSH |
-		PIPE_CONTROL_DEPTH_CACHE_FLUSH |
-		PIPE_CONTROL_NO_WRITE);
-      OUT_BATCH(0); /* write address */
-      OUT_BATCH(0); /* write data */
-      ADVANCE_BATCH();
+	 OUT_BATCH(_3DSTATE_PIPE_CONTROL);
+	 OUT_BATCH(PIPE_CONTROL_INSTRUCTION_FLUSH |
+		   PIPE_CONTROL_WRITE_FLUSH |
+		   PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+		   PIPE_CONTROL_NO_WRITE);
+	 OUT_BATCH(0); /* write address */
+	 OUT_BATCH(0); /* write data */
+	 ADVANCE_BATCH();
+      }
   } else if (intel->gen >= 4) {
      BEGIN_BATCH(4);
      OUT_BATCH(_3DSTATE_PIPE_CONTROL |
--- a/src/mesa/drivers/dri/intel/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/intel/intel_batchbuffer.h
@@ -31,6 +31,7 @@ struct intel_batchbuffer
   } emit;
 #endif

+   bool is_blit;
   GLuint dirty_state;
   GLuint reserved_space;
 };
@@ -55,7 +56,7 @@ void intel_batchbuffer_reset(struct intel_batchbuffer *batch);
 * intel_buffer_dword() calls.
 */
 void intel_batchbuffer_data(struct intel_batchbuffer *batch,
-                            const void *data, GLuint bytes);
+                            const void *data, GLuint bytes, bool is_blit);

 void intel_batchbuffer_release_space(struct intel_batchbuffer *batch,
                                     GLuint bytes);
@@ -114,8 +115,16 @@ intel_batchbuffer_emit_float(struct intel_batchbuffer *batch, float f)

 static INLINE void
 intel_batchbuffer_require_space(struct intel_batchbuffer *batch,
-                                GLuint sz)
+                                GLuint sz, int is_blit)
 {
+
+   if (batch->intel->gen >= 6 && batch->is_blit != is_blit &&
+       batch->ptr != batch->map) {
+      intel_batchbuffer_flush(batch);
+   }
+
+   batch->is_blit = is_blit;
+
 #ifdef DEBUG
   assert(sz < batch->size - 8);
 #endif
@@ -124,9 +133,10 @@ intel_batchbuffer_require_space(struct intel_batchbuffer *batch,
 }

 static INLINE void
-intel_batchbuffer_begin(struct intel_batchbuffer *batch, int n)
+intel_batchbuffer_begin(struct intel_batchbuffer *batch, int n, bool is_blit)
 {
-   intel_batchbuffer_require_space(batch, n * 4);
+   intel_batchbuffer_require_space(batch, n * 4, is_blit);
+
 #ifdef DEBUG
   assert(batch->map);
   assert(batch->emit.start_ptr == NULL);
@@ -154,7 +164,8 @@ intel_batchbuffer_advance(struct intel_batchbuffer *batch)
 */
 #define BATCH_LOCALS

-#define BEGIN_BATCH(n) intel_batchbuffer_begin(intel->batch, n)
+#define BEGIN_BATCH(n) intel_batchbuffer_begin(intel->batch, n, false)
+#define BEGIN_BATCH_BLT(n) intel_batchbuffer_begin(intel->batch, n, true)
 #define OUT_BATCH(d) intel_batchbuffer_emit_dword(intel->batch, d)
 #define OUT_BATCH_F(f) intel_batchbuffer_emit_float(intel->batch,f)
 #define OUT_RELOC(buf, read_domains, write_domain, delta) do {		\
--- a/src/mesa/drivers/dri/intel/intel_blit.c
+++ b/src/mesa/drivers/dri/intel/intel_blit.c
@@ -38,6 +38,8 @@
 #include "intel_reg.h"
 #include "intel_regions.h"
 #include "intel_batchbuffer.h"
+#include "intel_tex.h"
+#include "intel_mipmap_tree.h"

 #define FILE_DEBUG_FLAG DEBUG_BLIT

@@ -107,10 +109,6 @@ intelEmitCopyBlit(struct intel_context *intel,
   drm_intel_bo *aper_array[3];
   BATCH_LOCALS;

-   /* Blits are in a different ringbuffer so we don't use them. */
-   if (intel->gen >= 6)
-      return GL_FALSE;
-
   if (dst_tiling != I915_TILING_NONE) {
      if (dst_offset & 4095)
 	 return GL_FALSE;
@@ -140,7 +138,7 @@ intelEmitCopyBlit(struct intel_context *intel,
   if (pass >= 2)
      return GL_FALSE;

-   intel_batchbuffer_require_space(intel->batch, 8 * 4);
+   intel_batchbuffer_require_space(intel->batch, 8 * 4, true);
   DBG("%s src:buf(%p)/%d+%d %d,%d dst:buf(%p)/%d+%d %d,%d sz:%dx%d\n",
       __FUNCTION__,
       src_buffer, src_pitch, src_offset, src_x, src_y,
@@ -181,7 +179,7 @@ intelEmitCopyBlit(struct intel_context *intel,
   assert(dst_x < dst_x2);
   assert(dst_y < dst_y2);

-   BEGIN_BATCH(8);
+   BEGIN_BATCH_BLT(8);
   OUT_BATCH(CMD);
   OUT_BATCH(BR13 | (uint16_t)dst_pitch);
   OUT_BATCH((dst_y << 16) | dst_x);
@@ -209,7 +207,7 @@ intelEmitCopyBlit(struct intel_context *intel,
 * which we're clearing with triangles.
 * \param mask  bitmask of BUFFER_BIT_* values indicating buffers to clear
 */
-void
+GLbitfield
 intelClearWithBlit(struct gl_context *ctx, GLbitfield mask)
 {
   struct intel_context *intel = intel_context(ctx);
@@ -217,11 +215,9 @@ intelClearWithBlit(struct gl_context *ctx, GLbitfield mask)
   GLuint clear_depth;
   GLboolean all;
   GLint cx, cy, cw, ch;
+   GLbitfield fail_mask = 0;
   BATCH_LOCALS;

-   /* Blits are in a different ringbuffer so we don't use them. */
-   assert(intel->gen < 6);
-
   /*
    * Compute values for clearing the buffers.
    */
@@ -242,7 +238,7 @@ intelClearWithBlit(struct gl_context *ctx, GLbitfield mask)
   ch = fb->_Ymax - fb->_Ymin;

   if (cw == 0 || ch == 0)
-      return;
+      return 0;

   GLuint buf;
   all = (cw == fb->Width && ch == fb->Height);
@@ -338,9 +334,9 @@ intelClearWithBlit(struct gl_context *ctx, GLbitfield mask)
 					clear[3], clear[3]);
 	    break;
 	 default:
-	    _mesa_problem(ctx, "Unexpected renderbuffer format: %d\n",
-			  irb->Base.Format);
-	    clear_val = 0;
+	    fail_mask |= bufBit;
+	    mask &= ~bufBit;
+	    continue;
 	 }
      }

@@ -356,7 +352,7 @@ intelClearWithBlit(struct gl_context *ctx, GLbitfield mask)
 	 intel_batchbuffer_flush(intel->batch);
      }

-      BEGIN_BATCH(6);
+      BEGIN_BATCH_BLT(6);
      OUT_BATCH(CMD);
      OUT_BATCH(BR13);
      OUT_BATCH((y1 << 16) | x1);
@@ -375,6 +371,8 @@ intelClearWithBlit(struct gl_context *ctx, GLbitfield mask)
      else
 	 mask &= ~bufBit;    /* turn off bit, for faster loop exit */
   }
+
+   return fail_mask;
 }

 GLboolean
@@ -393,10 +391,6 @@ intelEmitImmediateColorExpandBlit(struct intel_context *intel,
   int dwords = ALIGN(src_size, 8) / 4;
   uint32_t opcode, br13, blit_cmd;

-   /* Blits are in a different ringbuffer so we don't use them. */
-   if (intel->gen >= 6)
-      return GL_FALSE;
-
   if (dst_tiling != I915_TILING_NONE) {
      if (dst_offset & 4095)
 	 return GL_FALSE;
@@ -420,7 +414,7 @@ intelEmitImmediateColorExpandBlit(struct intel_context *intel,
   intel_batchbuffer_require_space( intel->batch,
 				    (8 * 4) +
 				    (3 * 4) +
-				    dwords * 4 );
+				    dwords * 4, true);

   opcode = XY_SETUP_BLT_CMD;
   if (cpp == 4)
@@ -439,7 +433,7 @@ intelEmitImmediateColorExpandBlit(struct intel_context *intel,
   if (dst_tiling != I915_TILING_NONE)
      blit_cmd |= XY_DST_TILED;

-   BEGIN_BATCH(8 + 3);
+   BEGIN_BATCH_BLT(8 + 3);
   OUT_BATCH(opcode);
   OUT_BATCH(br13);
   OUT_BATCH((0 << 16) | 0); /* clip x1, y1 */
@@ -456,9 +450,9 @@ intelEmitImmediateColorExpandBlit(struct intel_context *intel,
   OUT_BATCH(((y + h) << 16) | (x + w));
   ADVANCE_BATCH();

-   intel_batchbuffer_data( intel->batch,
-			   src_bits,
-			   dwords * 4 );
+   intel_batchbuffer_data(intel->batch,
+			  src_bits,
+			  dwords * 4, true);

   intel_batchbuffer_emit_mi_flush(intel->batch);

@@ -480,9 +474,6 @@ intel_emit_linear_blit(struct intel_context *intel,
   GLuint pitch, height;
   GLboolean ok;

-   /* Blits are in a different ringbuffer so we don't use them. */
-   assert(intel->gen < 6);
-
   /* The pitch given to the GPU must be DWORD aligned, and
    * we want width to match pitch. Max width is (1 << 15 - 1),
    * rounding that down to the nearest DWORD is 1 << 15 - 4
@@ -514,3 +505,81 @@ intel_emit_linear_blit(struct intel_context *intel,
      assert(ok);
   }
 }
+
+/**
+ * Used to initialize the alpha value of an ARGB8888 teximage after
+ * loading it from an XRGB8888 source.
+ *
+ * This is very common with glCopyTexImage2D().
+ */
+void
+intel_set_teximage_alpha_to_one(struct gl_context *ctx,
+				struct intel_texture_image *intel_image)
+{
+   struct intel_context *intel = intel_context(ctx);
+   unsigned int image_x, image_y;
+   uint32_t x1, y1, x2, y2;
+   uint32_t BR13, CMD;
+   int pitch, cpp;
+   drm_intel_bo *aper_array[2];
+   struct intel_region *region = intel_image->mt->region;
+   BATCH_LOCALS;
+
+   assert(intel_image->base.TexFormat == MESA_FORMAT_ARGB8888);
+
+   /* get dest x/y in destination texture */
+   intel_miptree_get_image_offset(intel_image->mt,
+				  intel_image->level,
+				  intel_image->face,
+				  0,
+				  &image_x, &image_y);
+
+   x1 = image_x;
+   y1 = image_y;
+   x2 = image_x + intel_image->base.Width;
+   y2 = image_y + intel_image->base.Height;
+
+   pitch = region->pitch;
+   cpp = region->cpp;
+
+   DBG("%s dst:buf(%p)/%d %d,%d sz:%dx%d\n",
+       __FUNCTION__,
+       intel_image->mt->region->buffer, (pitch * region->cpp),
+       x1, y1, x2 - x1, y2 - y1);
+
+   BR13 = br13_for_cpp(region->cpp) | 0xf0 << 16;
+   CMD = XY_COLOR_BLT_CMD;
+   CMD |= XY_BLT_WRITE_ALPHA;
+
+   assert(region->tiling != I915_TILING_Y);
+
+#ifndef I915
+   if (region->tiling != I915_TILING_NONE) {
+      CMD |= XY_DST_TILED;
+      pitch /= 4;
+   }
+#endif
+   BR13 |= (pitch * region->cpp);
+
+   /* do space check before going any further */
+   aper_array[0] = intel->batch->buf;
+   aper_array[1] = region->buffer;
+
+   if (drm_intel_bufmgr_check_aperture_space(aper_array,
+					     ARRAY_SIZE(aper_array)) != 0) {
+      intel_batchbuffer_flush(intel->batch);
+   }
+
+   BEGIN_BATCH_BLT(6);
+   OUT_BATCH(CMD);
+   OUT_BATCH(BR13);
+   OUT_BATCH((y1 << 16) | x1);
+   OUT_BATCH((y2 << 16) | x2);
+   OUT_RELOC_FENCED(region->buffer,
+		    I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER,
+		    0);
+   OUT_BATCH(0xffffffff); /* white, but only alpha gets written */
+   ADVANCE_BATCH();
+
+   intel_batchbuffer_emit_mi_flush(intel->batch);
+}
--- a/src/mesa/drivers/dri/intel/intel_blit.h
+++ b/src/mesa/drivers/dri/intel/intel_blit.h
@@ -33,7 +33,7 @@
 extern void intelCopyBuffer(const __DRIdrawable * dpriv,
                            const drm_clip_rect_t * rect);

-extern void intelClearWithBlit(struct gl_context * ctx, GLbitfield mask);
+extern GLbitfield intelClearWithBlit(struct gl_context * ctx, GLbitfield mask);

 GLboolean
 intelEmitCopyBlit(struct intel_context *intel,
@@ -69,5 +69,7 @@ void intel_emit_linear_blit(struct intel_context *intel,
 			    drm_intel_bo *src_bo,
 			    unsigned int src_offset,
 			    unsigned int size);
+void intel_set_teximage_alpha_to_one(struct gl_context *ctx,
+				     struct intel_texture_image *intel_image);

 #endif
--- a/src/mesa/drivers/dri/intel/intel_clear.c
+++ b/src/mesa/drivers/dri/intel/intel_clear.c
@@ -85,6 +85,8 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
   GLbitfield blit_mask = 0;
   GLbitfield swrast_mask = 0;
   struct gl_framebuffer *fb = ctx->DrawBuffer;
+   struct intel_renderbuffer *irb;
+   int i;

   if (mask & (BUFFER_BIT_FRONT_LEFT | BUFFER_BIT_FRONT_RIGHT)) {
      intel->front_buffer_dirty = GL_TRUE;
@@ -93,6 +95,22 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
   if (0)
      fprintf(stderr, "%s\n", __FUNCTION__);

+   /* Get SW clears out of the way: Anything without an intel_renderbuffer */
+   for (i = 0; i < BUFFER_COUNT; i++) {
+      if (!(mask & (1 << i)))
+	 continue;
+
+      irb = intel_get_renderbuffer(fb, i);
+      if (unlikely(!irb)) {
+	 swrast_mask |= (1 << i);
+	 mask &= ~(1 << i);
+      }
+   }
+   if (unlikely(swrast_mask)) {
+      debug_mask("swrast", swrast_mask);
+      _swrast_Clear(ctx, swrast_mask);
+   }
+
   /* HW color buffers (front, back, aux, generic FBO, etc) */
   if (colorMask == ~0) {
      /* clear all R,G,B,A */
@@ -151,44 +169,18 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
      }
   }

-   if (intel->gen >= 6) {
-      /* Blits are in a different ringbuffer so we don't use them. */
-      tri_mask |= blit_mask;
-      blit_mask = 0;
-   }
-
-   /* SW fallback clearing */
-   swrast_mask = mask & ~tri_mask & ~blit_mask;
-
-   {
-      /* look for non-Intel renderbuffers (clear them with swrast) */
-      GLbitfield blit_or_tri = blit_mask | tri_mask;
-      while (blit_or_tri) {
-         GLuint i = _mesa_ffs(blit_or_tri) - 1;
-         GLbitfield bufBit = 1 << i;
-         if (!fb->Attachment[i].Renderbuffer->ClassID) {
-            blit_mask &= ~bufBit;
-            tri_mask &= ~bufBit;
-            swrast_mask |= bufBit;
-         }
-         blit_or_tri ^= bufBit;
-      }
-   }
+   /* Anything left, just use tris */
+   tri_mask |= mask & ~blit_mask;

   if (blit_mask) {
      debug_mask("blit", blit_mask);
-      intelClearWithBlit(ctx, blit_mask);
+      tri_mask |= intelClearWithBlit(ctx, blit_mask);
   }

   if (tri_mask) {
      debug_mask("tri", tri_mask);
      _mesa_meta_Clear(&intel->ctx, tri_mask);
   }
-
-   if (swrast_mask) {
-      debug_mask("swrast", swrast_mask);
-      _swrast_Clear(ctx, swrast_mask);
-   }
 }


--- a/src/mesa/drivers/dri/intel/intel_context.c
+++ b/src/mesa/drivers/dri/intel/intel_context.c
@@ -565,7 +565,8 @@ intel_glFlush(struct gl_context *ctx)

   intel_flush(ctx);
   intel_flush_front(ctx);
-   intel->need_throttle = GL_TRUE;
+   if (intel->is_front_buffer_rendering)
+      intel->need_throttle = GL_TRUE;
 }

 void
--- a/src/mesa/drivers/dri/intel/intel_context.h
+++ b/src/mesa/drivers/dri/intel/intel_context.h
@@ -29,7 +29,7 @@
 #define INTELCONTEXT_INC


-
+#include <stdbool.h>
 #include "main/mtypes.h"
 #include "main/mm.h"
 #include "dri_metaops.h"
--- a/src/mesa/drivers/dri/intel/intel_extensions_es2.c
+++ b/src/mesa/drivers/dri/intel/intel_extensions_es2.c
@@ -62,6 +62,7 @@ static const char *es2_extensions[] = {
   "GL_EXT_blend_minmax",
   "GL_EXT_blend_subtract",
   "GL_EXT_stencil_wrap",
+   "GL_NV_blend_square",

   /* Optional GLES2 */
   "GL_ARB_framebuffer_object",
--- a/src/mesa/drivers/dri/intel/intel_fbo.c
+++ b/src/mesa/drivers/dri/intel/intel_fbo.c
@@ -42,6 +42,8 @@
 #include "intel_fbo.h"
 #include "intel_mipmap_tree.h"
 #include "intel_regions.h"
+#include "intel_tex.h"
+#include "intel_span.h"

 #define FILE_DEBUG_FLAG DEBUG_FBO

@@ -110,26 +112,21 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
   case GL_RED:
   case GL_R8:
      rb->Format = MESA_FORMAT_R8;
-      rb->DataType = GL_UNSIGNED_BYTE;
      break;
   case GL_R16:
      rb->Format = MESA_FORMAT_R16;
-      rb->DataType = GL_UNSIGNED_SHORT;
      break;
   case GL_RG:
   case GL_RG8:
      rb->Format = MESA_FORMAT_RG88;
-      rb->DataType = GL_UNSIGNED_BYTE;
      break;
   case GL_RG16:
      rb->Format = MESA_FORMAT_RG1616;
-      rb->DataType = GL_UNSIGNED_SHORT;
      break;
   case GL_R3_G3_B2:
   case GL_RGB4:
   case GL_RGB5:
      rb->Format = MESA_FORMAT_RGB565;
-      rb->DataType = GL_UNSIGNED_BYTE;
      break;
   case GL_RGB:
   case GL_RGB8:
@@ -137,7 +134,6 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
   case GL_RGB12:
   case GL_RGB16:
      rb->Format = MESA_FORMAT_XRGB8888;
-      rb->DataType = GL_UNSIGNED_BYTE;
      break;
   case GL_RGBA:
   case GL_RGBA2:
@@ -148,16 +144,13 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
   case GL_RGBA12:
   case GL_RGBA16:
      rb->Format = MESA_FORMAT_ARGB8888;
-      rb->DataType = GL_UNSIGNED_BYTE;
      break;
   case GL_ALPHA:
   case GL_ALPHA8:
      rb->Format = MESA_FORMAT_A8;
-      rb->DataType = GL_UNSIGNED_BYTE;
      break;
   case GL_DEPTH_COMPONENT16:
      rb->Format = MESA_FORMAT_Z16;
-      rb->DataType = GL_UNSIGNED_SHORT;
      break;
   case GL_STENCIL_INDEX:
   case GL_STENCIL_INDEX1_EXT:
@@ -171,7 +164,6 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
   case GL_DEPTH24_STENCIL8_EXT:
      /* alloc a depth+stencil buffer */
      rb->Format = MESA_FORMAT_S8_Z24;
-      rb->DataType = GL_UNSIGNED_INT_24_8_EXT;
      break;
   default:
      _mesa_problem(ctx,
@@ -180,6 +172,7 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
   }

   rb->_BaseFormat = _mesa_base_fbo_format(ctx, internalFormat);
+   rb->DataType = intel_mesa_format_to_rb_datatype(rb->Format);
   cpp = _mesa_get_format_bytes(rb->Format);

   intel_flush(ctx);
@@ -338,39 +331,30 @@ intel_create_renderbuffer(gl_format format)
   switch (format) {
   case MESA_FORMAT_RGB565:
      irb->Base._BaseFormat = GL_RGB;
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
      break;
   case MESA_FORMAT_XRGB8888:
      irb->Base._BaseFormat = GL_RGB;
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
      break;
   case MESA_FORMAT_ARGB8888:
      irb->Base._BaseFormat = GL_RGBA;
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
      break;
   case MESA_FORMAT_Z16:
      irb->Base._BaseFormat = GL_DEPTH_COMPONENT;
-      irb->Base.DataType = GL_UNSIGNED_SHORT;
      break;
   case MESA_FORMAT_X8_Z24:
      irb->Base._BaseFormat = GL_DEPTH_COMPONENT;
-      irb->Base.DataType = GL_UNSIGNED_INT;
      break;
   case MESA_FORMAT_S8_Z24:
      irb->Base._BaseFormat = GL_DEPTH_STENCIL;
-      irb->Base.DataType = GL_UNSIGNED_INT_24_8_EXT;
      break;
   case MESA_FORMAT_A8:
      irb->Base._BaseFormat = GL_ALPHA;
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
      break;
   case MESA_FORMAT_R8:
      irb->Base._BaseFormat = GL_RED;
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
      break;
   case MESA_FORMAT_RG88:
      irb->Base._BaseFormat = GL_RG;
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
      break;
   default:
      _mesa_problem(NULL,
@@ -381,6 +365,7 @@ intel_create_renderbuffer(gl_format format)

   irb->Base.Format = format;
   irb->Base.InternalFormat = irb->Base._BaseFormat;
+   irb->Base.DataType = intel_mesa_format_to_rb_datatype(format);

   /* intel-specific methods */
   irb->Base.Delete = intel_delete_renderbuffer;
@@ -457,70 +442,16 @@ static GLboolean
 intel_update_wrapper(struct gl_context *ctx, struct intel_renderbuffer *irb, 
 		     struct gl_texture_image *texImage)
 {
-   if (texImage->TexFormat == MESA_FORMAT_ARGB8888) {
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
-      DBG("Render to RGBA8 texture OK\n");
-   }
-   else if (texImage->TexFormat == MESA_FORMAT_XRGB8888) {
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
-      DBG("Render to XGBA8 texture OK\n");
-   }
-#ifndef I915
-   else if (texImage->TexFormat == MESA_FORMAT_SARGB8) {
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
-      DBG("Render to SARGB8 texture OK\n");
-   }
-#endif
-   else if (texImage->TexFormat == MESA_FORMAT_RGB565) {
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
-      DBG("Render to RGB5 texture OK\n");
-   }
-   else if (texImage->TexFormat == MESA_FORMAT_ARGB1555) {
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
-      DBG("Render to ARGB1555 texture OK\n");
-   }
-   else if (texImage->TexFormat == MESA_FORMAT_ARGB4444) {
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
-      DBG("Render to ARGB4444 texture OK\n");
-   }
-#ifndef I915
-   else if (texImage->TexFormat == MESA_FORMAT_A8) {
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
-      DBG("Render to A8 texture OK\n");
-   }
-   else if (texImage->TexFormat == MESA_FORMAT_R8) {
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
-      DBG("Render to R8 texture OK\n");
-   }
-   else if (texImage->TexFormat == MESA_FORMAT_RG88) {
-      irb->Base.DataType = GL_UNSIGNED_BYTE;
-      DBG("Render to RG88 texture OK\n");
-   }
-   else if (texImage->TexFormat == MESA_FORMAT_R16) {
-      irb->Base.DataType = GL_UNSIGNED_SHORT;
-      DBG("Render to R8 texture OK\n");
-   }
-   else if (texImage->TexFormat == MESA_FORMAT_RG1616) {
-      irb->Base.DataType = GL_UNSIGNED_SHORT;
-      DBG("Render to RG88 texture OK\n");
-   }
-#endif
-   else if (texImage->TexFormat == MESA_FORMAT_Z16) {
-      irb->Base.DataType = GL_UNSIGNED_SHORT;
-      DBG("Render to DEPTH16 texture OK\n");
-   }
-   else if (texImage->TexFormat == MESA_FORMAT_S8_Z24) {
-      irb->Base.DataType = GL_UNSIGNED_INT_24_8_EXT;
-      DBG("Render to DEPTH_STENCIL texture OK\n");
-   }
-   else {
+   if (!intel_span_supports_format(texImage->TexFormat)) {
      DBG("Render to texture BAD FORMAT %s\n",
 	  _mesa_get_format_name(texImage->TexFormat));
      return GL_FALSE;
+   } else {
+      DBG("Render to texture %s\n", _mesa_get_format_name(texImage->TexFormat));
   }

   irb->Base.Format = texImage->TexFormat;
-
+   irb->Base.DataType = intel_mesa_format_to_rb_datatype(texImage->TexFormat);
   irb->Base.InternalFormat = texImage->InternalFormat;
   irb->Base._BaseFormat = _mesa_base_fbo_format(ctx, irb->Base.InternalFormat);
   irb->Base.Width = texImage->Width;
@@ -659,7 +590,8 @@ intel_finish_render_texture(struct gl_context * ctx,
       _glthread_GetID(), att->Texture->Name);

   /* Flag that this image may now be validated into the object's miptree. */
-   intel_image->used_as_render_target = GL_FALSE;
+   if (intel_image)
+      intel_image->used_as_render_target = GL_FALSE;

   /* Since we've (probably) rendered to the texture and will (likely) use
    * it in the texture domain later on in this batchbuffer, flush the
@@ -682,10 +614,10 @@ intel_validate_framebuffer(struct gl_context *ctx, struct gl_framebuffer *fb)
   int i;

   if (depthRb && stencilRb && stencilRb != depthRb) {
-      if (ctx->DrawBuffer->Attachment[BUFFER_DEPTH].Type == GL_TEXTURE &&
-	  ctx->DrawBuffer->Attachment[BUFFER_STENCIL].Type == GL_TEXTURE &&
-	  (ctx->DrawBuffer->Attachment[BUFFER_DEPTH].Texture->Name ==
-	   ctx->DrawBuffer->Attachment[BUFFER_STENCIL].Texture->Name)) {
+      if (fb->Attachment[BUFFER_DEPTH].Type == GL_TEXTURE &&
+	  fb->Attachment[BUFFER_STENCIL].Type == GL_TEXTURE &&
+	  (fb->Attachment[BUFFER_DEPTH].Texture->Name ==
+	   fb->Attachment[BUFFER_STENCIL].Texture->Name)) {
 	 /* OK */
      } else {
 	 /* we only support combined depth/stencil buffers, not separate
@@ -698,35 +630,34 @@ intel_validate_framebuffer(struct gl_context *ctx, struct gl_framebuffer *fb)
      }
   }

-   for (i = 0; i < ctx->Const.MaxDrawBuffers; i++) {
-      struct gl_renderbuffer *rb = ctx->DrawBuffer->_ColorDrawBuffers[i];
-      struct intel_renderbuffer *irb = intel_renderbuffer(rb);
+   for (i = 0; i < Elements(fb->Attachment); i++) {
+      struct gl_renderbuffer *rb;
+      struct intel_renderbuffer *irb;

-      if (rb == NULL)
+      if (fb->Attachment[i].Type == GL_NONE)
 	 continue;

+      /* A supported attachment will have a Renderbuffer set either
+       * from being a Renderbuffer or being a texture that got the
+       * intel_wrap_texture() treatment.
+       */
+      rb = fb->Attachment[i].Renderbuffer;
+      if (rb == NULL) {
+	 DBG("attachment without renderbuffer\n");
+	 fb->_Status = GL_FRAMEBUFFER_UNSUPPORTED_EXT;
+	 continue;
+      }
+
+      irb = intel_renderbuffer(rb);
      if (irb == NULL) {
 	 DBG("software rendering renderbuffer\n");
 	 fb->_Status = GL_FRAMEBUFFER_UNSUPPORTED_EXT;
 	 continue;
      }

-      switch (irb->Base.Format) {
-      case MESA_FORMAT_ARGB8888:
-      case MESA_FORMAT_XRGB8888:
-      case MESA_FORMAT_RGB565:
-      case MESA_FORMAT_ARGB1555:
-      case MESA_FORMAT_ARGB4444:
-#ifndef I915
-      case MESA_FORMAT_SARGB8:
-      case MESA_FORMAT_A8:
-      case MESA_FORMAT_R8:
-      case MESA_FORMAT_R16:
-      case MESA_FORMAT_RG88:
-      case MESA_FORMAT_RG1616:
-#endif
-	 break;
-      default:
+      if (!intel_span_supports_format(irb->Base.Format)) {
+	 DBG("Unsupported texture/renderbuffer format attached: %s\n",
+	     _mesa_get_format_name(irb->Base.Format));
 	 fb->_Status = GL_FRAMEBUFFER_UNSUPPORTED_EXT;
      }
   }
--- a/src/mesa/drivers/dri/intel/intel_reg.h
+++ b/src/mesa/drivers/dri/intel/intel_reg.h
@@ -37,6 +37,8 @@
 #define FLUSH_MAP_CACHE				(1 << 0)
 #define INHIBIT_FLUSH_RENDER_CACHE		(1 << 2)

+#define MI_FLUSH_DW			(CMD_MI | (0x26 << 23) | 2)
+
 /* Stalls command execution waiting for the given events to have occurred. */
 #define MI_WAIT_FOR_EVENT               (CMD_MI | (0x3 << 23))
 #define MI_WAIT_FOR_PLANE_B_FLIP        (1<<6)
--- a/Show More
+++ b/Show More