Monday, June 8, 2015

Intel driver crash of the day

In bug 1170143 we ran into an Intel driver crash when trying to share DXGI_FORMAT_A8_UNORM surfaces. Older Intel drivers crash while opening the texture using OpenShareHandle. The driver successfully opens BGRA surfaces, but not alpha surfaces which we want to use for video playback. Who knows why... Here's a test case.

Monday, June 1, 2015

Direct2D on top of WARP

In Firefox 38 we introduced the use of WARP for software rasterization on Window 7. Early in the release we ran into an issue where using WARP on top of the builtin VGA driver was ridiculously slow. We fixed this by disabling WARP when the the VGA driver was being used, but I was curious how Internet Explorer avoids this issue. One big difference between Firefox and Internet Explorer is that we're not currently using Direct2D on top of WARP where as they are. It turns out the WARP driver has a private API that is used by Direct2D to avoid having to use the regular D3D11 API.

Profiling Internet Explorer shows the following private API's used by Direct2D
d3d10warp.dll!UMDevice::DrawGlyphRun
d3d10warp.dll!UMDevice::AlphaBlt2
d3d10warp.dll!UMDevice::InternalGetDC
d3d10warp.dll!UMDevice::CreateGeometry
d3d10warp.dll!UMDevice::DrawGeometryInternal

It looks like these turn into fairly traditional 2D graphics operations as shown with the following call stack snippets below:

RasterizationStage::Rasterize_TEXT
DrawGlyphRun6x1_B8G8R8A8_SSE<1>
DrawGlyphRun4x4_B8G8R8A8_SSE

RasterizationStage::Rasterize_GEOMETRY
PixelJITRasterizeGeometry
PixelJITGeometryRasterizer::Rasterize
WarpGeometry::Rasterize
CAntialiasedFiller::RasterizeEdges
CAntialiasedFiller::FillEdges
CAntialiasedFiller::GenerateOutput
PixelJITGeometryRasterizer::RasterizeComplexScan
PixelJITGeometryRasterizer::BeginSpan
InitializeEdges
InitializeInactiveArray
QuickSortEdges

This suggests that using Direct2D on top of WARP is more efficient than expected and might actually make more sense than our current strategy of only using WARP for composition.

Monday, March 16, 2015

Performance and feature improvements in Firefox 37 WebGL with D3D11 ANGLE

Firefox 37 adds support for WebGL rendering using D3D11 on Windows. Up till now we were using D3D9 which has very limited support for cross-device synchronization. Without proper synchronization we were forced to wait on the main thread for WebGL content to finish rendering before we could continue script execution. The result of this is that the total frame rendering time would be the sum of the script time and the remaining gpu time.  D3D11 allows us to use a GPU-side read barrier between the main thread and compositing thread. This lets Firefox avoid waiting on the main thread giving improved responsiveness and more time for script execution.

Here's a test program that lets you adjust the GPU and CPU execution times to see how the browser responds. D3D11 WebGL let's you adjust the CPU time up to nearly 15ms without dropping below 60fps.

D3D11 support also lets us expose the WEBGL_draw_buffers extension which allows drawing to multiple output buffers at the same time, functionality that's very helpful for implementing deferred renderers.

Give D3D11 WebGL support a try in Firefox Beta today and let us know how it works.


Thursday, August 23, 2012

The system works

Over the last few months a number of us at Mozilla having been working on a profiler built into Firefox. One of the goals of this profiler is to make it as easy as possible to profile anywhere. Yesterday we had a satisfying realization of this goal.

It all started with Taras' Snappy #36 update. In a comment, a user going by kamulos reported a recent problem about laggy tab switches. kamulos posted the profile and Benoit Girard filed bug 784756. The profile showed us spending a bunch of time in TimeStamp::Now() during image decode. I wasn't particularly surprised by this because our TimeStamp::Now() implementation on Windows is not particularly fast. Ehsan and I went away and put some effort into improving the performance and have some good candidates in bug 784859. In the mean time, Robert Lickenbrock discovered that problem was recently introduced by bug 685516, which unintentionally caused a fixed time delay where we called TimeStamp::Now() in a loop. He has since posted a patch that fixes the problem.

Here we have two community members helping uncover a problem within a week of it landing, a problem that could have otherwise gone undetected for a long time. This is a great example of an open source community working beautifully.

Friday, July 13, 2012

What happens when you switch to a Gmail tab on OS X

What follows is a brief walk through of what happens when you switch to a Gmail tab. You can follow along in the profile.

The process starts with [GeckoNSApplication sendEvent:] for the mouse event. This travels on down to nsXBLEventHander::HandleEvent(). From there, we call into the JS, specifically onxblmousedown() in tabbox.xml(). This eventually calls into set__selected() and set_selectedPanel(). set_selectedIndex() calls onselect() in browser.xul which ends up taking about 14ms. During onselect() we spend 4ms decoding an image, 3 ms in callProgressListeners() and 3ms in GetBoundingClientRect(). The whole process of handling the click event takes about 15ms.

After that we spend 6ms handling some events. Among these are a RefreshDriver tick and a toolkit paint. Afterwards we wait for 12ms.

33 ms after the original click we start the painting process. First we do a [NSView viewWillDraw] which calls in PresShell::WillPaint() and takes 3ms. Finally we start the actual 85 ms paint in PresShell::Paint().

Here's the breakdown of what we doing during paint. Of the 85 ms, 81 ms is in LayerManagerOGL::Render(). 5ms of that is clearing the surface in BasicBufferOGL::BeginPaint(), 11 ms is texture upload which does a useless format conversion (bug 613046). In between these two is 58ms of DrawThebesLayer() of which 39 ms is BasicLayerManager::EndTransactionInternal doing composition. A lot of this seems to be vm badness caused by cairo/CoreGraphics and it's weird copy-on-write semantics. The rest of the time is 6ms in nsDisplayText::Paint, 5 ms in nsDisplayBackground::Paint, 4 ms in nsDisplayBorderBackground::Paint, and 3ms in nsDisplayBorder::Paint(). Unfortunately, of the 85 ms only 18ms of the time is painting display items and of the 18ms less than half of that is actual painting operations inside of CoreGraphics.

Shortly after PresShell::Paint() the new content is displayed on the screen and we run a couple more different events and a garbage collection.  And that's what happens during the 130ms switch to a Gmail tab.

Friday, June 22, 2012

Resizable windows in Ubuntu

By default, Ubuntu ships with window resizers that are very small and difficult to hit exactly with the mouse. This is made worse by the fact the resize cursor jumps to a different location. You can fix this by switching to the High Contrast theme. This adds a visible resizer to some windows. It does make the rest of the UI look terrible, but that's a price I'm willing to pay to be able to resize my terminals.

Tuesday, April 24, 2012

Azure canvas on OS X

Firefox 12 is the first release that we use the new CoreGraphics backend for canvas. This brings a host of performance improvements, that largely come from removing overhead and semantic mismatches between HTML canvas and CoreGraphics.

Here are some examples:
GUIMark2 vector test 6.29 fps to 6.63 fps
GUIMark2 Bitmap From 17.62 fps to 22.9 fps
Fish IE goes from a high quality but embarrassing 7fps with 10 fish to 48fps with 250 fish.