Wednesday, October 19, 2011

Moving patches between git and hg

Moving patches between git and hg is currently not very easy. I found script that converts in one direction and I added script that goes in the other direction. The scripts are available here: Hopefully, this will make it a bit easier.

Thursday, June 16, 2011

WebGL considered harmful?

Today Microsoft posted an article titled "WebGL considered harmful". It seems like a lot of their arguments against WebGL also apply to Silverlight 5's XNA 3D graphics support. It, like WebGL, allows authors to write shaders using HLSL. I wonder, if you reframe their article by replacing WebGL with Silverlight 5, is anything untrue? If so, how does Microsoft solve these problems?

Silverlight XNA 3D considered harmful

Microsoft's Silverlight 5 XNA 3D technology is a low-level 3D graphics API for the web.

One of the functions of MSRC Engineering is to analyze various technologies in order to understand how they can potentially affect Microsoft products and customers. As part of this charter, we recently took a look at XNA 3D. Our analysis has led us to conclude that Microsoft products supporting XNA 3D would have difficulty passing Microsoft’s Security Development Lifecycle requirements. Some key concerns include:
  • Browser support for Silverlight 5 directly exposes hardware functionality to the web in a way that we consider to be overly permissive
    The security of Silverlight 5 as a whole depends on lower levels of the system, including OEM drivers, upholding security guarantees they never really need to worry about before. Attacks that may have previously resulted only in local elevation of privilege may now result in remote compromise. While it may be possible to mitigate these risks to some extent, the large attack surface exposed by Silverlight 5 remains a concern. We expect to see bugs that exist only on certain platforms or with certain video cards, potentially facilitating targeted attacks.

  • Browser support for Silverlight 5 security servicing responsibility relies too heavily on third parties to secure the web experience
    As Silverlight 5 vulnerabilities are uncovered, they will not always manifest in the Silverlight 5 API itself. The problems may exist in the various OEM and system components delivered by IHV’s. While it has been suggested that Silverlight 5 implementations may block the use of affected hardware configurations, this strategy does not seem to have been successfully put into use to address existing vulnerabilities.
It is our belief that as configurations are blocked, increasing levels of customer disruption may occur. Without an efficient security servicing model for video card drivers (eg: Windows Update), users may either choose to override the protection in order to use Silverlight 5 on their hardware, or remain insecure if a vulnerable configuration is not properly disabled. Users are not accustomed to ensuring they are up-to-date on the latest graphics card drivers, as would be required for them to have a secure web experience. In some cases where OEM graphics products are included with PCs, retail drivers are blocked from installing. OEMs often only update their drivers once per year, a reality that is just not compatible with the needs of a security update process.

  • Problematic system DoS scenarios
    Modern operating systems and graphics infrastructure were never designed to fully defend against attacker-supplied shaders and geometry. Although mitigations such as Direct3D 10 may help, they have not proven themselves capable of comprehensively addressing the DoS threat. While traditionally client-side DoS is not a high severity threat, if this problem is not addressed holistically it will be possible for any web site to freeze or reboot systems at will. This is an issue for some important usage scenarios such as in critical infrastructure.

We believe that Silverlight 5 will likely become an ongoing source of hard-to-fix vulnerabilities. In its current form, XNA 3D in Silverlight 5 is not a technology Microsoft can endorse from a security perspective.

We recognize the need to provide solutions in this space however it is our goal that all such solutions are secure by design, secure by default, and secure in deployment.

The problems Microsoft is worried about are real, and they don't have any easy solutions. At the same, I don't think we need to wait for perfect answers before trying. With Silverlight 5's 3D support, it looks like Microsoft feels the same way.

Wednesday, April 20, 2011


Overall the reception to WebP that I've seen so far has been pretty negative. Jason Garrett-Glaser wrote a popular review, but there have been similar response from others like Charles Bloom. Since these reviews, the WebP encoder has improved on the example used by Jason (old vs. new) but it's still not a lot better than a decent JPEG encoding. I also have a couple of thoughts on the format that I'd like to share.

Google claims it's better than JPEG but this study has some problems and as a result, isn't very convincing (Update: Google has a new study that's better). First, they recompress existing JPEG's. This is unconventional. Perhaps recompressing JPEG's is their target market, but I find that a little weird and it should at least be explained in the study. Second, they use PSNR as a comparison metric. This is even more confusing. PSNR has, for a while now, been accepted as a poor measure of visual quality and I can't understand why Google continues to use it. I think it would help the format's credibility if Google did a study that used uncompressed source images, SSIM as a metric and provided enough information about the methodology so that others could reproduce their results.

WebP also comes across as half-baked. Currently, it only supports a subset of the features that JPEG has. It lacks support for any color representation other than 4:2:0 YCrCb. JPEG supports 4:4:4 as well as other color representations like CMYK. WebP also seems to lack support for EXIF data and ICC color profiles, both of which have be come quite important for photography. Further, it has yet to include any features missing from JPEG like alpha channel support. These features can still be added, but the longer they remain unspecified, the more difficult it will be to adopt.

JPEG XR provides a good example of what features you'd want from a replacement for JPEG. It has support for an alpha channel and HDR among others. Microsoft has also put in the effort to have it formally standardized. However, it too is not without problems. The compression improvements it claims haven't matched evaluations other parties have done. I don't know enough about JPEG XR to say whether this is because the encoders are bad or because the format is not really that great.

Every image format that becomes “part of the Web platform” exacts a cost for all time: all clients have to support that format forever, and there's also a cost for authors having to choose which format is best for them. This cost is no less for WebP than any other format because progressive decoding requires using a separate library instead of reusing the existing WebM decoder. This gives additional security risk but also eliminates much of the benefit of having bitstream compatibility with WebM. It makes me wonder, why not just change the bitstream so that it's more suitable for a still image codec? Given every format has a cost, if we're going to have a new image format for the Web we really need to make it the best we can make it with today's (royalty-free) technology.

Where does that leave us? WebP gives a subset of JPEG's functionality with more modern compression techniques and no additional IP risk to those already shipping WebM. I'm really not sure it's worth adding a new image format for that. Even if WebP was a clear winner in compression, large image hosts don't seem to care that much about image size. Flickr compresses their images at libjpeg quality of 96 and Facebook at 85: both quite a bit higher than the recommended 75 for “very good quality”. Neither of them optimize the huffman tables, which gives a lossless 4–7% improvement in size. Further, switching to progressive JPEG gives an even larger improvement of 8–20%.

History has shown that adoption of image formats on the internet is slow. JPEG 2000 has mostly failed on the internet. PNG took a very long time, despite having large advantages. I expect that adoption may even be slower now than it was in the past, because there is no driving force. I would also be surprised if Microsoft adopted WebP because of their stance on WebM and their involvment in JPEG XR. Can WebP succeed without being adopted by all of the major web browsers? It's hard to say, but it wouldn't be easy. Personally, I'd rather the effort being spent on WebP be spent on a improved JPEG encoder or even an improved JPEG XR encoder.

Is JPEG still great? No. Is there a great replacement for it? It doesn't feel like we're there yet.

Wednesday, March 2, 2011

Drawing Sprites: Minimizing draw calls

One reason OpenGL is so fast is that it allows applications to provide large chunks of work to be done in parallel. When drawing sprites with WebGL, it's important to make an effort to take advantage of this by minimizing the number of draw calls. This is true with OpenGL, but even more so with WebGL because each draw call requires extra validation.

Unfortunately, minimizing draw calls isn't always easy. It's often impractical or impossible to draw all your geometry at once because the geometry must share the same texture(s). FishIE used a single sprite from the beginning, which made it easy to draw everything at once. If possible, move as many sprites into the same texture as possible and sort or group sprites using the same texture into a single draw call. It may also be possible to use multi-texturing, but depending on the GPU architecture, this can cause all textures to be read for each sprite which will have dramatic impact on performance because of limitations on texture bandwidth.

The performance difference between drawing sprites individually versus all at once can be pretty big. I made another version of the FishIE demo that draws each sprite individually. This version draws 2000 fish at 10fps on my test system, while the original WebGL FishIE can do 4000 fish at 60fps on the same system. Since the same texture is used for all sprites I did not have to rebind the texture for each sprite; doing so would likely decrease performance further.

Designing an application around these limitation can be tricky, but often the application is in a better position to make compromises or take short cuts than a more general Canvas 2D implementation would be.

Monday, February 28, 2011

Drawing Sprites: Canvas 2D vs. WebGL

Lately I've seen a lot of graphics benchmarks that basically just test image blitting/sprite performance. These include Flying Images, FishIE, Speed Reading and JSGameBench(Update: I just saw the blog post for the WebGL JSGameBench. This further confirms my claim that WebGL is a better way to do sprites). They all try to draw a bunch of images in a short amount of time. They mostly use two techniques: positioned images or canvas' drawImage. Neither of these methods is particularly well suited to this task. Positioned images have typically been used for document layout and the Canvas 2D API was designed as a JavaScript binding to CoreGraphics which owes most of its design to Postscript. Neither were designed for high performance interactive graphics. However, OpenGL, and its web counterpart WebGL, was designed for exactly this.

To show off some of the potential performance difference available, I ported the FishIE benchmark to WebGL. Along the way I discovered some different problems and ways to solve them.

The problem, once the overhead of Canvas 2D is removed, is that FishIE very quickly becomes texture read bound. I noticed that the FishIE sprites have a lot of horizontal padding. This padding was included in the drawImage calls which causes us to do a bunch of texture reads for transparent pixels. Trimming this down a little gave a noticeable framerate boost.

An even bigger cause of texture bandwidth waste is that the demo uses a large sprite to draw a small fish. Fortunately, OpenGL has a great solution to this problem: mipmaps. without mipmapsMipmaps let the GPU use smaller textures when drawing smaller fish, which can dramatically reduce the texture bandwidth required. They also improve the quality of small fish by eliminating the aliasing that occurs when downscaling by large amounts.

Mipmapping is a good example of the flexibility that WebGL allows. Canvas 2D aims to be an easy to use API for drawing pictures, but this ease of use comes at some cost. First, the Canvas 2D implementation has to guess the intents of the author. For example drawImage on OS X does a high quality lanczos down scaling of the image. Direct2D just does a quick bilinear down scale. This makes it difficult for authors to know how fast drawImage will be. Further, because the design of Canvas 2D is inspired by an API for describing print jobs, it's not well suited to reusing data between paints.

Try out the difference with these two modified versions of FishIE:
  1. The original FishIE modified only to allow more fish.
  2. FishIE ported to WebGL.
The method I used to port FishIE to WebGL is pretty straight forward so I expect that any of the other benchmarks listed above could also be easily ported to WebGL.

Pushing the limits

Once the number of fish becomes high enough we run into Javascript performance problems. FishIE has some Javascript problems that make things worse than they need to be. First, it loops over the fish with "for (var fishie in fish) {". This can end up using 10% of the total CPU time. The problem with this code is that converts all of the array indices to strings and then uses those strings to index into the array. It also has the problem that any additional properties added to the array will also show up as index values, which is likely not the intent of the author.

Second, each fish object includes a swim() method. Unfortunately, in the FishIE source swim() is a closure inside the Fish() object. This means that the swim() method is different for each Fish which makes things worse for Javascript engines.

Fixing both of these problems, and making the fish really small lets us get an idea of how many sprites we can actually push around. Here's a final version. If I disable the method jit (bug 637878) and run at an even window size (bug 637894) I can do 60000 fish at 30fps, which I think is pretty impressive compared to the 1000 that the original Microsoft demo does.

Friday, February 18, 2011

Updated mozilla-cvs-history git repo

I recently ran git gc --agressive on the cvs history git repository mentioned here. It's now 543M, down from 986M. I've also uploaded a copy to github.

Thursday, February 10, 2011

Clone timings

Chris Atlee was wondering how clone times differ between git and mercurial so I ran a quick test on a fast linux machine.

$ time git clone git://
real 1m33.478s

$ time git clone mozilla-central moz2
real 0m2.559s

$ time hg clone
real 3m22.510s

$ time hg clone mozilla-central moz2
real 0m20.660s

Wednesday, January 12, 2011

historical mozilla-central git repository

A number of people use git to work with the mozilla hg tree. In the past I've wanted the entire history as a git repo so I converted the old CVS repository to git and put it up on

You can set it up as follows:

git clone
git clone git://

cd mozilla-central/.git/objects/pack
# set up symbol links to cvs-history pack files
ln -s ../../../../mozilla-cvs-history/.git/objects/pack/pack-5b5d604ab48cf7bc2a6b4495292fa8700a987c5f.pack .
ln -s ../../../../mozilla-cvs-history/.git/objects/pack/pack-5b5d604ab48cf7bc2a6b4495292fa8700a987c5f.idx .
cd ../../

# add a graft from the last revision in the mozilla-central repo
# to the first revision in the cvs-history
echo 2514a423aca5d1273a842918589e44038d046a51 3229d5d8b7f8376cfb7936e7be810635a14a486b > info/grafts

Now you have a git repository containing all of the history. You can update the mozilla-central repository as you normally would. The conversion isn't perfect, but it's been good enough to have working blame back into cvs time.

Tuesday, January 11, 2011

Firefox acceleration prefs changing

I just landed a changeset that changes the names of the layer acceleration prefs in Firefox.

The old prefs were:

The new prefs are:

layers.accelerate-all previously defaulted to 'true' on Windows and OS X. Which meant that there was no easy way to force layer acceleration on if your card had been blacklisted for some reason. The new prefs allow the blacklist to be overwritten. The old prefs are not being migrated over to the new names. If you have a problem with the defaults, please file bugs.

Saturday, January 8, 2011

Trying out AVX

Intel's new Sandy Bridge CPUs came out this week and they support a new set of instructions called AVX. The AVX instructions are a much bigger change than the usual SSE revisions in the past few micro-architectures. First of all, they double the 128 bit SSE registers to 256 bits. Second, they introduce an entirely new instruction encoding. The new encoding switches from 2 operand instructions to 3 operand instructions allowing the destination register to be different than the source registers. For example:
  addps r0, r1       # (r0 = r0 + r1)
  vaddps r0, r1, r2  # (r0 = r1 + r2)
This new encoding is not only used for the new 256 bit instructions, but also for the 128 bit AVX versions of all the old SSE instructions. This means that existing SSE code can improved without requiring a switch to 256 bit registers. Finally, AVX introduces some new data movement instructions, which should help improve code efficiency.

I decided to see what kind of performance difference using AVX could make in qcms with minimal effort. If you use SSE compiler intrinsics, like qcms does, switching to AVX is very easy; simply recompile with -mavx. In addition to using -mavx, I also took advantage of some of the new data movement instructions by replacing the following:
  vec_r = _mm_load_ss(r);
  vec_r = _mm_shuffle_ps(vec_r, vec_r, 0);

with the the new vbroadcastss instruction:
  vec_r = _mm_broadcast(r);
Overall, this change reduces the inner loop by 3 instructions.

The performance results were positive, but not what I expected. Here's what the timings were:
SSE2:75798 usecs
AVX (-mavx):69687 usecs
AVX w/ vbroadcastss:72917 usecs
Switching to the AVX encoding improves performance by more than I expected: nearly 10%. But adding the new vbroadcastss instruction, in addition to the AVX encoding, not only doesn't help, but actually makes things worse. I tried analyzing the code with the Intel Architecture Code Analyzer, but the analyzer also thought that using vbroadcastss should be faster. If anyone has any ideas why vbroadcastss would be slower, I'd love to hear them.

Despite this weird performance problem, AVX seems like a good step forward and should provide good opportunities for improving performance beyond what's possible with SSE. For more information, check out this presentation which gives a good overview of how to take advantage AVX.