Wednesday, October 7, 2009

Geometry Homework

Drawing arcs is a common task in computer graphics. Arcs are typically drawn using an approximation by set of cubic Bezier splines. These splines are passed through the regular rendering pipeline where they are usually subdivided into a set of line segments and then rasterized. A circle, for example, can be represented by four cubic splines: one for each 90 degree arc.

An arc of up to 90 degrees can be approximated by a single cubic Bezier spline reasonably well. To find an appropriate approximating spline we set the start and end points of the spline to match the arc and then can find the middle two points using the following formula, where A is the start angle and B is the end angle:

h = 4/3 * tan ((B - A)/4)

pt1 = {center.x + r*cos(A), center.y + r*sin(A)}
pt2 = {center.x + r*cos(A) - h*r*sin(A), center.y + r*sin(A) + h*r*cos(A)}
pt3 = {center.x + r*cos(A) + h*r*sin(B), center.y + r*sin(A) - h*r*cos(B)}
pt4 = {center.x + r*cos(B), center.y + r*sin(B)}

This approach works quite well and is currently used by cairo. However, the problem with this approach is that it requires converting to polar coordinates and then back to Cartesian ones. This conversion is slow because it requires the use of trigonometric functions.

Fortunately, this paper gives a solution that avoids the conversion to polar coordinates:

ax = pt1.x - center.x
ay = pt1.y - center.y
bx = pt4.x - center.x
by = pt4.y - center.y

q1 = ax*ax + ay*ay
q2 = q1 + ax*bx + ay*by

k2 = 4/3 (sqrt(2*q1*q2) - q2) / (ax*by - ay*bx)

pt2.x = pt1.x - k2*ay
pt2.y = pt1.y + k2*ax
pt3.x = pt4.x + k2*bx
pt3.y = pt4.y - k2*by

Unfortunately, there is no explanation, derivation or proof for this formula. Even more problematic, the formula for k2 becomes less stable as pt1 approaches pt4, becoming NaN when pt1 equals pt4.

Therefore, the homework problem has two parts:

1. Give an explanation or derivation for the formula for k2 provided above.
2. Provide a similar formulation for pt2 and pt3 that doesn't degenerate as pt1 and pt4 become close or prove that one doesn't exist.

The best answer will be credited in the new cairo stroking code.

Friday, October 2, 2009

qcms — now faster

Thanks to some optimization work by Steve Snyder, qcms is even faster than before.

What follows is a chart with the new performance numbers:The benchmark is the same as last time but run on a slightly slower computer using OS X v10.6 instead of 10.5. As the chart shows, the new qcms code is more than twice as fast as the previous code. In addition to the performance improvement, the new code includes a version that only uses SSE1 instructions. This will be especially helpful for those with older computers where the time spent doing color correction isn't as negligible as it is on faster computers.

When running this benchmark again, I noticed that the performance of lcms had drastically improved since the last time I had run the benchmark. Why was lcms so much faster on 10.6? What had changed? The default architecture target: in OS X 10.6, the compiler builds 64 bit binaries by default. Still, it was surprising that compiling for 64 bit could nearly double the performance.

The large difference, it turns out, can largely be attributed to the MAT3evalW1 function. This function multiplies a 1×3 matrix with a 3×3 one using 9 32×32→64 multiplications. GCC can usually optimize these multiplications by using the 32×32→64 multiply instructions, however that wasn't happening in 32 bit mode. Instead of the expected 9 multiplies, we get 18 multiplies and a bunch of housekeeping work, likely caused by the 64 bit additions and additional register pressure. In 64 bit mode, however, we get the code that you'd expect. This only takes 38 instructions versus the 169 instructions the 32 bit build uses. With a difference like that in the inner loop, it's easy to see why the 64 bit build is so much faster.

1. MAT3evalW has a handwritten assembly version that should be faster than the one that GCC generates, unfortunately it is MSVC only.

Tuesday, June 2, 2009

qcms - color management for the web

qcms is a brand-new color management system replacing Mozilla's previous solution, lcms. The browser's color management system allows us to use data in photos on the web and data about your monitor to make sure that colors look the same everywhere. This helps photographers and site designers control the appearance of their images more precisely, and makes the web a prettier place.

Normally, rewriting a large chunk of code from scratch is a bad idea. However, we were in a situation that made rewriting pretty attractive. First, we were only using a small portion of lcms's functionality, but we were paying for the large code base in both maintainability and threat surface. Second, we were already maintaining substantial modifications which was an additional maintenance burden. Finally, our requirements for a color management library are different then the goals of lcms: we need something fast and secure, while lcms seems more focused on functionality, completeness and correctness.

What's new?

qcms is made up of two main parts: the ICC profile parser and the transformation engine. The transformation engine reuses a lot of code from lcms. The ICC profile parser is completely new and written with security and robustness in mind.

The new parser has some key design differences compared to lcms. One of these is the I/O model. lcms has an I/O abstraction layer that abstracts reading from memory and reading from a file. It uses this layer to read what it needs from the profile as it's needed. This means that the full file or memory copy of the profile needs to be kept around for the lifetime of the profile object.1

qcms uses a simpler model. The entire profile is read into memory and then parsed to construct a profile object. We only keep the information that's needed to construct a transformation between color spaces. After parsing we can close the file or free the memory used to store the profile. Furthermore, since we only parse in-memory, we also don't have to deal with any possible read errors or other file I/O problems during parsing.

Error handling during parsing can be tricky and ridden with security holes. To help deal with this problem, qcms adopts an error handling strategy similar to cairo. Instead of trying to deal with the error immediately and returning an error result up the call stack, we often set a flag to note the brokenness, putting us into an error state, and continue on. To continue successfully, all of the following operations must be completable even while in an error state. However, ensuring this is usually easy, especially if the results will be discarded. When we do get to a place where it's convenient to return, we do so. The big advantage to this approach is that it keeps the error state control flow as similar to the normal control flow as possible. This makes the code easier to read, easier to reason about, and easier to test.

Speed

Last summer, Bobby Holley did a bunch of work to make lcms faster. I was able to reuse that work in qcms. The result is that qcms is one of the fastest color management systems around. Here's a simple test that transforms all the possible RGB components from one RGB colorspace to another. It compares lcms, qcms and ColorSync, the system color management system on OS X.

The tests are here and here.

Current Limitations

qcms currently only supports transformations to and from RGB colorspaces. This covers the vast majority of uses on web; however, it means there's no support for CMYK and many additional profile types. If you are interested in making this code better, let me know!

Conclusion

In the end, we have a library about a tenth of the size and 5 times as fast than what we had previously. I've put the code up at git://git.freedesktop.org/~jrmuizel/qcms. It's portable C and licensed under the same license as lcms so it's pretty easy to change from one api to another. Hopefully other projects can find it useful. Finally, a big thanks to Marti Maria for lcms, without which qcms wouldn't been possible.