A blog article recently caught my eye over at http://www.simppa.fi on z-sorting that i simply couldn’t resist to pick up on. Simo is someone that has great claims to geek-worship – being one of the awesome team that make up evoflash and having a hand in their latest Severity Of Grey demo, while also progressing with some amazing 3D and particle demos in recent blog posts.

The latest post is about sorting in AS3, and specifically, z-sorting objects that are being drawn in a 3D scene. Having done a fair amount of research on the topic myself while putting together Away3D Lite, I was very interested in seeing an adapted Insertion Sort method that had essentially been laid down gauntlet-style as ‘The fastest way to z-sort and handle objects in AS3′. How could i resist?

A sorting algorithm in its most basic form is a function for sorting a list of objects by ordering them in the ascending or descending order of a specified property. When dealing with 3D in Flash, sorting can become one of the most processor intensive operations because it isn’t done by default. Flash happens to have its own sorting functions for Arrays and Vectors, but by and large these aren’t that fast, so coding something faster in AS3 is a realistic proposition. The question is, which algorithm do you use?

If you feel like you’ve missed the sorting-algo boat then fear not! Wikipedia will give you a good grounding in various methods for sorting lists. The Insertion Sort method mentioned above is one with a simple premise – start with your unsorted list, create a new empty list, iterate through your unsorted list copying each item to its correct sorted position in your new list using an iterative compare on items already placed in the new list.

In standard use insertion sort is a fairly unimpressive algorithm. But combined with a little coding magic, Simo has transformed it into something that appears to beat all others. The success of his approach is based on two simple assertions:

1) Linked lists are faster than vectors when inserting values

2) In most cases in a 3D engine, sorting is performed on a list that is close to already being sorted

I’m a little skeptical about the second point, because the modified Insertion Sort method is heavily influenced by how unsorted a list is to start with. I would agree that if the order of 3D objects in a view are saved on each frame then it could be considered that from one frame to the next, the order of objects doesn’t change dramatically. However, this should be considered the most trivial case when rendering. Times when this is not the case include when objects are moving in the scene, when objects are being added or deleted from the scene, or when objects are moving in and out of view. For an unsorted list, standard Insertion Sort has one of the slowest performance records of any sorting algorithm, with a worst case of *n*^{2} iterations.

With the stage now set, I offer a potential challenger to the modified Insertion Sort method! It is the method that Away3D Lite uses as its default sorting algorithm, and is know as Radix Sorting. The biggest advantage with this algorithm is its consistent performance and minimal iterative approach. In all cases, whether sorting a completely sorted or completely unsorted list, the number of iterations required is constant at* 2n*.

Radix Sorting works by taking advantage of the binary nature of numbers stored in a computer program. These numbers can be easily quantised (split into smaller numbers representing the bytes that makeup the value of the original number), and this quantisation allows numbers to be sorted incredibly efficiently. Rather that compare absolute values of numbers, objects in a list are stored in ‘buckets’ that represent a single value of the quantised state of a number. Once all numbers have been placed in their respective bucket, the result can be read out in the correctly sorted order for that quantisation by simply emptying the buckets in sequence.

In general use, Radix Sorting quantises numbers into 8-bit bytes. This means that each iteration sorts the objects into their 8-bit bucket positions for that byte (starting with the byte representing 0x00 – 0xFF, then the byte representing 0x00FF – 0xFFFF, then 0x00FFFF – 0xFFFFFF etc.). Because we are sorting 8-bit numbers, 256 buckets are required for each iteration, to provide a bucket for each possible value. With each byte, only one iteration is required to completely sort the objects into their buckets, and read them out in the correct order. So the quoted number above for * 2n *iterations will allow you to completely sort 16-bit numbers. For 3D in Flash, this appears sufficiently accurate as long as the inverse of z is used for the sorting value.

One of the disadvantages of Radix Sorting is that it only sorts using integers. Because we currently have no byte access to floating point numbers in AS3 (boo Adobe!), z values have to be type-converted to integers on every frame, which is a little annoying as it sucks some of the power out of the algorithm. But despite this, the results are still very impressive, as this example shows (using the same source structure as Simo’s original demonstration for comparison).

Here are the results i get when directly comparing the two sorting mechanisms – in each case the list in question is 100000 items long.

- Simo Insertion Sort (on sorted list) – 4 ms
- Radix Sort (on sorted list) – 16 ms
- Simo Insertion Sort (on extremely unsorted list) – ????
- Radix Sort (on extremely unsorted list) -16 ms

The default for the ZSortTest demo is to create an already sorted list, which gives an Insertion Sort result of 4ms. Simo doesn’t bother to test an unsorted list in his post, and i think i may know why…. Flash Player times out before you get a result! His suggestion here is that you use a different sort method when you know a set of objects will be extremely unsorted, but this also implies that in the cases where you are using the algorithm on almost sorted lists, you will still not hit the 4ms benchmark seen here.

Radix Sort on the other hand doesn’t give two hoots what the starting sort nature of the list is, so it is the more consistent sort mechanism, and possibly the most consistent of any.

Picking up on an earlier point on linked lists, I had a crack at improving the Radix Sort score by using linked lists rather than vectors, but bizarrely came out with a slower benchmark score of around 20 ms. This may have something to do with the difference in access speed between bucket vectors containing integers and bucket vectors containing objects. You can try it for yourself with the source code – both types are present and it is a simple matter to comment out one type and uncomment the other.

Regardless, all of these methods come out far faster than the closest native sorting method in Flash – the Vector.sortOn() method – which managed a woeful 54 ms in the same test.