Our add-on performance initiative is getting lots of attention for, lets say, various reasons. There have been objections about transparency and our testing methods, so I decided to add something valuable to the discussion and document my own testing process.
I revisited my old add-on performance article and noticed that the contents of the Measuring Startup wiki page have changed substantially since I originally linked to it. It now recommends installing an add-on to measure startup performance. I haven’t tried it, but there are a few reasons I can think this is not the best approach. (Update: I’ve been informed that the add-on is only a display for data that is gathered and stored locally. You can make test runs and then install the add-on to look at the data. That dispels my previous doubts about this approach.) Regardless, I’m documenting the old testing method here, because it is the one I have been using for a while and is also very similar to the one implemented on our automated Talos testing framework.
I have been doing lots of add-on startup testing recently, mainly to double check if the results of the Talos tests are sound. We also correlate them to real world usage data that we have been collecting since early versions of Firefox 4. This data, manual testing and source code review are the main backup sources that give us a good confidence level in the results we display on our infamous performance page (it has been linked enough).
Here’s what I do.
- Create a new profile dedicated for testing add-on performance (I called it startuptest).
- Download this HTML page and save it somewhere convenient. The page is blank if you open it directly. All it does is run some JS that extracts a timestamp after the # character in the URL, compares it against the timestamp when the script is run, and shows the difference on the page.
- Set up a console command that opens Firefox in your testing profile and opens the downloaded file, with the current timestamp embedded after the # character. On my system (Mac OS), this command is the following:
/Applications/Firefox.app/Contents/MacOS/firefox-bin -P startuptest -no-remote file:///Users/jorge/startup.html#`python -c 'import time; print int(time.time() * 1000);'`
The old version of the Measuring Startup page explains how to set this up on Windows.
- Locate the testing profile folder and delete all files in it, if there are any.
- Open Firefox on this profile. You can use the console command or any other shortcut if you prefer.
- Copy and paste the add-on listing page URL on the new profile and open the page.
- Install the add-on using the install button and restart if necessary.
- Optionally, set up the add-on in a realistic way. For example, if this is a Facebook add-on, it may make sense to log in to a Facebook account since otherwise most of the add-on’s functionality would be inactive.
- Quit Firefox.
- Run Firefox using the console command.
- Note the result in the startup page.
- Quit Firefox. I prefer using the Quit key shortcut, to interact with Firefox as little as possible.
- Repeat steps 7-9. I discard the 2 first runs, which are normally much slower than the rest, and measure the 10 runs after that.
For your results to make any sense, it’s also necessary to make a test run without any add-ons installed, and use that as your baseline. It’s also a good idea to run all the tests consecutively to have some certainty that they are all running under similar conditions. I record and compare my results on a spreadsheet, like this one where I tested both my add-ons.
Looking at the results, Fire.fm has a somewhat noticeable impact on startup. This is not surprising because it is a complex add-on with a very complex overlay and startup processes. I documented on improving startup code in my old blog post, and we’re planning on greatly simplifying its overlay soon(ish). I doubt we’ll make the coveted 5%, but we’ll see. Remote XUL Manager is clearly simpler, and it shows how the results should not be taken at face value. Since all it does in the overlay is add a menu item that opens a separate window, it’s understandable that its impact is negligible. But does it really improve startup? No, of course not. This just means that the error margin is larger than its real performance impact.
The key takeaway here is that the results of manual tests shouldn’t be taken literally, but they’re still a good indicator of the performance impact of an add-on. Even if the error margin is not ideal (or even measurable under these conditions), you can still get a good idea of who’s fast and who’s slow. They have been very valuable to us when comparing them against Talos results.
How does this compare to Talos?
On one hand, these tests are influenced by how the testing system is set up. I have several applications open at all times, and I don’t close them all for testing. I do take care in not running anything heavy simultaneously, like Time Machine or MobileMe Sync. And then there’s clearly the fact that I have to spend some time setting things up and running the tests. The longer the tests take, the more likely it is that some other process affects the results.
On the other hand, it’s easier for me to recognize errors during testing. Many of the complaints we’ve received about the testing system is that it makes silly mistakes like trying to install an add-on from an incorrect URL, or trying to install an add-on that is not compatible with the Firefox version being tested. These are things that one can clearly see when testing manually, but they weren’t obvious when running the tests automatically. Those add-ons have been getting very good performance rankings because they’re not really being loaded, so those results are not reliable.
Luckily, the people complaining about our testing are also filing bugs and talking to us directly, so we’re looking into the issues and trying to get them resolved as soon as possible. Special thanks to Wladimir and Nils, who have been very helpful filing and categorizing bugs. More details coming up in the Add-ons Blog.
As always, the developer community proves itself as an invaluable asset for Mozilla (well, you are Mozilla). Even if our discussions can become harsh and are generally very public, the outcome is almost always a set of improvements both in our technical and communication fronts. Getting things right take a lot of work and a lot of patience, and I hope we can quickly get to a place where we’re all satisfied.