diff -r 000000000000 -r 4f2f89ce4247 SunSpider/TODO --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/SunSpider/TODO Fri Sep 17 09:02:29 2010 +0300 @@ -0,0 +1,70 @@ + +* Add more test cases. Categories we'd like to cover (with reasonably + real-world tests, preferably not microbenchmarks) include: + + (X marks the ones that are fairly well covered now). + + X math (general) + X bitops + X 3-d (the math bits) + - crypto / encoding + X string processing + - regexps + - date processing + - array processing + - control flow + - function calls / recursion + - object access (unclear if it is possible to make a realistic + benchmark that isolates this) + + I'd specifically like to add all the computer language shootout + tests that Mozilla is using. + +* Normalize tests. Most of the test cases available have a repeat + count of some sort, so the time they take can be tuned. The tests + should be tuned so that each category contributes about the same + total, and so each test in each category contributes about the same + amount. The question is, what implementation should be the baseline? + My current thought is to either pick some specific browser on a + specific platform (IE 7 or Firefox 2 perhaps), or try to target the + average that some set of same-generation release browsers get on + each test. The latter is more work. IE7 is probably a reasonable + normalization target since it is the latest version of the most + popular browser, so results on this benchmark will tell you how much + you have to gain or lose by using a different browser. + +* Instead of using the standard error, the correct way to calculate + a 95% confidence interval for a small sample is the t-test. + . Basically this involves + using values from a 2-tailed t-distribution table instead of 1.96 to + multiply by the error function, a table is available at + + +* Add support to compare two different engines (or two builds of the + same engine) interleaved. + +* Add support to compare two existing sets of saved results. + +* Allow repeat count to be controlled from the browser-hosted version + and the WebKitTools wrapper script. + +* Add support to run only a subset of the tests (both command-line and + web versions). + +* Add a profile mode for the command-line version that runs the tests + repeatedly in the same command-line interpreter instance, for ease + of profiling. + +* Make the browser-hosted version prettier, both in general design and + maybe using bar graphs for the output. + +* Make it possible to track change over time and generate a graph per + result showing result and error bar for each version. + +* Hook up to automated testing / buildbot infrastructure. + +* Possibly... add the ability to download iBench from its original + server, pull out the JS test content, preprocess it, and add it as a + category to the benchmark. + +* Profit.