SunSpider/TODO
changeset 0 4f2f89ce4247
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/SunSpider/TODO	Fri Sep 17 09:02:29 2010 +0300
@@ -0,0 +1,70 @@
+
+* Add more test cases. Categories we'd like to cover (with reasonably
+  real-world tests, preferably not microbenchmarks) include:
+
+  (X marks the ones that are fairly well covered now).
+
+    X math (general)
+    X bitops
+    X 3-d (the math bits)
+    - crypto / encoding
+    X string processing
+    - regexps
+    - date processing
+    - array processing
+    - control flow
+    - function calls / recursion
+    - object access (unclear if it is possible to make a realistic 
+      benchmark that isolates this)
+
+  I'd specifically like to add all the computer language shootout
+  tests that Mozilla is using.
+
+* Normalize tests. Most of the test cases available have a repeat
+  count of some sort, so the time they take can be tuned. The tests
+  should be tuned so that each category contributes about the same
+  total, and so each test in each category contributes about the same
+  amount. The question is, what implementation should be the baseline?
+  My current thought is to either pick some specific browser on a
+  specific platform (IE 7 or Firefox 2 perhaps), or try to target the
+  average that some set of same-generation release browsers get on
+  each test. The latter is more work. IE7 is probably a reasonable
+  normalization target since it is the latest version of the most
+  popular browser, so results on this benchmark will tell you how much
+  you have to gain or lose by using a different browser.
+
+* Instead of using the standard error, the correct way to calculate
+  a 95% confidence interval for a small sample is the t-test. 
+  <http://en.wikipedia.org/wiki/Student%27s_t-test>. Basically this involves
+  using values from a 2-tailed t-distribution table instead of 1.96 to
+  multiply by the error function, a table is available at
+  <http://www.medcalc.be/manual/t-distribution.php>
+
+* Add support to compare two different engines (or two builds of the
+  same engine) interleaved.
+
+* Add support to compare two existing sets of saved results.
+
+* Allow repeat count to be controlled from the browser-hosted version
+  and the WebKitTools wrapper script.
+
+* Add support to run only a subset of the tests (both command-line and
+  web versions).
+
+* Add a profile mode for the command-line version that runs the tests
+  repeatedly in the same command-line interpreter instance, for ease
+  of profiling.
+
+* Make the browser-hosted version prettier, both in general design and
+  maybe using bar graphs for the output.
+
+* Make it possible to track change over time and generate a graph per
+  result showing result and error bar for each version.
+
+* Hook up to automated testing / buildbot infrastructure.
+
+* Possibly... add the ability to download iBench from its original
+  server, pull out the JS test content, preprocess it, and add it as a
+  category to the benchmark.
+
+* Profit.