Figure 6 Purity as a function of dimension, parameterized by profile length for 4-gram represen- tation when LSI / ICA dimension reduction is used (top/bottom) on a typical dataset (RD-256). Best N-gram Parameters Overall, we are interested in selecting a good profile length and reduced dimensionality for 3-gram and 4-gram representations. The general result is that for each of the 3-gram and 4-gram representation, and for a mid-range profile length (around 2000), ICA is the method of choice. But choosing between the 3-gram and 4-gram representations seems to depend on the data set. Typical results are shown in Fig. 6.