580 California St., Suite 400
San Francisco, CA, 94104
Table 2 Estimated sizes of CDF secondary data sets (from [12]). There will be some distribution of jobs each site performs. In the simulation, we modelled this distribution such that each site ran an equal number of jobs of each type except for a preferred job type, which ran twice as often. This job type was chosen for each site based on storage considerations; for the replication algorithms to be effective, the local storage on each site had to be able to hold all the files for the preferred job type.