Some notes on running large simulations: I.Chosing simulation settings A. Try to pick "unit-free" parameters, such as coefficients of variation or ratios B. Try to get ranges of parameter settings from existing data C. Try some extreme settings to prove various hypotheses D. Good luck. II. Chosing the number of simulations to run A. Figure out what you want to estimate (means, variances, quantiles, AUCs) B. Do something like the following for stopping i. Option 1 a. run the simulation for some amount (say 1,000 iterations) b. calculate the Monte Carlo error for the simulations c. run more if needed. ii. Option 2 a. Calculate the Monte Carlo error while running the simulation and use a stopping rule for when to stop. b. This takes a lot more time up front but is a more rigorous way to do simualtions. III. Reporting results A. Calculate the margin of error for the Monte Carlo error for what you're estimating B. Report results only to that level of accuracy. For example if you MOE is .01, only report your estimates to two decimal place accuracy. C. Try to make a compelling story from your Monte Carlo results and reduce the amount of output you present as much as you can. Put extra stuff in supplementary appendices of it gets too much. IV. Running the simulation A. On our cluster, we don't have any real parallel processing programs installed. B. Also, the scheduling software doesn't really allow for it. C. Here's how I run lots of simulation settings without writing lots of separate code. i. Create directories "toDo", "trying", "done", "log" and "error". ii. Write a program that creates 1 file per simulation setting and puts these in the "toDo" directory. These files could be: a. filenames that contain the parameter values: pv_0_1_2_.5_7 b. filenames whose contents contain the parameter values c. actual programs for each simulation settting d. I like option a more. The unix command "touch" will create a file. You usually want to create a program that populates the "toDo" directory. iii. Write one program that: a. Fetches a random file from the "toDo" directory and moves it to the "trying" directory b. Then it gets the parameter values from the filename and trys the simulation with those parameter values. c. It puts an appropriatly named log file in the "log" directory, or if there was an error it moves the filename from "trying" to "error". If it completes the file without errors then it moves the file to "done". Probably your file will create some output, maybe in a directory called "output". d. When done (either with an error or finishing), it goes back to a. iV. Now you can launch multiple instances of this program and they will grab the contents of the "toDo" directory and work on them. If the programs crash or the cluster needs to be rebooted, just move the files in "trying" back to "toDo" and rerun the programs. You can look individually at the parameter values that caused errros. v. In the times I've done this, I've used a bash script to do the file moving and then R or Matlab or whatever to actually run the simulation. I attached an older version of this kind of approach that I previously used.