=====================================================================
C programs for simulation and analysis of SIR epidemic models.

October 2003, George Streftaris

Dr. George Streftaris
Actuarial Mathematics and Statistics
School of Mathematical and Computer Sciences
Scott Russell Building
Heriot-Watt University 
Edinburgh EH14 4AS, UK

Tel: 0131-451-3679, Fax: 0131-451-3249
Email: G.Streftaris@ma.hw.ac.uk
http://www.ma.hw.ac.uk/ams/people/pages/georges.html


=====================================================================
This document describes the input and output files for the data
simulation program data_simul.c and the MCMC estimation programs
sir_mcmc.c and smallpox_mcmc.c


=====================================================================
Directory contents

- StreftarisGibson_data_programs/data:
  - sim_data_70.txt
  - sim_data_85.txt
  - sim_data_98.txt
  These are text files in the form of the output file sim_data.txt
  which is produced when the data_simul program is executed.

- StreftarisGibson_data_programs/programs:
  - data_simul.c (C source file); data_simul (executable file)
  - sir_mcmc.c (C source file); sir_mcmc (executable file)
  - smallpox_mcmc.c (C source file); smallpox_mcmc (executable file)
  - various C code source files containing functions used in 
    data_simul and sir_mcmc programs
  - text files containing input for running C programs

- StreftarisGibson_data_programs/smallpox_data.txt
  Text file containing the data of the smallpox epidemic example.

 
=====================================================================
*** NOTE:
*** To compile the source files, various C functions (as specified in 
*** the 'include' commands in the code) are required. These can be 
*** found in: Press et al. (1992) Numerical Recipes in C: The Art of 
*** Scientific Computing, 2nd ed. Cambridge: Cambridge University 
*** Press.

=====================================================================
Data simulation program: data_simul.c
Executable file: data_simul

The program simulates an SIR (Susceptible-Infective-Removed) epidemic
where the rate of infection per possible susceptible-infectious
contact is given by the parameter beta, and the infectious period of
each infected individual follows a Weibull(nu, lambda) distribution.


Input (at execution time):

- Seed for random number generator (must be long negative integer;
  data set in paper uses -989849573) 
- N: size of population in which epidemic takes place
- T: length of time (in days) for which epidemic is observed
- beta: rate of infection per possible contact (transmission
  coefficient)
- nu: shape parameter for the Weibull(nu, lambda) distribution of
  infectious periods
- lambda: scale parameter for the Weibull(nu, lambda) distribution of
  infectious periods
- Time interval (in days) between successive diagnostic tests 
  Note: a value '1.0' which gives daily tests is recommended; analysis 
  program sir_mcmc.c can then use any possible subset of these tests


Output files:

- sim_data.txt
  Text file containing the following:
  - Seed used in random number generator
  - Epidemic characteristics as determined by user (population size;
    observation period; beta, nu, lambda parameters; test sensitivity
    and specificity; frequency of tests; mean of infectious periods;
    mean and standard deviation of simulated infectious periods)
  - Table comprising of columns giving (in ascending order of removal
    time): individual index; status of each infected individual at 
    end of observation period ('r' indicates removed, '+' indicates 
    still infectious individual)
  - Final number of infections and removals at observation period end
  - Diagnostic test results for all individuals; given format is
    'test_index(test_result)' where test_result can be '+' for
    positive test, '-' for negative test, or 'r' for removed 
    individual 

- inf_times.txt
  Text file ready for use as input in program sir_mcmc.c. 
  It contains the following:
  - First line: number of infections at end of observation period
  - Second line: length of observation period; lower bound for
    infection times updating (subtracted from time origin in
    sir_mcmc.c program) 
  - Subsequent lines: time of infection and status ('1' for removed,
    '2' for still infective) for each infected individual (one
    individual per line, in ascending order of removal time)

- rem_times.txt
  Text file ready for use as input in program sir_mcmc.c. 
  It contains the following:
  - First line: size of population
  - Second line: number of removals at end of observation period
  - Subsequent lines: time of removal for each infected individual
    (one individual per line, in ascending order)

- test_matrix.txt
  Text file ready for use as input in program sir_mcmc.c. 
  It contains the following:
  - First line: total number of conducted diagnostic tests
  - Second line: sensitivity and specificity of tests
  - Third line: times at which tests were carried out
  - Subsequent lines: test results for each infected individual (one
    individual per line); '0'indicates negative test, '2' indicates
    positive test, '1' indicates removed individual


=====================================================================
Analysis program: sir_mcmc.c
Executable file: sir_mcmc

The program uses MCMC methodology to sample from the marginal
posterior distributions of the parameters of interest. It outputs the 
generated sequences of parameter values in file sample.out, with
parameter definitions in file sample.ind. These files are in a form
suitable for use in CODA or BOA software packages.  


Input (at execution time):

- Name of file containing diagnostic test information.
  This will normally be the file 'test_matrix.txt' produced as output 
  from program data_simul. Different files produced by running 
  data_simul with different test sensitivity and specificity values 
  can be copied to separate files before used in program sir_mcmc.
  [Note: Files inf_times.txt and rem_times.txt produced as output 
  from program data_simul do not change with different test accuracy 
  specifications]

- Number of MCMC iterations

- Number of updates of infection times per MCMC cycle (iteration); 
  when the value zero is entered the infection times are assumed 
  observed (equal to their simulated values); otherwise the entered 
  number of updates of infection times take place within each single 
  MCMC iteration.

- Number of simulated values to be discarded, as 'burn-in' period, 
  from the MCMC chain.
  [Note: Choice of an appropriate burn-in value requires some MCMC 
  convergence information; when this is not available from previous 
  runs, the user may output the entire chain (entering  the value 2) 
  and then experiment with an MCMC convergence criterion. The 
  software packages CODA and BOA offer the option to select a subset 
  from the imported chain at run time.]

- Number of iterations for MCMC sample thinning; if a value k is 
  entered, every k-th iteration from the produced MCMC sample will be 
  recorded in the output file sample.out.
  [Note: As for burn-in; enter value 1 for entire chain.]

- Number of diagnostic tests to be used in the analysis.
  This must be at most equal to the number of tests specified when 
  data were simulated with program data_simul.

- Number of days between successive diagnostic tests.
  This option is for regular between-test intervals and is activated 
  when the number of tests given above is not equal to the number 
  specified when running data_simul (in which case the latter 
  specifies tests distributed evenly within observation time period); 
  the user may specify non-regular testing by entering the value 
  '999' and proceeding to the next option.
  Note: set of tests to be used in analysis *must* be a subset of the
  set of tests specified when running data_simul program.

- Range of test indices to be used in the analysis.
  [E.g. '13 20' for all 8 tests indexed 13-20 to be used, '1 1' for 
  test 1 only to be used.]
  Program will prompt for range to be entered until the 'number of 
  diagnostic tests to be used' given by the user before is exhausted.
  Note: set of tests to be used in analysis *must* be a subset of the
  set of tests specified when running data_simul program.


Input files:

- inf_times.txt
  Text file produced as output when running data_simul program.

- rem_times.txt
  Text file produced as output when running data_simul program.

- test_matrix.txt
  Text file produced as output when running data_simul program.
  User must specify its name at run time.

- prior_parameters.txt
  Text file containing the parameters of the prior distributions of 
  parameters beta~Gamma(alpha,beta), nu~Gamma(xi,phi) and 
  lambda~Gamma(c,d).
  The user must create this file to contain the parameters (separated 
  by space or enter) in the order alpha, beta, xi, phi, c, d.

- param_start_values.txt
  Text file containing starting values for beta, nu and lambda to be 
  used in the MCMC algorithm.
  The user must create this file to contain the parameters (separated 
  by space or enter) in the order beta, nu, lambda. 


Output files:

- sample.out
  File containing the sampler output.
  Output is in two columns, with each line containing an index 
  number and a sample value for the sequence of the beta, nu and 
  lambda parameter successively.  
    
- sample.ind
  File containing information on the sample sequences as these are 
  output in file sample.out.
  Each line gives the name of the parameter and the line where its 
  sequence starts and ends in file sample.out.

- sample_2.out
  As file sample.out, for parameters beta, mu and sigma.

- sample_2.ind
  As file sample.ind, for parameters beta, mu and sigma.


=====================================================================
Analysis program: smallpox_mcmc.c
Executable file: smallpox_mcmc

As for sir_mcmc.c above, except:

- To compile the source file, provide the input files 
  smallpox_prior_parameters.txt and smallpox_param_start_values.txt 
  (as for sir_mcmc above).

- All input file names should have the prefix `smallpox_'.

- Enter `0' (zero) when prompted for `Number of diagnostic tests to 
  be used in the analysis'.

=====================================================================
