reFABS = function(V, C, R=5, Q=1000):
V: Set of binding sites
C: Mapping criterion
R: No. of sampling runs for step A
Q: No. of sampling runs for step B
Estimate number of sites to be sampled randomly
Map V using C and note down the number of genes (N) mapped
Treat all binding sites mapped to a gene as duplicates and let M be the number of such non-duplicate binding sites.
Sample M sites randomly
Map the M random sites using C and note down number of genes mapped
Repeat steps 2 to 4 for R times
N’ is the average number of genes mapped in the steps 2-4 over R runs
M’, the number of random sites to be sampled, = M×N/N’
Estimate Significance of Enrichment
Map V to genes using C. Let nz be the number of genes mapped to the predefined gene category Z
Randomly sample M’ sites from the reference genome
Apply the criterion C, map M’ binding sites to genes
Let niz’ be the number of mapped genes using M’ in Z in ith sampling run.
Repeat steps 2-4 for Q times
p-value of the category Z, Pz, is the fraction of Q sampling trials yielded niz’ ≥ nzi.e. Pz= |{i | i = 1…Q and niz’ ≥ nz}|/Q
Table 1: Pseudocode for reFABS procedure for unbiased enrichment analysis for
binding sites of transcription factors.