reFABS = function(V, C, R=5, Q=1000):
V: Set of binding sites
C: Mapping criterion
R: No. of sampling runs for step A
Q: No. of sampling runs for step B

  1. Estimate number of sites to be sampled randomly
    1. Map V using C and note down the number of genes (N) mapped
    2. Treat all binding sites mapped to a gene as duplicates and let M be the number of such non-duplicate binding sites.
    3. Sample M sites randomly
    4. Map the M random sites using C and note down number of genes mapped
    5. Repeat steps 2 to 4 for R times
    6. N’ is the average number of genes mapped in the steps 2-4 over R runs
    7. M’, the number of random sites to be sampled, = M×N/N’

 

  1. Estimate Significance of Enrichment
  2. Map V to genes using C. Let nz be the number of genes mapped to the predefined gene category Z
  3. Randomly sample M’ sites from the reference genome
  4. Apply the criterion C, map M’ binding sites to genes
  5. Let niz be the number of mapped genes using M’ in Z in ith sampling run.
  6. Repeat steps 2-4 for Q times
  7. p-value of the category Z, Pz, is the fraction of Q sampling trials yielded niz’ ≥ nzi.e. Pz= |{i | i = 1…Q and  niz’ ≥ nz}|/Q
Table 1: Pseudocode for reFABS procedure for unbiased enrichment analysis for binding sites of transcription factors.