One of the main targets in detecting critical genes is

Binding-site similarities are measured in terms of detected geometrically and chemically equivalent atoms in common between two binding-sites [58,59]. We identified an average of 36 atoms in common between binding-sites (average p-value 0.038 and zscore 3.05) for 21 out of 24 models with members of Pfam families [60] that include human proteins. In 3 cases, ispF, CD2549 and dapH, no significant degree of binding-site similarity was found to a Pfam family that includes human homologs (Additional file 1: Table S7). It is difficult to judge if the found matches are significant or not considering that no threshold for binding-site similarity can be uniquely defined above which cross-reactivity is certain [59]. However, in 17 cases out of 24 cases, the topscoring detected binding-site similarities for each case represent binding-sites in proteins that bind ligands which are similar to at least one of the substrates from the reaction catalyzed by the modelled C. difficile protein. In seven of these cases, the top matching Pfam family that contains human homologs binds a similar ligand (Additional file 1: Table S6). Taking as an example the case of the enzyme encoded by the asd gene, we detect 39 atoms (Z-score 3.92, p-value 0.012) in common to a glyceraldehyde-phosphate dehydrogenase from spinach (PDB ID 2PKQ) bound to NADPH, a member of Pfam family PF00044 which has human homologs (Figure 1). Five out of the top 7 most similar binding-sites also bind NADPH or NADP, all from different Pfam families. The superimposition of these different binding-sites based on their similarities to the asd gene product binding-site leads to an extremely good superposition of their respective bound-ligands (Inset Figure 1). This suggests that the detected similarities are biologically significant. The quality of the resulting superimpositions together with the detection of similarities across families that bind similar ligands to those that bind the C. difficile targets reinforces the confidence in the biological significance of our predictions. The quality of the alignment of the NADP molecules across different families through the detected similarities suggests that these capture the molecular determinants responsible for binding.

One of the principal goals in detecting critical genes is to assess their potential as therapeutic targets. One aspect weighting in favour of a potential therapeutic target is the lack of a human homolog, as this decreases the chances of unwanted effects of a potential drug off-targeting the gene product of the human homolog. We again use here two definitions of homology, the standard sequence homology that relates two genes through evolution and functional homology, which relates two genes through common function. There is a possibility that potential cross-reactivity targets could perform different functions (EC numbers) and have little sequence similarity yet still have sufficient 3D atomic binding-site similarities to be inhibited by a drug designed against a C. difficile target.