Protein-DNA complexes benchmarks


HR-PDNA187 benchmark

The high resolution protein-DNA complexes benchmark (HR-PDNA187) comprises 187 structures of protein-(ds)DNA complexes non-redundant at 25% sequence identity. The complexes' 3D structures were downloaded from the Protein Data Bank (http://www.rcsb.org/) and are determined by X-ray crystallography with a resolution better than 2.5 Å and an R-factor lower than 0.3. Each complex comprises at least one protein chain longer than 40 amino acids and a DNA with at least 5 base pairs. The HR-PDNA187 benchmark covers all major groups of DNA-protein interactions according to Luscombe et al. classification (Luscombe NM, Austin SE, Berman HM, Thornton JM. An overview of the structures of protein-DNA complexes. Genome Biology. 2000;1(1):reviews001.1-reviews001.37.): helix-turn-helix (HTH), zinc-coordinating, zipper type, other α-helical, β-sheet, β-hairpin/ribbon, other. Moreover, it spans a wide range of different functional classes according to the Nucleic Acid Database (http://ndbserver.rutgers.edu/) classification: it comprises 100 enzymes, 78 regulatory proteins, 7 structural proteins, 1 protein with other function and 1 unclassified protein. Concerning the protein stoichiometry, the HR-PDNA187 comprises: 109 monomers, 64 homo-2-mers, 3 hetero-2-mers, 3 homo-3-mers, 1 hetero-3-mer, 5 homo-4-mers, 1 hetero-5-mer (composed of 4 couples of homo-2-mers) and 1 homo-6-mer.

HOLO-APO82 benchmark

This benchmark comprises the 82 HOLO(bound)-APO(unbound) pairs available in the Protein Data Bank (http://www.rcsb.org/) amongst the 187 protein-DNA complexes comprised in HR-PDNA187. The collected APO forms are all X-ray crystallographic structures that share at least 95% of sequence identity, a percentage of coverage ≥ 70% and a percentage of gaps ≤ 10% with the correspondent HOLO form. Two unbound forms in a different oligomeric state, a dimer and a monomer, are present for the complex 2ISZ. We retained both APO forms as they are reported in literature as both present in equilibrium (Chou, C. James, et al. "Functional studies of the Mycobacterium tuberculosis iron-dependent regulator." Journal of Biological Chemistry 279.51 (2004): 53554-53561.). Within each HOLO-APO pair, the APO form may be in the same oligomeric state as the HOLO form or may have fewer chains, due to the oligomerization process associated to DNA binding. The APO forms in the HOLO-APO82 dataset are divided in: 52 monomers, 28 homo-2-mers, 1 homo-3-mer and 1 hetero-3-mer. Specifically, we observed a change in the stoichiometry in 4 HOLO-APO pairs: 3 homo-4-mers, 1 homo-4-mer, 1 homo-3-mer, 1 homo-2-mer in the bound form are respectively 3 homo-2-mers and 3 monomers in the unbound form.

Download

The folder with the HR-PDNA187 and the HOLO-APO82 benchmarks are available here.
The Excel version of the table below is available here.

Table 1: List of the 187 complexes comprised in HR-PDNA187 dataset and the 82 HOLO-APO pairs.

The entries in the columns are respectively: 1) the Protein Data Bank (PDB) identifier of the HR-PDNA187 complex; 2) the protein chains considered in the complex; 3) the subset of protein chains considered in column 2 non redundant at 95% sequence identity; 4) the DNA chains considered in the complex; 5) the PDB identifier and the considered chains of the complex in the HOLO-APO82 dataset; 6) the PDB identifier and the considered chains of the unbound form in the HOLO-APO82 dataset; 7) the class and 8) the subclass to which the protein belongs as derived from the Nucleic Acid Database.

PDB ID Prot chains Prot nr chains DNA chains HOLO ID:chains APO ID:chains Class Subclass
1a3q AB A CD regulatory transcription_factor
1a73 AB A CDEF 1a73:BA 1evx:AB enzyme nuclease
1b3t AB A CD 1b3t:BA 1vhi:AB regulatory other
1bdt ABCD A EF 1bdt:CD 1myk:AB regulatory gene
1bl0 A A BC regulatory transcription_factor
1cez A A NT enzyme polymerase
1d02 AB A CD enzyme nuclease
1dc1 AB A CW enzyme nuclease
1dfm AB A CD enzyme nuclease
1egw AB A EF regulatory transcription_factor
1emh A A BC 1emh:A 3fci:A enzyme glycosylase
1esg AB A CD enzyme nuclease
1f4k AB A DE 1f4k:BA 2dqr:AB regulatory replication
1fiu ABCD A EFGHIJKL enzyme nuclease
1gu4 AB A CD regulatory transcription_factor
1gxp AB A CD 1gxp:B 1gxq:A regulatory other
1h6f AB A CD regulatory transcription_factor
1hlv A A BC structural other/centromere
1i3j A A BC enzyme nuclease
1iaw AB A CDEF 1iaw:BA 1ev7:AB enzyme hydrolase
1j3e A A BC regulatory replication
1je8 AB A CD 1je8:B 1a04:A regulatory transcription_factor
1jko C C AB enzyme recombinase
1jx4 A A PT 1jx4:A 2rdi:A enzyme polymerase
1k3x A A BC 1k3x:A 1q39:A enzyme nuclease
1k4t A A BCD enzyme isomerase
1ku7 A A BC 1ku7:A 1ku3:A regulatory transcription_factor
1kx5 ABCDEFGH B IJ structural histone
D
A
C
1l3l BD B EG regulatory transcription_factor
1lmb 34 3 12 regulatory other
1lq1 CD C EF regulatory transcription_factor
1mjo ABCD A FG 1mjo:AB 1mjl:AB regulatory transcription_factor
1mnn A A BC 1mnn:A 1mn4:A regulatory transcription_factor
1nkp AB B FG regulatory transcription_factor
A
1oe4 AB A EF enzyme glycosylase
1orn A A BC enzyme nuclease
1oup B B CD 1oup:B 1ouo:A enzyme nuclease
1owf AB A CDE regulatory transcription_factor
B
1ozj A A CD regulatory transcription_factor
1pp7 U U EF regulatory transcription_factor
1pt3 A A CDEFGH 1pt3:A 3zfk:A enzyme nuclease
1qna A A CD 1qna:A 1vok:A regulatory transcription_factor
1r71 AB A EFIJ regulatory transcription_factor
1rh6 B B CD regulatory recombination
1rxw A A BC enzyme nuclease
1sa3 A A CD enzyme nuclease
1skn P P AB regulatory transcription_factor
1sx5 AB A CDEF 1sx5:AB 1az3:AB enzyme nuclease
1sxq A A CE 1sxq:A 1jg7:A enzyme transferase
1t7p A A PT enzyme polymerase
1t9i AB A CD 1t9i:AB 2o7m:AB enzyme nuclease
1tc3 C C AB enzyme other
1tez A A IJK 1tez:A 1owl:A enzyme lyase
1u8b A A BCDE regulatory other
1uut A A C 1uut:A 1m55:A enzyme nuclease
1wb9 AB A EF regulatory repair
1xyi A A BC structural chromosomal
1yf3 A A CD 1yf3:A 1q0s:A enzyme methyl
1yo5 C C AB regulatory transcription_factor
1zme CD C AB regulatory transcription_factor
1zrf AB A WXYZ 1zrf:AB 4r8h:AB regulatory other
2aor A A CD enzyme methyl
2aq4 A A PT enzyme transferase
2bnw ABCD A EFGH 2bnw:CD 1irq:AB regulatory other
2dp6 A A CD 2dp6:A 2d3y:A enzyme glycosylase
2e52 AB A EG enzyme nuclease
2ex5 AB A XY enzyme nuclease
2fkc A A CD 2fkc:A 1ynm:A enzyme nuclease
2g1p A A FG 2g1p:A 4gom:D enzyme methyl
2gb7 AB A EF enzyme nuclease
2h27 A A BC enzyme transferase
2h7g X X YZ enzyme isomerase
2i06 A A BC regulatory replication
2ih2 A A BC 2ih2:A 1aqj:B enzyme methyl
2ihm A A DPT enzyme polymerase
2is6 A A CD 2is6:A 3lfu:A enzyme helicase
2isz AB A EF 2isz:BA 2isy:AB regulatory transcription_factor
2isz:B 1b1b:A
2noh A A BC 2noh:A 5an4:A enzyme glycosylase
2nq9 A A BCD 2nq9:A 1qtw:A enzyme nuclease
2o4a A A BC regulatory transcription_factor
2ofi A A BC 2ofi:A 2ofk:A enzyme glycosylase
2pi0 B B EF 2pi0:B 3qu6:A regulatory other
2pyj A A XY 2pyj:A 1xhx:A enzyme polymerase
2qhb B B CD 2qhb:B 2ckx:A structural telomere
2qoj Z Z XY enzyme nuclease
2r1j LR L AB regulatory transcription_factor
2r9l A A CD 2r9l:A 2iru:A enzyme polymerase
2rbf AB A CD 2rbf:BA 2gpe:AB regulatory other
2ve9 ABC A IJ 2ve9:A 2ve8:A structural other
2vla A A LM enzyme nuclease
2vs7 A A BC 2vs7:A 1b24:A enzyme nuclease
2w42 A A PQ 2w42:A 1w9h:A regulatory other
2w7n AB A EFGH 2w7n:BA 5ckt:AD regulatory gene
2xm3 CD C KLMN enzyme other/transposase
2xrz A A CD 2xrz:A 2xry:A enzyme lyase
2xzf A A BC enzyme glycosylase
2yvh AB A EFGH 2yvh:AB 2yve:AB regulatory transcription_factor
3aaf A A CD enzyme other
3bep AB A CD 3bep:BA 4k3l:AB enzyme polymerase
3bm3 AB A CD enzyme nuclease
3bs1 A A BC 3bs1:A 4g4k:A regulatory gene
3c0w A A BCD enzyme nuclease
3c25 AB A CD 3c25:AB 3bvq:AB enzyme nuclease
3coq AB A DE regulatory transcription_factor
3cw7 ABCD A EFGH 3cw7:B 1mpg:A enzyme glycosylase
3dsd AB A C 3dsd:BA 1ii7:AB regulatory repair
3dvo AB A EF enzyme nuclease
3eeo A A CD 3eeo:A 1hmy:A enzyme methyl
3f2b A A PT enzyme polymerase
3fde A A DE 3fde:A 2zkg:A enzyme ligase
3fdq AB A CD regulatory other
3g00 A A HI 3g00:A 3g8v:A enzyme nuclease
3g0q A A BC enzyme hydrolase
3g9m AB A CD regulatory transcription_factor
3gox AB A CD enzyme nuclease
3gxq AB A CD regulatory other
3h0d AB A CD regulatory transcription_factor
3i0w A A BC 3i0w:A 3f10:A enzyme glycosylase
3iag C C AB regulatory transcription_factor
3iay A A PT enzyme polymerase
3igm AB A CDWX regulatory transcription_factor
3ikt AB A CD 3ikt:AB 3ikv:AB regulatory other
3jso AB A CD 3jso:AB 1jhf:AB regulatory other
3jxy A A BC 3jxy:A 3bvs:A enzyme glycosylase
3k59 A A PT 3k59:A 3k5o:A enzyme polymerase
3kde C C AB enzyme other
3kxt A A BC structural other
3l2c A A BC regulatory transcription_factor
3lap ABCDEF A GHIJKL regulatory other
3m4a A A DE 3m4a:A 2f4q:A enzyme isomerase
3mfi A A PT 3mfi:A 1jih:A enzyme polymerase
3mln AB A CD regulatory transcription_factor
3mva O O DE regulatory transcription_factor
3mx4 AH A KL enzyme nuclease
3o1t A A BC 3o1t:A 4jht:A enzyme other
3o9x AB A EF 3o9x:AB 3gn5:AB regulatory gene
3od8 A A IJ enzyme other
3pov A A CD 3pov:A 3fhd:A enzyme other
3pvi AB A CD 3pvi:AB 1k0z:AB enzyme nuclease
3pvv A A CD regulatory replication
3qex A A PT 3qex:A 3cfo:A enzyme polymerase
3qmd A A BC regulatory other
3qqy A A BC enzyme nuclease
3qws AB A CN 3qws:AB 2hin:AB regulatory other
3rkq A A CD regulatory transcription_factor
3rmp AC A EFGH enzyme other
3s57 A A BC enzyme other
3s8q AB A CD 3s8q:BA 4i6r:AB regulatory other
3sjm A A CD structural telomere
3sm4 ABC A DE 3sm4:CAB 1avq:ABC enzyme nuclease
3spd A A EF 3spd:A 3sp4:A enzyme hydrolase
3ssc A A CD enzyme nuclease
3tan A A BC enzyme polymerase
3tq6 A A CD regulatory transcription_factor
3u2b C C AB regulatory transcription_factor
3vk8 A A CD 3vk8:A 3a42:A enzyme glycosylase
3vw3 LH L AB other antibody
H
3vxv A A BC enzyme hydrolase
3zvk FG F XY regulatory other
3zvn A A EFGHI 3zvn:A 3zvl:A enzyme hydrolase
4aij AB A CD 4aij:BA 4aih:AB regulatory transcription_factor
4dih H H D 4dih:H 3nxp:A enzyme thrombin
4e9f A A CD 4e9f:A 4e9e:A enzyme Hydrolase/glycosylase
4ecq A A PT enzyme polymerase
4esj A A CD enzyme nuclease
4fzx C C AB enzyme nuclease
4g92 ABC B DE 4g92:ABC 4g91:ABC regulatory transcription_factor
C
A
4gck AB A WZ 4gck:AB 4gfl:AB other other
4gjr AB A GHIJ regulatory transcription_factor
4glx A A BCD 4glx:A 5tt5:A enzyme ligase
4gzn C C AB regulatory transcription_factor
4h0e B B TU regulatory transcription_factor
4h10 AB B CD regulatory transcription_factor
A
4hf1 AB A CD 4hf1:AB 4hf0:AB regulatory transcription_factor
4hqe AB A CD 4hqe:BA 4hqm:AB regulatory transcription_factor
4htu A A CD 4htu:A 1zbf:A enzyme nuclease
4i2o AB A XW regulatory other
4ix7 AB A CD regulatory other
4j3n AB A CDEF enzyme isomerase
4jbm A A RT regulatory other
4jcy AB A CD 4jcy:BA 3lis:AB regulatory other
4k98 A A DE 4k98:A 4k8v:C enzyme transferase
4kb1 A A C enzyme hydrolase
4kli A A DPT enzyme polymerase
4kpy A A CDN NoClass NoClass
4qtj A A BC regulatory transcription_factor
4rkh CEF C AB enzyme ligase
6pax A A BC regulatory transcription_factor

Contact

For questions, comments or suggestions feel free to contact Flavia Corsi, Elodie Laine or Alessandra Carbone.