What is DIBS?

Disordered Binding Sites (DIBS) is a repository of protein complexes that are formed by ordered and disordered proteins. Intrinsically disordered proteins (IDPs) do not have a stable 3D structure in isolation and therefore they defy structure determination by X-ray or NMR. However, many IDPs bind to ordered proteins and as a result of the interaction adopt a stable structure. In accord, complex structures involving binding sites in IDPs are available. DIBS offers a collection of these complexes where exactly one protein chain is disordered and all other proteins are ordered (i.e. are stable in their isolated form that is approved by available monomeric structures).

Why aren't there complexes with several disordered partners?

While the majority of known IDP-mediated interactions involve ordered proteins as interaction partners, a range of IDPs are known to form complexes with each other. In these cases structure formation is not dictated by a folded partner but it is mutually determined by all interacting proteins. While the exact molecular mechanisms involved are unclear, the underlying principles are thought to be markedly different. These interactions, where all constituent chains are disordered in isolation, are therefore collected in MFIB, a separate sister-site of DIBS available at http://mfib.enzim.ttk.mta.hu.

What constitutes as being intrinsically disordered/unstructured and what supporting evidence is used?

It is a requisite that one partner in all DIBS entries should be intrinsically disordered/unstructured. This means that in their monomeric state these protein regions lack a stable tertiary structure, and thus their structure cannot be determined. DIBS only incorporates cases where the disordered nature is supported by experimental data. These can be direct measurements that prove the IDP status (e.g. NMR, small angle X-ray scattering, etc.), in which case the protein is classified as ‘Confirmed’. If direct experimental verification of disorder is only available for a close homologue, the entry is labelled as ‘Inferred from homology’. If there is no direct or indirect structural evidence, but the interacting region contains a known functional instance of a linear motif (and is structurally compatible with protein disorder, i.e. the interacting residues form a contiguous segment), the entry is classified as ‘Inferred from motif’. While these motif-based cases have no structural evidence for the lack of structure, most functional linear motifs are found in disordered protein segments and therefore are highly likely to be true examples for IDP-mediated binding.

Structural or motif-based proofs are collected from disorder/motif-specific datasets (DisProt, IDEAL and ELM) and are complemented with the results of extensive manual literature searches. All relevant evidence supporting the disordered nature of the corresponding protein are listed at the ‘Evidence’ subsection of the entry pages.

What constitutes as being ordered and what supporting evidence is used?

Apart from the one IDP partner, other interacting proteins in each DIBS entry should be ordered. Proofs for order were derived from the PDB. The ordered partner is required to have a determined structure in monomeric form (for at least a close homologue). If the disordered partner interacts with several proteins, then either all partner chains are required to be ordered in isolation, or they should form a stable complex without the disordered chain. In the former case, all proteins are assigned the ‘Ordered’ label. In the latter case (exemplified by PCNA that only forms a stable structure as a trimer), constituent chains of the ordered complex were labelled as ‘Ordered component’.

Order evidence including links to the PDB structures of the monomeric form of the ordered partner(s) are also included in the ‘Evidence’ subsection of entry pages. The ordered partners typically coincide with known structured Pfam domains. In these cases links to the relevant Pfam pages are also included.

How are DIBS entries created?

Each entry in DIBS describes a specific interaction for which the constituent chains are annotated as either disordered or as ordered (see above two sections). These annotations are complemented by other information about their biological roles and sub-cellular localizations, post-translational modifications, Kd of the interaction (if known), domain type for the ordered partner(s), secondary structures, and a list of similar complexes (see section about ‘related structures’).

How are biological annotations assigned?

Biological annotations of the complexes in DIBS are taken from the Gene Ontology (GO). All three types of annotations (biological process, cellular component, and molecular function) are assigned if possible. GO terms of the disordered protein are assigned to the whole complex if they match the terms of at least one of the ordered partners. In order to expand the number of annotations, ‘matches’ between GO terms are defined permissively. Two terms are considered to be matching if they are the same, or if they are in children/ancestor relationship and their distance in the ontology is no more than 2 steps. For more information on GO ancestry and the full definitions of the ontology, see the Gene Ontology page.

Why aren't there complexes with DNA/RNA/other macromolecules?

The primary focus of DIBS is the collection of complexes where the folding of a disordered protein is coupled to the binding of an ordered protein partner. While there are proteins that adopt a stable structure upon the interaction with DNA/RNA or other molecules (such as lipids or the membrane itself) instead of proteins, such complexes are not included. The primary reason behind this is that protein-protein interactions are markedly different from protein-DNA or protein-RNA interactions and we opted to keep DIBS specific to the former.

I know a certain complex fits the above criteria, but it still isn't included in DIBS. Why?

During the construction of DIBS several databases were integrated (like PDB, UniProt, Pfam, IDEAL, DisProt and ELM) to provide a means for the systematic collection of protein complexes formed by IDPs and ordered proteins. The resulting complexes of this collection were manually curated and complemented with extensive literature searches to widen the coverage of DIBS as much as possible. However, undoubtedly there are must be complexes that would fit DIBS but are not included yet. If you know such a complex, please let us know at dibs(at)ttk.mta.hu so we can include it.

Are IDPs in DIBS disordered on their entire length? Or can they contain domains?

Many proteins are modular and contain domains that act mostly independently from each other in a structural sense. In accordance, the inclusion in DIBS only requires that one of the interacting segments found in the complex be disordered in their monomeric forms. Other regions of the same protein that do not form part of the complex can be either disordered or ordered as they do not have a primary effect on the interaction covered by DIBS. The reverse is true for the ordered partners in the interactions: regions not taking part in the interaction directly can either be ordered or disordered.

While DIBS only concentrates on the directly interacting segments of the partner proteins, it also gives an indication of the extent of the surrounding protein regions. This is found as the 'UniProt coverage' for each protein of the entries. This value describes the fraction of the whole protein that directly contributes to the interaction (and hence is visible in the corresponding structure).

How are DIBS accessions generated?

Each DIBS entry is assigned a unique accession, which is composed of the letters 'DI' at the beginning, followed by 7 digits. The first digit marks the oligomeric state of the ordered part of the complex. For example, if the interaction is between a single IDP and a single ordered protein, then this number is 1; if the ordered partner is a dimer, then this number is 2; and so on.

The second and third digits contain information about the taxonomic group(s) from which the interacting chains originate. The second digit shows the highest taxonomic group of all chains with '0' corresponding to human, '1' corresponding to all other eukaryotes, '2' meaning bacteria, '3' meaning archaea and '4' denoting viral proteins. The third digit shows the heterogeneity of the origin species of the interacting chains. It is '0' if all interacting proteins are from the same species, '1' if they cover more than one species but all are from the same taxonomic domain, and '2' if the proteins in the complex cover more than one taxonomic domain. For example the second and third digits of the entry containing the Doc:Phd toxin-antitoxin dimer from Enterobacteria phage P1 (DI1400002) are '40' as it only contains proteins from a single virus. In contrast, the second and third digits of the Retinoblastoma protein pocket domain in complex with adenovirus E1A CR1 domain (DI1020013) are '02' as it contains a human and a viral protein that belong to different taxonomic domains.

The last four digits form a randomly assigned number that guarantees the uniqueness of the accession.

Why are certain PDB structures modified?

All protein complexes that are included in DIBS have a solved structure deposited in the PDB. However, in some cases the original PDB structure does not (or does not only) show the biologically relevant, core interaction. To remedy this, in these cases we generated a modified PDB file. A description of the transformations made on the PDB structure is given for each entry where relevant. These transformations can be the omission of protein chains (to reduce possible duplicity present in the PDB structure, such as for DI1000039), the generation of protein chains (based on the biomatrices described in the PDB file, e.g. for DI4000001), or truncations of protein chains (to only include regions of proteins that mediate the highlighted interaction, e.g. DI2100002). For each entry the modified PDB files are available for download and are displayed in the embedded structure viewer.

What does 'related structures' mean?

For each complex the PDB was scanned for highly similar other structures - and the PDB IDs of such related structures are provided at the bottom of the entry pages. Two complexes are deemed related (or highly similar) if they contain the same number of proteins, and the proteins from the two structures show a sufficient degree of pairwise similarity, i.e. they belong to the same UniRef90 cluster (the full proteins exhibit at least 90% sequence identity) and convey roughly the same region to their respective interactions (the two regions from the two proteins share a minimum of 70% overlap).

How is redundancy treated in DIBS?

The basis of DIBS is the PDB database which - in certain cases - exhibits a high degree of redundancy. In order to reduce this redundancy, DIBS groups certain complexes that share a high degree of similarity (these are called 'related structures' - see above). From each such group, one complex was chosen as a representative of the interaction based on structure determination methods, quality, and source organism (NRM structures were selected if available; in case of clusters with only X-ray structures, structures with better resolution were selected; and in case of structures with the same quality, proteins from higher order taxonomic groups were favoured over others).

As the criteria for being considered 'related' is very stringent, some level of redundancy is inevitable. We believe that this amount of redundancy is useful, as it aids the comparison between similar structures emerging through different sequences.

How to search for similar sequences in DIBS?

At the moment DIBS supports only a limited way to search by sequence similarity. While no input sequence can be submitted to the server, all entries are annotated with UniRef90 cluster names and the search field facilitates the use of these cluster names as search terms. E.g. using the search term ‘UniRef90_Q71DI3’ (the UniRef90 cluster ID for human histone H3.2 and its close homologues) the DIBS server returns complexes including human, murine and drosophila histones as well.

How are ordered domain types defined/assigned?

DIBS entries are grouped according to the domain type of the ordered interacting partner. This assignation primarily relies on Pfam domain definitions (where available) and conventions used in the literature. All domain types are assigned during the manual curation step to ensure credibility. The domain types are marked for each entry and are listed for all interactions in DIBS at the ProteinMap page.

Can I use DIBS for my work?

DIBS is freely available for use in academic research - we only ask to cite DIBS if it has a substantial contribution to your project. Please use the reference below:

Eva Schad, Erzsébet Fichó, Rita Pancsa, István Simon, Zsuzsanna Dosztányi and Bálint Mészáros:
DIBS: a repository of disordered binding sites mediating interactions with ordered proteins
Bioinformatics. 2018 February 1; 34(3):535-537
PMID: 29385418
doi: 10.1093/bioinformatics/btx640

If you would like to use DIBS in a non-academic environment, please contact us at dibs(at)ttk.mta.hu