University of Nebraska–Lincoln

ABOUT AllergenOnline

The Food Allergy Research and Resource Program (FARRP) Protein AllergenOnline Database has been updated to version 8.0 in January, 2008. Version 8.0 contains a comprehensive list (1313 sequence entries) of unique proteins of known and putative allergenic proteins (food, airway, venom/salivary and contact). A number of the allergenic wheat gliadins or glutenins may also cause celiac disease and those are listed if there is evidence of IgE binding. This version replaces version 7.0 with 1251 sequences, which was posted in January, 2007. Sequences and other reference information for version 8.0 were downloaded in June, 2007 and compared with all sequences contained in version 7.0 including those listed as allergens and putative allergens and those that were not listed as they were judged to have insufficient evidence to be defined as allergens in version 7.0. Duplicate entries and corrections due to updates in NCBI entries were entered. The result was a unique list of 2256 sequences. The 2256 sequences were divided into groups based on taxonomic identity (genus and species) and sequence comparison to classify the sequences into homologues similar to the concept of isoallergens of IUIS (International Union of Immunological Societies). Source and sequence information as well as data from allergy studies were gleaned from AllergenOnline version 7.0, IUIS, Allergome, PubMed and individual publications. This information was reviewed for each group of sequences as described below. The resulting database includes 1313 sequences from 229 species that are clustered into 483 protein groups with sufficient evidence judged by the panel to be called allergens or putative allergens. The remaining sequences were judged to have insufficient evidence to suggest they cause allergy.

REMOVAL OF "FALSE" ENTRIES

Sequences that are only associated with an institute of allergy, or that are listed as merely being hypothetical translation products, or "homologues" of, or "similar to" an allergen were not included in version 8.0, unless there was some additional information, such as source, or IgE binding that would suggest the protein might be an allergen. Sequences that are from model genomes that are not associated reasonably with allergens (e.g. Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans, etc.) were not included. However, proteins from allergenic species that are also genetic models (rice, mice and corn) that have information suggesting allergenicity are included. Proteins that are obviously merely associated with an allergic response (e.g. cytokines, chemokines, transcription factors) were not included. The review process was accomplished by a panel of academic experts (see the peer review panelists listed below and on the Home page). The final candidate list of 2256 entries reviewed for version 8.0 was categorized into related taxonomic/protein homologue sequence groups (similar to . The panel reviewed published studies reporting the "proof" of allergy associated with the sequence groups and categorized the groups leading to selection of the final list of 1313 sequences that are included in version 8.0.

PEER REVIEW PROCESS FOR CATEGORIZING SEQUENCE GROUPS: PROOF OF ALLERGENICITY

Goal: To curate the list of sequences included in AllergenOnline version 8.0 and future updates, including only protein sequences that are supported by evidence demonstrating that the protein is a proven allergen or that there is substantial proof of allergy to the source of the protein as well as immunoglobulin E (IgE) binding to the specific protein using sera from individuals with allergies to the source.

Rationale: The AllergenOnline database is intended for use as a tool for evaluating the safety of proteins included in foods through processing or genetic modification. The Codex Alimentarius Guidelines (2003) established a process for evaluating potential allergenicity based on evidence that the protein is likely to cause allergic reactions in consumers. A key component in the evaluation process is comparison of candidate products (proteins) with those of known allergens using a bioinformatics approach such as FASTA or BLASTP local alignment tools to identify proteins that would require further testing by serum IgE binding and/or clinical testing to evaluate safety. It is therefore important to have scientific evidence that the database entries are allergens or probable (putative) allergens in order to maximize the reliability of bioinformatics searches.

Peer Review Process: In 2005, FARRP brought together a panel of seven food allergy experts to define criteria for inclusion in future versions of the database. A protocol was developed for including sequences for consideration, for classifying sequences into groups (allergen, putative allergen, insufficient evidence), collecting publications for review, providing information to reviewers and finally for voting to accept sequences as allergens or putative allergens.

Criteria for three classes of assignment were agreed to: Allergen is a protein that has been demonstrated to specifically bind IgE using sera from individuals with clear allergies to the source of the gene/protein and further that the protein causes basophil activation or histamine release, skin test reactivity or challenge test reactivity using subjects allergic to the source. Putative allergen is a protein that has met most of the criteria of an allergen, but has a missing component, usually biological activity (basophil activation or in vivo reactivity). Both Allergens and Putative Allergens are retained in the list of sequence searchable protein entries in version 8.0. The third category, those with Insufficient Evidence of Allergenicity, have not been retained in the sequence searchable protein list because they were judged to be lacking critical evidence of specific IgE binding, the serum donors were not demonstrated to be allergic to the source and there was no allergic biological activity demonstrated for the protein. The proteins categorized as "Insufficient" are maintained in a list for future annual reviews as new candidate "allergens" are identified from NCBI and the published literature. If new evidence suggestions a need for reclassification, the "Insufficient" proteins would be included in future versions of the database through the annual review process.

The amount and quality of published objective data supporting the classification of various proteins as allergens varies remarkably. For many food, airway or contact allergens there is unquestionable objective data of the identity, characterization and purity of the protein and clear evidence that human subjects with relevant allergic histories and symptoms were tested to demonstrate reactions upon challenge, or at least clear evidence of specific IgE binding. However, there are also a number of proteins labeled as allergens in the literature or in the NCBI sequence database for which there is not sufficient objective data characterizing the protein used in testing, or data to demonstrate human reactivity or specific IgE binding. The objective of this peer review process is to review the collective literature for individual proteins and classify them based on common criteria.

An initial review was performed for all groups by Dr. Goodman at FARRP. Each sequence group was assigned to an individual primary reviewer (from the other six reviewers). The primary reviewer compiled any additional information and provided their opinion for classification. An individual secondary reviewer (from the other five reviewers) reviewed all data including information from the primary reviewer, added any additional information and provided their opinion. Finally all seven experts reviewed the classification of entries in the dataset. The final review of the 1313 sequences in AllergenOnline version 8.0 database was completed in December, 2007. All data of sequences and classification is archived by FARRP at the University of Nebraska.

PEER REVIEW PANEL

Ebisawa, Motohiro, MD, Pediatric Allergy, National Sagamihara Hospital, Japan
Fatima Ferreira, PhD, University of Salzburg, Austria
Goodman, Rick, PhD, FARRP, University of Nebraska, USA
Sampson, Hugh, MD, Pediatric Allergy, Mount Sinai Medical Center, New York, USA
Taylor, Steve, PhD, FARRP, University of Nebraska, USA
van Ree, Ronald, PhD, University of Amsterdam, The Netherlands
Vieths, Stefan, PhD, Paul-Ehrlich-Institut, Germany

ALLERGEN DATABASE SEARCH ROUTINES

This website includes a sequence comparison routine, FASTA(Pearson and Lipman, 1988) which may be used to compare a protein sequence (the query sequence) to entries in the allergen database.  This version of the FASTA search interface utilizes the FASTA3 (Pearson, 2000) algorithm.  The purpose of the comparison routine is to evaluate whether the query protein sequence is identical to, or homologous with known or putative allergens in the database.  Alignments with high identity scores may indicate a potential for allergenic cross-reactions.  However, there is not sufficient scientific data to establish a simple scoring boundary (E-score or percent identity), beyond which cross-reactivity is certain, or below which cross-reactivity is not possible.  Based on historical data, cross-reactivity is not likely for proteins with less than 50% identity over the entire protein sequence, and is fairly common above 70% identity (Aalberse, 2000).  In most cases experimental studies would be needed to confirm that two sequence similar proteins may cause allergic cross-reactions. Evaluation of literature regarding the matched allergen would help to identify appropriately allergic study subjects.

In addition to the full-length FASTA search, we have added an option to automatically scan each possible 80 amino acid segment (1-80, 2-81, 3-82, etc.) of the entered search protein against the AllergenOnline database, looking for matches of at least 35% identity.  The 35% identity for 80 amino acid segments was suggested in a scientific advisory to regulators for evaluating proteins in genetically modified crops (see FAO/WHO 2001, and Codex 2003).  This short segment matching routine evaluating segments of 80 amino acids appears to be quite conservative, and precautionary as discussed in Goodman et al. (2005) and Goodman and Hefle (2005).  However, the 80 amino acid segment search appears to be far more likely to be informative than a search for shorter identical segments of 6 or 8 contiguous amino acids as originally recommended by Metcalfe et al. (1996) or the FAO/WHO 2001 approach, based on evaluations by Hileman et al., (2002) and Silvanovich et al. (2006).  See also the summary report from the bioinformatics workshop on evaluating potential allergenicity (Goodman, 2006).

REFERENCES

For bioinformatic analysis:

    • Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85:2440-2448.
    • Pearson, W.R. 2000. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132:185-219.

For protein sequence (structure) and allergenicity:

    • Aalberse, R.C. 2000. Structural biology of allergens. J. Allergy Clin. Immunol. 106:228-238.
    • Aalberse, R. C., and Stapel, S. O. 2001. Structure of food allergens in relation to allergenicity. Pediatr Allergy Immunol 12:10-4.
    • Codex Alimentarius Commission. 2003. Alinorm 03/34: Appendix III. Guideline for the conduct of food safety assessment of foods derived from recombinant DNA plants. Annex IV. Annex on the assessment fo possible allergenicity, Rome, Italy.
    • Doolittle, R.F. in Methods in Enzymology Vol. 183. Molecular evolution: Computer analysis of protein and nucleic acid sequences, R. F. Doolittle, Ed. (Academic Press, Inc., San Diego, 1990), chap. 6.
    • FAO/WHO 2001. Evaluation of allergenicity of genetically modified foods derived from biotechnology. Rome, Italy.
    • Goodman, RE, Hefle, SL. 2005. Gaining perspective on the allergenicity assessment of genetically modified food crops. Expert Rev. Clin. Immunol. 1(4):561-578.
    • Goodman, RE, Hefle, SL, Taylor SL, van Ree, R. 2005. Assessing genetically modified crops to minimize the risk of increased food allergy. Int. Arch. Allergy Immunol. 137(2):153-166.
    • Goodman RE. 2006. Practical and predictive bioinformatics methods for the identification of potentially cross-reactive protein matches. Mol Nutr Food Res 50:655-660.
    • Goodman RE, Vieths S, Sampson HA, Hill D, Ebisawa M, Taylor SL, van Ree R. 2008. Allergenicity assessment of genetically modified crops – what makes sense? Nat Biotech 26(1):73-81.
    • Hileman, R.E., Silvanovich, A., Goodman, R.E., Rice, E.A., Holleschak, G., Astwood, J.D. and Hefle, S.L. 2002. Bioinformatic methods of allergenicity assessment using a comprehensive allergen database. Int. Arch. Allergy Immunol. 128:280-291.
    • Ladics GS, Bannon GA, Silvanovich A, Cressman RF. 2007. Comparison of conventional FASTA identity searches with the 80 amino acid sliding window FASTA search for the elucidation of potential identities to known allergens. Mol Nutr Food Res 51(8):985-998.
    • Metcalfe, D. D., Astwood, J. D., Townsend, R., Sampson, H. A., Taylor, S. L., and Fuchs, R. L. 1996. Assessment of the allergenic potential of foods derived from genetically engineered crop plants. Crit Rev Food Sci Nutr 36 Suppl:S165-86.
    • Silvanovich A, Nemeth MA, Song P, Herman R, Tagliani, L, Bannon, GA. 2006. The value of short amino acid sequence matches for prediction of protein allergencity. Toxicol. Sci. 90(1):252-258.
    • Thomas K, Bannon G, Hefle S, Herouet C, Holsapple M, Ladics G, MacIntosh S, Privalle L. In silico methods for evaluating human allergenicity of novel proteins: International Bioinformatics Workshop Meeting Report: Mallorca, Spain, February 23-24, 2005. Tox Sciences.

Additional or alternative bioinformatics tools and databases may also be useful for the evaluation of potential allergens (see also LINKS):

    • Brusic, V., Petrovsky, N., Gendel, S.M., Millot, M., Gigonzac, O., Stelman, S.J. 2003. Computational tools for the study of allergens. Allergy 58:1083-1092.
    • Brusic, V., Petrovsky, N., Gendel, S.M., Millot, M., Gigonzac, O., Stelman, S.J. 2003. Allergen databases. Allergy 58:1093-1100.
    • Kleter, G.A. and Peijnenburg, A.A.C.M. 2002. Screening of transgenic proteins expressed in transgenic food crops for the presence of short amino acid sequences identical to potential, IgE=binding linear epitopes of allergens. BMC Structural Biology 2:8.
    • Ivanciuc, O., Schein, C.H., Braun, W. 2003. SDAP: database and computational tools for allergenic proteins. Nuc. Acids Res. 31:359-362.
    • Malandain, H. 2004. Basic immunology, allergen prediction and bioinformatics Allergy 59:1011-1012.
    • Stadler, M.B. and Stadler, B.M. 2003. Allergenicity prediction by protein sequence. FASEB J. 17:1141-1143.