PSC Logo Harvest-Seq is a toolkit for manipulating protein sequences, sequence alignments, phylogenetic trees and information associated with protein sequences (identifiers, function names, and references to the literature). In addition, conserved residues and motifs for a protein family can be visualized on a representative protein structure through JMol or loading a Harvest-Seq created VMD state file. Many other useful bioinformatics analyses will be forthcoming. Harvest-Seq can be downloaded as a command line program (coming soon) or used through the web interface.

Motivation: It is not uncommon anymore to find hundreds of protein sequences belonging to the same protein family or superfamily. One approach to handling this large amount of data is to define a smaller representative set though if care is not taken less robust or even erroneous conclusions can be reached. In addition, even after finding a representative set, you may still be left with lots of sequences. Harvest-Seq was developed to leverage the vast amount of data being generated in genome sequencing projects as a means to collaborate on large-scale sequence-based bioinformatics research (~20-1000s of sequences). The organization of the vast amounts of data generated through such bioinformatics projects can be overwhelming to researchers but if done properly can lead to insightful evolutionary relationships and important protein structure-function relationships useful for protein design and rational drug design strategies.