CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.  Originally developed by This email address is being protected from spambots. You need JavaScript enabled to view it. at Dr. Adam Godzik's Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute), CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.





  • Bridges

To see what versions of CD-HIT are available type

module avail cd-hit

To see what other modules are needed, what commands are available and how to get additional help type

module help  cd-hit

To use CD-HIT, include a command like this in your batch script to load the CD-HIT module:

module load cd-hit

Be sure you also load any other modules needed, as listed by the module help cd-hit command.

User Information

Connect to PSC systems:
Technical questions:

Send mail to or call the PSC hotline: 412-268-6350.