Artemis is a free genome browser and annotation tool that allows visualization of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation.
Artemis is written in Java, and is available for UNIX, Macintosh and Windows systems. It can read EMBL and GENBANK database entries or sequence in FASTA, indexed FASTA or raw format. Other sequence features can be in EMBL, GENBANK or GFF format.
Please, download the Artemis package manual.
Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample. Braken uses the taxonomy labels assigned by Kraken, a highly accurate metagenomics classification algorithm, to estimate the number of reads originating from each species present in a sample. Kraken classifies reads to the best matching location in the taxonomic tree, but does not estimate abundances of species. We use the Kraken database itself to derive probabilities that describe how much sequence from each genome is identical to other genomes in the database, and combine this information with the assignments for a particular sample to estimate abundance at the species level, the genus level, or above. Combined with the Kraken classifier, Bracken produces accurate species- and genus-level abundance estimates even when a sample contains two or more near-identical species.
Bracken is open source software under the GNU General Public License v3.0
Dendroscope is an interactive computer software program written in Java for viewing Phylogenetic trees. This program is designed to view trees of all sizes and is very useful for creating figures. Dendroscope can be used for a variety of analyses of molecular data sets but is particularly designed for metagenomics or analyses of uncultured environmental samples.
Dendroscope was developed by Daniel Huson and his colleagues at the University of Tübingen in Germany.
Please, download the Dendroscope package manual.
FastQC aims to provide a simple way to do some quality control checks on raw sequence
data coming from high throughput sequencing pipelines. It provides a modular set of
analyses which you can use to give a quick impression of whether your data has any
problems of which you should be aware before doing any further analysis.
The main functions of FastQC are
- Import of data from BAM, SAM or FastQ files (any variant)
- Providing a quick overview to tell you in which areas there may be problems
- Summary graphs and tables to quickly assess your data
- Export of results to an HTML based permanent report
- Offline operation to allow automated generation of reports without running the interactive application
HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences. It does this by comparing a profile-HMM to either a single sequence or a database of sequences. Sequences that score significantly better to the profile-HMM compared to a null model are considered to be homologous to the sequences that were used to construct the profile-HMM. Profile-HMMs are constructed from a multiple sequence alignment in the HMMER package using the hmmbuild program.
Jmol is a free, open source molecule viewer for students, educators, and researchers in chemistry and biochemistry. It is cross-platform, running on Windows, Mac OS X, and Linux/Unix systems.
What Jmol can do:
- Free, open-source software licensed under the GNU Lesser General Public License
- Applet, Application, and Systems Integration Component
The JmolApplet is a web browser applet that can be
integrated into web pages. It is ideal for development
of web-based courseware and web-accessible
chemical databases. The JmolApplet provides an upgrade path
for users of the Chime plug-in.
The Jmol application is a standalone Java application
that runs on the desktop.
- The JmolViewer can be integrated as a component into other Java applications.
- The JmolApplet is a web browser applet that can be
- Cross-platform (Windows, Mac OS X, Linux / Unix
- Supports all major web browsers
- High-performance 3D rendering with no hardware requirements
- Accepts many file formats
- Animations, Vibrations, Surfaces, Orbitals
- Support for unit cell and symmetry operations
- Schematic shapes for secondary structures in biomolecules
- Measurements: distance, angle, torsion angle
- Support for RasMol/Chime scripting language
- Exports to jpg, png, gif, ppm, pdf, POV-Ray, Gaussian, Maya, vrml, x3d, idtf, web page
Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
Kraken is open source software under the GNU General Public License v3.0
Mauve is a system for efficiently constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. Multiple genome alignment provides a basis for research into comparative genomics and the study of evolutionary dynamics. Aligning whole genomes is a fundamentally different problem than aligning short sequences.
Mauve has been developed with the idea that a multiple genome aligner should require only modest computational resources. It employs algorithmic techniques that scale well in the amount of sequence being aligned. For example, a pair of Y. pestis genomes can be aligned in under a minute, while a group of 9 divergent Enterobacterial genomes can be aligned in a few hours.
MEGAN – MEtaGenome ANalyzer provides tools for optimized analysis of large metagenomic datasets.
Metagenomics is the analysis of the genomic sequences from a usually uncultured environmental sample. A large term goal of most metagenomics is to inventory and measure the extent and the role of microbial biodiversity in the ecosystem due to discoveries that the diversity of microbial organisms and viral agents in the environment is far greater than previously estimated. Tools that allow the investigation of very large data sets from environmental samples using shotgun sequencing techniques in particular, such as MEGAN, are designed to sample and investigate the unknown biodiversity of environmental samples where more precise techniques with smaller, better known samples, cannot be used.
Mothur seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.
Mothur is open source software under the GNU General Public License
mothur_krona_XML.py: A script converting mothur’s taxonomy summaries into Krona-compatible XML files.
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
Program features include:
- A common command-line interface across Macintosh, Windows, and UNIX operating systems;
- Extensive help available from the command line;
- Analysis of nucleotide, amino acid, restriction site, and morphological data;
- Mixing of data types, such as molecular and morphological characters, in a single analysis;
- Easy linking and unlinking of parameters across data partitions;
- An abundance of evolutionary models, including 4 X 4, doublet, and codon models for nucleotide data and many of the standard rate matrices for amino acid data;
- Estimation of positively selected sites in a fully hierarchical Bayesian framework;
- Full integration of the BEST algorithms for the multi-species coalescent.
- Support for complex combinations of positive, negative, and backbone constraints on topologies;
- Model jumping across the GTR model space and across fixed rate matrices for amino acid data;
- Monitoring of convergence during the analysis, and access to a wide range of convergence diagnostics tools after the analysis has finished;
- Rich summaries of posterior samples of branch and node parameters printed to majority rule consensus trees in FigTree format;
- Implementation of the stepping-stone method for accurate estimation of model likelihoods for Bayesian model choice using Bayes factors;
- The ability to spread jobs over a cluster of computers using MPI (for Macintosh (OS X) and UNIX environments only);
- Support for the BEAGLE library, resulting in dramatic speedups for codon and amino acid models on compatible hardware (NVIDIA graphics cards);
- Checkpointing across all models, allowing the user to seemlessly extend a previous analysis or recover from a system crash;
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences.
Pilon is a software tool which can be used to:
- Automatically improve draft assemblies
- Find variation among strains, including large event detection
Pilon requires as input a FASTA file of the genome along with one or more BAM files of reads aligned to the input FASTA file. Pilon uses read alignment analysis to identify inconsistencies between the input genome and the evidence in the reads. It then attempts to make improvements to the input genome, including:
- Single base differences
- Small indels
- Larger indel or block substitution events
- Gap filling
- Identification of local misassemblies, including optional opening of new gaps
Pilon then outputs a FASTA file containing an improved representation of the genome from the read data and an optional VCF file detailing variation seen between the read data and the input genome.
Pilon is distributed under GNU General Public License v2.0
PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results. PLINK (one syllable) is being developed by Shaun Purcell at the Center for Human Genetic Research (CHGR), Massachusetts General Hospital (MGH), and the Broad Institute of Harvard & MIT, with the support of others.
Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee. Key features of Prodigal include:
- Speed: Prodigal is an extremely fast gene recognition tool (written in very vanilla C). It can analyze an entire microbial genome in 30 seconds or less.
- Accuracy: Prodigal is a highly accurate gene finder. It correctly locates the 3′ end of every gene in the experimentally verified Ecogene data set (except those containing introns). It possesses a very sophisticated ribosomal binding site scoring system that enables it to locate the translation initiation site with great accuracy (96% of the 5′ ends in the Ecogene data set are located correctly).
- Specificity: Prodigal’s false positive rate compares favorably with other gene identification programs, and usually falls under 5%.
- GC-Content Indifferent: Prodigal performs well even in high GC genomes, with over a 90% perfect match (5’+3′) to the Pseudomonas aeruginosa curated annotations.
- Metagenomic Version: Prodigal can run in metagenomic mode and analyze sequences even when the organism is unknown.
- Ease of Use: Prodigal can be run in one step on a single genomic sequence or on a draft genome containing many sequences. It does not need to be supplied with any knowledge of the organism, as it learns all the properties it needs to on its own.
- Open Source: Prodigal source code is freely available under the General Public License.
Prokka: rapid prokaryotic genome annotation
Whole genome annotation is the process of identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files.
QUAST evaluates genome assemblies.
QUAST stands for QUality ASsessment Tool. QUAST works both with and without a reference genome.
The tool accepts multiple assemblies, thus is suitable for comparison..
QUAST is open source software under the GPL v2
RasMol is a molecular graphics program intended for the visualisation of proteins, nucleic acids and small molecules. The program is aimed at display, teaching and generation of publication quality images. RasMol runs on wide range of architectures and operating systems including Microsoft Windows, Apple Macintosh, UNIX and VMS systems.
The program reads in molecular coordinate files and interactively displays the molecule on the screen in a variety of representations and colour schemes. Supported input file formats include Protein Data Bank (PDB), Tripos Associates’ Alchemy and Sybyl Mol2 formats, Molecular Design Limited’s (MDL) Mol file format, Minnesota Supercomputer Center’s (MSC) XYZ (XMol) format, CHARMm format, CIF format and mmCIF format files.
RNAmmer predicts genes for ribosomal RNA (rRNA) in full genome sequences using hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project.
RNAmmer 1.2 for Linux
RNAmmer: consistent and rapid annotation of ribosomal RNA genes
Please, read the RNAmmer installation manual.
Note: RNAmmer uses HMMER tools, you can download HMMER tool from the Other tools section.
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or in a browser connected to RStudio Server or RStudio Server Pro (Debian/Ubuntu, RedHat/CentOS, and SUSE Linux).
RStudio is open source software under the AGPL v3
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data. It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify. All you need to run Salmon is a FASTA file containing your reference transcripts and a (set of) FASTA/FASTQ file(s) containing your reads. Optionally, Salmon can make use of pre-computed alignments (in the form of a SAM/BAM file) to the transcripts rather than the raw reads.
Salmon is open source software under the GNU General Public License v3.0
Don’t forget to read the Salmon documentation!
SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.
Staden is a fully developed set of DNA sequence assembly (Gap4 and Gap5), editing and analysis tools (Spin) for Unix, Linux, MacOSX and MS Windows
The Staden package consists of a number of different programs. The main components are:
- pregap4 – base calling with Phred, end clipping, and vector trimming.
- trev – trace viewing and editing
- gap4 – sequence assembly, contig editing, and finishing
- gap5 – assembly visualisation, editing and finishing of NGS data
- Spin – DNA and protein sequence analysis
Staden Package 2.0.0b10 (UNIX 32-bit)
Staden Package 2.0.0b10 (UNIX 64-bit)
Staden Package 2.0.0b10 (MacOS X)
Staden Package 2.0.0b10 (Windows 32-bit)
Staden Package 2.0.0b10 (Windows 64-bit)
Staden Package 2.0.0b10 (Sources)
Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.
- High-performance visualization and data navigation.
- Display of reads in both packed and stacked formats.
- File format support for ACE, AFG, MAQ, SOAP2, SAM, BAM, FASTA, FASTQ, and GFF3.
- Import GFF3, VCF, GTF or BED features and quickly find/highlight/display them.
- Search and locate reads by name or subsequence across entire data sets.
- Paired end visualization support (for SAM/BAM).
- Entire-contig overviews, showing data layout or coverage information.
- Simple install routine via auto-updating graphical installers.
- Support for Windows, Apple Mac OS X and Linux, in 32 and 64-bit.
Tablet is open source software under the standard BSD 2-Clause License
tRNAscan-SE identifies transfer RNA genes in genomic DNA or RNA sequences. It combines the specificity of the Cove probabilistic RNA prediction package (Eddy & Durbin, 1994) with the speed and sensitivity of tRNAscan 1.3 (Fichant & Burks, 1991) plus an implementation of an algorithm described by Pavesi and colleagues (1994) which searches for eukaryotic pol III tRNA promoters (our implementation referred to as EufindtRNA). tRNAscan and EufindtRNA are used as first-pass prefilters to identify “candidate” tRNA regions of the sequence. These subsequences are then passed to Cove for further analysis, and output if Cove confirms the initial tRNA prediction. In this way, tRNAscan-SE attains the best of both worlds:
- a false positive rate of less than one per 15 billion nucleotides of random sequence
- the combined sensitivities of tRNAscan and EufindtRNA (detection of 99% of true tRNAs)
- search speed 1,000 to 3,000 times faster than Cove analysis and 30 to 90 times faster than the original tRNAscan 1.3 (tRNAscan-SE uses both a code-optimized version of tRNAscan 1.3 which gives a 650-fold increase in speed, and a fast C implementation of the Pavesi et al. algorithm).
This program and results of its analysis of a number of genomes have been published in Lowe & Eddy, Nucleic Acids Research 25: 955-964 (1997).
UGENE is free open-source bioinformatics software. It works on a desktop computer with Windows, Mac OS X or Linux.
UGENE helps biologists to analyze various biological data, such as sequences, annotations, multiple alignments, phylogenetic trees, NGS assemblies, and others. The data can be stored both locally (on a personal computer) and on a shared storage (e.g. a lab database).
UGENE integrates dozens of well-known biological tools and algorithms, as well as original tools in context of genomics, evolutionary biology, virology and other branches of life science. UGENE provides a graphical interface for the pre-built tools so biologists without programming skills can access those tools more easily.
Universal binary packages:
To use UGENE binary package unpack the archive and execute ./ugene -ui to launch UGENE GUI.
By default ugene script launches the command-line version of UGENE.
Packages for Windows Vista, Windows 7 and higher Windows versions
- Download 32-bit Standard or Full installer package
- Download 64-bit Standard or Full installer package
Packages for Windows XP (without support of CUDA and OpenCL):
- Download 32-bit portable Full package
Mac OS X
Packages for Mac OS X 10.7 and higher: