Bioruby mini-series: The Sequence class

Bioruby is a bioinformatics ruby package for analysis of biological sequences. In my quest to become a bioruby guru i have decided to poke the bioruby API and all available tutorials to better understand this fantastic library written by the bioruby team of developers. My journey will be logged here as the bioruby mini series. We start with an introductory overview of the sequence class.

To use the library you need to have a ruby interpreter installed , preferably ruby 1.8.5 and above . To install bioruby as a gem, do:
sudo gem install bio

This will install Bioruby version 1.1.0 and it comes with its own shell as well.

Type bioruby on the command prompt and you will see this:

Loading config (/.bioruby/shell/session/config) … done
Loading object (/.bioruby/shell/session/object) … done
Loading history (/.bioruby/shell/session/history) … done

. . . B i o R u b y i n t h e s h e l l . . .

Version : BioRuby 1.1.0 / Ruby 1.8.6


Now we ready to rock and roll! I dug in to the API and extracted some useful information for us.

The Bio::Sequence class

This is the primary sequence class and deals with sequence translation and transformations. It inherits from ruby’s string class which means that you can use ruby’s string methods with the Bio::Sequence class just like you would with a string.

The Bio::Sequence class object is a wrapper around the actual sequence and it is represented as either a Bio::Sequence::NA or a Bio::Sequence::AA. and responds to all the methods that are defined for both NA and AA classes. This class has the following methods:

  • auto – This will guess the type of sequence provided and return the appropriate Bio:Sequence class for the given string, either a Bio::Sequence::AA or a Bio::Sequence::NA
  • new – Creates a new Bio::Sequence object. It does not initialize the object in to any of the bioruby objects. It returns a string.
  • aa – Will transform your current Bio::Sequence object to a Bio::Sequence::AA object. It will change your current object i.e it will transform a Bio::Sequence::NA to a Bio::Sequence::AA which is undesirable. So it needs to be used only when you are sure of the type of sequence you are working with.
  • na – works the same as the aa method above but the returned object is a Bio::Sequence::NA
  • output – It returns a string with the current Bio::Sequence object formatted with the given style. The supported styles are fasta, genbank and embl. The style argument is passed as a ruby symbol eg :fasta
  • to_s – it returns the sequence as a string leaving the original sequence unaltered. The to_str is an alias for this method

Bio::Sequence::NA class

This class wraps a nucleic acid sequence. It provides a number of methods to work with a DNA sequence as demonstrated in the example below.

Dr Optimist has finally finished his long awaited sequencing project code named Sikwensi. The nucleic acid sequence for a chromosome for which he won’t reveal any further details is shown below.

He calls his trusted ruby programmer to help analyze the sequence and tear it base by base. The guy gets to work.

require ‘bio’

bio_seq = ‘gacagatggacatggactagagctgct’) #=> bio_seq is now a Bio::Sequence::NA object

#get the number of codons in the sequence

bio_seq.window_search(3,3) {|codon| puts codon}

# complemental sequence

bio_seq.complement (Bio::Sequence::NA object)

# gets subsequence of positions 4 to 14
bio_seq.subseq(4,14) # he thinks the subsequence is interesting and worth

bio_seq.gc_percent #what is the gc content?

bio_seq.composition # nucleic acid compositions (returns a Hash)

bio_seq.translate # translation ( returns a Bio::Sequence::AA object)
bio_seq.translate(2) # translation from frame 2 (The default is frame 1)
bio_seq.translate(1,11) # using codon table No.11 (bacteria) # shows three-letter codes ( returns an Array)
bio_seq.translate.names # shows amino acid names (returns an Array)
bio_seq.translate.composition # amino acid compositions (returns a Hash)
bio_seq.translate.molecular_weight # calculating molecular weight (returns Float)

bio_seq.complement.translate # translation of complemental strand

A tutorial written by Katayama Toshiaki can be found here and translated to English by Naohisa Goto. (Thank you guys!)



  1. Pingback: Bioinformatics Zen » Why data testing is important in computational research

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s