Bioruby mini-series: The Bio::Sequence::Common class


Sequence Transformation

Lets have a look at the Bio::Sequence::Common class module which provides us with most of the sequence transformation methods for biological sequences.

Bio::Sequence::Common

Implements methods which are common to both Bio::Sequence::AA and Bio::Sequence::NA, for example

A Bio::Sequence object is easily created like this;

require ‘bio’

my_dna = Bio::Sequence.auto("actagatatttgat") #=> actagatatttgat

my_dna is now a Bio::sequence object and you can use the various methods available for this class, which we are going to explore shortly.

Bio::Sequence::Common Non Modifying methods

  • to_s

    This method returns a sequence as a string. It does not modify the original sequence.

    puts my_dna.to_s #=> actagatatttgat

    puts my_dna.to_s.class #=> String

    An alias for this method is the to_str method.

    my_dna.to_str
    #=> actagatatttgat

  • seq

    This method will return a new Bio::Sequence::NA or Bio::Sequence::AA object. The original sequence remains unchanged. For example if you wished to assign a new instance of my_dna object that we created above ,such that you have a my_dna2 object, you would create that as follows,

    my_dna2 = my_dna.seq

    puts my_dna2 #=> actagatatttgat

    puts my_dna2.class #=>
    Bio::Sequence::NA

Bio::Sequence::Common modifying methods

  • Normalize!

    This method removes all the white space and transforms all positions to uppercase if the sequence is an amino acid (AA) or transforms all positions to lowercase if the sequence is a nucleic acid (NA) sequence, leaving the original sequence modified

    For example

    test_seq = Bio::Sequence::NA.new(“ACTG”)

    puts test_seq.normalize! #=>
    actg

  • Concatenating

    Many times we want to append a new sequence or a set of bases/residues eg a poly A sequence to the end of a new sequence and modify the original sequence. This is achieved by the concat method.
    It is also referred to as << method.

    test_seq = Bio::Sequence::NA.new(“actg”)

    test_seq << “acagat”

    test_seq concat “acagat”

    puts test_seq #=>
    actgacagat

Note that to create a new sequence that adds to an existing sequence without altering the original sequence you would use the + operator. It accepts a variable number of arguments. For example

test_seq = Bio::Sequence::NA.new(“actg”)

test_seq2 = test_seq + (“cttcccttttt” “tatatata”)

puts test_seq2 #=>
actgcttcccttttttatatata

puts test_seq #=>actg

Working with subsequences

Please note that biological sequence numbering convections are one based as opposed to ruby’s zero based. Biological coordinate’s convection for BioSQL and Chado is zero based.

  • Subseq

    This method returns a new sequence containing the subsequence identified by the start and end values given as parameters. This method works in a similar way to the slice string method. For example

    my_seq = Bio::Sequence::NA.new(“agggatttc”)

    puts my_seq.subseq(2,5) #=>
    ggga

    The first argument denotes the start and the second argument denotes the end of the subsequence. Both arguments must be positive integers

    When this method is used without arguments, the start defaults to 1 and the end defaults to the last element of the string. Therefore when subseq is called without any arguments, it returns a new sequence similar to the original sequence.

    puts my_seq.subseq #=> agggatttc

  • window_search

    This method is typically used with a block. The method is called if you wanted to step through a sequence given a length of a subsequence. Therefore the method accepts two arguments. Step_size which defines the size of your ‘steps’ and the window_size which defines the length of the stepping subsequence. Any remaining sequence at the terminal end will be returned. The default step size is one since its an optional argument.

    For example

    To print the average GC% on each 100bp you can write,

    s.window_search(100) do |subseq|

    puts subseq.gc

    end

Advertisements

6 comments

  1. Jay

    Thanks for the post, George. Could you also please post something that shows a newbie how to integrate BioRuby with Rails? For ex., if I have a database backed Rails app showing gene and protein sequences, organism, taxonomy etc, how do I integrate BioRuby with the app such that I can mine data from PubMed and GenBank and populate the tables created through Rails migrations
    Thanks a lot!

  2. biorelated

    You are welcome Jay. Integrating bioruby with rails is really easy. Install the bioruby gem
    gem install bio

    Then add this line to your environment.rb file.
    ‘require bio’

    and you are set.
    for example you can use a model assuming that the model stores some sequence data as follows

    #get an array of sequences given an array of feature ids
    #This method will return a fasta formatted file of the sequence data
    #The feature model has a name field and a residues field.

    def self.send_to_fasta(*accessions)
    features = Feature.find(accessions)
    @my_fasta = []
    features.each do |feature|
    name = feature.name
    seq = Bio::Sequence.auto(feature.residues.to_s)
    @fasta_file = seq.to_fasta(name.to_s, 80)
    @my_fasta << @fasta_file
    end
    return @my_fasta
    rescue
    puts “wrong names or accession numbers”
    end

    A real easy example is this method within a model

    #returns a bio-sequence length given a sequence string
    def self.bioseqlen(seq_value)
    bio_seq = Bio::Sequence.auto(seq_value)
    residues_count = bio_seq.seq.length
    return residues_count
    end

    It can be called as follows
    Model.bioseqlen(sequence_string)

    Once i get sometime over the weeekend i will write a more detailed fuctionality

  3. shev

    Ah, this has come a bit late, but I agree with the guys saying thank you :)

    I have however one suggestion, can the colours be adjusted a bit? the thin bright-blue on white background is a bit difficult to read for my bad eyes :-)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s