
Sequence Transformation
Lets have a look at the Bio::Sequence::Common class module which provides us with most of the sequence transformation methods for biological sequences.
Bio::Sequence::Common
Implements methods which are common to both Bio::Sequence::AA and Bio::Sequence::NA, for example
A Bio::Sequence object is easily created like this;
require ‘bio’
my_dna = Bio::Sequence.auto("actagatatttgat") #=> actagatatttgat
my_dna is now a Bio::sequence object and you can use the various methods available for this class, which we are going to explore shortly.
Bio::Sequence::Common Non Modifying methods
-
to_s
This method returns a sequence as a string. It does not modify the original sequence.
puts my_dna.to_s #=> actagatatttgat
puts my_dna.to_s.class #=> String
An alias for this method is the to_str method.
my_dna.to_str
#=> actagatatttgat
-
seq
This method will return a new Bio::Sequence::NA or Bio::Sequence::AA object. The original sequence remains unchanged. For example if you wished to assign a new instance of my_dna object that we created above ,such that you have a my_dna2 object, you would create that as follows,
my_dna2 = my_dna.seq
puts my_dna2 #=> actagatatttgat
puts my_dna2.class #=>
Bio::Sequence::NA
Bio::Sequence::Common modifying methods
-
Normalize!
This method removes all the white space and transforms all positions to uppercase if the sequence is an amino acid (AA) or transforms all positions to lowercase if the sequence is a nucleic acid (NA) sequence, leaving the original sequence modified
For example
test_seq = Bio::Sequence::NA.new(“ACTG”)
puts test_seq.normalize! #=>
actg
-
Concatenating
Many times we want to append a new sequence or a set of bases/residues eg a poly A sequence to the end of a new sequence and modify the original sequence. This is achieved by the concat method.
It is also referred to as << method.test_seq = Bio::Sequence::NA.new(“actg”)
test_seq << “acagat”
test_seq concat “acagat”
puts test_seq #=>
actgacagat
Note that to create a new sequence that adds to an existing sequence without altering the original sequence you would use the + operator. It accepts a variable number of arguments. For example
test_seq = Bio::Sequence::NA.new(“actg”)
test_seq2 = test_seq + (“cttcccttttt” “tatatata”)
puts test_seq2 #=>
actgcttcccttttttatatata
puts test_seq #=>actg
Working with subsequences
Please note that biological sequence numbering convections are one based as opposed to ruby’s zero based. Biological coordinate’s convection for BioSQL and Chado is zero based.
-
Subseq
This method returns a new sequence containing the subsequence identified by the start and end values given as parameters. This method works in a similar way to the slice string method. For example
my_seq = Bio::Sequence::NA.new(“agggatttc”)
puts my_seq.subseq(2,5) #=>
ggga
The first argument denotes the start and the second argument denotes the end of the subsequence. Both arguments must be positive integers
When this method is used without arguments, the start defaults to 1 and the end defaults to the last element of the string. Therefore when subseq is called without any arguments, it returns a new sequence similar to the original sequence.
puts my_seq.subseq #=> agggatttc
-
window_search
This method is typically used with a block. The method is called if you wanted to step through a sequence given a length of a subsequence. Therefore the method accepts two arguments. Step_size which defines the size of your ’steps’ and the window_size which defines the length of the stepping subsequence. Any remaining sequence at the terminal end will be returned. The default step size is one since its an optional argument.
For example
To print the average GC% on each 100bp you can write,
s.window_search(100) do |subseq|
puts subseq.gc
end


