What’s new in Bioruby edge

Changes to Bio::Blast

Naohisa Goto has announced changes to the Bio::Blast.reports to support default -m 0 and tabular -m 8
formats in addition to XML (-m 7) form. I think this is really nice and convenient!

Previously it meant that for bioruby to parse a Blast file, you had to have your blast results in XML output which Bio::Blast::Reports would understand. However, by default Blast gives an  -m0 output and without that prior knowledge you may spend hours wondering what is wrong when parsing default blast output files. With Bioruby 1.2.1 this will not work

require 'rubygems'
require 'bio'

report_file = "/home/george/esther_blast_files/blast_output2.txt"
Bio::Blast.reports(report_file) do |report|
  puts report.class
end

Unless blast_output2.txt is in XML format

In the upcoming Bioruby 1.3 the default Blast file can now be parsed, for example,

require 'rubygems'
require 'bio'
report_file = "/home/george/esther_blast_files/blast_output2.txt"
Bio::FlatFile.open(Bio::Blast::Default::Report,report_file) do |ff|
ff.each do |rep|
   puts rep.statistics
   rep.iterations.each do |itr|
      puts itr.hits.size

    itr.hits.each_with_index do |hit,i|
      puts hit.hit_id
      puts hit.len
     end
   end
 end
end

Bio::Blast.remote now supports DDBJ in addition to Genomenet. It would be a nice idea to support NCBI as well.

Changes to Bio:sequence

It is possible to create  sequence objects from Bio::GenBank, Bio::EMBL, and Bio::FastaFormat by using the to_biosequence method

gb = Bio::GenBank.new(genbank_file.gb)
gb.to_biosequence

Bio::SQL Support

Thanks to Raoul and Naohisa, support for BioSQL has been rewritten by using  ActiveRecord.

#Make a connection

connection = Bio::SQL.establish_connection(path_to_database.yaml,'development')
#list databases

databases =Bio::SQL.list_databases

#retrieve a sequence
sample_seq = Bio::SQL.fetch_accession('some_accession_number')

#get number of seqeunces in the database
puts Bio::SQL.list_entries

#get references associated with an entry
puts sample_seq.references

#create an embl format
puts sample_seq.to_biosequence.output(:embl)

Changes to Bio::GFF2 and Bio::GFF3

GFF2/GFF3 formatted texts are now supported but there will be backward portability issues with bio 1.2.1 since some incompatible changes have been incorporated.  Bio::GFF::Record.comments has been renamed to comment and comments= is now comment=

Both Bio::GFF::GFF2::Record.new and Bio::GFF::GFF3::Records.new, can now take 9 arguments that correspond to GFF columns making it easy to create a Record object directly without need for  formatted text.

Both Bio::GFF::GFF2::Record#attributes and Bio::GFF::GFF3::Record#attributes have been changed to return a nested array containing tag, value pairs, to obtain a  hash, use the to_hash method

To support data output for GFF2/GFF3, new methods have been added:  Bio::GFF::GFF2#to_s, Bio::GFF::GFF3#to_s, Bio::GFF::GFF2::Record#to_s,and Bio::GFF::GFF3::Record#to_s

Lots of other changes have been incorporated for the GFF classes and you can view the change log at github

CodeML parser

A wrapper for PAML codeml program that is used for estimating evolutinary rate has been added.  The class provides methods  for generating the necessary configuration file. The new Bio::PAML::Codeml::Report and PAML::Codeml::Rates  provides simpel classes  for accessing the codeml report and rates file.  This example is from the example given in the source code

require 'bio'
# Reads multi-fasta formatted file and gets a Bio::Alignment object.
     alignment = Bio::FlatFile.open(Bio::Alignment::MultiFastaFormat, 'example.fst').alignment
     # Reads newick tree from a file
     tree = Bio::FlatFile.open(Bio::Newick, 'example.tree').tree
  # Creates a Codeml object
  codeml = Bio::PAML::Codeml.new
     # Sets parameters
     codeml.parameters[:runmode] = 0
     codeml.parameters[:RateAncestor] = 1
     # You can also set many parameters at a time.
   codeml.parameters.update({ :alpha => 0.5, :fix_alpha => 0 })
     # Executes codeml with the alignment and the tree
     report = codeml.query(alignment, tree)

Lots of Bugs  have been fixed and also support for Ruby 1.9 has been added. Its great thanks to the bioruby developers for their time and the excellent new changes!



Standalone BLAST with Ruby

 

 

BLAST is one of the most widely used search algorithms in molecular biology. So lets see how you can run and retrieve blast results via a simple Ruby script .I will assume you already have ruby 1.8.5 and above installed in your windows box and a standalone blast.exe which you can download from the NCBI’s ftp site here . The latest windows binaries as of this writing is 2.2.17. Create a new folder in C and call it NCBI_Blast. Paste the downloaded blast program in this folder. Double click the blast program and it will create a bin, doc and data folders inside your your NCBI_Blast folder. If this is your first time to install blast in your machine. You will need to do a little configuration. Follow these instructions for setting up blast .

 

Using your favorite text editor, (I use the Netbeans IDE 6.0 with Ruby support. It is a fantastic Ruby editor and totally free) .

 

#create a query sequence
myseq=”pcaatcacatyyawwqqffgghhhkllkl”

 

#create a temporary file
require ‘tempfile’

temp=Tempfile.new(“seqfile”)
#get the name of the temporary file
name=temp.path
#append the contents your sequence to this temporary file
temp.puts “#{myseq}”
temp.close

#since we have a protein query sequence, we will run a blastp. Please note that you will need to have a valid #database to query against. use the formatdb command to create your database before executing the lines #below.
@program = ‘blastp’
#path to blast

@database = ‘c:/path_to_databasefile’
#name of your query file
@input= name

#your blast output file
@output=’c:/path_output_file’

#assume your blast is in a folder called NCBI_Blast, execute
system( “c:/NCBI_Blast/bin/blastall.exe -p #{@program} -d #{@database} -i #{@input} -o #{@output}”)

#To capture the output in a variable execute this command instead.
#note that we have omitted the blast -o parameter

result=%x(c:/NCBI_Blast/bin/blastall.exe -p #{@program} -d #{@database} -i #{@input} )

#remember to delete the temporary file!

temp.close(true)

And that all folks!