Genomics (Ecol 553) Computational Lab
Week 8: Oct 12, 2006.
Course webpage: http://genomics.arizona.edu/553/computation
Homework6
To be completed by noon on Thursday, Oct 19.
On amadeus, create a directory called ~/homework/homework6. Place all programs described below into this directory.
1) Write a program, called blast_smallest_pct.pl, which does the following:
*
Reads in the blast
result file at
/tmp/week8/blast_result.out
* Prints out the smallest percent identity among of all hits.
2) Write a program, called k_biggest.pl, which does the following:
* Takes a number as an argument (I'll call that number k);
* Reads the list of numbers from the file /tmp/week8/numbers.txt;
* Prints the k largest numbers in that file.
Sample program
outputs:
> perl k_biggest.pl 4
2343, 1126, 996, 978
(i.e. the 4 biggest numbers
found in numbers.txt should be printed.)
>
perl k_biggest.pl 50
2343, 1126, 996, 978, 956, 926, 911, 906, 878, 868 ...
(i.e. the 50 biggest numbers found in numbers.txt should be printed.)3) Write a program, called highest_gc.pl, which does the following:
* Opens the file /tmp/week8/gene_composition.csv
This is a comma-delimited file. Each row represents a gene, and contains the fields (in this order) :
* name
* number of adenines
* number of cytosines
* number of guanines* number of thymines
* For each row, calculate the gc% - that's (#gs + #cs) /(#gs + #cs + #as + #ts)
* Keep track of the
highest gc%. Print the name and gc% of the gene with the highest gc%
4) Write a program, called highest_k_gc.pl, which does the following:
* Uses the same file as in #3
* Takes a number as an argument (I'll call that number k)
* Prints out the top
k gc%s (don't worry about listing the associated gene names)
Hints:
This will require that you create an array to hold the
gc%s, and add an entry to that array for each line in the input.
You'll need to use push (or unshift) for this purpose.
Details about push are found on pages 100-102 (chapter 3) of the text (and
will be discussed in Tuesday's class).
Once you have the entire array filled in, the problem is
roughly the same as #2.
5) Write a program, called fetch_seqs.pl, which does the following:
* Read in the contents of /tmp/week8/misc_genes.fa, storing it in a hash with the name as the key and sequence as the value, e.g.:
>14_2dorA
LSVKLPGLNLKNPIMPASGCFGFGKEYSEY
...
would be stored in the hash as:
$genes{14_2dorA} = "LSVKLPGLNLKNPIMPASGCFGFGKEYSEY ..."
*Accept a list of
arguments, where each argument is a gene name.
* For each gene name argument, print out the gene name and the first 20 characters of the corresponding sequence.
(hashes are covered in pages 104-109 in chapter 3 of the reading, and will be discussed on Tuesday)