Genomics (Ecol 553) Computational Lab
Week 10: Oct 25, 2006.
Course webpage: http://genomics.arizona.edu/553/computation
Homework7
To be completed by noon on Thursday, Nov 2.
On amadeus, create a directory called ~/homework/homework7. Place all programs described below into this directory. ** Do not make copies of the input files into your directory. **
1) Write a program, called 99_words.pl, which does the following:
*
Reads in the file at /tmp/week10/99redballoons.lyr
* Prints out a list of the words used in the song (in alphabetical order), and the number of times each was used.
2) This homework looks at the impact of file I/O
decisions. Write two programs, which do the following:
*
Take a single filename as an
argument
* Read
the file and count the number of lines in the file, then print out that
number.
The two programs are:
a) count_lines_using_cat.pl - this should use the cat method we've used up to this week [ $var = `cat $filename`; @arr = split("\n", $var) ; ], then print out the number of elements in the array.
b)
count_lines_using_fh.pl - this should use
file handles as we discussed on Tuesday [ open (FH,
"<$filename"); while( ... ) { } ], and count the number of
lines by incrementing a variable named $i.
After you've written these scripts (and are sure they work correctly), you can test their relative speeds using the unix command "time". Run these two commands:
3) Write a program, called filter_blast.pl, which does the following:
* Reads
in the file at /tmp/week10/blast_results.out
* Prints out to a
file "147.blastout" all lines that contain the string "ENSG00000198223", and
have an alignment length 147.
* Prints out to a file "343_369.blastout" all lines that contain the string "ENSG000001982", and have an alignment length 343 or 369.
4) Write a program, called parse_longform_blast.pl, which does the following:
* Reads
in the file at /tmp/week10/blast_longform.out
* For each hit associated with the query for "sequence3", print a line containing (tab-delimited):
* The name of the matching sequence
* The length, score, expectation and %-identities of the hit
* As an example, the first line of your output should look like this:
ENSG00000147027.ENSP00000275954 181 322
1e-89 86%
extra credit -
Modify parse_longform_blast.pl to loop through all the query sequences in blast_longform.out, printing out the same stats as above for each one.
Be sure to clearly delineate when the stats for one query
end and another query begin.