Fall, 2009

COMP/MATH 452

Bioinformatics

Nick Peters

The idea behind exon chaining is that we are using previously sequenced genes as the template for recognizing unknown genes. These sequenced genes are given to us as a set of intervals which have already been weighted for us and may overlap. Given this set of intervals, this program is supposed to figure out which subset of nonoverlapping intervals gives us the closest match to the unknown genes. This is done by finding the subset of intervals that maximizes the given weight.

For this assignment the input was the set of weighted intervals. To represent the set, I used a list that consisted of 3-tuples. The first element of each tuple is the start of the interval, followed by the end of the interval, and lastly is the weight of the interval. From this set I constructed a graph. Each vertex was connected to it's neighbor to the right with a weight of 0. So vertex 1 was connected to vertex 2 (but not the other way around), vertex n was connected to vertex n + 1, etc (See page 202 to see what the graph looks like). Additional edges were defined by the set of weighted intervals. The start element connected the start vertex to the end vertex with the given weight as the weight for the edge.

Once the graph was setup, the ExonChaining algorithm finds the longest path from the first vertex to the last vertex. The length of the longest path for a given vertex is stored in an list called s, which allows us to use dynamic programming. With that said, by the time we get to the final node, on a single iteration through all of the nodes, we know what the weight of the longest path is. This value is the answer to the problem.

To test my program, I used two different sets. The first set I found online when trying to understand the problem. I first wrote it out on paper and then wrote my program and tests based on it. The second set is the one given in the book on page 202. One thing that I would like to note is that there is actually a mistake on the graph in the book. It says the weight for node 18 is 20, but my program said that it was actually 21. Upon examination of the graph on page 202, I saw that on node 16 the current weight was 17 and then it jumped to node 18 with a weight of 4. 17 + 4 = 21, not 20.

Cases