Homework 1, due November 10:
1. Horses and donkeys have different numbers of chromosomes. Occasionally they mate and produce offspring (mules, or rarely, hinnies). Why are those offspring almost always sterile, given what you know about meiosis? (5 points)
2. There are roughly 1591 Myc target genes in the human genome (estimate 21,000 genes total). There are about 100 cell cycle genes in the genome. What is the expected number of cell cycle genes that are Myc targets? (5 points)
3. The distribution of nucleotide frequencies in two DNA sequences is: p(A) = 0.2, p(C) = 0.3, p(G) = 0.4, p(T) = 0.1. If these two sequences are independent, and have the same length, what is the probability that when they are lined up residue by residue, starting at the first nucleotide of each and using no gaps, there is a match at any randomly chosen location? (8 points)
4. In a CpG island, the probability of observing a CG dinucleotide is about 0.08 (lots of vague assumptions here). Given that you are in a CpG island, what is p(CGCGTTACCG), given that other dinucleotide frequencies are evenly distributed (not exactly accurate)? (5 points) If you observe CGCGTTACCG, what is the probability that you are in a CpG island, if about 2% of the genome is covered by CpG islands, and in the non-CpG genome, CG dinucleotide frequency is 0.04 and the others are evenly distributed? (5 points)
6. Write an algorithm, 10 lines or more, in pseudocode or computer code, describing something that you do every day. Include at least one "for" or "while" statement. (5 points)
7. Given two sequences, of lengths n and m, show that there are (n+m)Cn ways to intercalate the sequences into a single long sequence, preserving the order of the symbols in each. Show that this is the same as creating all possible gapped alignments between the two sequences. (7 points) If you can demonstrate this convincingly for two sample sequences that are 5 residues long, you can get up to 5 points.
8. What happens if the parameters for a pairwise alignment are set so that the mismatches are penalized more than gaps? (5 points)
9. Manually align, globally, the two sequences CCGTA and TCCAA, using match=3, mismatch=-1, gap = -2. Show the initial matrix with pointers as well as the traceback, with the final alignment(s) and score(s). (10 points)
1. Horses and donkeys have different numbers of chromosomes. Occasionally they mate and produce offspring (mules, or rarely, hinnies). Why are those offspring almost always sterile, given what you know about meiosis? (5 points)
2. There are roughly 1591 Myc target genes in the human genome (estimate 21,000 genes total). There are about 100 cell cycle genes in the genome. What is the expected number of cell cycle genes that are Myc targets? (5 points)
3. The distribution of nucleotide frequencies in two DNA sequences is: p(A) = 0.2, p(C) = 0.3, p(G) = 0.4, p(T) = 0.1. If these two sequences are independent, and have the same length, what is the probability that when they are lined up residue by residue, starting at the first nucleotide of each and using no gaps, there is a match at any randomly chosen location? (8 points)
4. In a CpG island, the probability of observing a CG dinucleotide is about 0.08 (lots of vague assumptions here). Given that you are in a CpG island, what is p(CGCGTTACCG), given that other dinucleotide frequencies are evenly distributed (not exactly accurate)? (5 points) If you observe CGCGTTACCG, what is the probability that you are in a CpG island, if about 2% of the genome is covered by CpG islands, and in the non-CpG genome, CG dinucleotide frequency is 0.04 and the others are evenly distributed? (5 points)
6. Write an algorithm, 10 lines or more, in pseudocode or computer code, describing something that you do every day. Include at least one "for" or "while" statement. (5 points)
7. Given two sequences, of lengths n and m, show that there are (n+m)Cn ways to intercalate the sequences into a single long sequence, preserving the order of the symbols in each. Show that this is the same as creating all possible gapped alignments between the two sequences. (7 points) If you can demonstrate this convincingly for two sample sequences that are 5 residues long, you can get up to 5 points.
8. What happens if the parameters for a pairwise alignment are set so that the mismatches are penalized more than gaps? (5 points)
9. Manually align, globally, the two sequences CCGTA and TCCAA, using match=3, mismatch=-1, gap = -2. Show the initial matrix with pointers as well as the traceback, with the final alignment(s) and score(s). (10 points)