Intro to bioinformatics.
- Implemented Eulerian path algorithm for spectral assembly allowing for k-mer multiplicity.
- Constructed and tested
euler_assemble_multifor superstring assembly. - Assembled the SARS-CoV-2 genome from reads, analyzed k-mer histograms, and identified variant.
- Explored greedy assembly and correctness for read sets.
- Provided overlap graphs, edge orders, and analysis.
- Completed conceptual questions about greedy and spectral assembly paradigms.
- Developed and tested
overlap_alignfor optimal pairwise overlap alignment with affine gap penalties. - Calculated overlap graphs from real SARS-CoV-2 Nanopore reads and determined read ordering.
- Performed hand-computed global and local alignments for provided sequences.
- Modeled read overlap probabilities using a probabilistic framework and computed likelihoods and posteriors.
- Included code, worked examples, and matrix visualizations for alignments.
- Implemented UPGMA alignment order strategy, handling tie-breaking.
- Constructed guide trees for progressive multiple alignments of SARS-CoV-2 spike proteins.
- Used substitution matrices (e.g., BLOSUM62) and pairwise edit distances.
- Performed multiple alignment, visualized domains, and analyzed evolutionary changes.
- Explored star alignment strategies and sum-of-pairs scoring.
- Analyzed DNA substitution matrix scenarios for zero, positive, and negative entries.
- Implemented tree likelihood calculations using dynamic programming.
- Constructed and applied a PhyloHMM to detect recombination in SARS-CoV-2 spike gene alignments, analyzed Viterbi paths.
- Compared HMM predictions under different transition probabilities.
- Manually performed branch-and-bound for unweighted parsimony on five taxa, documented queue states and tree count comparisons.
- Estimated Markov chain parameters and likelihoods for DNA sequence data using uniform, MLE, and Laplace approaches.
- Estimating HMM parameters with Viterbi training
- K-means clustering
- Forward and Backward algorithm
- The Baum–Welch algorithm
- Gaussian mixture model-based clustering
- Bottom-up hierarchical clustering
- determining causal relationships
- scoring Bayesian networks with model evidence