Getting Started

Install uv. Then add all the dependencies by running uv sync and activate the virtual environment with source .venv/bin/activate.

Test Cases

Simulator

Run python src/generate_test_graphs.py to generate test cases of differing distributions and save them to test_graphs/generated. It's possible, but not necessary, to specify custom parameters for the graphs that should be generated here. They are documented in the main function of the script.

Testing

For convenience, you can run the script run_tests.sh. It first calculates vertex clique cover numbers for all graphs in a path that can be specified (default: test_graphs/generated/perturbed) via the src/add_vertex_clique_cover_number.py script, or chromatic numbers via src/add_chromatic_number.py script if you pass the --chromatic-number flag. Note that the current ground truth calculations rely on the reduced ILP algorithms since they are faster and accurate according to our tests. It then runs through the tests you specify as command line arguments, or all available tests if none were specified. Adding ground truths relies on the Gurobi solver, so you'll need a license for larger graphs. You can also deviate from default path by running with a positional argument like this: run_tests.sh test_graphs/curated.

Warning

The --cluster-editing and --reduced-cluster-editing flags in the following example drastically increase the time it takes for all algorithms to finish.

Example Usage:

  ./run_tests.sh --ilp --reduced-ilp --chalupa --reduced-chalupa --minimal-cluster-editing --minimal-reduced-cluster-editing --cluster-editing --reduced-cluster-editing --chromatic-number test_graphs/generated/perturbed

This will run all the specified algorithms, which are made available through the src/wrappers.py script, on the specified test graph directory and save the results in results/raw/. These results are human-readable and already contain interesting information.

Note

We decided to include two different approaches to the cluster editing ILP and reductions since neither works perfectly. The minimal approach implements only triangle inequalities and has achieved similar performance to the other approach, but the edge cut kernelization conditions are too strict to be passed by any of our test files (especially that 2δ(v) + γ(N[v]) < |N[v]|). The other cluster editing approach is more complex and employs reductions based on Böcker and the cutting plane approach based on Grötschel, but it is currently much slower.

Testing on Rfam data

You can use the script src/wp5_rfam_to_graph.py to convert a .tsv file containing rfam data into a weighted graph. Pass as positional argument the path to the rfam file (default: test_graphs/rfam/RF02246.tsv) and as option the -m or --method flag to choose the method used to convert from dissimilarity to weight (linear, threshold or exp, default: linear).

Analysis

To compare results of different algorithms, you can run python src/comparison.py <results-file1> <results-file2> ... with two or more of these results files. This will output analysis plots in the results/analyses directory.

Curation

Curated test cases were found at houseofgraphs.org. Each .txt file contains the graph structure as well as their invariants. Feel free to add more test cases (choose Invariant values as file format)!

Architecture

File Structure

Files in src/algorithms pertain to algorithms, chalupa.py and helpers.py for the Chalupa heuristic, cluster_editing.py for the cluster editing problem, ilp_solver.py for the vertex clique cover (VCC) problem. The src/reductions directory contains code for reductions pertaining to the VCC problem. The cluster editing reductions are currently contained in the src/algorithms/cluster_editing.py file.

The files in the top-level src/ directory relate to the workflow infrastructure. The files src/test_cluster_editing.py and src/test_reductions.py were used during development to verify the cluster editing and VCC reduction algorithms. src/algorithms/helpers.py contains some convenience functions for test infrastructure.

Current State of Progress

Work Program

WP0

Write a simulator for test cases by starting from a disjoint union of cliques. Then introduce perturbations by removing edges from within the cliques and adding edges between them. Different distributions of cliques sizes should be generated, from uniform (all the same size) to skew distributions with few large and many small ones.

WP1 Exact versus heuristic solutions:

(1.a) Implement Chalupa’s heuristic algorithm (see [4, 8]).

(1.b) Implement the ILP for vertex clique coloring and compare the quality of solutions. What are practical limits on size and “perturbation strength” for the ILP solution?

(1.c) Compare the Chalupa to the ILP solutions. How often does the heuristic produce the exact solutions for larger instances. How well performs the heuristic on average?

WP2 Kernelizations for the vertex clique cover problem

(2.a) Implement the different reduction algorithms and test them thoroughly.

(2.b) Use the following workflow to get exact solutions: (i) Estimate an upper bound k on the clique cover number θ(G). (ii) Run the kernelization/data reduction. (iii) Run the exact solution. Does that significantly increase the size/perturbation level of the instances that can be processed?

(2.c) Extend your workflow to an interactive scheme: After data reduction/kernelization run Chalupa again. If you now get a smaller estimate for k, rerun the pipeline. Does that improve running time or problem size before feeding the kernel into the ILP?

WP3 Kernelizations for the Cluster editing problem.

(3.a) Implement the kernelizations for cluster editing and test them thoroughly.

(3.b) Quantify the improvements achieved by the kernelizations on the cluster editing problem for your instances.

WP4 Comparison of Vertex Clique Cover and Cluster Editing solutions.

The two problems are conceptually similar. How good are the solutions of the cluster editing compared to vertex clique cover. To this end compare the number of clusters C(G) obtained by cluster editing with the vertex clique cover number θ(G).

Bonus

Can you think of a good heuristic to get a better solution for one problem from an exact solution of the other?

WP5 Real data.

Once everything is up and running, apply the workflows to real data from shift-alignment predictions on Rfam RNA families. This data will by supplied around May 1st by Maria Waldl.

References

[1] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Mach. Learn., 56:89–113, 2004. [2] S. Böcker and J. Baumbach. Cluster editing. In P. Bonizzoni, V. Brattka, and B. Löwe, editors, The Nature of Computation. Logic, Algorithms, Applications, pages 33–44. Springer, Berlin, Heidelberg, 2013. [3] Y. Cao and J. Chen. Cluster editing: Kernelization based on edge cuts. Algorithmica, 64:152–169, 2012. [4] D. Chalupa. Construction of near-optimal vertex clique covering for real- world networks. Computing and Informatics, 34(6):1397–1417, 2016. [5] G. Fritzsch, M. Schlegel, and P. F. Stadler. Alignments of mitochondrial genome arrangements: Applications to metazoan phylogeny. J. Theor. Biol., 240:511–520, 2006. [6] R. Shamir, R. Sharan, and D. Tsur. Cluster graph modification problems. Discrete Applied Mathematics, 144:173–182, 2004. [7] A. Stoltzfus, J. M. Logsdon Jr, J. D. Palmer, and W. F. Doolittle. Intron “sliding” and the diversity of intron positions. Proc Natl Acad Sci U S A, 94:10739–10744, 1997. [8] D. Strash and L. Thompson. Effective data reduction for the vertex clique cover problem. In 2022 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), pages 41–53, 2022. [9] M. Waldl, S. Will, M. Wolfinger, I. L. Hofacker, and P. F. Stadler. Bi- alignments as models of incongruent evolution of RNA sequence and secondary structure. In P. Cazzaniga, D. Besozzi, I. Merelli, and L. Manzoni, editors, Computational Intelligence Methods for Bioinformatics and Biostatistics, 16th International Meeting, CIBB’19, volume 12313 of Lect. Notes Comp. Sci., pages 159–170, Cham, CH, 2020. Springer Nature

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
results/public/raw		results/public/raw
src		src
test_graphs		test_graphs
.gitignore		.gitignore
.python-version		.python-version
Handbuch.md		Handbuch.md
README.md		README.md
pyproject.toml		pyproject.toml
run_tests.sh		run_tests.sh
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

Test Cases

Simulator

Testing

Testing on Rfam data

Analysis

Curation

Architecture

File Structure

Current State of Progress

Work Program

WP0

WP1 Exact versus heuristic solutions:

WP2 Kernelizations for the vertex clique cover problem

WP3 Kernelizations for the Cluster editing problem.

WP4 Comparison of Vertex Clique Cover and Cluster Editing solutions.

Bonus

WP5 Real data.

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Test Cases

Simulator

Testing

Testing on Rfam data

Analysis

Curation

Architecture

File Structure

Current State of Progress

Work Program

WP0

WP1 Exact versus heuristic solutions:

WP2 Kernelizations for the vertex clique cover problem

WP3 Kernelizations for the Cluster editing problem.

WP4 Comparison of Vertex Clique Cover and Cluster Editing solutions.

Bonus

WP5 Real data.

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages