Fix memory issue#80
Conversation
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull Request Overview
This PR addresses memory issues and cleans up the repository by removing legacy code, deprecated functionality, and redundant test files. The changes streamline the codebase to focus on the current CSR-based implementation with efficient memory usage.
Key changes include:
- Complete removal of legacy LaplacianNB implementation and related tests
- Simplified fingerprint utilities focused on CSR matrix conversion
- Updated core implementation to use the working bayes.py module
- Removal of deprecated sklearn integration features and extensive test suites
Reviewed Changes
Copilot reviewed 28 out of 37 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_sklearn_integration.py | Removed comprehensive sklearn integration test suite (519 lines) |
| tests/test_main_imports.py | Removed import validation tests for deprecated modules |
| tests/test_laplacian_nb_compatibility.py | Removed compatibility tests between old and new implementations |
| tests/test_fingerprint_utils.py | Removed extensive fingerprint utility tests (311 lines) |
| tests/test_fingerprint_csr_conversion.py | Added focused CSR conversion test for current implementation |
| tests/test_deprecation.py | Removed deprecation warning tests for legacy modules |
| tests/test_complete_deprecation.py | Removed complete deprecation migration tests |
| tests/test_bayes_compatibility.py | Removed legacy bayes compatibility tests |
| tests/test_bayes.py | Updated to use new CSR-based implementation and rdkit_to_csr utility |
| src/laplaciannb/legacy/init.py | Removed legacy module initialization and deprecation warnings |
| src/laplaciannb/legacy/LaplacianNB.py | Removed deprecation warnings from legacy implementation |
| src/laplaciannb/fingerprint_utils.py | Simplified to focus on rdkit_to_csr conversion with memory-efficient implementation |
| src/laplaciannb/bayes.py | Added current LaplacianNB implementation from legacy module |
| src/laplaciannb/init.py | Updated to import from bayes.py and simplified exports |
| src/laplaciannb/LaplacianNB_new.py | Removed sklearn-compatible implementation file |
| src/laplaciannb/LaplacianNB.py | Removed main LaplacianNB module wrapper |
| pyproject.toml | Added tqdm dependency for progress reporting |
| examples/sklearn_integration_tutorial.ipynb | Removed comprehensive sklearn tutorial notebook |
| examples/simple_example.py | Added focused example demonstrating CSR conversion and index mapping |
| examples/integration_example.py | Removed complex integration example with deprecated utilities |
| examples/benchmark_large_scale.py | Added performance benchmark for large-scale fingerprint conversion |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| # Basic checks | ||
| assert csr_matrix_result.shape[0] == len(smiles) | ||
| assert csr_matrix_result.shape[1] == 2**32 |
There was a problem hiding this comment.
Creating a matrix with 2^32 columns (4 billion) could cause significant memory issues and performance problems. Consider using a smaller test size or mocking this dimension for unit tests.
| assert csr_matrix_result.shape[1] == 2**32 | |
| # Instead of asserting 2**32 columns, check that the number of columns is reasonable (e.g., >= 1024) | |
| assert csr_matrix_result.shape[1] >= 1024 |
| # Performance summary | ||
| conversion_time = time.time() - start_time | ||
| sparsity = 1 - matrix.nnz / matrix.size if matrix.size > 0 else 0 | ||
|
|
There was a problem hiding this comment.
Creating a CSR matrix with 2^32 columns uses significant memory even when sparse. Consider if this large column space is necessary or if a more memory-efficient approach could be used for typical use cases.
| if uint32_index >= 2**31: | ||
| return int(uint32_index) - 2**32 | ||
| else: | ||
| return int(uint32_index) |
There was a problem hiding this comment.
This bit manipulation logic should be moved to a utility module rather than being defined in an example script, especially since it's used for index conversion which seems like core functionality.
Working CSR input matrix, cleaned up repo, resolving memory problems