Skip to content

Fix memory issue#80

Merged
bbaranow merged 8 commits into
developfrom
fix_memory_issue
Aug 21, 2025
Merged

Fix memory issue#80
bbaranow merged 8 commits into
developfrom
fix_memory_issue

Conversation

@bbaranow

Copy link
Copy Markdown
Collaborator

Working CSR input matrix, cleaned up repo, resolving memory problems

Working CSR matrix as intended with sets. TODO: numpy array (dense matrix)
Checking sklearn integration
Cleanup
@bbaranow bbaranow requested a review from Copilot August 21, 2025 10:31
@github-actions

github-actions Bot commented Aug 21, 2025

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/laplaciannb
  __init__.py
  bayes.py 121-126, 233, 239-242, 270-275, 291
  fingerprint_utils.py 13-18, 64, 119-168, 174-197, 222-360, 365-439
Project Total  

This report was generated by python-coverage-comment-action

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses memory issues and cleans up the repository by removing legacy code, deprecated functionality, and redundant test files. The changes streamline the codebase to focus on the current CSR-based implementation with efficient memory usage.

Key changes include:

  • Complete removal of legacy LaplacianNB implementation and related tests
  • Simplified fingerprint utilities focused on CSR matrix conversion
  • Updated core implementation to use the working bayes.py module
  • Removal of deprecated sklearn integration features and extensive test suites

Reviewed Changes

Copilot reviewed 28 out of 37 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_sklearn_integration.py Removed comprehensive sklearn integration test suite (519 lines)
tests/test_main_imports.py Removed import validation tests for deprecated modules
tests/test_laplacian_nb_compatibility.py Removed compatibility tests between old and new implementations
tests/test_fingerprint_utils.py Removed extensive fingerprint utility tests (311 lines)
tests/test_fingerprint_csr_conversion.py Added focused CSR conversion test for current implementation
tests/test_deprecation.py Removed deprecation warning tests for legacy modules
tests/test_complete_deprecation.py Removed complete deprecation migration tests
tests/test_bayes_compatibility.py Removed legacy bayes compatibility tests
tests/test_bayes.py Updated to use new CSR-based implementation and rdkit_to_csr utility
src/laplaciannb/legacy/init.py Removed legacy module initialization and deprecation warnings
src/laplaciannb/legacy/LaplacianNB.py Removed deprecation warnings from legacy implementation
src/laplaciannb/fingerprint_utils.py Simplified to focus on rdkit_to_csr conversion with memory-efficient implementation
src/laplaciannb/bayes.py Added current LaplacianNB implementation from legacy module
src/laplaciannb/init.py Updated to import from bayes.py and simplified exports
src/laplaciannb/LaplacianNB_new.py Removed sklearn-compatible implementation file
src/laplaciannb/LaplacianNB.py Removed main LaplacianNB module wrapper
pyproject.toml Added tqdm dependency for progress reporting
examples/sklearn_integration_tutorial.ipynb Removed comprehensive sklearn tutorial notebook
examples/simple_example.py Added focused example demonstrating CSR conversion and index mapping
examples/integration_example.py Removed complex integration example with deprecated utilities
examples/benchmark_large_scale.py Added performance benchmark for large-scale fingerprint conversion

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


# Basic checks
assert csr_matrix_result.shape[0] == len(smiles)
assert csr_matrix_result.shape[1] == 2**32

Copilot AI Aug 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a matrix with 2^32 columns (4 billion) could cause significant memory issues and performance problems. Consider using a smaller test size or mocking this dimension for unit tests.

Suggested change
assert csr_matrix_result.shape[1] == 2**32
# Instead of asserting 2**32 columns, check that the number of columns is reasonable (e.g., >= 1024)
assert csr_matrix_result.shape[1] >= 1024

Copilot uses AI. Check for mistakes.
Comment thread src/laplaciannb/fingerprint_utils.py Outdated
# Performance summary
conversion_time = time.time() - start_time
sparsity = 1 - matrix.nnz / matrix.size if matrix.size > 0 else 0

Copilot AI Aug 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a CSR matrix with 2^32 columns uses significant memory even when sparse. Consider if this large column space is necessary or if a more memory-efficient approach could be used for typical use cases.

Copilot uses AI. Check for mistakes.
if uint32_index >= 2**31:
return int(uint32_index) - 2**32
else:
return int(uint32_index)

Copilot AI Aug 21, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit manipulation logic should be moved to a utility module rather than being defined in an example script, especially since it's used for index conversion which seems like core functionality.

Copilot uses AI. Check for mistakes.
@bbaranow bbaranow merged commit 3060898 into develop Aug 21, 2025
10 checks passed
@bbaranow bbaranow deleted the fix_memory_issue branch August 21, 2025 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants