Skip to content

Speed up compilation unit export and fix related CU range matching#360

Open
Raincarnator wants to merge 3 commits into
joxeankoret:masterfrom
Raincarnator:optimize-compilation-units
Open

Speed up compilation unit export and fix related CU range matching#360
Raincarnator wants to merge 3 commits into
joxeankoret:masterfrom
Raincarnator:optimize-compilation-units

Conversation

@Raincarnator

Copy link
Copy Markdown

Summary

This PR improves compilation unit handling in two places:

  1. Speeds up compilation unit export.
  2. Fixes the address range used by the related compilation unit diff heuristic.

Changes

Speed up compilation unit export

The exporter previously matched functions to LFA compilation-unit ranges by repeatedly scanning all cached functions for each module. On large databases this becomes very expensive, e.g. thousands of modules times hundreds of thousands of functions.

This PR builds sorted function-address arrays and uses binary search to find the functions inside each module range. It also batches the compilation_unit_functions and source_file updates with executemany().

The source-string-to-module lookup was also changed to use sorted LFA module ranges instead of linearly scanning every module for each source-string function.

This keeps the exported compilation unit data equivalent, but avoids the large nested scans.

Fix related compilation unit range matching

find_related_compilation_unit() selected both start_ea and end_ea, but assigned the end variables from start_ea:

main_end_ea = main_row["start_ea"]
diff_end_ea = diff_row["start_ea"]

That made the later between start and end query only cover the start address, so the heuristic was effectively restricted to a single address instead of the compilation unit range.

This PR changes those assignments to use end_ea, matching the selected columns and the intended range-based heuristic.

Verification

  • Ran python -m py_compile diaphora.py diaphora_ida.py.

@Raincarnator

Copy link
Copy Markdown
Author

Note: this PR includes one export-time optimization commit from #359 , because this branch is based on that work.

Please review #359 first, or only consider the last two commits in this PR. @joxeankoret

@joxeankoret

Copy link
Copy Markdown
Owner

Thank you very much! It will take me a while to review it, but looks robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants