Open
Conversation
Extract output file path logic into a single source of truth so that naming convention changes need only be made in one place. No behavior change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nifest Before writing any output, each command writes a command-scoped manifest listing every file it intends to produce. On --full-rerun, only the files listed in that manifest are deleted — unrelated files in the output directory are never touched. Interrupted runs are handled cleanly since the manifest is written before any output files exist. The existing-output guard now checks for the manifest first, with a legacy glob fallback for output produced before this change. If no manifest is found on --full-rerun, a warning is logged and no files are removed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover manifest filename constants, intended-file computation for both commands, write/read/delete lifecycle, tolerance of missing files, non-TaxonoPy file safety, command scoping, and the write-before-output ordering guarantee. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a dedicated Reruns page under the IO section covering the existing-output guard, what --full-rerun touches and does not touch, the manifest schema, and no-manifest behavior. Update surrounding pages and the quick reference guide to link to it. Update AGENTS.md to reflect the new module and --full-rerun semantics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the [AI-assisted session] footer with a Co-Authored-By trailer identifying the model and provider, consistent with standard git co-authorship convention and applicable across any AI assistant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #28 by implementing a targeted deletion mechanism for the --full-rerun flag. Instead of aggressively deleting entire output directories, TaxonoPy now tracks its output files using command-specific manifest files and deletes only those files during a full rerun.
Changes:
- Introduced a manifest tracking system that records all intended output files before they are created
- Refactored output file naming to use a single source of truth in
output_manager.py - Updated
--full-rerunbehavior to delete only files listed in the manifest, preserving non-TaxonoPy files - Added comprehensive test coverage for manifest lifecycle and deletion behavior
- Documented the new rerun mechanism and manifest schema
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/taxonopy/manifest.py |
New module implementing manifest read/write/delete operations and file listing for both resolve and common-names commands |
src/taxonopy/output_manager.py |
Refactored to establish _resolve_output_paths_for_input as single source of truth for output naming, with compute_output_paths for manifest generation |
src/taxonopy/cli.py |
Integrated manifest writing before output generation and updated --full-rerun to use manifest-based deletion instead of directory removal |
src/taxonopy/resolve_common_names.py |
Added manifest writing before common-names output processing |
tests/test_full_rerun.py |
Comprehensive test suite covering manifest operations, file deletion, subdirectory handling, and edge cases |
docs/user-guide/io/reruns.md |
New documentation explaining the guard mechanism, --full-rerun behavior, and manifest schema |
docs/user-guide/io/output.md |
Updated to mention manifest files as part of normal output |
docs/user-guide/io/cache.md |
Updated --full-rerun description to clarify targeted deletion behavior |
docs/user-guide/quick-reference.md |
Added notes about manifest files and rerun behavior |
mkdocs.yml |
Added reruns documentation to navigation |
AGENTS.md |
Updated with manifest behavior details and commit message conventions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ctory Canonicalize each path before the containment check so that symlinks and .. sequences cannot be used to target files outside the output directory. Also replace the exists()+unlink() pair with unlink(missing_ok=True). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover non-string entries, path traversal, absolute paths outside the output directory, and symlink escape. Symlink test skips gracefully when symlink creation is not permitted by the OS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A JSONDecodeError or OSError from read_manifest previously surfaced as a generic 'unexpected error' with no guidance. Now the error names the file, explains that automated cleanup cannot proceed, and tells the user to fix or delete the manifest or use a new output directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses #28 robustly using the manifest approach introduced in the issue.
A command-specific metadata file serving as manifest of taxonopy output files is written to the output directory. It's written before any output is produced to safeguard against partial runs. Using the
--full-rerunflag now deletes only and exactly the files listed in this manifest.The approach used a refactor to establish a single source of truth for output file naming, tests covering the manifest life cycle and behavior under various conditions, and updated docs.