Skip to content

Comments

Fix/28 full rerun targeted deletion#32

Open
thompsonmj wants to merge 10 commits intodevfrom
fix/28-full-rerun-targeted-deletion
Open

Fix/28 full rerun targeted deletion#32
thompsonmj wants to merge 10 commits intodevfrom
fix/28-full-rerun-targeted-deletion

Conversation

@thompsonmj
Copy link
Contributor

Addresses #28 robustly using the manifest approach introduced in the issue.

A command-specific metadata file serving as manifest of taxonopy output files is written to the output directory. It's written before any output is produced to safeguard against partial runs. Using the --full-rerun flag now deletes only and exactly the files listed in this manifest.

The approach used a refactor to establish a single source of truth for output file naming, tests covering the manifest life cycle and behavior under various conditions, and updated docs.

thompsonmj and others added 5 commits February 20, 2026 17:04
Extract output file path logic into a single source of truth so that
naming convention changes need only be made in one place. No behavior
change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nifest

Before writing any output, each command writes a command-scoped manifest
listing every file it intends to produce. On --full-rerun, only the files
listed in that manifest are deleted — unrelated files in the output directory
are never touched. Interrupted runs are handled cleanly since the manifest
is written before any output files exist.

The existing-output guard now checks for the manifest first, with a legacy
glob fallback for output produced before this change. If no manifest is
found on --full-rerun, a warning is logged and no files are removed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover manifest filename constants, intended-file computation for both
commands, write/read/delete lifecycle, tolerance of missing files,
non-TaxonoPy file safety, command scoping, and the write-before-output
ordering guarantee.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a dedicated Reruns page under the IO section covering the
existing-output guard, what --full-rerun touches and does not touch,
the manifest schema, and no-manifest behavior. Update surrounding pages
and the quick reference guide to link to it. Update AGENTS.md to reflect
the new module and --full-rerun semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the [AI-assisted session] footer with a Co-Authored-By trailer
identifying the model and provider, consistent with standard git
co-authorship convention and applicable across any AI assistant.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thompsonmj thompsonmj marked this pull request as draft February 20, 2026 22:26
@thompsonmj thompsonmj changed the base branch from main to dev February 24, 2026 18:43
@thompsonmj thompsonmj marked this pull request as ready for review February 24, 2026 18:43
@thompsonmj thompsonmj requested a review from Copilot February 24, 2026 18:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #28 by implementing a targeted deletion mechanism for the --full-rerun flag. Instead of aggressively deleting entire output directories, TaxonoPy now tracks its output files using command-specific manifest files and deletes only those files during a full rerun.

Changes:

  • Introduced a manifest tracking system that records all intended output files before they are created
  • Refactored output file naming to use a single source of truth in output_manager.py
  • Updated --full-rerun behavior to delete only files listed in the manifest, preserving non-TaxonoPy files
  • Added comprehensive test coverage for manifest lifecycle and deletion behavior
  • Documented the new rerun mechanism and manifest schema

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/taxonopy/manifest.py New module implementing manifest read/write/delete operations and file listing for both resolve and common-names commands
src/taxonopy/output_manager.py Refactored to establish _resolve_output_paths_for_input as single source of truth for output naming, with compute_output_paths for manifest generation
src/taxonopy/cli.py Integrated manifest writing before output generation and updated --full-rerun to use manifest-based deletion instead of directory removal
src/taxonopy/resolve_common_names.py Added manifest writing before common-names output processing
tests/test_full_rerun.py Comprehensive test suite covering manifest operations, file deletion, subdirectory handling, and edge cases
docs/user-guide/io/reruns.md New documentation explaining the guard mechanism, --full-rerun behavior, and manifest schema
docs/user-guide/io/output.md Updated to mention manifest files as part of normal output
docs/user-guide/io/cache.md Updated --full-rerun description to clarify targeted deletion behavior
docs/user-guide/quick-reference.md Added notes about manifest files and rerun behavior
mkdocs.yml Added reruns documentation to navigation
AGENTS.md Updated with manifest behavior details and commit message conventions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

thompsonmj and others added 5 commits February 24, 2026 14:32
…ctory

Canonicalize each path before the containment check so that symlinks
and .. sequences cannot be used to target files outside the output
directory. Also replace the exists()+unlink() pair with unlink(missing_ok=True).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover non-string entries, path traversal, absolute paths outside the
output directory, and symlink escape. Symlink test skips gracefully
when symlink creation is not permitted by the OS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A JSONDecodeError or OSError from read_manifest previously surfaced as
a generic 'unexpected error' with no guidance. Now the error names the
file, explains that automated cleanup cannot proceed, and tells the user
to fix or delete the manifest or use a new output directory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant