Skip to content

Comments

Fix splicing hang, speed CI#8911

Open
rustyrussell wants to merge 8 commits intoElementsProject:masterfrom
rustyrussell:guilt/flakes31
Open

Fix splicing hang, speed CI#8911
rustyrussell wants to merge 8 commits intoElementsProject:masterfrom
rustyrussell:guilt/flakes31

Conversation

@rustyrussell
Copy link
Contributor

test_splice_rbf kept hanging. Turns out one side was stopping sending before a commitment_signed, resulting in a hang. I clarified the logic around when we can send STFU, and when we should defer new actions, and now it's both simpler to understand and doesn't hang.

This started as tracking down a test flake, so it has another unrelated addition to track down a trace test flake, too.

@rustyrussell rustyrussell requested a review from ddustin February 24, 2026 00:48
Copy link
Collaborator

@ddustin ddustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Makes total sense to have a new state. Feels like a cleaner approach on top of fixing the bug.

ACK 26ea5ef

@rustyrussell rustyrussell force-pushed the guilt/flakes31 branch 2 times, most recently from 7c72d0c to 309d453 Compare February 24, 2026 01:24
@rustyrussell rustyrussell enabled auto-merge (rebase) February 24, 2026 01:24
I got a flake, but all I see is:
```

>           assert suspended == set()
E           AssertionError: assert {'c26485f2839c5a27'} == set()
E             
E             Extra items in the left set:
E             'c26485f2839c5a27'
E             
E             Full diff:
E             - set()
E             + {
E             +     'c26485f2839c5a27',
E             + }

tests/test_misc.py:5049: AssertionError
```

Add some more diagnostics, but it's also clear that something is being
suspended and not terminating before we finish.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is useful for splicing: given an HTLC state, do we need to send
more messages to get it into a non-pending state?

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And remove `uncommitted_ok` flag which was always false.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Otherwise, we can hang: we don't send commitment_signed, and they're
waiting to receive it.

1. We defer fee updates, blockheight updates and master requests
   (adding and closing htlcs) if we're *trying* or *started* to quiesce.
2. We only stop actually sending commitment_signed if we have sent
   STFU.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-EXPERIMENTAL: Protocol: avoid an occasional hang when splicing with a pending closing HTLC.
It's timing out after 2 hours sometimes: this now make it finish in 53
minutes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…f random delays.

Sometimes this times out after 30 minutes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell force-pushed the guilt/flakes31 branch 4 times, most recently from 1b28117 to a773f65 Compare February 25, 2026 03:45
Usually downloading and installing takes 90 seconds.  But sometimes it
takes an hour!  Use caching for this, to keep it consistent.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
.github/scripts/setup.sh does this already, *and* it uses the cache now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell changed the title Fix splicing hang Fix splicing hang, speed CI Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants