Enable CoW by syntactically · Pull Request #1229 · hyperlight-dev/hyperlight

syntactically · 2026-02-11T18:28:02Z

There are still a number of things that need to be done before the performance improvements from CoW will actually be apparent (e.g. making it possible to mmap the snapshot mapping readonly directly from the page cache, rather than copy it into a SharedMemory).

Previously, these were inconsistent. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

These values could conflict with those in the SandboxConfiguration also passed in. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

These are, for now, still set up at certain distinguished addresses (at the beginning of the region) that the host and the guest can both compute. In the future, hopefully all of this will be replaced with a more flexible set up based on the ring-buffer I/O design doc. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

In 1f94fb3, some odd behaviour with MSHV and MAP_PRIVATE mappings was noticed. It turns out that this was actually due to the hypervisor pinning the old page structures, which meant that `madvise(MADV_DONTNEED)` resulted in the userspace view of the memory in question becoming completely divorced from the hypervisor view. Switching (back) to MAP_SHARED "fixed" this only because it meant that zeroing was actually ineffective for both the host and the guest (so at least host writes were reflected in the guest): `madvise(2)` notes that shared anonymous mappings will have their contents repopulated on access after an `MADV_DONTNEED`. This commit switches back to `MAP_PRIVATE` (so that the optimisation is correct on KVM, where it works) and disables the optimisation on MSHV, where the scratch region will always be zeroed by writing zeroes to it. The original intent of lazily zeroing/populating the memory will likely only be possible on MSHV with kernel/hypervisor changes for support. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

In preparation for the snapshot becoming immutable, this gets rid of on of the last places that assume identity mapping & require early memory writes in the sandbox. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

There are still a few changes needed to be able to ensure that the host doesn't try to write to the snapshot region before it can be mmap'd readonly into the guest. However, the guest no longer tries to write to the snapshot region. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

This is necessary for now on amd64, where processor page table walks of the guest page tables are treated as writes w.r.t. to the second-level page tables, meaning that no guest page table can ever be in the (mapped readonly at stage 2) snapshot region. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

This is still somewhat broken (see #721), since it relies on the flush_tlb function ending up in the big area around the code that is still identity mapped. So, this code should probably move somewhere else. However, this fixes a very significant bug in the prior code, where the flush_tlb() function called on guest function dispatch could return to the wrong address, due to the correct return address having been pushed onto the wrong page due to a stale TLB entry. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

Certain page-table updates that do not require TLB invalidation were previously missing any kind of fence at all to keep table walks coherent. This commit introduces a new barrier for this scenario. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

ludfjig

LGTM. No feedback other than some nits, feel free to ignore

ludfjig · 2026-02-12T20:06:24Z

src/hyperlight_host/src/error.rs


+    /// The memory request is too small to contain everything that is
+    /// required
+    #[error("Memory requested {0} exceeds maximum size allowed {1}")]


nit: this attribute has an incorrect message

ludfjig · 2026-02-12T20:07:50Z

src/hyperlight_host/src/mem/mgr.rs

+        self.shared_mem.with_exclusivity(|snap| {
+            self.scratch_mem.with_exclusivity(|scratch| {
+                let bytes = &snap.as_slice()[snapshot_pt_start..snapshot_pt_end];
+                scratch.copy_from_slice(bytes, self.layout.get_pt_base_scratch_offset())
+            })
+        })???;


nit: This looks a bit funky but it's probably correct

Curious what you meant here? Happy to take suggestions for a more clear way to do this. The same nested with_exclusivity pattern is used in snapshot.rs and I don't presently see a cleaner way to do it.

ludfjig · 2026-02-12T20:22:11Z

src/hyperlight_host/src/hypervisor/hyperlight_vm.rs

+    pub(crate) fn get_entrypoint(&mut self) -> NextAction {
+        self.entrypoint
+    }
+
+    /// Set the current entrypoint action
+    pub(crate) fn set_entrypoint(&mut self, entrypoint: NextAction) {
+        self.entrypoint = entrypoint
+    }
+


nit: this could take &self

andreiltd · 2026-02-13T10:05:42Z

src/hyperlight_guest_bin/src/paging.rs

        }
    }
-    fn phys_to_virt(&self, addr: u64) -> *mut u8 {
+    fn fallible_phys_to_virt(&self, addr: u64) -> Option<*mut u8> {


Suggested change

fn fallible_phys_to_virt(&self, addr: u64) -> Option<*mut u8> {

fn try_phys_to_virt(&self, addr: u64) -> Option<*mut u8> {

andreiltd

Nice! 🚀

simongdavies · 2026-02-13T17:24:43Z

src/hyperlight_host/src/mem/layout.rs

    #[instrument(skip_all, parent = Span::current(), level= "Trace")]
-    pub(crate) fn set_pt_size(&mut self, size: usize) {
-        self.pt_size = Some(size);
+    pub(crate) fn get_pt_size(&mut self) -> usize {


think this can just be get_pt_size(&self)

dblnz

LGTM! Great work!

dblnz · 2026-02-17T08:36:35Z

src/hyperlight_guest_bin/src/guest_function/call.rs

-    }
-
-    internal_dispatch_function();
-    halt();


I am not sure if this is true, but having eliminated the use of halt() here, can we remove the function also?

I'm referring to https://github.com/hyperlight-dev/hyperlight/blob/main/src/hyperlight_guest/src/exit.rs#L29.
And, if that is the case, I think this fixes #1074 , right?

dblnz · 2026-02-17T08:48:06Z

src/hyperlight_host/src/hypervisor/hyperlight_vm.rs

                Ok(VmExit::Debug { dr6, exception }) => {
+                    let initialise = match self.entrypoint {
+                        NextAction::Initialise(initialise) => initialise,
+                        _ => 0,


Nice, this gave me an idea: I think this can be used to add an option stop_on_entry for the sandbox under debug and stop on each call of a function (dispatch_function). So that it would solve the issue we have now that we only allow debugging of a newly created sandbox (we rely on the call to initialise to stop).

jsturtevant · 2026-02-17T22:47:26Z

src/hyperlight_guest_bin/src/guest_function/call.rs

-        let guest_start_tsc = hyperlight_guest_tracing::invariant_tsc::read_tsc();
-        // Reset the trace state for the new guest function call with the new start TSC
-        // This clears any existing spans/events from previous calls ensuring a clean state
-        hyperlight_guest_tracing::new_call(guest_start_tsc);


It looks like this was removed from the new implementation. Does this mean tracing might not work as expected going forward?

Yeah, you might be right there, good catch! @dblnz do you know if we can just move this into internal_dispatch_function, which is the first thing called with a normal ABI after the little entry stub?

jsturtevant · 2026-02-17T22:52:12Z

src/hyperlight_guest_bin/src/arch/amd64/exception/handle.rs

+        let target_virt = virt as *mut u8;
+        core::ptr::copy(
+            target_virt,
+            crate::paging::phys_to_virt(new_page).unwrap(),


This could cause a triple fault which might make it really hard to diagnose? Should we maybe abort with a clear error message here?

I'm not sure I see how this would cause a triple fault? If you are referring to the unwrap, then I think it will just abort? Or is the concern about something a bit second order here like a format call that allocates in the unwrap code path or something? If the latter & that is there then happy to change it.

Apart from that, I don't think we should need to worry about reading from target_virt (since by this point we have already confirmed that the original mapping of this page was a CoW mapping, so just reading from it should be fine) or writing to the new page in the physmap (since all the physmap entries are RW).

I was referring to the unwrap. Previously, when working on the handler I saw a panic combined with an alloc and it was hard to diagnose the initial error since it triggered the exception recursively till it stack overflowed. I think it would be easy enough to do a .or_else() and abort with a specific error message here?

OK, I didn't think there would necessarily be an alloc here, but perhaps there is. I don't think it's a huge deal, because this case should be impossible, but happy to make the change until we get around to proving that it is!

There may not be at this point. I think being defensive here might save us some heartache down the road if something acciendelty gets added later

jsturtevant · 2026-02-17T22:55:43Z

src/hyperlight_guest_bin/src/paging.rs

+            core::arch::asm!("
+                mov rax, cr0
+                mov cr0, rax
+            ", out("rax") _, out("rcx") _);


Suggested change

", out("rax") _, out("rcx") _);

", out("rax") _);

rcx isn't used here

Good catch, thanks, that is a holdover from a different implementation, will fix.

syntactically requested review from danbugs, dblnz, devigned, jprendes, ludfjig, marosset and simongdavies as code owners February 11, 2026 18:28

syntactically added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Feb 11, 2026

syntactically force-pushed the lm/enable-cow branch 2 times, most recently from 83cf717 to 3e8c518 Compare February 11, 2026 20:08

syntactically added 5 commits February 12, 2026 15:56

Update guest Cargo.lock files

30cbe4d

Previously, these were inconsistent. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

SandboxMemoryLayout: do not pass heap and scratch size explicitly

41dd438

These values could conflict with those in the SandboxConfiguration also passed in. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

syntactically force-pushed the lm/enable-cow branch from 9c8d8e3 to 306fac3 Compare February 12, 2026 15:58

syntactically added 5 commits February 12, 2026 17:16

amd64: Remove unused access/dirty flag updates

8d0da51

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>

syntactically force-pushed the lm/enable-cow branch from 306fac3 to c2b03ea Compare February 12, 2026 17:17

ludfjig approved these changes Feb 12, 2026

View reviewed changes

andreiltd reviewed Feb 13, 2026

View reviewed changes

andreiltd approved these changes Feb 13, 2026

View reviewed changes

simongdavies reviewed Feb 13, 2026

View reviewed changes

dblnz approved these changes Feb 17, 2026

View reviewed changes

jsturtevant reviewed Feb 17, 2026

View reviewed changes

jsturtevant mentioned this pull request Feb 18, 2026

Guest Copy on Write #1200

Open

16 tasks

	fn fallible_phys_to_virt(&self, addr: u64) -> Option<*mut u8> {
	fn try_phys_to_virt(&self, addr: u64) -> Option<*mut u8> {

Conversation

syntactically commented Feb 11, 2026

Uh oh!

ludfjig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

syntactically Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ludfjig Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreiltd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dblnz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

syntactically Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Comments

syntactically Feb 16, 2026 •

edited

Loading

ludfjig Feb 12, 2026 •

edited

Loading

syntactically Feb 18, 2026 •

edited

Loading