Conversation
83cf717 to
3e8c518
Compare
Previously, these were inconsistent. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
These values could conflict with those in the SandboxConfiguration also passed in. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
These are, for now, still set up at certain distinguished addresses (at the beginning of the region) that the host and the guest can both compute. In the future, hopefully all of this will be replaced with a more flexible set up based on the ring-buffer I/O design doc. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
In 1f94fb3, some odd behaviour with MSHV and MAP_PRIVATE mappings was noticed. It turns out that this was actually due to the hypervisor pinning the old page structures, which meant that `madvise(MADV_DONTNEED)` resulted in the userspace view of the memory in question becoming completely divorced from the hypervisor view. Switching (back) to MAP_SHARED "fixed" this only because it meant that zeroing was actually ineffective for both the host and the guest (so at least host writes were reflected in the guest): `madvise(2)` notes that shared anonymous mappings will have their contents repopulated on access after an `MADV_DONTNEED`. This commit switches back to `MAP_PRIVATE` (so that the optimisation is correct on KVM, where it works) and disables the optimisation on MSHV, where the scratch region will always be zeroed by writing zeroes to it. The original intent of lazily zeroing/populating the memory will likely only be possible on MSHV with kernel/hypervisor changes for support. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
In preparation for the snapshot becoming immutable, this gets rid of on of the last places that assume identity mapping & require early memory writes in the sandbox. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
9c8d8e3 to
306fac3
Compare
There are still a few changes needed to be able to ensure that the host doesn't try to write to the snapshot region before it can be mmap'd readonly into the guest. However, the guest no longer tries to write to the snapshot region. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This is necessary for now on amd64, where processor page table walks of the guest page tables are treated as writes w.r.t. to the second-level page tables, meaning that no guest page table can ever be in the (mapped readonly at stage 2) snapshot region. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This is still somewhat broken (see #721), since it relies on the flush_tlb function ending up in the big area around the code that is still identity mapped. So, this code should probably move somewhere else. However, this fixes a very significant bug in the prior code, where the flush_tlb() function called on guest function dispatch could return to the wrong address, due to the correct return address having been pushed onto the wrong page due to a stale TLB entry. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Certain page-table updates that do not require TLB invalidation were previously missing any kind of fence at all to keep table walks coherent. This commit introduces a new barrier for this scenario. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
306fac3 to
c2b03ea
Compare
ludfjig
left a comment
There was a problem hiding this comment.
LGTM. No feedback other than some nits, feel free to ignore
|
|
||
| /// The memory request is too small to contain everything that is | ||
| /// required | ||
| #[error("Memory requested {0} exceeds maximum size allowed {1}")] |
There was a problem hiding this comment.
nit: this attribute has an incorrect message
| self.shared_mem.with_exclusivity(|snap| { | ||
| self.scratch_mem.with_exclusivity(|scratch| { | ||
| let bytes = &snap.as_slice()[snapshot_pt_start..snapshot_pt_end]; | ||
| scratch.copy_from_slice(bytes, self.layout.get_pt_base_scratch_offset()) | ||
| }) | ||
| })???; |
There was a problem hiding this comment.
nit: This looks a bit funky but it's probably correct
There was a problem hiding this comment.
Curious what you meant here? Happy to take suggestions for a more clear way to do this. The same nested with_exclusivity pattern is used in snapshot.rs and I don't presently see a cleaner way to do it.
| pub(crate) fn get_entrypoint(&mut self) -> NextAction { | ||
| self.entrypoint | ||
| } | ||
|
|
||
| /// Set the current entrypoint action | ||
| pub(crate) fn set_entrypoint(&mut self, entrypoint: NextAction) { | ||
| self.entrypoint = entrypoint | ||
| } | ||
|
|
There was a problem hiding this comment.
nit: this could take &self
| } | ||
| } | ||
| fn phys_to_virt(&self, addr: u64) -> *mut u8 { | ||
| fn fallible_phys_to_virt(&self, addr: u64) -> Option<*mut u8> { |
There was a problem hiding this comment.
| fn fallible_phys_to_virt(&self, addr: u64) -> Option<*mut u8> { | |
| fn try_phys_to_virt(&self, addr: u64) -> Option<*mut u8> { |
| #[instrument(skip_all, parent = Span::current(), level= "Trace")] | ||
| pub(crate) fn set_pt_size(&mut self, size: usize) { | ||
| self.pt_size = Some(size); | ||
| pub(crate) fn get_pt_size(&mut self) -> usize { |
There was a problem hiding this comment.
think this can just be get_pt_size(&self)
| } | ||
|
|
||
| internal_dispatch_function(); | ||
| halt(); |
There was a problem hiding this comment.
I am not sure if this is true, but having eliminated the use of halt() here, can we remove the function also?
I'm referring to https://github.com/hyperlight-dev/hyperlight/blob/main/src/hyperlight_guest/src/exit.rs#L29.
And, if that is the case, I think this fixes #1074 , right?
| Ok(VmExit::Debug { dr6, exception }) => { | ||
| let initialise = match self.entrypoint { | ||
| NextAction::Initialise(initialise) => initialise, | ||
| _ => 0, |
There was a problem hiding this comment.
Nice, this gave me an idea: I think this can be used to add an option stop_on_entry for the sandbox under debug and stop on each call of a function (dispatch_function). So that it would solve the issue we have now that we only allow debugging of a newly created sandbox (we rely on the call to initialise to stop).
| let guest_start_tsc = hyperlight_guest_tracing::invariant_tsc::read_tsc(); | ||
| // Reset the trace state for the new guest function call with the new start TSC | ||
| // This clears any existing spans/events from previous calls ensuring a clean state | ||
| hyperlight_guest_tracing::new_call(guest_start_tsc); |
There was a problem hiding this comment.
It looks like this was removed from the new implementation. Does this mean tracing might not work as expected going forward?
There was a problem hiding this comment.
Yeah, you might be right there, good catch! @dblnz do you know if we can just move this into internal_dispatch_function, which is the first thing called with a normal ABI after the little entry stub?
| let target_virt = virt as *mut u8; | ||
| core::ptr::copy( | ||
| target_virt, | ||
| crate::paging::phys_to_virt(new_page).unwrap(), |
There was a problem hiding this comment.
This could cause a triple fault which might make it really hard to diagnose? Should we maybe abort with a clear error message here?
There was a problem hiding this comment.
I'm not sure I see how this would cause a triple fault? If you are referring to the unwrap, then I think it will just abort? Or is the concern about something a bit second order here like a format call that allocates in the unwrap code path or something? If the latter & that is there then happy to change it.
Apart from that, I don't think we should need to worry about reading from target_virt (since by this point we have already confirmed that the original mapping of this page was a CoW mapping, so just reading from it should be fine) or writing to the new page in the physmap (since all the physmap entries are RW).
There was a problem hiding this comment.
I was referring to the unwrap. Previously, when working on the handler I saw a panic combined with an alloc and it was hard to diagnose the initial error since it triggered the exception recursively till it stack overflowed. I think it would be easy enough to do a .or_else() and abort with a specific error message here?
There was a problem hiding this comment.
OK, I didn't think there would necessarily be an alloc here, but perhaps there is. I don't think it's a huge deal, because this case should be impossible, but happy to make the change until we get around to proving that it is!
There was a problem hiding this comment.
There may not be at this point. I think being defensive here might save us some heartache down the road if something acciendelty gets added later
| core::arch::asm!(" | ||
| mov rax, cr0 | ||
| mov cr0, rax | ||
| ", out("rax") _, out("rcx") _); |
There was a problem hiding this comment.
| ", out("rax") _, out("rcx") _); | |
| ", out("rax") _); |
rcx isn't used here
There was a problem hiding this comment.
Good catch, thanks, that is a holdover from a different implementation, will fix.
There are still a number of things that need to be done before the performance improvements from CoW will actually be apparent (e.g. making it possible to mmap the snapshot mapping readonly directly from the page cache, rather than copy it into a SharedMemory).