diff options
| author | Natasha Moongrave <natasha@256phi.eu> | 2026-04-08 16:40:09 +0200 |
|---|---|---|
| committer | Natasha Moongrave <natasha@256phi.eu> | 2026-04-08 16:40:09 +0200 |
| commit | af1089a4262414b64714b87180f2223c8a40918f (patch) | |
| tree | b208cdb34d0c4b43c5acff98127a824b4289e26a | |
| parent | eb61ec76367731579eb585f39b251da629beb871 (diff) | |
[Phase 2.1] GDT user space segments + heap growth
- Restructure GDT to add kernel_data/user_data/user_code in the order
required for SYSCALL/SYSRET ABI:
0x08 kernel code, 0x10 kernel data, 0x18 user data, 0x20 user code, 0x28 TSS
STAR MSR values: STAR[47:32]=0x08, STAR[63:48]=0x10
- Add TSS.privilege_stack_table[0] (RSP0) with 8 KiB static initial stack
for Ring3→Ring0 hardware interrupt transitions
- Expose GDT static and all Selectors fields as pub (needed by syscall module)
- Add set_kernel_stack(VirtAddr) for scheduler to update RSP0 per-process
- Grow HEAP_SIZE 100 KiB → 4 MiB to support process table + kernel stacks
- Fix pre-existing lifetime elision lint in allocator.rs
- Update flake.nix: add cpio, busybox, gdb, binutils, e2fsprogs
- Update NOTES.md with decisions and next steps
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| -rw-r--r-- | CLAUDE.md | 15 | ||||
| -rw-r--r-- | NOTES.md | 22 | ||||
| -rw-r--r-- | StrixKernel/src/allocator.rs | 4 | ||||
| -rw-r--r-- | StrixKernel/src/gdt.rs | 283 | ||||
| -rw-r--r-- | flake.nix | 35 |
5 files changed, 195 insertions, 164 deletions
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -**Strix OS** is a bare-metal x86-64 operating system kernel written in Rust, aiming for Linux application compatibility. Currently in Phase 0-1 (foundational infrastructure). +**Strix OS** is a bare-metal x86-64 operating system kernel written in Rust, aiming for Linux application compatibility. Currently in Phase 1 (memory management complete, heading into user space). ## Build Commands @@ -45,13 +45,18 @@ cargo clean | `gdt.rs` | Global Descriptor Table + Task State Segment (double-fault stack) | | `interrupts.rs` | IDT setup, exception handlers, hardware interrupts (timer/keyboard via PIC 8259) | | `memory.rs` | Page table access (`OffsetPageTable`), `BootInfoFrameAllocator` for physical frames | +| `allocator.rs` | Heap allocator dispatcher; `HEAP_START`/`HEAP_SIZE` constants; `init_heap()` | +| `allocator/bump.rs` | Simple bump allocator (fast alloc, dealloc only when all freed) | +| `allocator/linked_list.rs` | First-fit linked-list allocator (arbitrary alloc/dealloc) | +| `allocator/fixed_size_block.rs` | **Active**: hybrid fixed-size block allocator with 9 size classes (8–2048 bytes) + linked-list fallback | | `vga_buffer.rs` | VGA text mode (80x25), `print!`/`println!` macros | | `serial.rs` | UART 16550 serial output, `serial_print!`/`serial_println!` for debugging | ### Memory Model -- Uses bootloader's `map_physical_memory` feature: all physical memory mapped at a fixed offset +- Uses bootloader's `map_physical_memory` feature: all physical memory mapped at a fixed offset (`0x0000256000000000`) - `OffsetPageTable` translates virtual ↔ physical addresses - `BootInfoFrameAllocator` provides frames from bootloader's memory map +- Heap region: virtual address `0x4444_4444_0000`, size 100 KiB ### Test Framework - Custom `#[test_case]` attribute with `Testable` trait @@ -74,7 +79,7 @@ cargo clean ## Roadmap -See `roadmap.md` for the 6-phase development plan. Next priorities: -1. Heap allocator (GlobalAlloc implementation) -2. User space (Ring 3 transition, syscall interface) +See `roadmap.md` for the 6-phase development plan. Phase 1 (memory management) is complete. Next priorities: +1. User space (Ring 3 transition, syscall interface via SYSCALL/SYSRET MSRs) +2. ELF loading (parser + execve) 3. Process management (fork, exec, scheduler) @@ -17,9 +17,9 @@ ## Current Status **Branch**: `CLAUDE_TEST` -**Phase**: Starting Phase 2 — User Space Foundation -**Last commit**: `[Step 0] Add PLAN.md and NOTES.md` -**Next task**: `[Phase 2.1]` — Extend GDT with user space segments +**Phase**: Phase 2 — User Space Foundation +**Last commit**: `[Phase 2.1] GDT user space segments + heap growth` +**Next task**: `[Phase 2.2]` — SYSCALL/SYSRET MSR setup --- @@ -75,4 +75,20 @@ User address limit: 0x0000_8000_0000_0000 (canonical boundary) **Next**: Phase 2.1 — Extend GDT. **Decisions**: None new. +### [Phase 2.1] 2026-04-08 — GDT user space segments + heap growth +**Done**: +- Restructured `StrixKernel/src/gdt.rs`: added `kernel_data`, `user_data`, `user_code` segments in the correct order for SYSCALL/SYSRET ABI +- Added `TSS.privilege_stack_table[0]` (RSP0) with an 8 KiB static initial stack +- Exposed `GDT` static and `Selectors` fields as `pub` for use by syscall setup +- Added `set_kernel_stack(VirtAddr)` for the scheduler to update RSP0 per-process +- Grew `HEAP_SIZE` from 100 KiB → 4 MiB in `allocator.rs` (needed for process table) +- Fixed pre-existing lifetime lint in `allocator.rs` +- Updated `flake.nix` to add `cpio`, `busybox`, `gdb`, `binutils`, `e2fsprogs` +- `basic_boot` integration test passes in QEMU via `nix develop` +**Next**: Phase 2.2 — SYSCALL/SYSRET MSR setup (`src/syscall/mod.rs`) +**Decisions**: +- GDT order: kernel_code(0x08) / kernel_data(0x10) / user_data(0x18) / user_code(0x20) / TSS(0x28) +- STAR MSR: `[47:32]=0x08`, `[63:48]=0x10` → SYSRET CS=0x20, SS=0x18 +- `set_kernel_stack` uses raw pointer write inside `unsafe {}` block; safe when interrupts disabled + --- diff --git a/StrixKernel/src/allocator.rs b/StrixKernel/src/allocator.rs index 4bae6bd..f4c1b69 100644 --- a/StrixKernel/src/allocator.rs +++ b/StrixKernel/src/allocator.rs @@ -13,7 +13,7 @@ pub mod fixed_size_block; pub mod linked_list; pub const HEAP_START: usize = 0x_4444_4444_0000; -pub const HEAP_SIZE: usize = 100 * 1024; // 100 KiB +pub const HEAP_SIZE: usize = 4 * 1024 * 1024; // 4 MiB — needed for process table + kernel stacks #[global_allocator] static ALLOCATOR: Locked<FixedSizeBlockAllocator> = Locked::new(FixedSizeBlockAllocator::new()); @@ -69,7 +69,7 @@ impl<A> Locked<A> { } } - pub fn lock(&self) -> spin::MutexGuard<A> { + pub fn lock(&self) -> spin::MutexGuard<'_, A> { self.inner.lock() } } diff --git a/StrixKernel/src/gdt.rs b/StrixKernel/src/gdt.rs index 864dd57..63add29 100644 --- a/StrixKernel/src/gdt.rs +++ b/StrixKernel/src/gdt.rs @@ -1,108 +1,91 @@ //! # Global Descriptor Table (GDT) Module //! //! This module sets up the Global Descriptor Table and Task State Segment for the -//! Strix OS kernel, providing the foundation for memory segmentation and interrupt -//! handling on x86-64. +//! Strix OS kernel, providing the foundation for memory segmentation, interrupt +//! handling, and Ring 3 (user space) transitions on x86-64. //! -//! ## x86-64 Background +//! ## Segment Layout //! -//! ### Global Descriptor Table (GDT) +//! The GDT entries are ordered to satisfy the `SYSCALL`/`SYSRET` ABI constraints. +//! The `STAR` MSR encodes two base offsets from which the CPU derives selectors: //! -//! The GDT is a legacy x86 structure that was originally used for memory segmentation. -//! In 64-bit long mode, segmentation is mostly disabled, but the GDT is still required for: +//! - `STAR[47:32]` = kernel CS selector → kernel SS = kernel CS + 8 +//! - `STAR[63:48]` = user base → user SS = base + 8, user CS = base + 16 (with RPL=3) //! -//! - **Privilege Level Switching**: The code segment selector determines the Current -//! Privilege Level (CPL). Ring 0 (kernel) vs Ring 3 (user) transitions require -//! different code segments. -//! - **Task State Segment (TSS)**: The GDT must contain a TSS descriptor for the -//! processor to locate the TSS. +//! ```text +//! GDT index │ Offset │ Descriptor │ DPL +//! ──────────┼────────┼────────────────────┼──── +//! 0 │ 0x00 │ null │ — +//! 1 │ 0x08 │ kernel code │ 0 ← STAR[47:32] +//! 2 │ 0x10 │ kernel data │ 0 ← SYSCALL auto SS = 0x10 +//! 3 │ 0x18 │ user data │ 3 ← SYSRET SS = STAR[63:48]+8 +//! 4 │ 0x20 │ user code │ 3 ← SYSRET CS = STAR[63:48]+16 +//! 5–6 │ 0x28 │ TSS (128-bit) │ — +//! ``` //! -//! ### Task State Segment (TSS) +//! STAR MSR values: `STAR[47:32] = 0x08`, `STAR[63:48] = 0x10` //! -//! The TSS in 64-bit mode serves two main purposes: +//! ## Task State Segment //! -//! 1. **Interrupt Stack Table (IST)**: Up to 7 separate stack pointers that can be -//! used by specific interrupts. This is critical for handling double faults, -//! which cannot use the normal kernel stack (it might be corrupted or overflowed). +//! The TSS serves two purposes: //! -//! 2. **I/O Permission Bitmap**: Controls which I/O ports userspace can access -//! (not currently used in Strix OS). +//! 1. **IST (Interrupt Stack Table)**: Provides safe stacks for specific exceptions. +//! IST entry 0 is used for double faults to survive stack-overflow scenarios. //! -//! ## Module Structure -//! -//! - [`DOUBLE_FAULT_IST_INDEX`]: The IST index (0-6) used for the double fault handler -//! - [`TSS`]: Static Task State Segment with the double fault stack configured -//! - [`GDT`]: Static Global Descriptor Table with kernel code segment and TSS descriptor -//! - [`init()`]: Loads the GDT and TSS into the CPU -//! -//! ## Safety Considerations -//! -//! The GDT and TSS must remain valid for the lifetime of the kernel. They are -//! stored in `lazy_static` statics to ensure they have `'static` lifetime. +//! 2. **RSP0 (`privilege_stack_table[0]`)**: The kernel stack the CPU switches to +//! when taking a hardware interrupt while in Ring 3. The scheduler must call +//! [`set_kernel_stack`] before entering each user process to point RSP0 at +//! that process's kernel stack. use lazy_static::lazy_static; use x86_64::structures::gdt::{Descriptor, GlobalDescriptorTable, SegmentSelector}; use x86_64::structures::tss::TaskStateSegment; use x86_64::VirtAddr; -/// The Interrupt Stack Table index used for the double fault handler. -/// -/// The IST is an array of 7 stack pointers (indices 0-6) in the TSS. When an -/// interrupt or exception specifies an IST index in its IDT entry, the CPU -/// automatically switches to that stack before invoking the handler. +/// IST index used for the double fault handler. /// -/// We use index 0 for double faults. This ensures that even if the kernel -/// stack overflows (causing a page fault that turns into a double fault), -/// we have a valid stack to handle the exception. +/// When the double fault IDT entry specifies this IST index, the CPU automatically +/// switches to the stack stored in `TSS.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX]` +/// before invoking the handler. This is critical for surviving stack overflows. pub const DOUBLE_FAULT_IST_INDEX: u16 = 0; +/// Size of the double fault IST stack (20 KiB = 5 pages). +const DOUBLE_FAULT_STACK_SIZE: usize = 4096 * 5; + +/// Size of the privilege stack / RSP0 (8 KiB = 2 pages). +/// +/// This stack is used by the CPU for Ring 3 → Ring 0 transitions triggered by +/// hardware interrupts. The scheduler updates RSP0 via [`set_kernel_stack`] to +/// point at the current process's kernel stack before returning to user mode. +const PRIV_STACK_SIZE: usize = 4096 * 2; + lazy_static! { /// The Task State Segment for the kernel. /// - /// The TSS contains the Interrupt Stack Table (IST), which provides separate - /// stacks for specific interrupt handlers. This is essential for handling - /// faults that might occur due to stack issues (like stack overflow). - /// - /// ## IST Entry 0 (Double Fault Stack) + /// Contains: + /// - `interrupt_stack_table[0]`: 20 KiB stack for the double fault handler + /// - `privilege_stack_table[0]`: initial 8 KiB kernel stack for Ring 3 → Ring 0 /// - /// We allocate a 20 KiB stack (5 × 4096 bytes) for handling double faults. - /// The stack grows downward on x86-64, so we store the *end* address - /// (highest address) in the IST entry. - /// - /// ## Why a Separate Stack? - /// - /// Consider what happens during a stack overflow: - /// 1. Code pushes to the stack, exceeding the guard page - /// 2. CPU triggers a page fault - /// 3. CPU tries to push the interrupt frame... but the stack is full! - /// 4. CPU triggers a double fault - /// 5. CPU tries to push the double fault frame... still no stack! - /// 6. CPU triggers a triple fault → system reset - /// - /// With an IST entry, step 4 switches to a known-good stack, avoiding the - /// triple fault. + /// The scheduler must call [`set_kernel_stack`] to update RSP0 before + /// launching each user-mode process. static ref TSS: TaskStateSegment = { let mut tss = TaskStateSegment::new(); - // Configure IST entry 0 for the double fault handler + // IST entry 0: double fault stack. + // Must be a separate stack so that double faults triggered by a stack + // overflow (corrupted RSP) still have a valid stack to run on. tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = { - // Stack size: 20 KiB (5 pages) - const STACK_SIZE: usize = 4096 * 5; - - // Allocate the stack as a static mutable array. - // SAFETY: This is only accessed here during TSS initialization, - // and the stack pointer is stored in the TSS for CPU use only. - static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE]; - - // Get the stack boundaries - // SAFETY: We're taking a raw pointer to the static, which is safe. - // The `&raw const` syntax avoids creating a reference to mutable static. + static mut STACK: [u8; DOUBLE_FAULT_STACK_SIZE] = [0; DOUBLE_FAULT_STACK_SIZE]; let stack_start = VirtAddr::from_ptr(&raw const STACK); + stack_start + DOUBLE_FAULT_STACK_SIZE as u64 + }; - // Stacks grow downward on x86-64, so we need the end (top) address - let stack_end = stack_start + STACK_SIZE as u64; - - stack_end + // RSP0: kernel stack for Ring 3 → Ring 0 transitions via hardware interrupts. + // Replaced per-process by the scheduler via set_kernel_stack(). + tss.privilege_stack_table[0] = { + static mut STACK: [u8; PRIV_STACK_SIZE] = [0; PRIV_STACK_SIZE]; + let stack_start = VirtAddr::from_ptr(&raw const STACK); + stack_start + PRIV_STACK_SIZE as u64 }; tss @@ -110,100 +93,108 @@ lazy_static! { } lazy_static! { - /// The Global Descriptor Table for the kernel. + /// The Global Descriptor Table and its associated segment selectors. /// - /// The GDT contains segment descriptors that define memory segments and their - /// access permissions. In 64-bit long mode, most segmentation features are - /// disabled, but we still need: - /// - /// 1. **Kernel Code Segment**: Required for the CPU to execute code in Ring 0. - /// The segment selector is loaded into the CS register. - /// - /// 2. **TSS Segment**: A special system segment descriptor that points to the - /// Task State Segment. Required for the CPU to find IST stacks. - /// - /// The GDT is stored alongside a [`Selectors`] struct containing the segment - /// selectors returned when adding entries. These selectors are loaded into - /// CPU segment registers during [`init()`]. - static ref GDT: (GlobalDescriptorTable, Selectors) = { + /// Segments are added in the order required by the SYSCALL/SYSRET ABI. + /// The `pub` visibility allows the syscall module to read the selectors + /// needed to configure the `STAR` MSR. + pub static ref GDT: (GlobalDescriptorTable, Selectors) = { let mut gdt = GlobalDescriptorTable::new(); - // Add kernel code segment (required for Ring 0 execution) - let code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); - - // Add TSS segment (required for IST stack switching) - let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS)); - - (gdt, Selectors { code_selector, tss_selector }) + // Entries must be in exactly this order — see module-level doc for why. + let kernel_code_selector = gdt.add_entry(Descriptor::kernel_code_segment()); + let kernel_data_selector = gdt.add_entry(Descriptor::kernel_data_segment()); + let user_data_selector = gdt.add_entry(Descriptor::user_data_segment()); + let user_code_selector = gdt.add_entry(Descriptor::user_code_segment()); + let tss_selector = gdt.add_entry(Descriptor::tss_segment(&TSS)); + + (gdt, Selectors { + kernel_code_selector, + kernel_data_selector, + user_code_selector, + user_data_selector, + tss_selector, + }) }; } -/// Segment selectors for the GDT entries. -/// -/// A segment selector is a 16-bit value containing: -/// - Bits 0-1: Requested Privilege Level (RPL) -/// - Bit 2: Table Indicator (0 = GDT, 1 = LDT) -/// - Bits 3-15: Index into the descriptor table +/// Segment selectors for all GDT entries. /// -/// These selectors are returned by [`GlobalDescriptorTable::add_entry()`] and -/// must be loaded into the appropriate CPU registers. -struct Selectors { - /// Selector for the kernel code segment. - /// Loaded into the CS (Code Segment) register. - code_selector: SegmentSelector, - - /// Selector for the Task State Segment. - /// Loaded via the `ltr` (Load Task Register) instruction. - tss_selector: SegmentSelector, +/// The selectors encode the GDT index plus the Requested Privilege Level (RPL): +/// - Kernel selectors: RPL = 0 +/// - User selectors: RPL = 3 (set automatically by the x86_64 crate) +pub struct Selectors { + /// Selector for the kernel code segment (DPL=0, RPL=0). + /// Loaded into CS on boot; also stored in `STAR[47:32]` for SYSCALL. + pub kernel_code_selector: SegmentSelector, + + /// Selector for the kernel data segment (DPL=0, RPL=0). + /// `STAR[63:48]` is set to this value; SYSRET derives user SS from it (+8) + /// and user CS (+16). It is also the SS value on SYSCALL entry. + pub kernel_data_selector: SegmentSelector, + + /// Selector for the user code segment (DPL=3, RPL=3). + /// Used in `iretq` frames when entering Ring 3. + pub user_code_selector: SegmentSelector, + + /// Selector for the user data segment (DPL=3, RPL=3). + /// Used in `iretq` frames for SS when entering Ring 3. + pub user_data_selector: SegmentSelector, + + /// Selector for the TSS descriptor. + /// Loaded via the `ltr` instruction during [`init`]. + pub tss_selector: SegmentSelector, } -/// Initializes and loads the Global Descriptor Table and Task State Segment. -/// -/// This function must be called early in the kernel boot process, before any -/// interrupts are enabled. It performs the following steps: -/// -/// 1. **Load GDT**: Uses the `lgdt` instruction to load the GDT register with -/// the address and size of our GDT. +/// Updates `TSS.privilege_stack_table[0]` (RSP0) to point at the given kernel stack. /// -/// 2. **Reload CS Register**: The code segment register must be reloaded after -/// changing the GDT. This is done with a far jump/return that loads the new -/// selector. -/// -/// 3. **Load TSS**: Uses the `ltr` (Load Task Register) instruction to tell the -/// CPU where to find our TSS. This enables IST stack switching. +/// The CPU reads RSP0 when taking a hardware interrupt while executing in Ring 3. +/// It must be set to the top of the *current process's* kernel stack before +/// returning to user mode, so that any interrupt received in Ring 3 uses the +/// correct per-process kernel stack. /// /// # Safety /// -/// After this function returns: -/// - The CPU uses our GDT for all segment lookups -/// - Interrupts can use IST stacks defined in the TSS -/// - The GDT and TSS must remain valid for the kernel's lifetime +/// The caller must ensure: +/// 1. Interrupts are disabled (use `x86_64::instructions::interrupts::disable()`). +/// 2. `kernel_stack_top` points to the top of a valid, mapped kernel stack that +/// will remain valid while the corresponding process runs in Ring 3. /// -/// # Example +/// Interrupts must be disabled because the CPU may read RSP0 between the write +/// and the actual Ring 3 entry if an interrupt fires at that instant. Disabling +/// interrupts makes the update atomic from the CPU's perspective. +pub unsafe fn set_kernel_stack(kernel_stack_top: VirtAddr) { + // SAFETY: The TSS lives for the kernel's lifetime (lazy_static). + // We write via a raw pointer with interrupts disabled, so no concurrent + // reader can observe a partially-written value. The write is to a plain + // u64 field (VirtAddr wraps u64), so it is word-atomic on x86-64. + // SAFETY: See function-level SAFETY doc. Caller ensures interrupts are disabled + // and the stack pointer is valid. + unsafe { + let tss_ptr = &*TSS as *const TaskStateSegment as *mut TaskStateSegment; + (*tss_ptr).privilege_stack_table[0] = kernel_stack_top; + } +} + +/// Initializes and loads the GDT and TSS. +/// +/// Must be called before any interrupts are enabled. /// -/// ```ignore -/// // Called during kernel initialization -/// gdt::init(); -/// interrupts::init_idt(); // IDT can now reference IST entries -/// ``` +/// Steps: +/// 1. Load the GDT via `lgdt` +/// 2. Reload CS with the kernel code selector (required after GDT change) +/// 3. Load the TSS via `ltr` (enables IST stack switching) pub fn init() { use x86_64::instructions::segmentation::{Segment, CS}; use x86_64::instructions::tables::load_tss; - // Load the GDT into the GDTR register GDT.0.load(); - // SAFETY: We're loading valid segment selectors from our GDT. - // The code_selector points to a valid kernel code segment. - // The tss_selector points to our TSS descriptor. + // SAFETY: Both selectors point to valid descriptors in our GDT. + // The kernel_code_selector is a valid Ring 0 code descriptor. + // The tss_selector is a valid system (TSS) descriptor. unsafe { - // Reload the code segment register with our kernel code selector. - // This is required because changing the GDT doesn't automatically - // update the hidden portion of segment registers. - CS::set_reg(GDT.1.code_selector); - - // Load the task register with our TSS selector. - // This tells the CPU where to find IST stacks for interrupts. + CS::set_reg(GDT.1.kernel_code_selector); load_tss(GDT.1.tss_selector); } } @@ -1,20 +1,39 @@ { - description = "Dev shell for kernel/build environment"; + description = "Dev shell for Strix OS kernel development"; inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable"; - outputs = { self, nixpkgs }: + outputs = { self, nixpkgs }: let system = "x86_64-linux"; pkgs = import nixpkgs { inherit system; }; in { devShells.${system}.default = pkgs.mkShell { - packages = with pkgs; [ - rustup # Rust - qemu # VM - ]; - + packages = with pkgs; [ + # Rust toolchain (nightly specified by rust-toolchain file) + rustup + + # QEMU for running kernel and integration tests + qemu + + # cpio: required by build.rs to pack the initramfs archive + cpio + + # busybox statically linked: placed in initramfs/ for the embedded rescue shell + busybox + + # Debugging and inspection tools + gdb + binutils # objdump, nm, readelf + + # Disk image tools for testing ext2/ext4 (Phase 4+) + e2fsprogs # mkfs.ext2, mkfs.ext4, debugfs + ]; + + # Ensure cargo uses the nightly toolchain from rust-toolchain + shellHook = '' + rustup show + ''; }; }; } - |
