lab4:PartB

lab通关记录

MIT-6.828实验通关记录

B 部分:写时复制fork

我们这部分实现以下内容

  1. 用户的pgfault 的upcall机制:从内核到用户处理,再返回用户
  2. COW的流程处理

对于fork的写时复制,jos采用了“复制页表”的方式。而我们也可以用“共享页表”的方式:即在内核页目录中增加用户目录映射到父级,而不是前者:映射到新分配的页面作为子进程的页表,并memcpy复制父子页表。

话说回来,我们的复制页表,这种COW通常设置为read-only,而当我们写时会出现PGFLT,这里给出一些通用(不一定是JOS情况)可能PGFLT的例子:

  • COW的修改

  • 如果exec只映射部分?

    • BSS:通常或许先只映射了全为0的一页
    • STACK:通常先映射了一页而已
    • HEAP:比如linux下mmp和brk
    • TEXT:如果代码很长,那么我们有理由考虑先映射部分磁盘内容

上述的映射情况在linux下会增加一个vm_area_struct来维护和记录映射,比如brk只是在这个结构体中增加,并没有进行真正的映射(增加页表项)。

回到我们的JOS,我们本部分实验最终关注并实现一个fork,我们的fork简化为第一种情况:将父子USTACKTOP下的所有页都COW映射,因此我们将会不断的产生PGFLT来逐页替换为子进程独有部分。

Exercise8

我们将注册用户的pagefault处理,我们可以让内核处理,也可以让用户处理,jos使用了用户处理,于是我们的中断流程:当我们pagefault时,中断处理会执行user-mode的exception stack,使用upcall处理,然后在user态下回到中断前的状态。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/*
* UTOP,UENVS ------> +------------------------------+ 0xeec00000
* UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebff000
* | Empty Memory (*) | --/-- PGSIZE
* USTACKTOP ---> +------------------------------+ 0xeebfe000
* | Normal User Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebfd000
* | |
* | |
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* . .
* . .
* . .
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
* | Program Data & Heap |
* UTEXT --------> +------------------------------+ 0x00800000
*/

我们在这里进行注册upcall函数。

所以我们记得权限检查,不要忘记还有syscall的调用添加(在syscall.h里的syscallno)

1
2
3
4
5
6
7
8
9
10
static int
sys_env_set_pgfault_upcall(envid_t envid, void *func)
{
// LAB 4: Your code here.
struct Env* e;
if(envid2env(envid, &e, 1))return -E_BAD_ENV;
env->env_pgfault_upcall = func;
return 0;
//panic("sys_env_set_pgfault_upcall not implemented");
}

Exercise9

用来处理用户页面错误,我们的内核之前处理时将会直接panic,所以我们现在在用户模式进行处理自己的pgfault,主要是对exception stack的判断处理,lab给的note非常详细

给出exception stack布局

1
2
3
4
5
6
7
8
9
10
11
12
13
14
                    <-- UXSTACKTOP 
trap-time esp
trap-time eflags
trap-time eip
trap-time eax start of struct PushRegs
trap-time ecx
trap-time edx
trap-time ebx
trap-time esp
trap-time ebp
trap-time esi
trap- time edi end of struct PushRegs
tf_err (error code)
fault_va <-- %esp 当处理程序运行时

我们让这个用户下单异常栈将会被upcall用来回到中断前的代码,而如果本身pagefault处理递归的话,就会多次插入UTrapframe。为什么预留要32bit(4B)?lab10会给出答案:递归时的eip存放。

page_fault_handler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
if(curenv->env_pgfault_upcall){
struct UTrapframe * utf;
if(ROUNDUP(tf->tf_esp, PGSIZE) == UXSTACKTOP){
utf = (struct UTrapframe *)((tf->tf_esp) - sizeof(struct UTrapframe) - 4);
}else{
utf = (struct UTrapframe *)(UXSTACKTOP - sizeof(struct UTrapframe));
}

user_mem_assert(curenv, (void *)utf, sizeof(struct UTrapframe), PTE_W);
utf->utf_fault_va = fault_va;
utf->utf_err = tf->tf_err;
utf->utf_regs = tf->tf_regs;
utf->utf_eip = tf->tf_eip;
utf->utf_eflags = tf->tf_eflags;
utf->utf_esp = tf->tf_esp;
tf->tf_eip = (uintptr_t)curenv->env_pgfault_upcall;//异常处理时的eip记录
tf->tf_esp = (uintptr_t)utf;//下一次
env_run(curenv);//返回user-mode
}

// Destroy the environment that caused the fault.
cprintf("[%08x] user fault va %08x ip %08x\n",
curenv->env_id, fault_va, tf->tf_eip);
print_trapframe(tf);
env_destroy(curenv);

Exercise10

我们使用

lib/pfentry.S下_page_fault_upcall

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
//  +----------USTACKTOP------+   high
// | ... |
// +-------------------------+
// | |
// +-------------------------+
// | trap-time-esp (4B) |
// +-------------------------+
// | trap-time-eflags (4B) |
// +-------------------------+
// | trap-time-eip (4B) |
// +-------------------------| low
// | trap-time-regs (32B)|
// | ... |
// | ... |
// +-------------------------+
// | err (4B) |
// +-------------------------+
// | fault_va (4B) |
// +-------------------------+ <-- cur_esp
// (1)
//
// +----trap-time-stack------+
// | ... |
// +-------------------------+
// | trap-time-eip (4B) |
// +-------------------------+ <-- trap_time_esp

// LAB 4: Your code here
// Restore the trap-time registers. After you do this, you
// can no longer modify any general-purpose registers.
// LAB 4: Your code here.
// trap-time esp -= 4 to push trap-time eip into trap-time stack
movl 0x30(%esp), %eax
subl $0x4, %eax
movl %eax, 0x30(%esp)
//push trap-time eip into trap-time stack
movl 0x28(%esp), %ebx
mov %ebx, (%eax)
//restore trap-time registers
addl $8, %esp
popal
// Restore eflags from the stack. After you do this, you can
// no longer use arithmetic operations or anything else that
// modifies eflags.
// LAB 4: Your code here.
addl $4, %esp
popfl
// Switch back to the adjusted trap-time stack.
// LAB 4: Your code here.
popl %esp
// Return to re-execute the instruction that faulted.
// LAB 4: Your code here.
//ret: popl %eip
ret

Exercise11

lib/pgfault.c

set_pgfault_handler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
void
set_pgfault_handler(void (*handler)(struct UTrapframe *utf))
{
int r;

if (_pgfault_handler == 0) {
// First time through!
// LAB 4: Your code here.
r = sys_page_alloc(0, (void *)(UXSTACKTOP - PGSIZE), PTE_U | PTE_P | PTE_W;
if(r < 0)panic("set_pgfault_handler: page alloc fault!");
r = sys_env_set_pgfault_upcall(0, (void *)_pgfault_upcall);
if(r < 0)
panic("set_pgfault_handler: set pgfault upcall failed!");
}
// Save handler pointer for assembly to call.
_pgfault_handler = handler;
}

Exercise12

lib/fork.c

这里解决一个当初理解错误的问题:UVPT是用户映射,并且分配了4MB虚拟内存(inc/memlayout.h),刚好对应了1M的页数。所以uvpt里的index应该是PGNUM的宏而不是PTX,而uvpd是页目录也确实没错,并且是以kern_pgdir模板,并且用户只读不可改。只有uvpd的内核相应函数查找时候才用到PTX。这是一个比较绕的设计想法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
static void
pgfault(struct UTrapframe *utf)
{
void *addr = (void *) utf->utf_fault_va;
uint32_t err = utf->utf_err;
int r;

// Check that the faulting access was (1) a write, and (2) to a
// copy-on-write page. If not, panic.
// Hint:
// Use the read-only page table mappings at uvpt
// (see <inc/memlayout.h>).

// LAB 4: Your code here.
uint32_t write_err = err & FEC_WR;
uint32_t COW = uvpt[PGNUM(addr)] & PTE_COW;
if(!(write_err && COW))panic("pgfault: not write to the COW page fault!\n");
// Allocate a new page, map it at a temporary location (PFTEMP),
// copy the data from the old page to the new page, then move the new
// page to the old page's address.
// Hint:
// You should make three system calls.

// LAB 4: Your code here.
//alloc a page by PFTEMP

addr = ROUNDDOWN(addr, PGSIZE);
r = sys_page_alloc(0, PFTEMP, PTE_U | PTE_P | PTE_W);
if(r < 0)panic("pgfault: sys_page_alloc failed!\n");
//copy data
memmove(PFTEMP, addr, PGSIZE);
r = sys_page_map(0, PFTEMP, 0, addr, PTE_U | PTE_P | PTE_W);
if(r < 0)panic("pgfault: sys_page_map failed!\n");

//remove PTE:PFTEMP
r = sys_page_unmap(0, PFTEMP);
if(r < 0)panic("pgfault: sys_page_unmap failed!\n");
//panic("pgfault not implemented");
}

//
// Map our virtual page pn (address pn*PGSIZE) into the target envid
// at the same virtual address. If the page is writable or copy-on-write,
// the new mapping must be created copy-on-write, and then our mapping must be
// marked copy-on-write as well. (Exercise: Why do we need to mark ours
// copy-on-write again if it was already copy-on-write at the beginning of
// this function?)
//
// Returns: 0 on success, < 0 on error.
// It is also OK to panic on error.
//
static int
duppage(envid_t envid, unsigned pn)
{
int r;

// LAB 4: Your code here.
//COW check, map page
pte_t pte = uvpt[pn];
void *addr = (void *) (pn * PGSIZE);

uint32_t perm = pte&0xfff;
if(perm & (PTE_W | PTE_COW)){
perm &= ~PTE_W;
perm |= PTE_COW;
}

r = sys_page_map(0, addr, envid, addr, perm & PTE_SYSCALL);
if(r < 0)panic("duppage: sys_map_page child failed\n");
//map self again : freeze parent and child
r = sys_page_map(0, addr, 0, addr, perm & PTE_SYSCALL);
if(r < 0)panic("duppage: sys_map_page self failed\n");
//panic("duppage not implemented");
return 0;
}

//
// User-level fork with copy-on-write.
// Set up our page fault handler appropriately.
// Create a child.
// Copy our address space and page fault handler setup to the child.
// Then mark the child as runnable and return.
//
// Returns: child's envid to the parent, 0 to the child, < 0 on error.
// It is also OK to panic on error.
//
// Hint:
// Use uvpd, uvpt, and duppage.
// Remember to fix "thisenv" in the child process.
// Neither user exception stack should ever be marked copy-on-write,
// so you must allocate a new page for the child's user exception stack.
//
envid_t
fork(void)
{
// LAB 4: Your code here.
//1.set page fault handler
set_pgfault_handler(pgfault);
//2.create a child env
envid_t envid = sys_exofork();//just the tf copy
if (envid == 0) {//must after code below excuted
thisenv = &envs[ENVX(sys_getenvid())];//fix "thisenv" in the child process
return 0;
}
if (envid < 0) {
panic("fork: sys_exofork: %e failed\n", envid);
}
//COW mapping:duppage(envid, va's page):from 0 - USTACKTOP(under UTOP)
uint32_t addr;
for (addr = 0; addr < USTACKTOP; addr += PGSIZE)
if ((uvpd[PDX(addr)] & PTE_P) && (uvpt[PGNUM(addr))] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_U)) {
duppage(envid, PGNUM(addr)); //env already has page directory and page table
}

//child's exception stack
int r;
if ((r = sys_page_alloc(envid, (void *)(UXSTACKTOP-PGSIZE), PTE_P | PTE_W | PTE_U)) < 0)
panic("sys_page_alloc: %e", r);
//set child's pgfault_upcall
extern void _pgfault_upcall(void);
sys_env_set_pgfault_upcall(envid, _pgfault_upcall);
//runnable
if ((r = sys_env_set_status(envid, ENV_RUNNABLE)) < 0)
panic("sys_env_set_status: %e", r);
return envid;
//panic("fork not implemented");
}

总结

总结一下PartB的upcall机制和fork处理

1.set upcall

2.call upcall

3.COW 机制

​ 1.what in ”lib/fork.c“?

  • pagefault: user registers own handler supported by syscall
  • duppage: copy virtual page into appointed env id(just modify the page table)
  • fork: create a child(COW mapping)

​ 2.what “fork” will do?

  • set page fault handler
  • cread a child
  • fix child thisenv->env_id to real env id
  • COW mapping
  • child’s page fault handler
  • set child’s state = RUNNABLE

​ 3.how “syscall” support the handler?

  • sys_env_set_status
  • sys_env_set_pgfault_upcall
  • sys_page_alloc(just check va perm and alloc a ppage)
  • sys_page_map
  • sys_page_unmap

那么问题来了,这个upcall在user-mode下,使kern做出了这么多事情,也让中断的次数增加了很多,到底是为什么?

给出一些我查阅的资料:

关于kernel upcall 引用 lkml 中的一段话 :

An upcall is a mechanism that allows the kernel to execute a function in userspace, and potentially be returned information as a result.

An upcall is like a signal, except that the kernel may use it at any time, for any purpose, including in an interrupt handler.

A process asks to use upcalls, and passes the kernel the addresses of a series of stacks to execute upcalls on. The kernel wires down down the stacks. The process registers functions associated with a set of predefined events (such as a page fault or blocking I/O). When such an event happens, the thread for which the event occured to doesn’t call schedule(), but instead switches to an upcall stack, constructs a dummy trap return so that on return to user space it will execute the upcall, and returns to user space via a trap return.

Even Larry will, I hope, admit that this is a pretty fast process, much faster than a context switch, and way faster than a call to any schedule().

Note however that the function NEVER RETURNS TO THE KERNEL.

在这个邮件列表里面还给出了 upcall 可以用来实现 scheduler activation 和 timing in user space code :

Why would you want upcalls ? Well, we implemented upcalls specifically for a thread package that uses an idea called scheduler activations; every time a kernel thread blocks on I/O or suffers a page fault, the kernel “activates” the user level thread scheduler and tells it what happened. This way, the user level thread scheduler can continue to use the processor by deciding to run some other thread.

It would also allow much more precise timing for Linux user space code, because a process could register a function (and yes, it has to be a very carefully designed process) to be executed by the timer interrupt (probably the timer code BH), not whenever the process gets woken by the timer interrupt and then run.

大意就是:

1.对于阻塞I/O时候,不再上下文切换,会快

2.对于抢占式切换(时间片),可以随时切换。如果是转到内核处理时,会因为同步问题加锁来禁止切换,导致了时间片效果很差

lab4:PartC
Donate
  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.
  • Copyrights © 2020-2024 环烷烃
  • Visitors: | Views:

我很可爱,请我喝一瓶怡宝吧~

支付宝
微信