用户态到内核态切换

http://www.cnblogs.com/justcxtoworld/p/3155741.html

本文将主要研究在X86体系下Linux系统中用户态到内核态切换条件，及切换过程中内核栈和任务状态段TSS在中断机制/任务切换中的作用及相关寄存器的变化。

一、用户态到内核态切换途径：

1：系统调用 2：中断　　3：异常

对应代码，在3.3内核中，可以在/arch/x86/kernel/entry_32.S文件中查看。

二、内核栈

内核栈：Linux中每个进程有两个栈，分别用于用户态和内核态的进程执行，其中的内核栈就是用于内核态的堆栈，它和进程的task_struct结构，更具体的是thread_info结构一起放在两个连续的页框大小的空间内。

在内核源代码中使用C语言定义了一个联合结构方便地表示一个进程的thread_info和内核栈：

此结构在3.3内核版本中的定义在include/linux/sched.h文件的第2106行：

union thread_union {
        struct thread_info thread_info;
        unsigned long stack[THREAD_SIZE/sizeof(long)];
   };        

其中thread_info结构的定义如下：

3.3内核 /arch/x86/include/asm/thread_info.h文件第26行：

　　struct thread_info {
       struct task_struct      *task;          /* main task structure */
       struct exec_domain      *exec_domain;   /* execution domain */
       __u32                   flags;          /* low level flags */
       __u32                   status;         /* thread synchronous flags */
       __u32                   cpu;            /* current CPU */
       int                     preempt_count;  /* 0 => preemptable,
                                                  <0 => BUG */
       mm_segment_t            addr_limit;
       struct restart_block    restart_block;
       void __user             *sysenter_return;
#ifdef CONFIG_X86_32
       unsigned long           previous_esp;   /* ESP of the previous stack in
                                                  case of nested (IRQ) stacks
                                               */
       __u8                    supervisor_stack[0];
#endif
       unsigned int            sig_on_uaccess_error:1;
       unsigned int            uaccess_err:1;  /* uaccess failed */
};

它们的结构图大致如下：

esp寄存器是CPU栈指针，存放内核栈栈顶地址。在X86体系中，栈开始于末端，并朝内存区开始的方向增长。从用户态刚切换到内核态时，进程的内核栈总是空的，此时esp指向这个栈的顶端。

在X86中调用int指令型系统调用后会把用户栈的%esp的值及相关寄存器压入内核栈中，系统调用通过iret指令返回，在返回之前会从内核栈弹出用户栈的%esp和寄存器的状态，然后进行恢复。所以在进入内核态之前要保存进程的上下文，中断结束后恢复进程上下文，那靠的就是内核栈。

这里有个细节问题，就是要想在内核栈保存用户态的esp,eip等寄存器的值，首先得知道内核栈的栈指针，那在进入内核态之前，通过什么才能获得内核栈的栈指针呢？答案是：TSS

三、TSS

X86体系结构中包括了一个特殊的段类型：任务状态段（TSS），用它来存放硬件上下文。TSS反映了CPU上的当前进程的特权级。

linux为每一个cpu提供一个tss段，并且在tr寄存器中保存该段。

在从用户态切换到内核态时，可以通过获取TSS段中的esp0来获取当前进程的内核栈栈顶指针，从而可以保存用户态的cs,esp,eip等上下文。

注：linux中之所以为每一个cpu提供一个tss段，而不是为每个进程提供一个tss段，主要原因是tr寄存器永远指向它，在任务切换的适合不必切换tr寄存器，从而减小开销。

下面我们看下在X86体系中Linux内核对TSS的具体实现：

内核代码中TSS结构的定义：

3.3内核中：/arch/x86/include/asm/processor.h文件的第248行处：

 struct tss_struct {
       /*
        * The hardware state:
        */
       struct x86_hw_tss       x86_tss;

       /*
        * The extra 1 is there because the CPU will access an
        * additional byte beyond the end of the IO permission
        * bitmap. The extra byte must be all 1 bits, and must
        * be within the limit.
        */
       unsigned long           io_bitmap[IO_BITMAP_LONGS + 1];

       /*
        * .. and then another 0x100 bytes for the emergency kernel stack:
        */
       unsigned long           stack[64];

} ____cacheline_aligned;    

其中主要的内容是：
硬件状态结构: x86_hw_tss
IO权位图: 　　　　io_bitmap
备用内核栈: 　　 stack

其中硬件状态结构：其中在32位X86系统中x86_hw_tss的具体定义如下：

/arch/x86/include/asm/processor.h文件中第190行处：

190#ifdef CONFIG_X86_32
/* This is the TSS defined by the hardware. */
struct x86_hw_tss {
       unsigned short          back_link, __blh;
       unsigned long           sp0;　　            //当前进程的内核栈顶指针
       unsigned short          ss0, __ss0h;       //当前进程的内核栈段描述符
       unsigned long           sp1;
       /* ss1 caches MSR_IA32_SYSENTER_CS: */
       unsigned short          ss1, __ss1h;
       unsigned long           sp2;
       unsigned short          ss2, __ss2h;
       unsigned long           __cr3;
       unsigned long           ip;
       unsigned long           flags;
       unsigned long           ax;
       unsigned long           cx;
       unsigned long           dx;
       unsigned long           bx;
       unsigned long           sp;      　　　　　　//当前进程用户态栈顶指针
       unsigned long           bp;
       unsigned long           si;
       unsigned long           di;
       unsigned short          es, __esh;
       unsigned short          cs, __csh;
       unsigned short          ss, __ssh;
       unsigned short          ds, __dsh;
       unsigned short          fs, __fsh;
       unsigned short          gs, __gsh;
       unsigned short          ldt, __ldth;
       unsigned short          trace;
       unsigned short          io_bitmap_base;

} __attribute__((packed));

linux的tss段中只使用esp0和iomap等字段，并且不用它的其他字段来保存寄存器，在一个用户进程被中断进入内核态的时候，从tss中的硬件状态结构中取出esp0（即内核栈栈顶指针），然后切到esp0，其它的寄存器则保存在esp0指的内核栈上而不保存在tss中。

每个CPU定义一个TSS段的具体实现代码：

3.3内核中/arch/x86/kernel/init_task.c第35行：

* per-CPU TSS segments. Threads are completely 'soft' on Linux,
* no more per-task TSS's. The TSS size is kept cacheline-aligned
* so they are allowed to end up in the .data..cacheline_aligned
* section. Since TSS's are completely CPU-local, we want them
* on exact cacheline boundaries, to eliminate cacheline ping-pong.
*/

DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, init_tss) = INIT_TSS;

INIT_TSS的定义如下:

3.3内核中 /arch/x86/include/asm/processor.h文件的第879行：

#define INIT_TSS  {                                                       \
       .x86_tss = {                                                      \
               .sp0            = sizeof(init_stack) + (long)&init_stack, \
               .ss0            = __KERNEL_DS,                            \
               .ss1            = __KERNEL_CS,                            \
               .io_bitmap_base = INVALID_IO_BITMAP_OFFSET,               \
        },                                                               \
       .io_bitmap              = { [0 ... IO_BITMAP_LONGS] = ~0 },       \
}

其中init_stack是宏定义，指向内核栈：

61 #define init_stack              (init_thread_union.stack)

这里可以看到分别把内核栈栈顶指针、内核代码段、内核数据段赋值给TSS中的相应项。从而进程从用户态切换到内核态时，可以从TSS段中获取内核栈栈顶指针，进而保存进程上下文到内核栈中。

总结、有了上面的一些准备，现总结在进程从用户态到内核态切换过程中，Linux主要做的事：

1：读取tr寄存器，访问TSS段
2：从TSS段中的sp0获取进程内核栈的栈顶指针
3：由控制单元在内核栈中保存当前eflags,cs,ss,eip,esp寄存器的值。
4：由SAVE_ALL保存其寄存器的值到内核栈
5：把内核代码选择符写入CS寄存器，内核栈指针写入ESP寄存器，把内核入口点的线性地址写入EIP寄存器

此时，CPU已经切换到内核态，根据EIP中的值开始执行内核入口点的第一条指令。

kk Blog —— 通用基础

date [-d @int|str] [+%s|"+%F %T"]
netstat -ltunp
sar -n DEV 1

一、用户态到内核态切换途径：

二、内核栈

三、TSS

总结、有了上面的一些准备，现总结在进程从用户态到内核态切换过程中，Linux主要做的事：