しゃべれる理系: Xv6: Modern OS-like Case Study

Yohei Yasukawa

Operating System

Dec. 15th, 2010

Case Study - Xv6

1. Introduction

Like case study sections in Modern Operating System, this paper introduces about Xv6, an teaching operating system, and shows how it runs in order to deeply understand a operating system concept.

A. What’s Xv6?

Xv6 is a teaching operating system that was developed by MIT in 2002, and being used as a material of operating system courses for graduate students since then. And it is becoming famous as a well-developed teaching operating system. In fact, in 2010, it is used as a teaching material of operating system courses in not only MIT, but also Rutgers University, Yale University, Johns Hopkins University, and Stanford University, because of its simplicity.

For many years, operating systems had been taught as one of essential courses for understanding computer science. However, there were few operating systems that can be covered in within a semester because the great operating systems, which are well-developed as a teaching material such as BSD and Linux, are too large to cover in a course. Also, the existing operating systems, such as Unix, were so obsolete that many students were struggled to understand the differences between old and new architectures, and between old and new programming languages. These obstacles made it difficult to teach operating system concepts with actual readable source codes.

To attack this problem, xv6 operating system was developed as a very compacted operating system that implements important operating system concepts which are mainly inspired from Unix version 6, but it is written in ANSI C programming language and runs on x86. So, Xv6 is becoming famous for a good teaching material that is written in readable source codes, and follows current architectures, but its line of codes keeps less than 10,000 and its commentary consists of less than 100 pages, which can be covered in a semester.

B. Related works

To understand perspective of xv6, knowing related works are helpful, so this section briefly introduces two other teaching operating systems, MINIX3 and Haribote OS.

MINIX 3

MINIX 3, which is an operating system developed since 1987, is one of other well-known operating systems that is intended for teaching. This operating system is thoroughly explained in the textbook, Operating Systems: Design and Implementation.

There are two key different features from xv6. One main difference is that MINIX 3 uses micro kernel structure, which is one of concepts on operating system that implementing less features in a kernel results in increasing performances rather than implementing many features in a kernel. In other words, it needs to have well abstracted organizations, although UNIX tends to put all key functions in a kernel. In fact, the kernel in MINIX 3 operating system consists of less than 4,000 lines. Thereby, MINIX 3 operating system is potentially easier to porting other architectures.

The other key difference is the concept of MINIX 3 operating system. MINIX 3 was developed not only for teaching, but also for embedded systems. Former versions of MINIX, such as MINIX 1 and MINIX 2, were intended only for teaching, but the modern organization, micro kernel, was fit to the purpose in embedded technology fields. So, from MINIX 3, MINIX expands its features to fit to the needs from embedded systems. Thereby, MINIX 3 users are increased from MINIX 3; on the other hand, the size of MINIX 3 is becoming larger, and now the line of codes keeps more than 30,000 lines.

Haribote OS

The other teaching operating system is Haribote OS, which was released with the book published in 2006 in Japan. This operating system contributes to understand operating system with actual source codes, because it does not fully cover general design and concepts that other operating systems do, but is intended for learning how operating systems are implemented. So, the book starts from implementing bootstrap from disks, and ends with implementing applications running on the Haribote OS.

Also, unlike MINIX 3 and xv6, this operating system does not need any previous knowledge. According to the author, this operating system and book was structured for being understood by not only university students, but also junior high school students. In fact, the book kindly explains computer architectures when the readers needs to understand to implement. So, this teaching operating system is friendly for operating system learners, especially at the very beginning level.

C. Overview of Xv6

Operating systems have different aspects in order to adapt a particular situation or to achieve a particular purpose. But most operating systems, especially unix and unix-like operating systems, have very similar designs. The designs should be slightly different one by one, but its essential concept is totally same.

Fundamental Concepts

Xv6 is, as it is inspired from Unix V6, designed based on such shared, essential concepts. Thereby, xv6 can be a concrete example of other general operating systems. So, understanding of xv6 can be useful for any kind of operating systems. On the other hand, because it is an aggregation of general operating systems, most organizations are so simple that it is very understandable but not effective. For example, many operating systems separately use read locks and write locks in order to decrease performance degradation. However, xv6 does not separately use locks but use just a lock, which increases readability and simplicity of source codes. But it decreases the performance.

Interfaces

From this reason, xv6 kernel provides just following system calls:

Obviously, these system calls are familiar with general operating systems, and they run in the almost same way actually. But as you see that the arguments of functions are equal to or less than general ones, each function is simplified.

Like other operating systems, these system calls serve as an interface between kernel space and user space. Processes in user space cannot use the resources held by kernel, but they can use them by calling system calls. For example, to achieve an xv6’s memory allocation organization that can be dynamic, kalloc() and kfree() system calls are used to implement.

In xv6, any kind of organizations follow this concepts. So, when seeing xv6 codes, you can easily understand essential concepts of process, scheduler, locks, traps, etc. in implementation level, but you need to consider how those concepts can be adapted to real worlds for advancing your knowledge.

2. Process in Xv6

Xv6’s process contains instructions, data, and stack in user-space memory. Also, the process information is stored at data structure in a kernel-space. For example, if a process wants to get its process identifier, it can get it by calling pid() system call. So, processes in xv6 implements several organizations by controlling the data with system calls. This section introduces fundamental functions and a few distinct organizations in xv6.

Fundamental Functions

Xv6’s process can create a new process with fork() system call. It lets a process create new child process. Each process has same memory contents but separately managed.

Also, process can stop with exit() and can wait for one of the calling process’s children to exit with wait().

As the program and sequence diagram above illustrates, fork() returns child process id to the parent process, and returns 0 to child process. So, the fork() behaves in the same way as general operating systems. However, exit() and wait() runs in a slightly different way. In general, they need status code in their arguments, but in xv6 they are not needed when calling exit() and wait(). So, when a process calls wait(), it just waits for one of the caller’s children to call exit().

Similarly, xv6’s process has exec() system call to replace the calling process’s memory with a new memory image. The different colors of data in user space illustrates, exec() system call replaces instructions, data, stack data in user space with new ones. So, in xv6, the memory image except data structure in kernel space are all replaced with a new program when calling exec().

Scheduling

As most operating systems implement multiple process concept, xv6 implements it, too. But the way of scheduling processes works in a different, simpler way. When scheduling processes, red colored members in the following data structures are used.

The colored members are basically same as general designs except ZOMBIE state of process. In xv6, when a process died, it turns into a ZOMBIE process, and it is not removed automatically. The processes in ZOMBIE states do nothing, and wait for being detected from scheduler. If the scheduler notices that there is a ZOMBIE process in a process table, it removes from the table then. Implementing the automatically process-detected system should be smart, but it requires to have some kind of interrupts, which will reduce the simplicity of scheduler. So, ZOMBIE process state functions as just waiting for being removed.

With the colored data members, xv6’s scheduler runs in the following way which corresponds to the program below.

1. If a hardware interrupt is waiting to be handled, the scheduler's CPU will handle it before continuing.

2. Check the process table looking for a runnable process.

3. Set up CPU's segment descriptors and current process task state.

4. Mark the process as RUNNING.

5. Call swtch() to start running new process.

(* swtch() function causes context switching; specifically, it saves the current context and switches to the chosen process’s context.)

These steps are basically same as the ones in general operating systems. The distinct point of xv6’s scheduler is that the scheduler seeks process table entries incrementally. It is so simple that the line of code are within just about 15 lines. However, in this scheduler, there is no way to assign process intentionally, such as priority-based scheduling. So, if there is a process that needs to be done earlier than other processes, it, however, has to wait for being assigned by xv6’s scheduler.

3. Memory Management in Xv6

As most operating systems have memory managers, xv6 operating system also has a memory manager in its kernel. If a process wants to get more memory spaces or returns its memory spaces to the kernel, they need to contact a memory manager. So, to accept a request from processes, a memory manager always manages used and free memory spaces.

In xv6, the memory manager manages its memory space with a linked list called freelist that contains list of memory area. The unit of its page size is 4KB because x86 segment size granularity is 4KB. So, for example, a page in the freelist is able to have huge size amount of memory spaces in one page, but the least size of a page should be 4KB. The actual structure is as follows:

struct run { struct run *next;int len; // bytes;} struct run *freelist;

So, when xv6‘s user processes want to request more memory space for doing something, they can get it by calling kalloc() function. Like the scheduling algorithm in xv6, the memory manager seeks the free list incrementally, and if it detects the page that is large enough to accept the request, it returns a kernel-segment pointer. If not, it return 0.

Similarly, if a process needed to release memory space, they can return it by calling kfree() function. But the kfree() runs a little more complex than that the kalloc() runs. When kfree() is called, not only the len bytes of memory spaces are freed, but also the memory manager sorts freelist, meaning that if there are free, adjacent pages, they are combined into one long page.

Obviously, this memory-allocating algorithm is too simple to adapt real world. But it matches xv6’s concept, an operating system as teaching material.

4. File System

As almost all of operating systems have file systems to maintain data consistently, xv6 also has its simple file system. So, this chapter introduces how the file system in xv6 works in terms of two sections: file system data, and file system call.

File System Data

In xv6, the file system composes of the following four elements:

1. Block Allocator

2. I-nodes

3. Directories

4. Path names

First, the block allocator is responsible for managing disk blocks, and keeping track of which blocks are in use. It functions as a memory allocator does. So, if balloc() function is called, the allocator finds and returns free blocks, and then, some data is allocated at the returned new disk block. Also, it can release blocks by calling bfree(). One of the signature has a block to be freed, so if it is called, the given block will be released. So, the block allocator in xv6 can manage disk blocks.

Second element is an i-node, which refer to am unnamed file in the file system. Precisely, it is managed by several functions. For example, when allocating, getting or putting data, the function of ialloc(), iget(), or iput() is used respectively. To allocate a new i-node, such as creating a file, xv6 will call ialloc() function. The ialloc() runs like balloc(), allocating a new i-node with the given type on device. In iget(), it finds the i-node by a given i-node number and returns its copy. And the xv6 can release a reference to an i-node by calling iput(). With such functions, xv6 can manage i-nodes.

The third is directories, which are special kind of i-nodes whose content is a sequence of directory entries. Each entry is a structure in which there are data of name and i-node number. So, for searching a directory, we can find it by calling dirlookup() with the name. Also, for writing, dirlink() function is provided. It enables us to write a new entry with the given name and i-node number into a directory. With those functions, the concept of directory is implemented in xv6.

Finally, there is a path name element, which serves as providing convenient syntax to identify particular files or directories. For example, you can directly refer to a file with the path name, “/xv6/fs.c”, by using the path name. In the case, xv6 first looks up root directory because it begins with a slash, and then, look up “xv6” directory. If the directory exits, then starts to refer to “fs.c” file. If the path name does not begin with a slash, xv6 assumes that the path is a relative path, not absolute path.

With those elements, xv6 implements the file system. So, the xv6 file system runs in almost same way that the current unix and unix-like operating system runs. But the algorithms are totally naive. The functions to seek something such as balloc() and dirlookup(), xv6 uses a linear seeking, in order to fit the concept of xv6. So, the file system behaves almost same as other operating systems, but it is not efficient.

File System Call

We learned how the data structure in the xv6 file system works in the previous section, but not how the system calls in xv6 are related. So, this section briefly introduces which system calls are used to implement the file system.

The following list shows some of the system calls and its brief behavior mainly used in xv6 file system.

In addition, there are some helper functions for system calls, which include argint(), argstr(), argptr(), and argfd(). They function as making a good abstraction and serve as connectors between lower level functions to upper level one. And they run in the same way as modern systems have, except that its algorithm is much simpler than the modern systems.

5. Input / Output

We described how the file system is structured and which system calls are used to access data in the previous chapter. Since this chapter, we will focus on how input and output works; specifically, how a buffer cache works in xv6.

Overview

Xv6 has a buffer cache organization to read data from and to write data into disks. This buffer cache functions as managing data temporarily before reading or writing in order to control data exclusively and increase performance. So, if two processes access the same data simultaneously, the organization can serve as serializing it. Also, it is beneficial for reducing the number of accessing physical devices, which is considered as one of the highest cost operations.

Data Structure

How does xv6 implement the buffer cache? Xv6 implements it by using a structure data called “buf”. The buf has three fields: dev, sector, and data. Dev specifies which disk device is referred, sector specifies which sector in a disk is referred, and data is a memory space for copying data from the sector. So, each buf corresponds to a sector in a disk.

The point of this data structure is that how it collaborates with other processes for exclusive control. For collaborating, the data field in the structure data has flags to maintain a status of read and write by B_VALID, B_DIRTY, and B_BUSY. B_VALID flag determines if data is already read, B_DIRTY determines if the data need to write out, and B_BUSY is used for exclusive control. For example, if a process is using a buf, the B_BUSY flag turns on in order to avoid the situation that the other process uses the buf.

Disk Driver

To access disks, an interface between operating systems and hardwares is required. Currently, there are several kinds of interface to access disks, such as SCSI and SATA, but xv6 uses IDE controller as an interface, because it is a little older but very simple. The actual steps to boot IDE device is as follows:

1. Control interrupts:

Initializes locks and forbids hardware interrupts.
Allows uni- and multi-processor to interrupt.

2. Boot and wait:

Boot IDE controller
Do polling status bits until IDE controller becomes ready

3. Check disk status:

Choose each disk incrementally
Check if status bits are changed
If not changed, recognize that the disk is absent

So, through these steps above, xv6 boots the IDE and control disks.

Now we understand how to boot the disk interface, IDE controller. So, let us start understand how it controls disk accesses with buffers. In xv6, the requests of reading and writing data to disks are managed by a queue. If buffers in a queue all finish, the controller notifies it with an hardware interrupt. Specifically, the buffer is added into a queue by calling iderw(), and the first buffer in the queue is started to read or write by calling idestart(), while other buffers in the queue wait for finishing the previous buffer.

The idestart() changes its behavior by the status of flags; if the write bit is on, it writes data, and then notify that it finished by calling ideintr(). If the read bit is on, with the sleep() system call, it waits for the notification that the disks are ready for being read, and then read data.

Interrupts and Locks

What happens if an interrupt occurs when reading or writing data? Obviously, the consistency of disk file system will be broken because the disk access cannot be stopped immediately. To avoid such situations, xv6 has a special lock called idklock; xv6 acquires the lock by calling acquire(), and releases it by release(). So, with the lock, xv6’s writing and reading operations can be synchronized.

Like general operating systems, because the idelock is used, we should avoid race condition when acquiring and releasing the lock. So, xv6 creates critical regions by adding pushcli() or popcli() operation to forbid interrupts. Specifically, pushcli() is inserted before acquire(), and popcli() is inserted after release(). Of course, the interrupts can be nested, so the puchcli() and popcli() operations can be stacked in xv6.

Buffer Cache

Now we grasp the idea about how buffer cache works to read and write data. But why does the buffer cache serve as serializing accesses? It is obvious that we need to establish the situation that only one kernel process can edit file system’s data, because if not, the file system will not be consistent and the data will be corrupted. To allow only one process to read and write, bread() is used when using buffer caches to block processes. For example, if two processes are about to call bread(), one process gets the buffer successfully, and the other process will wait for that the first process finished its operation.

The bread() function calls two helper functions; first, call bget() to obtain a buffer for a given sector, and then, call iderw() to add the buffer into a queue in order to read or write disks. When bget() function is called, it starts to scan buffers incrementally, and if the buffer for a given sector is found, it tries to acquire a lock. If a lock is acquired successfully, bget() sets B_BUSY flag true, and return the buffer. For exceptions, if cannot acquire the lock, sleep until the lock is released. Also, if not found the targeted buffer, create a new buffer for that section, and if there is a buffer never used recently, reuse the buffer as a new buffer. In other words, there are no buffers in the list at the beginning, but each buffer is created based on needs. Once created, it is remained in the list, or reused as a new buffer.

It is common that general operating systems have the list of buffers for input and output, but in xv6, the list is structured based on LRU algorithm. The reason why xv6 uses LRU algorithm is that it is simple to scan for two meanings. If the obtained buffer is used successfully, the buffer moves its entry in the list to the first entry, because xv6 assumes that recently used buffer will be used again in the near future. Also, if looking for buffers never used recently, xv6 can scan the list by the reverse order, because the reverse order of the list means the order of buffer recently not used. So, the LRU algorithm fits to the concept of xv6.

6. Summary

In this case study, we investigated the background and concept of xv6 first, and then showed how xv6 runs. Overall, because of its concept, xv6 would be too simple as an operating system; however, it is implemented by the understandable codes and algorithms and they are essences modern operating systems. Also, the lack of functions to be an real operating system could be good assignments for learners to implement. For those reasons, xv6 is becoming famous as a new teaching operation system, and becoming widely used in operating system classes.

7. References

1. Xv6: a teaching operating system (revision 3)

http://pdos.csail.mit.edu/6.828/xv6/

2. The MINIX 3 Operating System

http://www.minix3.org/

3. Haribote OS

http://hrb.osask.jp/

しゃべれる理系

2011-04-16

Xv6: Modern OS-like Case Study

0 件のコメント: