How to Read a File in Kernel Space Linux
read and write
The read and write methods perform a similar task, that is, copying data from and to awarding lawmaking. Therefore, their prototypes are pretty similar and it's worth introducing them at the same time:
ssize_t read(struct file *filp, char *buff, size_t count, loff_t *offp); ssize_t write(struct file *filp, const char *buff, size_t count, loff_t *offp);
For both methods, filp
is the file pointer and count
is the size of the requested data transfer. The buff
argument points to the user buffer holding the information to exist written or the empty buffer where the newly read information should be placed. Finally, offp
is a pointer to a "long showtime type" object that indicates the file position the user is accessing. The return value is a "signed size type;" its use is discussed later on.
Every bit far equally data transfer is concerned, the primary result associated with the 2 device methods is the need to transfer information betwixt the kernel address space and the user address space. The operation cannot be carried out through pointers in the usual manner, or through memcpy . User-space addresses cannot exist used directly in kernel space, for a number of reasons.
Ane big divergence between kernel-space addresses and user-space addresses is that memory in user-infinite can be swapped out. When the kernel accesses a user-space arrow, the associated folio may not be present in retentivity, and a page fault is generated. The functions we introduce in this section and in Department 5.i.4 in Affiliate 5 employ some hidden magic to bargain with page faults in the proper fashion even when the CPU is executing in kernel space.
Also, it's interesting to note that the x86 port of Linux 2.0 used a completely different memory map for user infinite and kernel space. Thus, user-space pointers couldn't be dereferenced at all from kernel space.
If the target device is an expansion board instead of RAM, the same problem arises, because the driver must notwithstanding copy data between user buffers and kernel space (and mayhap betwixt kernel space and I/O memory).
Cross-space copies are performed in Linux by special functions, defined in <asm/uaccess.h>
. Such a copy is either performed by a generic ( memcpy -like) office or by functions optimized for a specific data size (char
, brusque
, int
, long
); nearly of them are introduced in Section 5.1.4 in Affiliate 5.
The code for read and write in scull needs to copy a whole segment of data to or from the user address infinite. This capability is offered past the following kernel functions, which copy an arbitrary array of bytes and sit at the heart of every read and write implementation:
unsigned long copy_to_user(void *to, const void *from, unsigned long count); unsigned long copy_from_user(void *to, const void *from, unsigned long count);
Although these functions behave like normal memcpy functions, a little extra care must be used when accessing user space from kernel code. The user pages being addressed might not exist currently present in memory, and the page-mistake handler can put the process to sleep while the page is existence transferred into place. This happens, for example, when the page must be retrieved from swap infinite. The internet effect for the driver author is that whatever function that accesses user space must be reentrant and must be able to execute concurrently with other driver functions (run across also Section five.ii.3 in Chapter 5). That's why we apply semaphores to control concurrent admission.
The function of the 2 functions is non limited to copying data to and from user-space: they also check whether the user space arrow is valid. If the pointer is invalid, no copy is performed; if an invalid address is encountered during the re-create, on the other mitt, only part of the data is copied. In both cases, the return value is the corporeality of memory still to be copied. The scull code looks for this mistake return, and returns -EFAULT
to the user if it's not 0.
The topic of user-space admission and invalid user space pointers is somewhat advanced, and is discussed in Section five.1.4 in Chapter 5. However, it's worth suggesting that if y'all don't demand to bank check the user-space pointer you tin can invoke __copy_to_user and __copy_from_user instead. This is useful, for example, if you know you already checked the statement.
As far as the actual device methods are concerned, the job of the read method is to re-create data from the device to user space (using copy_to_user ), while the write method must copy data from user space to the device (using copy_from_user ). Each read or write system call requests transfer of a specific number of bytes, but the driver is gratuitous to transfer less data—the exact rules are slightly different for reading and writing and are described afterward in this chapter.
Whatsoever the amount of data the methods transfer, they should in general update the file position at *offp
to stand for the electric current file position after successful completion of the system call. Most of the fourth dimension the offp
argument is just a pointer to filp->f_pos
, but a different pointer is used in gild to back up the pread and pwrite arrangement calls, which perform the equivalent of lseek and read or write in a single, atomic performance.
Figure 3-ii represents how a typical read implementation uses its arguments.
Effigy 3-2. The arguments to read
Both the read and write methods render a negative value if an error occurs. A return value greater than or equal to 0 tells the calling program how many bytes have been successfully transferred. If some data is transferred correctly and and then an error happens, the return value must be the count of bytes successfully transferred, and the error does not get reported until the next time the function is chosen.
Although kernel functions render a negative number to signal an mistake, and the value of the number indicates the kind of fault that occurred (as introduced in Affiliate 2 in Section 2.four.1), programs that run in user infinite always meet -1 every bit the mistake return value. They need to access the errno
variable to discover out what happened. The difference in behavior is dictated by the POSIX calling standard for system calls and the advantage of not dealing with errno
in the kernel.
The read Method
The render value for read is interpreted by the calling awarding program equally follows:
-
If the value equals the
count
argument passed to the read system call, the requested number of bytes has been transferred. This is the optimal case. -
If the value is positive, but smaller than
count
, only function of the information has been transferred. This may happen for a number of reasons, depending on the device. Nigh often, the application program volition retry the read. For example, if you read using the fread part, the library function reissues the organisation call till completion of the requested information transfer. -
If the value is 0, cease-of-file was reached.
-
A negative value ways there was an fault. The value specifies what the error was, according to
<linux/errno.h>
. These errors look like-EINTR
(interrupted system call) or-EFAULT
(bad address).
What is missing from the preceding listing is the case of "there is no data, but information technology may arrive subsequently." In this example, the read arrangement phone call should block. We won't deal with blocking input until Section five.2 in Chapter 5.
The scull code takes advantage of these rules. In item, it takes advantage of the fractional-read rule. Each invocation of scull_read deals only with a single data quantum, without implementing a loop to gather all the information; this makes the code shorter and easier to read. If the reading program actually wants more than data, it reiterates the call. If the standard I/O library (i.e., fread and friends) is used to read the device, the application won't even notice the quantization of the data transfer.
If the current read position is greater than the device size, the read method of scull returns 0 to signal that there'south no data available (in other words, we're at end-of-file). This state of affairs tin happen if process A is reading the device while process B opens it for writing, thus truncating the device to a length of cipher. Process A suddenly finds itself past end-of-file, and the next read call returns 0.
Here is the code for read :
ssize_t scull_read(struct file *filp, char *buf, size_t count, loff_t *f_pos) { Scull_Dev *dev = filp->private_data; /* the first list particular */ Scull_Dev *dptr; int quantum = dev->breakthrough; int qset = dev->qset; int itemsize = quantum * qset; /* how many bytes in the list item */ int item, s_pos, q_pos, residuum; ssize_t ret = 0; if (down_interruptible(&dev->sem)) return -ERESTARTSYS; if (*f_pos >= dev->size) goto out; if (*f_pos + count > dev->size) count = dev->size - *f_pos; /* find list item, qset index, and offset in the breakthrough */ item = (long)*f_pos / itemsize; residual = (long)*f_pos % itemsize; s_pos = residual / breakthrough; q_pos = residual % quantum; /* follow the list up to the right position (defined elsewhere) */ dptr = scull_follow(dev, detail); if (!dptr->data) goto out; /* don't fill holes */ if (!dptr->data[s_pos]) goto out; /* read only up to the end of this quantum */ if (count > quantum - q_pos) count = quantum - q_pos; if (copy_to_user(buf, dptr->data[s_pos]+q_pos, count)) { ret = -EFAULT; goto out; } *f_pos += count; ret = count; out: up(&dev->sem); return ret; }
The write Method
write , like read , can transfer less data than was requested, according to the following rules for the render value:
-
If the value equals
count
, the requested number of bytes has been transferred. -
If the value is positive, but smaller than
count
, only part of the data has been transferred. The program will most probable retry writing the rest of the data. -
If the value is 0, zero was written. This result is not an error, and there is no reason to return an mistake lawmaking. One time again, the standard library retries the call to write . We'll examine the exact meaning of this case in Section v.2 in Chapter 5, where blocking write is introduced.
-
A negative value ways an error occurred; like for read , valid error values are those defined in
<linux/errno.h>
.
Unfortunately, in that location may be misbehaving programs that issue an error message and abort when a partial transfer is performed. This happens because some programmers are accustomed to seeing write calls that either neglect or succeed completely, which is actually what happens most of the time and should be supported past devices as well. This limitation in the scull implementation could be fixed, but nosotros didn't want to complicate the code more than than necessary.
The scull code for write deals with a single quantum at a time, similar the read method does:
ssize_t scull_write(struct file *filp, const char *buf, size_t count, loff_t *f_pos) { Scull_Dev *dev = filp->private_data; Scull_Dev *dptr; int quantum = dev->quantum; int qset = dev->qset; int itemsize = quantum * qset; int particular, s_pos, q_pos, residuum; ssize_t ret = -ENOMEM; /* value used in "goto out" statements */ if (down_interruptible(&dev->sem)) return -ERESTARTSYS; /* observe list detail, qset alphabetize and offset in the quantum */ item = (long)*f_pos / itemsize; rest = (long)*f_pos % itemsize; s_pos = rest / quantum; q_pos = rest % quantum; /* follow the listing upwardly to the right position */ dptr = scull_follow(dev, detail); if (!dptr->data) { dptr->data = kmalloc(qset * sizeof(char *), GFP_KERNEL); if (!dptr->information) goto out; memset(dptr->data, 0, qset * sizeof(char *)); } if (!dptr->information[s_pos]) { dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL); if (!dptr->data[s_pos]) goto out; } /* write only up to the end of this quantum */ if (count > quantum - q_pos) count = quantum - q_pos; if (copy_from_user(dptr->data[s_pos]+q_pos, buf, count)) { ret = -EFAULT; goto out; } *f_pos += count; ret = count; /* update the size */ if (dev->size < *f_pos) dev-> size = *f_pos; out: upwardly(&dev->sem); return ret; }
readv and writev
Unix systems accept long supported two alternative system calls named readv and writev . These "vector" versions take an array of structures, each of which contains a pointer to a buffer and a length value. A readv call would then be expected to read the indicated amount into each buffer in plough. writev , instead, would gather together the contents of each buffer and put them out equally a single write operation.
Until version 2.three.44 of the kernel, even so, Linux always emulated readv and writev with multiple calls to read and write . If your driver does not supply methods to handle the vector operations, they will still be implemented that style. In many situations, withal, greater efficiency is achieved by implementing readv and writev directly in the driver.
The prototypes for the vector operations are as follows:
ssize_t (*readv) (struct file *filp, const struct iovec *iov, unsigned long count, loff_t *ppos); ssize_t (*writev) (struct file *filp, const struct iovec *iov, unsigned long count, loff_t *ppos);
Here, the filp
and ppos
arguments are the same as for read and write . The iovec
construction, defined in <linux/uio.h>
, looks like this:
struct iovec { void *iov_base; _ _kernel_size_t iov_len; };
Each iovec
describes one clamper of data to be transferred; information technology starts at iov_base
(in user space) and is iov_len
bytes long. The count
parameter to the method tells how many iovec
structures there are. These structures are created by the awarding, but the kernel copies them into kernel space before calling the driver.
The simplest implementation of the vectored operations would be a simple loop that just passes the address and length out of each iovec
to the driver'southward read or write function. Often, withal, efficient and correct beliefs requires that the driver do something smarter. For example, a writev on a record drive should write the contents of all the iovec
structures as a unmarried record on the record.
Many drivers, though, will gain no benefit from implementing these methods themselves. Thus, scull omits them. The kernel will emulate them with read and write , and the terminate result is the same.
Source: https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch03s08.html
0 Response to "How to Read a File in Kernel Space Linux"
Post a Comment