There are a number of core Linux features or shell utilities that can help build extremely efficient and effective software tools with minimal effort. I find these particularly useful when building server systems, build utilities, or tools for processing large amounts of data. Also, they are great for quick “hacky” solutions.
Instead of re-inventing the wheel, utilizing these features lets you reduce your workload and build upon the years of work that has gone into large open source projects like the Linux kernel. They can also help you abstract away the responsibility of functionality that you might need to the operating system, which can usually do a much better job than you can.
Below, I’ll generally go in the order of starting with the more commonly known features/utilities and make my way to ones that are not as well known.
Table of Contents
- Hosts File
- Temporary Directory
- Special Device Files
- Job Control and Priority
- Symbolic Links
- Named Pipes or FIFOs
- Memory Mapped Files
- File Locks
- File System Event Monitor (or “Watcher”)
- Swap Space
- Loop Mount
- RAM Drive and Shared Memory Virtual File System
- Process Information Virtual File System
- Loopback Interface
- Miscellaneous Binary Formats
Cron is a scheduling system that lets you run commands and scripts at a specified regular schedule.
/tmp/ temporary directory is a place where you can dump temporary files you don’t want to keep around for very long. Usually this gets cleared by the next time the machine is re-booted. Again, most developers know of this already.
Special Device Files
On Linux, everything is a file. So the special file paths
/dev/stderr correspond to a process’s standard streams. So if a command line program takes a file path as an argument, you could pass in
/dev/stdin to make it operate on its standard input
The are other special files like /dev/random, /dev/urandom, /dev/null, /dev/full, and /dev/zero that let you do things like read random bytes and zero bytes or write to “null” or nothing.
Job Control and Priority
You can use the
bash shell to launch and manage multiple processes using
bg instead of rolling your own control system for quick runs.
nice command can help you weigh / balance how much CPU time each process gets by setting their priority.
You can think of a symbolic link as creating a “reference” to a folder or file somewhere else on the file system. You can place the reference anywhere and programs when given a path to the reference will (usually) treat it as if they were just looking at the linked folder or file. They can be created with the
Named Pipes or FIFOs
Named pipes, also known as FIFOs, are a method of one-way communication between multiple processes. The pipe is “backed” by a file.
Sockets allow for two-way communication between two processes and allow for communication over a network. Almost all languages support creating them and their interface is usually very simple, so it’s an easy way to talk between two processes.
Memory Mapped Files
Memory mapped files are a way of loading a file’s content as a buffer of memory. This is useful because it can be treated as memory and, using the
MAP_SHARED flag, any changes made to the memory mapped file will be reflected in other processes that have also memory mapped the file, resulting in a form of communication between processes.
Moreover, it has the nice property that the file’s contents are paged into volatile memory (RAM) as needed by the operating system and pages are evicted using the the eviction policy of the page cache. This allows you to automatically get a caching system that brings in the file’s contents into memory, where it can be operated on much faster, completely free.
The content stored in volatile memory is also shared between multiple processes so there also aren’t multiple copies floating around in RAM. This makes it useful for sharing a large file between multiple processes that would not fit in memory many times over.
File System Event Monitor (or “Watcher”)
If you want to monitor or watch a directory and get notified when a file is modified (so that you could perhaps re-build a project), you could poll and scan over all the files in the directory recursively. However, for a large directory (thousands of files) and a frequent polling frequency (every few seconds), this will end up chewing up a lot of CPU time. Instead, you can use Inotify which provides hooks for monitoring file system events in a much more efficient manner.
A file system needs to implement some basic operations (list files at a location, read X number of bytes from a file, write X number of bytes to a file, seek to specific position in a file, etc.). FUSE is a utility that allows you to create virtual filesystems that implement those basic operations but are backed by non-traditional file systems. For example, you can mount things like FTP directories, AWS S3 buckets, Google Cloud Store object stores as a “local directory” on your machine.
Swap space is additional system memory backed by disk that the system can use when RAM might be full. You can add swap memory to your system using a file and choose the “swappiness” level of your system.
The system will automatically push out inactive portions of RAM to disk and prioritize putting more actively used regions in RAM. Swap space works fine if you have a lot of contents in RAM, but only a small amount is actively needed at any given point. If you are truly trying to access a large amount of data at once or quickly using this method it’s likely you’ll experience thrashing or swap death.
Loop mounting allows you to mount a directory on your system that is backed by a single file somewhere else on your system. The file just needs to hold a valid file system format. This lets you move around an entire file system as a file.
RAM Drive and Shared Memory Virtual File System
tmpfs you can mount a directory, backed by RAM, that you can read and write files to. For this reason, read/write operations on this file system are much faster than traditional disk storage, but the storage is volatile.
/dev/shm on your system is a built-in location that uses
tmpfs and is backed by RAM.
You can memory map a file in
tmpfs in multiple processes and that would allow you to share memory regions between them for communication. This is slightly different than normally using memory mapped files since the contents are not backed to a file.
Process Information Virtual File System
Following everything is a file, the
/proc/ path is a virtual file system that contains virtual files that contain information about the CPU and currently running processes on the system (like PIDs, environment variables, open files, etc.). You can read all of this information just by reading these virtual files.
In a Docker container, however,
localhost will refer to the
localhost of the container and not the host. You can alias the loopback interface to another IP address which lets you now access the host’s
localhost in the container at that IP address.
CRIU is a utility that allows you to pause and save the state of running process to a file (checkpoint) and then load and continue running the process at a later time (restore). This requires some specific kernel features to be enabled.
LD_PRELOAD environment variable lets you inject custom implementations of symbols that are dynamically linked in a program. The library runs before
libc so you can even inject a custom
malloc implementation. This provides a nice “hacking interface” to customize the behavior of a compiled program or as a probing mechanism.
You can use the
chroot operation to “jail” a process to a certain directory (so that it can only see, read, and write files in that directory).
You can use the
cgroups feature to limit the use of system resources like CPU and memory by a process.
You can use the
namespaces feature to isolate resources like hostnames and process IDs available to a process.
These are the underlying features that enable Docker containers to work.
Miscellaneous Binary Formats
binfmt_misc feature of Linux allows you to register programs to run non-standard, unrecognized executable files. For example, you can register a virtual machine or emulator to run executables intended for different architectures. QEMU utilizes this feature.
If you have any others you think would be interesting to add to the list, you can contact me via email on the about page.