There are a number of core Linux features or shell utilities that can help build extremely efficient and effective software tools with minimal effort. I find these particularly useful when building server systems, build utilities, or tools for processing large amounts of data. Also, they are great for quick “hacky” solutions.
Instead of re-inventing the wheel, utilizing these features lets you reduce your workload and build upon the years of work that has gone into large open source projects like the Linux kernel. They can also help you abstract away the responsibility of functionality that you might need to the operating system, which can usually do a much better job than you can.
Below, I’ll generally go in the order of starting with the more commonly known features/utilities and make my way to ones that are not as well known.
Table of Contents
- Hosts File
- Cron
- Temporary Directory
- Special Device Files
- Job Control and Priority
- Symbolic Links
- Named Pipes or FIFOs
- Sockets
- Memory Mapped Files
- File Locks
- File System Event Monitor (or “Watcher”)
- FUSE
- Swap Space
- Loop Mount
- RAM Drive and Shared Memory Virtual File System
- Process Information Virtual File System
- Loopback Interface
- CRIU
- LD_PRELOAD
- Jailing
- Miscellaneous Binary Formats
Hosts File
The /etc/hosts
file lets you resolve some hostnames to IP addresses you choose. The file is used preferentially before your DNS service.
Cron
Cron is a scheduling system that lets you run commands and scripts at a specified regular schedule.
Temporary Directory
The /tmp/
temporary directory is a place where you can dump temporary files you don’t want to keep around for very long. Usually this gets cleared by the next time the machine is re-booted. Again, most developers know of this already.
Special Device Files
On Linux, everything is a file. So the special file paths /dev/stdin
, /dev/stdout
, /dev/stderr
correspond to a process’s standard streams. So if a command line program takes a file path as an argument, you could pass in /dev/stdin
to make it operate on its standard input
The are other special files like /dev/random, /dev/urandom, /dev/null, /dev/full, and /dev/zero that let you do things like read random bytes and zero bytes or write to “null” or nothing.
Job Control and Priority
You can use the bash
shell to launch and manage multiple processes using jobs
, fg
, and bg
instead of rolling your own control system for quick runs.
Using the nice
command can help you weigh / balance how much CPU time each process gets by setting their priority.
Symbolic Links
You can think of a symbolic link as creating a “reference” to a folder or file somewhere else on the file system. You can place the reference anywhere and programs when given a path to the reference will (usually) treat it as if they were just looking at the linked folder or file. They can be created with the ln
command.
Named Pipes or FIFOs
Named pipes, also known as FIFOs, are a method of one-way communication between multiple processes. The pipe is “backed” by a file.
Sockets
Sockets allow for two-way communication between two processes and allow for communication over a network. Almost all languages support creating them and their interface is usually very simple, so it’s an easy way to talk between two processes.
Memory Mapped Files
Memory mapped files are a way of loading a file’s content as a buffer of memory. This is useful because it can be treated as memory and, using the MAP_SHARED
flag, any changes made to the memory mapped file will be reflected in other processes that have also memory mapped the file, resulting in a form of communication between processes.
Moreover, it has the nice property that the file’s contents are paged into volatile memory (RAM) as needed by the operating system and pages are evicted using the the eviction policy of the page cache. This allows you to automatically get a caching system that brings in the file’s contents into memory, where it can be operated on much faster, completely free.
The content stored in volatile memory is also shared between multiple processes so there also aren’t multiple copies floating around in RAM. This makes it useful for sharing a large file between multiple processes that would not fit in memory many times over.
File Locks
You can perform “locking” on files for coordination of writing / reading to them or just use the file as a “lock file” to act as a lock between multiple processes.
File System Event Monitor (or “Watcher”)
If you want to monitor or watch a directory and get notified when a file is modified (so that you could perhaps re-build a project), you could poll and scan over all the files in the directory recursively. However, for a large directory (thousands of files) and a frequent polling frequency (every few seconds), this will end up chewing up a lot of CPU time. Instead, you can use Inotify which provides hooks for monitoring file system events in a much more efficient manner.
FUSE
A file system needs to implement some basic operations (list files at a location, read X number of bytes from a file, write X number of bytes to a file, seek to specific position in a file, etc.). FUSE is a utility that allows you to create virtual filesystems that implement those basic operations but are backed by non-traditional file systems. For example, you can mount things like FTP directories, AWS S3 buckets, Google Cloud Store object stores as a “local directory” on your machine.
Swap Space
Swap space is additional system memory backed by disk that the system can use when RAM might be full. You can add swap memory to your system using a file and choose the “swappiness” level of your system.
The system will automatically push out inactive portions of RAM to disk and prioritize putting more actively used regions in RAM. Swap space works fine if you have a lot of contents in RAM, but only a small amount is actively needed at any given point. If you are truly trying to access a large amount of data at once or quickly using this method it’s likely you’ll experience thrashing or swap death.
Loop Mount
Loop mounting allows you to mount a directory on your system that is backed by a single file somewhere else on your system. The file just needs to hold a valid file system format. This lets you move around an entire file system as a file.
RAM Drive and Shared Memory Virtual File System
Using tmpfs
you can mount a directory, backed by RAM, that you can read and write files to. For this reason, read/write operations on this file system are much faster than traditional disk storage, but the storage is volatile. /dev/shm
on your system is a built-in location that uses tmpfs
and is backed by RAM.
You can memory map a file in tmpfs
in multiple processes and that would allow you to share memory regions between them for communication. This is slightly different than normally using memory mapped files since the contents are not backed to a file.
Process Information Virtual File System
Following everything is a file, the /proc/
path is a virtual file system that contains virtual files that contain information about the CPU and currently running processes on the system (like PIDs, environment variables, open files, etc.). You can read all of this information just by reading these virtual files.
Loopback Interface
Loopback interfaces is a feature that allows you to access services running on your own system. This is normally accessed through localhost
.
In a Docker container, however, localhost
will refer to the localhost
of the container and not the host. You can alias the loopback interface to another IP address which lets you now access the host’s localhost
in the container at that IP address.
CRIU
CRIU is a utility that allows you to pause and save the state of running process to a file (checkpoint) and then load and continue running the process at a later time (restore). This requires some specific kernel features to be enabled.
LD_PRELOAD
The LD_PRELOAD
environment variable lets you inject custom implementations of symbols that are dynamically linked in a program. The library runs before libc
so you can even inject a custom malloc
implementation. This provides a nice “hacking interface” to customize the behavior of a compiled program or as a probing mechanism.
Jailing
You can use the chroot
operation to “jail” a process to a certain directory (so that it can only see, read, and write files in that directory).
You can use the cgroups
feature to limit the use of system resources like CPU and memory by a process.
You can use the namespaces
feature to isolate resources like hostnames and process IDs available to a process.
These are the underlying features that enable Docker containers to work.
Miscellaneous Binary Formats
The binfmt_misc
feature of Linux allows you to register programs to run non-standard, unrecognized executable files. For example, you can register a virtual machine or emulator to run executables intended for different architectures. QEMU utilizes this feature.
If you have any others you think would be interesting to add to the list, you can contact me via email on the about page.