Ajay Patel

Useful Linux Features and Utilities
Software Engineering · September 14, 2020

There are a number of core Linux features or shell utilities that can help build extremely efficient and effective software tools with minimal effort. I find these particularly useful when building server systems, build utilities, or tools for processing large amounts of data. Also, they are great for quick “hacky” solutions.

Instead of re-inventing the wheel, utilizing these features lets you reduce your workload and build upon the years of work that has gone into large open source projects like the Linux kernel. They can also help you abstract away the responsibility of functionality that you might need to the operating system, which can usually do a much better job than you can.

Below, I’ll generally go in the order of starting with the more commonly known features/utilities and make my way to ones that are not as well known.

Table of Contents

  1. Hosts File
  2. Cron
  3. Temporary Directory
  4. Special Device Files
  5. Job Control and Priority
  6. Symbolic Links
  7. Named Pipes or FIFOs
  8. Sockets
  9. Memory Mapped Files
  10. File Locks
  11. File System Event Monitor (or “Watcher”)
  12. FUSE
  13. Swap Space
  14. Loop Mount
  15. RAM Drive and Shared Memory Virtual File System
  16. Process Information Virtual File System
  17. Loopback Interface
  18. CRIU
  19. LD_PRELOAD
  20. Jailing
  21. Miscellaneous Binary Formats

Hosts File

The /etc/hosts file lets you resolve some hostnames to IP addresses you choose. The file is used preferentially before your DNS service.

Cron

Cron is a scheduling system that lets you run commands and scripts at a specified regular schedule.

Temporary Directory

The /tmp/ temporary directory is a place where you can dump temporary files you don’t want to keep around for very long. Usually this gets cleared by the next time the machine is re-booted. Again, most developers know of this already.

Special Device Files

On Linux, everything is a file. So the special file paths /dev/stdin, /dev/stdout, /dev/stderr correspond to a process’s standard streams. So if a command line program takes a file path as an argument, you could pass in /dev/stdin to make it operate on its standard input

The are other special files like /dev/random, /dev/urandom, /dev/null, /dev/full, and /dev/zero that let you do things like read random bytes and zero bytes or write to “null” or nothing.

Job Control and Priority

You can use the bash shell to launch and manage multiple processes using jobs, fg, and bg instead of rolling your own control system for quick runs.

Using the nice command can help you weigh / balance how much CPU time each process gets by setting their priority.

You can think of a symbolic link as creating a “reference” to a folder or file somewhere else on the file system. You can place the reference anywhere and programs when given a path to the reference will (usually) treat it as if they were just looking at the linked folder or file. They can be created with the ln command.

Named Pipes or FIFOs

Named pipes, also known as FIFOs, are a method of one-way communication between multiple processes. The pipe is “backed” by a file.

Sockets

Sockets allow for two-way communication between two processes and allow for communication over a network. Almost all languages support creating them and their interface is usually very simple, so it’s an easy way to talk between two processes.

Memory Mapped Files

Memory mapped files are a way of loading a file’s content as a buffer of memory. This is useful because it can be treated as memory and, using the MAP_SHARED flag, any changes made to the memory mapped file will be reflected in other processes that have also memory mapped the file, resulting in a form of communication between processes.

Moreover, it has the nice property that the file’s contents are paged into volatile memory (RAM) as needed by the operating system and pages are evicted using the the eviction policy of the page cache. This allows you to automatically get a caching system that brings in the file’s contents into memory, where it can be operated on much faster, completely free.

The content stored in volatile memory is also shared between multiple processes so there also aren’t multiple copies floating around in RAM. This makes it useful for sharing a large file between multiple processes that would not fit in memory many times over.

File Locks

You can perform “locking” on files for coordination of writing / reading to them or just use the file as a “lock file” to act as a lock between multiple processes.

File System Event Monitor (or “Watcher”)

If you want to monitor or watch a directory and get notified when a file is modified (so that you could perhaps re-build a project), you could poll and scan over all the files in the directory recursively. However, for a large directory (thousands of files) and a frequent polling frequency (every few seconds), this will end up chewing up a lot of CPU time. Instead, you can use Inotify which provides hooks for monitoring file system events in a much more efficient manner.

FUSE

A file system needs to implement some basic operations (list files at a location, read X number of bytes from a file, write X number of bytes to a file, seek to specific position in a file, etc.). FUSE is a utility that allows you to create virtual filesystems that implement those basic operations but are backed by non-traditional file systems. For example, you can mount things like FTP directories, AWS S3 buckets, Google Cloud Store object stores as a “local directory” on your machine.

Swap Space

Swap space is additional system memory backed by disk that the system can use when RAM might be full. You can add swap memory to your system using a file and choose the “swappiness” level of your system.

The system will automatically push out inactive portions of RAM to disk and prioritize putting more actively used regions in RAM. Swap space works fine if you have a lot of contents in RAM, but only a small amount is actively needed at any given point. If you are truly trying to access a large amount of data at once or quickly using this method it’s likely you’ll experience thrashing or swap death.

Loop Mount

Loop mounting allows you to mount a directory on your system that is backed by a single file somewhere else on your system. The file just needs to hold a valid file system format. This lets you move around an entire file system as a file.

RAM Drive and Shared Memory Virtual File System

Using tmpfs you can mount a directory, backed by RAM, that you can read and write files to. For this reason, read/write operations on this file system are much faster than traditional disk storage, but the storage is volatile. /dev/shm on your system is a built-in location that uses tmpfs and is backed by RAM.

You can memory map a file in tmpfs in multiple processes and that would allow you to share memory regions between them for communication. This is slightly different than normally using memory mapped files since the contents are not backed to a file.

Process Information Virtual File System

Following everything is a file, the /proc/ path is a virtual file system that contains virtual files that contain information about the CPU and currently running processes on the system (like PIDs, environment variables, open files, etc.). You can read all of this information just by reading these virtual files.

Loopback Interface

Loopback interfaces is a feature that allows you to access services running on your own system. This is normally accessed through localhost.

In a Docker container, however, localhost will refer to the localhost of the container and not the host. You can alias the loopback interface to another IP address which lets you now access the host’s localhost in the container at that IP address.

CRIU

CRIU is a utility that allows you to pause and save the state of running process to a file (checkpoint) and then load and continue running the process at a later time (restore). This requires some specific kernel features to be enabled.

LD_PRELOAD

The LD_PRELOAD environment variable lets you inject custom implementations of symbols that are dynamically linked in a program. The library runs before libc so you can even inject a custom malloc implementation. This provides a nice “hacking interface” to customize the behavior of a compiled program or as a probing mechanism.

Jailing

You can use the chroot operation to “jail” a process to a certain directory (so that it can only see, read, and write files in that directory).

You can use the cgroups feature to limit the use of system resources like CPU and memory by a process.

You can use the namespaces feature to isolate resources like hostnames and process IDs available to a process.

These are the underlying features that enable Docker containers to work.

Miscellaneous Binary Formats

The binfmt_misc feature of Linux allows you to register programs to run non-standard, unrecognized executable files. For example, you can register a virtual machine or emulator to run executables intended for different architectures. QEMU utilizes this feature.


If you have any others you think would be interesting to add to the list, you can contact me via email on the about page.


  • Founder @ Plasticity (acq. 2020)
  • PhD @ University of Pennsylvania (est. 2025)
  • Y Combinator Alum (S17 Batch)

More on About  →
Currently in 🔔 Philadelphia, PA, USA 🇺🇸
Copyright © 2021 Ajay Patel. All rights reserved.