Where?

Find the code here: http://github.com/bwlewis/shim.
Find PVFS2 here: http://PVFS.org/.

What?

A shim file system for Linux (work in progress).

Shim provides a simple way to add basic memory mapping capability to almost any file system, without tinkering with the file system itself.

The shim module defines an intermediary file system for the Linux kernel that sits between applications and an underlying file system. The shim module implements address space operations on its files as read and write operations on corresponding backing files in the underlying file system. We wrote shim to allow us to memory map files in a PVFS2 file system. Although we wrote shim with PVFS2 in mind, it can be used with any underlying file system that supports read and write operations.

Why would one want this?

Memory mapped files are often quite useful, but not every file system supports file mapping. Consider, for example, the PVFS2 file system. PVFS2 is an elegant, high-performance, parallel file system. It allows one to aggregate storage across multiple networked computers into a parallel file system simultaneously usable by multiple clients.

PVFS2 is focused intensely on performance, especially in HPC settings and for applications using MPI. More general purpose capabilities like memory mapped files are left out by design. The shim file system provides basic memory mapped file support on top of existing installations of PVFS2 without requiring modification of PVFS2 code or settings. The following example inspired the name shim.

Example: Parallel Virtual Shared Memory with PVFS2

Let's say we have a small cluster of GNU/Linux nodes, and a problem that would benefit from the ability to access very large amounts of relatively fast memory from RAM. We can use PVFS2 and shim to provide a virtual pool of RAM from across the cluster as follows:
  1. Configure PVFS2 across the cluster, using /dev/shm on each node as a backing store.
  2. Mount the PVFS2 file system on one or more nodes.
  3. Copy the problem data into the PVFS2 directory.
  4. Mount the shim file system on one or more nodes.
  5. Programs running on the nodes mounting shim may now memory-map the problem data (something not possible with PVFS2).


Of course, the real benefit of a system like PVFS2 is to allow high-performance parallel use of memory. The shim file system does not explicitly coordinate parallel access to memory from multiple processes. It does not enforce cache consistency, and memory-mapped read/write operations are granular (at the page level). However, shim does provide tools that help client applications manage cache state on their own. See the README document in the source code directory for more information.