Semantic file systems are file systems used for information persistence which structure the data according to their semantics and intent, rather than the location as with current file systems. It allows the data to be addressed by their content (associative access). Traditional hierarchical file-systems tend to impose a burden, for example when a sub-directory layout is contradicting a user's perception of where files would be stored. Having a tag-based interface alleviates this hierarchy problem and enables users to query for data in an intuitive fashion.
Semantic file systems raise technical design challenges as indexes of words, tags or elementary signs of some sort have to be created and constantly updated, maintained and cached for performance to offer the desired random, multi-variate access to files in addition to the underlying, mostly traditional block-based filesystem.
A semantic file system can be envisioned as a part of a semantic desktop.
History
The notion of semantic file system was proposed in 1991 by researchers of the MIT and École des Mines de Paris.[1] They proposed an integrated system whose main query interface looked like a traditional file system interface via a virtual directory system that interpreted a path as a conjunctive query. Their implementation had automatic extraction of the relevant metadata via what they called file type specific transducers.
Starting in around 2004, a new wave of implementations centered on manual tagging of files and folders.
In 2008, researchers proposed to integrate semantic file systems with Semantic Web technologies.[2]
Types of metadata
Tags
Tags can be used instead of folders to circumvent the limits of a hierarchical model.
File type-specific
Gifford et al.[1] suggested the idea of file type-specific metadata automatically extracted by a file-type specific transducer.
For instance, for a source code text file, metadata could include the names of the procedures that the program exports or imports, procedure types, and the files included by the program. For a document, its date, author, title and structure (sections and subsections). For an e-mail, its sender, recipient and subject.
Lineage
In scientific workflows, provenance of a data file is important. A scientist might want to select a results file by filtering by the input dataset.
Architecture
Vasudevan and Pazandak[3] introduce the distinction between integrated and augmented approaches:
In integrated approaches, semantics are a feature of the file system.
Tightly coupled systems are implemented within a file system
Loosely coupled systems are implemented on top of a classical file system, but hide its interface.
In augmented approaches, semantics are an abstraction on top of a classical file system. Access to the classical file system interface is maintained, the user can choose.
They suggest Open systems architecture as being well adapted to semantic file system implementations.
Compatibility with hierarchical file systems
Even integrated semantic file systems may choose to expose an interface for compatibility with existing local or distributed file system protocols. For instance, Gifford et al.’s 1991 implementation was fully compatible with NFS.[1]