15-712: Advanced Operating Systems & Distributed Systems
This page lists a bunch of potential ideas for projects. Feel
free to use one, propose something completely different, or
refine one of these into your own idea.
Each project should have an implementation component and an
There are two basic types of projects that are suitable for this
- reproduce results from a good systems research paper
interesting if you change some of the assumptions, or part of
the system set-up)
- build and evaluate a novel system.
Coming up with a Good Project
In preparation for your term project, read
project startup documents written by John Wilkes at HP
Labs. These guidlines could make your project selection and proposal
writing much easier.
Efficient syscall interposition for Linux programs. Explore
the efficiency and flexibility trade-offs for system call
interposition mechanisms (e.g. dynamic library, kernel module
for C/Linux programs. Evaluate the ability of such mechanisms
to support manager applications wanting to insert new
functionality (e.g., monitoring, sandboxing) for normal user
I rarely use suspend mode on my laptop because my network
connections all get messed up. An alternative idea would be a
suspend mode that preserved network connections properly.
There are many suitable projects in the space of mobile
computing. A project on power aware computing or location
aware computing is well-suited for this course.
Look at the
project and the
project on campus to get a sense of potential projects and
people you may want to talk with.
Resource logging: Systems like Abacus
could potentially benefit from historical resource
availability and usage databases. For example, application
profiles (resources used over time) could allow Abacus to (a)
proactively relocate objects if there's high confidence that
the application will change behavior and (b) prevent migrating
objects belonging to applications that will terminate shortly.
Monitoring resource availability is similarly useful.
Construct and evaluate a distributed database of significant
system attributes. Part of such a project is determining what
data is useful to monitor and determining how to monitor it.
Study agility vs. stability. Systems like Odyssey
try to adapt quickly to environmental changes, because
reacting slowly could squander some of the benefit of
adaptation. However, adapting to transient conditions can
lead to instability (frequent adaptations). Create a model
and simulator to study the conditions under which instability
occurs and ways in which one can adaptively determine how
agile to be while maintaining a certain degree of stability.
For example, if network latencies are highly variable, perhaps
one needs to be less agile. (Well done, such a project would
exploit an understanding of control systems.)
File system clustering: Clustering or coalescing of multiple
disk reads is a very big win. C-FFS
clustered the objects of a directory into a contiguous unit
for prefetching. Implement and test. Now consider running
C-FFS on an NFS server. Determine whether NFS clients
experience improved performance vs. accessing an NFS server
that uses a traditional FFS. If running on an NFS server with
C-FFS isn't significantly faster, determine why, and propose
changes to the NFS protocol so that it might achieve better
performance. Evaluate your changes.
Process migration requires identical processors, or a
transformation system, identical operating system support such
as open files and sockets, redirected input and output, and
distributed shared memory. Build a simple implementation on a
system like Linux and evaluate it in contrast to alternative
load balancing tools, such as Condor.
Serializability service in a multi-threaded RVM:
CMU's RVM tool (part of the Coda project) provides only atomicity
and durability under a software crash fault model. It does
not provide serializability and argues against sharing the
service across different address spaces (processes). However,
multi-threaded applications are increasingly common,
especially in servers, so perhaps RVM should offer
serializability. Add serializability to RVM and evaluate its
overhead with a multi-threaded application (file server, mail
server, web server, etc).
Explore the feasibility of soft-error tolerance in software.
Soft errors occur, for example, when gamma rays cause bits of main
memory to change values. (This is an increasingly real
concern, as transistors scale down in size. For obvious reasons,
applications find such an occurrance quite disturbing.) One way to
address this would be to have the OS execute two copies of the same
batch application, while keeping them relatively synchronized. Every
once in a while, the OS could cross-check their data pages for errors
(e.g., when a page is reclaimed or a system call is made).
Consider approaches to keeping them in synch and dealing with
interactions with the outside world. Consider what changes when the
copies run on different systems instead of the same. Evaluate
performance for different system configurations (e.g., single CPU
Direct data transfer for NFSv4. Large storage environments are
moving towards networked storage devices that could be accessed
by clients as well as their managing servers. Explore extensions
to the NFSv4 protocol for supporting direct client access to
such devices while maintaining the NFSv4 semantics. [Gibson]
Parallel NFSFv4. Explore and evaluate approaches for creating
a "super NFS server" out of multiple NFS servers, wherein files
are striped and/or redundant. Any departures from the base NFS
semantics should of course be understood, and such a project is
interesting for either NFSv3 or particularly NFSv4.
Virtualized AFS volumes. Create and evaluate an AFS client
extension that automatically splits ovlumes as they approach
their quota. Doing so exploits AFS's level of indirection to
hide space limitations/expansions from users; clarify when
it does and does not succeed.
Collective I/O: parallel file system designs sometimes add a
collective synchronization operation to allow the file system
maximal opportunities for reordering. Construct such a
system, perhaps using modified NFS service, and test the power
of these optimizations. Implement support for multiple failure
Distributed transactions for NFS: Quicksilver (discussed later
in term) tried to give a fully general system for transactions
in the OS, but its principle clients were storage based.
Perhaps a much simpler implementation, restricted to the file
system, will provide the right tradeoff between utility and
simplicity. Construct an NFS variant that provides abortable
transactions to its clients and evaluate its overhead for
non-aborting clients and aborting clients.
There are many emerging distributed storage architectures:
peer-to-peer, content distribution networks (CDNs), and
publish-subscribe systems. There are no agreed upon benchmarks
with which to evaluate any of these types of systems.
Analyzing the performance, algorithms, and or workloads of any
such system could be a project. Such a project would likely
involve emulating/simulating systems rather than building or
directly testing such a system. [Srini Seshan is interested
in some of these ideas]
Service discovery with dynamic attributes. One of the
desirable features of service discovery is to be able to do
things like print to the printer with the shortest
queue. Service attributes for the printer such as location and
memory do not change over time. However, attributes such as
queue length are dynamic. Service descriptions tend to very
simple in most systems. What if we made them into code
segments that would be evaluated by either the service
discovery server or client. This would enable service
discovery servers to pull information about service attributes
on demand. Pulling information would make more sense for
something like a compute server where the load attribute
changes frequently but few queries either match the server's
other attributes. It may also be useful to support attributes
such as price which may vary based on a variety of parameters
including client characteristics. Issues to resolve include:
how is the pull code defined (Java maybe)?, what are the
security restrictions on this code, are these pulled values
cached?, is the caching defined by the code? if so, what type
of persistent state is allowed across invocations? [Srini
Service composition and service browsing. Path creation is the
composition of multiple services to make a new
service. Another useful feature is the ability to browse
available services. There seem to be reasonable solutions to
both these problems in centralized service discovery
systems. Are there reasonable ways to enable these in a
distributed service discovery system? [Srini Seshan]
Determine the propogation semantics of a specific type of
virus (e.g., email). Given knowledge of what a virus looks
like to the computer system (OS, network, etc.), are there
changes that can be made to some system component (OS,
firewall, router, etc.) to facilitate detection of viruses or
to protect against propogating such viruses. Alternately,
consider what ISPs can do to protect the world against their
clients being zombies in DDOS attacks.
Characterizing DOS/DDOS attacks.
Develop a model of the 'workload' of a denial of service
Consider how servers act under high load.
How should servers act under high load, if the load may be a
denial of service attack?
Last modified: Fri Sep 19 14:00:22 EDT 2003