15-712: Advanced Operating Systems & Distributed Systems
This page lists a bunch of potential ideas for projects. Feel
free to use one, propose something completely different, or
refine one of these into your own idea.
Each project should have an implementation component and an
There are two basic types of projects that are suitable for this
(1) reproduce results from a good systems research paper
interesting if you change some of the assumptions, or part of
the system set-up) and
(2) build and evaluate a novel system.
Evaluate support for I/O prefetch and release in Linux.
Angela Demke Brown's I/O prefetching system has been ported to
Linux. However, the system still needs to be instrumented and
evaluated to determine the performance gain of I/O prefetching
on Linux. See Angela's
for her publications about I/O prefetching. [Angela Brown,
An "instant-on" kernel: i.e. by the time I have opened my
laptop (or turned on my monitor), the system has booted.
I rarely use suspend mode on my laptop because my network
connections all get messed up. An alternative idea would be a
suspend mode that preserved network connections properly.
There are many suitable projects in the space of mobile
computing. A project on power aware computing or location
aware computing is well-suited for this course.
Look at the
project and the
project on campus to get a sense of potential projects and
people you may want to talk with.
In the context of Aura, there is a wireless bandwidth advisor
and people location service. Build an application on top of
this functionality. For example, you could build a conference
room reservation system that would verify that, based on
historical data, your conference room will have enough
wireless bandwidth (no guarantees of course). Another
possibility (again in the context of Aura) is that people
could develop an example "network aware" application, using
Remos for collecting information about the network.
Bullet proof a Linux kernel (make it robust - where robust
means that you cannot crash the kernel by passing poorly
formed arguments into the system API).
Quantify the performance cost of making the Linux kernel robust.
This should demonstrate that the performance penalty objections to
writing robust code is "hogwash".
John DeVale's thesis work has identified some cheap ways to improve OS
Extend the MS Windows stand-alone Ballista testing capability.
There are lots and lots of functions/calls to test beyond what
group plans on testing. Someone might want to
test lots of other functions looking for system killers.
(Would involve understanding the API, learning about
robustness testing, and creating test strategies for various
data types beyond the standard POSIX set.) [Koopman]
Resource logging: Systems like Abacus
could potentially benefit from historical resource
availability and usage databases. For example, application
profiles (resources used over time) could allow Abacus to (a)
proactively relocate objects if there's high confidence that
the application will change behavior and (b) prevent migrating
objects belonging to applications that will terminate shortly.
Monitoring resource availability is similarly useful.
Construct and evaluate a distributed database of significant
system attributes. Part of such a project is determining what
data is useful to monitor and determining how to monitor it.
Study agility vs. stability. Systems like Odyssey
try to adapt quickly to environmental changes, because
reacting slowly could squander some of the benefit of
adaptation. However, adapting to transient conditions can
lead to instability (frequent adaptations). Create a model
and simulator to study the conditions under which instability
occurs and ways in which one can adaptively determine how
agile to be while maintaining a certain degree of stability.
For example, if network latencies are highly variable, perhaps
one needs to be less agile. (Well done, such a project would
exploit an understanding of control systems.)
Learning of disk mappings. The firmware of disk drives uses
fairly regular schemes for mapping logical block numbers (from
1 to N) to physical disk locations, but almost every disk uses
a slightly different scheme. DIXtrac
includes a simple expert system to determine the mapping
schemes for modern SCSI disks, but it still fairly frequently
runs into schemes its authors have never seen (thus the
expertise isn't in it yet). Extend DIXtrac with a simple
learning system to discover minimized
representations/algorithms for expressing a particular disk's
mapping, and evaluate it on real disks.
File system clustering: Clustering or coalescing of multiple
disk reads is a very big win. C-FFS
clustered the objects of a directory into a contiguous unit
for prefetching. Implement and test. Now consider running
C-FFS on an NFS server. Determine whether NFS clients
experience improved performance vs. accessing an NFS server
that uses a traditional FFS. If running on an NFS server with
C-FFS isn't significantly faster, determine why, and propose
changes to the NFS protocol so that it might achieve better
performance. Evaluate your changes.
Process migration requires identical processors, or a
transformation system, identical operating system support such
as open files and sockets, redirected input and output, and
distributed shared memory. Build a simple implementation on a
system like Linux and evaluate it in contrast to alternative
load balancing tools, such as Condor.
Serializability service in a multi-threaded RVM:
CMU's RVM tool (part of the Coda project) provides only atomicity
and durability under a software crash fault model. It does
not provide serializability and argues against sharing the
service across different address spaces (processes). However,
multi-threaded applications are increasingly common,
especially in servers, so perhaps RVM should offer
serializability. Add serializability to RVM and evaluate its
overhead with a multi-threaded application (file server, mail
server, web server, etc).
Study centralized vs. decentralized resource allocation.
Intuitively, centralized resource allocators can make a better
use of cluster resources because they are aware of all the
workloads and all the resources. However, centralization
introduces a bottleneck and a central point of failure.
Validate these common beliefs. Explore approaches to
decentralizing resource management while still finding stable
(if not globally optimal) resource allocation points.
Collective I/O: parallel file system designs sometimes add a
collective synchronization operation to allow the file system
maximal opportunities for reordering. Construct such a
system, perhaps using modified NFS service, and test the power
of these optimizations. Implement support for multiple failure
Distributed transactions for NFS: Quicksilver (discussed later
in term) tried to give a fully general system for transactions
in the OS, but its principle clients were storage based.
Perhaps a much simpler implementation, restricted to the file
system, will provide the right tradeoff between utility and
simplicity. Construct an NFS variant that provides abortable
transactions to its clients and evaluate its overhead for
non-aborting clients and aborting clients.
Data race warnings are useful in local systems. Consider
debugging tools for distributed system. One useful tool would
be a "replay" system that reruns a "trace" of a distributed
system with identical asynchronous events. Such a system
needs to control when signals reach applications and control
the outcome of synchronization. Build such a tool for
collections of Linux (or Java) systems and evaluate the
slowdown during recording and play back.
There are many emerging distributed storage architectures:
peer-to-peer, content distribution networks (CDNs), and
publish-subscribe systems. There are no agreed upon benchmarks
with which to evaluate any of these types of systems.
Analyzing the performance, algorithms, and or workloads of any
such system could be a project. Such a project would likely
involve emulating/simulating systems rather than building or
directly testing such a system. [Srini Seshan is interested
in some of these ideas]
Service discovery with dynamic attributes. One of the
desirable features of service discovery is to be able to do
things like print to the printer with the shortest
queue. Service attributes for the printer such as location and
memory do not change over time. However, attributes such as
queue length are dynamic. Service descriptions tend to very
simple in most systems. What if we made them into code
segments that would be evaluated by either the service
discovery server or client. This would enable service
discovery servers to pull information about service attributes
on demand. Pulling information would make more sense for
something like a compute server where the load attribute
changes frequently but few queries either match the server's
other attributes. It may also be useful to support attributes
such as price which may vary based on a variety of parameters
including client characteristics. Issues to resolve include:
how is the pull code defined (Java maybe)?, what are the
security restrictions on this code, are these pulled values
cached?, is the caching defined by the code? if so, what type
of persistent state is allowed across invocations? [Srini
Service composition and service browsing. Path creation is the
composition of multiple services to make a new
service. Another useful feature is the ability to browse
available services. There seem to be reasonable solutions to
both these problems in centralized service discovery
systems. Are there reasonable ways to enable these in a
distributed service discovery system? [Srini Seshan]
Determine the propogation semantics of a specific type of
virus (e.g., email). Given knowledge of what a virus looks
like to the computer system (OS, network, etc.), are there
changes that can be made to some system component (OS,
firewall, router, etc.) to facilitate detection of viruses or
to protect against propogating such viruses. Alternately,
consider what ISPs can do to protect the world against their
clients being zombies in DDOS attacks.
Characterizing DOS/DDOS attacks.
Develop a model of the 'workload' of a denial of service
Consider how servers act under high load.
How should servers act under high load, if the load may be a
denial of service attack?
Last modified: Tue Oct 09 16:57:41 Eastern Daylight Time 2001