Agenda | Berkeley Stanford

March 9th: Agenda

9:00 – 10:30am EE-related research talk (DOP Center, Cory Hall)

Speakers:

Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters
Christina Delimitrou (Stanford, Computer Architecture / Systems)

Crime and Punishment for Cognitive Radio
Kristen Woyach (UC Berkeley, Communications)

High-temperature Gallium Nitride Soot Particulate Sensors for Energy Efficient Combustion and Low Soot Emission
Minmin Hou (Stanford, MEMS / devices)

Algorithms and Codes for Reliable and Efficient Distributed Data Storage Rashmi K Vinayak (UC Berkeley, Coding theory for distributed data storage)

WiFi on steroids: Characterizing Spectrum Goodness for Dynamic Spectrum Access
Aakanksha Chowdhery (Stanford, Communications and Networking)

11:00 – 12:30am CS-related research talk (DOP Center, Cory Hall)

Speakers:

Designing Text Analysis Interfaces for Humanities Scholars
Aditi Muralidharan (UC Berkeley, HCI / NLP)

Opinion Formation Games
Kshipra Bhawalkar (Stanford, Theory)

Beyond K-means and beyond clustering: BP-means and related ideas
Tamara Broderick (UC Berkeley, Machine Learning)

RAMCloud: A low-latency datacenter storage system
Ankita Kejriwal (Stanford, Distributed Systems)

Automating Exploratory Data Analysis
Sara Alspaugh (UC Berkeley, Information Retrieval)

12:30 – 2:00pm Lunch (521 Hogan Room, Cory Hall) (We are fortunate to have Marti Hearst, Tsu-Jae King Liu, Ana Arias from Berkeley and Debbie G. Senesky from Stanford join us for lunch)

2:00 – 3:30pm Poster Session (posters from graduate students and undergraduate researchers)

Presenters:

Lavanya Jose (Stanford, Networks)

Caroline Suen and Sandy Huang (Stanford, AI / Data Mining)

Graph learning with corruptions
Po-Ling Loh (Berkeley, Machine Learning)

Brittany Judoprasetijo (Berkeley, Circuits / Power Electronics)

Kinetic Polyacrylamide Gel Electrophoresis: Microfluidic Binding Assay Enables Measurements of Kinetic Rates for Immunoreagent Quality Screening
Monica A. Kapil (Berkeley, Bioengineering)

Expanding the Range of microWestern Blotting to Low Molecular Mass Species
Rachel Gerver (Berkeley, Bioengineering)

Restricting Brain Tumor-Initiating Cell Motility By Rewiring Cell-Matrix Mechanosensing
Sophie Wong (Berkeley, Bioengineering)

Social Game Based Building Energy Saving System
An Evolved Rectenna for Wireless Sensor Networks
Multi-focus and Multi-window Techniques for Interactive Network
Sheryl Root

3:30 – 4:00pm Social mixer

Abstracts

Electrical Engineering Session:

Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters
Christina Delimitrou (Stanford, Computer Architecture / Systems)

Large-scale datacenters provide the compute and storage for an increasing number of diverse applications. Ideally, these systems should provide strict Quality of Service (QoS) guarantees for each hosted workload. Unfortunately, to side-step interference issues, unknown workloads and changes in user load many cloud operators either provide only soft QoS guarantees, or disallow application co-scheduling resulting in serious server underutilization. As these systems become more prominent, this lack of resource efficiency results in both significant energy and capital losses. We argue that resource efficiency, i.e., achieving high utilization while enforcing QoS guarantees, is a critical target for cloud systems. A large potential in improving resource efficiency resides in the cluster management system of a datacenter, which schedules incoming applications.
In this talk we present Paragon, a QoS-aware scheduler for large-scale datacenters. Paragon accounts for the interference between co-scheduled workloads, and the heterogeneity in server platforms and preserves QoS constraints while improving utilization. Paragon is derived from analytical collaborative filtering methods and operates like an online recommendation system, similar to the ones used in Amazon or Netflix. Requiring only a minimal signal about a new application, the scheduler uses the rich information it has about previously-scheduled workloads to find similarities between them and the new application. Paragon maintains scheduling overheads low. Within a minute of its arrival a workload is scheduled on a server of favorable configuration, and is co-scheduled with applications that do not induce destructive interference due to contention in shared resources. We show that Paragon scales to tens of thousands of servers and applications, and preserves QoS guarantees while significantly improving server utilization.

Crime and Punishment for Cognitive Radio. Kristen Woyach (UC Berkeley, Communications)

Cognitive radio promises to alleviate spectrum shortages by allowing different networks to intelligently share the same frequency band. Regulating this sharing paradigm can no longer mean just a long-term frequency assignment and spectral mask. It must obviously include defining (possibly prioritized) access rules to limit unavoidable interference from other networks. Sharing spectrum will cause some degradation in capacity, or overhead, to the networks involved. Quality of sharing rules can therefore be judged by how close they come to the minimum possible overhead.

One way to limit overhead could be to force all networks to use one standardized PHY/MAC layer — like LTE. However, in order to serve niche applications and to take advantage of new information-theoretic discoveries, property rights must include implementation freedom. But freedom to implement implies freedom to implement strategic misbehavior. So, we must also build rights that are credibly enforceable — mechanisms must exist so that rules are followed. We believe that unless every packet can be centrally scheduled (an unrealistic coordination condition), enforceability implies another necessary overhead, regardless of the mechanism chosen.

This work explores this enforceability overhead given a specific enforcement scheme: a “database” issues tokens that allow a strategic secondary access to a certain number of bands for a certain amount of time. These tokens are given or withheld based on the secondary’s past behavior. We then explore the relationship between the operating freedom provided by the token and the overhead required to guarantee that all secondaries will follow sharing rules.

High-temperature Gallium Nitride Soot Particulate Sensors for Energy Efficient Combustion and Low Soot Emission
Minmin Hou (Stanford, MEMS / devices)

The development of gallium nitride (GaN) high electron mobility transistor (HEMT) sensing technology can aid in real-time monitoring of combustion processes and exhaust gases, which are challenging to assess. Combustion energy accounts for over 80% of the U.S. energy demand and advanced approaches for reducing fuel consumption and emissions are required. State-of-the-art sensors based on silicon technology are limited to temperatures below 200 C and are not suitable for combustion and exhaust gas conditions. As a result, new material platforms that utilize temperature-tolerant, ceramic semiconductor materials are proposed for efficient combustion energy applications (e.g. automotive engines, jet engines, gas turbines, heaters and boilers). In this project, a micro-scale GaN HEMT-based soot particulate sensor that can withstand combustion exhaust environments will be developed to provide feedback to combustion control modules. The scope of this research project is to 1) develop a GaN HEMT-based soot particulate sensor that can operate within the harsh combustion exhaust gas environment, 2) develop a fabrication process for GaN HEMT-based sensors and electronics, 3) explore the use of nanostructured surfaces and catalysts on sensor performance and 4) perform experimental characterization of sensor materials and actual sensor devices in experimental flames. These tasks aid in the realization of advanced sensors for combustion monitoring. Ultimately, the GaN HEMT-based sensor technology developed in this project can lead to greener combustion, greener transportation and a more sustainable society. Also, the outcomes of this project will shed lights on the design and fabrication of GaN based sensors, circuits, resonators, energy harvesters, etc., and can lead to complete solutions of harsh environment sensing by integration of these devices on a single chip.

Algorithms and Codes for Reliable and Efficient Distributed Data Storage
Rashmi K Vinayak (UC Berkeley, Coding theory for distributed data storage)

Today’s large scale distributed storage systems comprise of thousands of nodes, storing hundreds of PetaBytes of data. In these systems, component failures are common, and this makes it essential to store the data in a redundant fashion to ensure reliability. The most common way of adding redundancy is replication. However, replication is highly inefficient in terms of storage utilization, and hence many large-scale distributed storage systems, such as those of Google, Microsoft, and Facebook are now turning to Reed-Solomon (erasure) codes. While these codes are optimal in terms of storage-utilization, they perform poorly in terms of bandwidth utilization. During recovery of a failed node, these codes require download of the entire data to recover the small fraction that was stored in the failed node. Recovering (partial) data stored in a particular node is of interest not only during recovery but also for many other applications such latency critical read operations when the system is in degraded mode.

My research deals with designing new algorithms (i.e. erasure codes) for distributed storage which are efficient in (partial) recovery of failed nodes, while also being optimal in terms of storage utilization. We have new algorithms focussing on two important dimensions: (i) the amount of data accessed during recovery, which is critical to achieve high throughputs in data-centers, and (ii) the network-bandwidth required during recovery, which is important in wide area networks such as peer-to-peer networks.

WiFi on steroids: Characterizing Spectrum Goodness for Dynamic Spectrum Access
Aakanksha Chowdhery (Stanford, Communications and Networking)

The combination of exclusive use spectrum licensing
and growing demand for voice, data, and video applications
is leading to artificial spectrum scarcity. A recent approach
to alleviate this artificial spectrum scarcity innovatively uses
unused TV spectrum, also called the TV white spaces, through
dynamic spectrum access (DSA) techniques. Wireless devices
can use DSA techniques such as sensing and geo-location
databases to learn about available TV channels for wireless
communication. One obvious question to ask is whether the
technology enabler for white space networking, i.e. dynamic
spectrum access, is viable in other portions of the spectrum?

This research work (with Microsoft research) investigates white spaces in other licensed spectrum
bands between 30 MHz and 6 GHz. Typically, the goodness of
licensed spectrum bands is measured using spectrum occupancy
as a goodness metric, but the DSA opportunities in different
bands can depend on several factors. We propose a novel DSA
goodness metric to compare the opportunity of capitalizing on
available spectrum using DSA techniques in various licensed
bands. Further, we use these metrics to evaluate the data from
the ongoing spectrum measurement campaign at Microsoft
Research over one year.

Computer Science Session:

Designing Text Analysis Interfaces for Humanities Scholars
Aditi Muralidharan (UC Berkeley, HCI / NLP)

Increasing numbers of texts for study in the humanities have been digitized in recent years. Humanities scholars who want to work with these new collections need computational assistance because of their large scale. I’ll describe my dissertation project, WordSeer, a web-based text analysis tool for humanities scholars that combines information visualization, information retrieval, and natural language processing. So far, WordSeer has been used analyze language use patterns in a collection of North American slave narratives, the complete works of Shakespeare, and the works of American author Stephen Crane, but the technique is applicable to any text collection. Our user studies with humanities scholars show that WordSeer makes it easier for them to translate their questions into queries and find answers to their questions compared to a standard keyword-based search interface. This talk presents the system currently under development and describes text analysis features we plan to include in the next iteration.

Opinion Formation Games
Kshipra Bhawalkar (Stanford, Theory)

I will present a fairly general game-theoretic model of opinion formation in social networks where opinions evolve in response to opinions of the neighbors. In this model, nodes form their opinions by maximizing agreements with friends weighted by the strength of their relationships.

We consider the equilibrium of this process and characterize existence, uniqueness and efficiency with respect to a global objective. Our results provide general inefficiency bounds characterized by how players model their value for agreement. A simple lower bound construction exhibits that the bounds we obtain are exact. Our results generalize recent work of Bindel et al.,FOCS 2011.

This talk is based on joint work with Sreenivas Gollapudi (Microsoft Research Search Labs), Kamesh Munagala (Duke University) that will appear in STOC 2013.

Beyond K-means and beyond clustering: BP-means and related ideas. Tamara Broderick (UC Berkeley, Machine Learning)

K-means is fast and conceptually straightforward. But it is designed to find a known number of equally-sized spherical clusters (mutually exclusive and exhaustive groups of data points that reflect latent structure). Bayesian methods have proven effective at discovering many other types of useful latent structure in data. But these methods are slower and require more background knowledge than K-means. We have designed a method to approximate Bayesian solutions to these problems in a form close to the K-means objective function. This form yields faster and simpler learning algorithms. We show how our method produces an objective function for learning groups of data points called features that need not be exclusive or exhaustive and demonstrate novel and fast performance in experiments.

RAMCloud: A low-latency datacenter storage system
Ankita Kejriwal (Stanford, Distributed Systems)

In recent years DRAM has played a larger and larger role in storage systems, driven by the demands of large-scale Web applications. However, DRAM is still used primarily in limited or special-purpose ways, such as a cache for some other backing store. In this talk I will describe RAMCloud, a general-purpose storage system where all data lives in DRAM at all times and large-scale systems are created by aggregating the main memories of thousands of commodity servers. RAMCloud provides durable and available DRAM-based storage for the same cost as volatile caches, and it offers performance 10-100x faster than existing storage systems. By combining low latency and large scale, RAMCloud will enable a new class of applications that manipulate large datasets more intensively than has ever been possible.

Automating Exploratory Data Analysis
Sara Alspaugh (UC Berkeley, Information Retrieval)

The uses for data are innumerable, but one of the most common user scenarios is data exploration: a user has new data, and wishes to gain an intuition for the high-level trends, important patterns and anomalies hidden within. Data exploration, like any data analysis task, involves many steps, but at a high-level, these are (1) preparing, (2) analyzing, and (3) presenting data. The goal of this project is to help automate the second step by creating an interactive tool for quick, easy, and accessible data exploration. This help will come in the form of suggestions made by an analysis recommendation tool (ART): the user provides a data set they want to explore, and the tool provides an ordered list of visualizations that display different patterns, descriptions, and correlations within the data set, much like any content recommendation system. To inform the design of the ART, we have begun by studying the ways in which data analysis experts structure their queries through a large-scale trace collection effort at Splunk, Conviva, and Cloudera. This talk will cover our goals for the tool and the results of our initial studies.

Poster Session:

Lavanya Jose (Stanford, Networks)

Most network management tasks in software-defined networks (SDN) involve two stages: measurement and control. While many efforts have been focused on network control APIs for SDN, little attention goes into measurement. The key challenge of designing a new measurement API is to strike a careful balance between generality (supporting a wide variety of measurement tasks) and efficiency (enabling high link speed and low cost). We propose a software defined traffic measurement architecture OpenSketch, which separates the measurement data plane from the control plane. In the data plane, OpenSketch provides a simple three-stage pipeline (hashing, filtering, and counting), which can be implemented with commodity switch components and support many measurement tasks. In the control plane, OpenSketch provides a measurement library that automatically configures the pipeline and allocates resources for different measurement tasks. Our evaluations of realworld packet traces, our prototype on NetFPGA, and the implementation of five measurement tasks on top of OpenSketch, demonstrate that OpenSketch is general, efficient and easily programmable.

Caroline Suen and Sandy Huang (Stanford, AI / Data Mining)

Real-time information is a fundamental emerging issue in the creation and management of online media content. The real-time information on news sites, blogs and social networking sites changes dynamically and spreads rapidly through the Web. Developing methods for handling such information at a massive scale requires that we think about how information content varies over time, how it is transmitted, and how it mutates as it spreads.

We developed the “News Information Flow Tracking, Yay!” (NIFTY) system for large scale real-time tracking of “memes” – short textual phrases that travel and mutate through the Web.

NIFTY employs a novel highly-scalable incremental meme-clustering approach to efficiently extract and identify mutational variants of a single meme. It runs orders of magnitude faster than the previous Memetracker system, while also maintaining better consistency and quality of extracted memes.

We demonstrated the effectiveness of our approach by processing a 20 terabyte dataset of 6.1 billion blog posts and news articles that we have been continuously collecting for the last four years. NIFTY extracted 2.9 billion unique textual phrases and identified more than 9 million memes. Our meme-tracking algorithm was able to process the entire dataset in less than five days using a single machine. Furthermore, we also created a live deployment of the NIFTY system that allows users to explore the dynamics of online news in near real-time.

Graph learning with corruptions
Po-Ling Loh (Berkeley, Machine Learning)

Graphical models are used in many application domains, running the gamut from computer vision and civil engineering to political science and epidemiology. In many applications, estimating the edge structure of an underlying graphical model is of significant interest. For instance, a graphical model may be used to represent friendships between people in a social network or links between organisms with the propensity to spread an infectious disease.

However, data in real-world applications are often observed uncleanly, and observations may be systematically corrupted according to mechanisms such as additive noise or missing data. Running standard machine learning algorithms on such corrupted data often leads to systematically biased solutions, which are inconsistent even in the limit of infinite data. We hereby present new methods for edge recovery in graphical models with systematically corrupted observations. We show how to modify existing machine learning algorithms such as regularized linear regression and the graphical Lasso to accommodate for systematic corruptions, and demonstrate the theoretical and practical consequences of our corrected algorithms for learning in Gaussian and discrete-valued graphs.

Brittany Judoprasetijo (Berkeley, Circuits / Power Electronics)

Future office spaces and buildings will collect energy consumption data from the electrical devices used by their occupants, from refrigerators to the humble cell phone charger. This goal of this project is to develop and evaluate the devices enabling dense measurement of energy consumption throughout the building. We have started developing two hardware platforms. First, we designed a sensor circuit board which can be quickly installed by placing it between a plug and the outlet. Our second design is a custom designed surge protector can measure six independent outlets as well as turn them on and off. Both devices are able to transmit the data over a wireless mesh network. The data will be fed into a central server which filters the signal and eventually uses the information to make smart control decisions to reduce energy usage of the building.

Kinetic Polyacrylamide Gel Electrophoresis: Microfluidic Binding Assay Enables Measurements of Kinetic Rates for Immunoreagent Quality Screening
Monica A. Kapil (Berkeley, Bioengineering)

Antibodies are critical reagents used everyday, from bench-to-bedside in approaches such as immunohistochemistry (IHC), protein arrays, and immunoprecipitation (IP) to detect antigens of interest for a variety of research and diagnostic applications. These approaches however suffer in problems with accuracy due to non-specific binding or little binding, and reproducibility due to variability in antibodies and are rooted from poor antibody selection. An antibody that has a high association and low dissociation to a given analyte of interest will result in good attachment of target antigens resulting in a greater signal providing a more confident and accurate result. Therefore, technologies that can validate the selection of reliable and consistent antibodies towards uncharacterized proteins of interest are of great need. Selecting antibodies based on their antigen binding kinetic properties, such as their association and dissociation rate constants, kon and koff, can provide a quantitative metric that can further optimize and validate immunoreagent selection. These rate constants quantify the ability for an antibody to associate (bind) or dissociate (unbind) to a target analyte and determines inherent binding strength. Therefore, a metric such as this has the power to eliminate many of the problems seen in antibody-based approaches and has the ability to inform in assay design and improve overall performance. Therefore, my goal is to Introduce rapid, quantitative quality assessment assay for screening and selection of improved immunoreagents.

Expanding the Range of microWestern Blotting to Low Molecular Mass Species
Rachel Gerver (Berkeley, Bioengineering)

Bench top Western blotting is time consuming, involving multiple steps, transfers, and long incubation times. Nevertheless, Western blotting is a workhorse assay, used for high-specificity protein detection in complex proteinaceous fluids. The Herr Lab harnesses microfluidic design to significantly reduce assay times and enable quantitation while reducing sample consumption by reinventing Western blotting. Here, we expand the capabilities of our recently developed microfluidic Western blot (μWestern) to advance the detection of important yet difficult to detect low molecular mass proteins.

Restricting Brain Tumor-Initiating Cell Motility By Rewiring Cell-Matrix Mechanosensing
Sophie Wong (Berkeley, Bioengineering)

Glioblastoma multiforme (GBM) is the most aggressive primary brain tumor and is characterized by poor survival even in the setting of surgery, radiation, and chemotherapy. While GBM tumors are very heterogeneous, recent work has shown that a specific subpopulation of tumor cells – so-called “brain tumor-initiating cells” (BTICs, also called brain tumor stem cells) – is particularly resistant to conventional treatments, is uniquely capable of initiating new GBM-like tumors following transplantation, and may directly participate in the invasive process. Thus, there is tremendous interest in identifying microenvironmental factors that regulate BTIC self-renewal and motility. Specifically, studies have shown that Rho family GTPases can influence brain tumor progression, but results from these studies have been contradictory. Here, we investigate the sensitivity of BTICs to extracellular matrix (ECM) mechanics and the role of Rho GTPases in regulating this sensitivity.

We cultured BTIC lines on laminin-coated polyacrylamide ECMs ranging in stiffness from 80 Pa to 119 kPa. We found no significant differences in spreading area, random motility speed, or proliferation as a function of ECM stiffness. Remarkably, all BTIC lines proliferated robustly, underwent rapid mesenchymal motility, and developed vinculin-positive adhesions on extremely soft ECMs (~ 80 Pa) normally regarded as non-permissive for spreading, adhesion maturation, and migration. Lentiviral transduction of BTICs with a constitutively active (CA) mutant of Rho GTPase strongly restored mechanosensitivity abrogating BTIC spreading and migration on soft ECMs of comparable stiffness to brain tissue. Administration of myosin II inhibitor blebbistatin offset CA RhoA-mediated mechanosensitivity and restored BTIC spreading and migration on soft ECMs. AFM measurements suggest that BTICs are very soft on both stiff and soft ECMs, thus failing to exhibit normal tensional homeostasis. These findings are consistent with a model in which BTICs evade the anti-tumorigenic effects of soft ECMs but may be rendered susceptible through activation of RhoA-dependent cell contractility.

In summary, the spreading, motility, and proliferation of BTICs are much less sensitive to ECM-based biomechanical cues than continuous culture models of GBM, with BTICs overcoming soft ECM-induced limitations on cell migration and proliferation. These results provide the first evidence that BTICs can uniquely resist mechanical suppression of motility and other culture behaviors relevant to tumor spread, in the same way they can resist radio- and chemotherapy. In addition, motility on ECMs of stiffness comparable to brain tissue may be restricted by forcing activation of myosin-dependent contractility, implying that proteins in this pathway may represent a novel set of therapeutic targets.

Sakshi Jain (Berkeley, Computer Security)

As the usage and importance of social networking sites grows, so does our need for usable and secure authentication systems. In this paper, we propose an authentication system for Facebook using your Facebook activity. The proposed system tackles some of the problems in existing systems like a) people forgetting their passwords/ answers to secret questions and b) users having the same set of passwords for multiple sites. We perform a thorough user survey of the proposed system using Amazon Mechanical Turk to study the usability and reliability of the system.