ML in the Next-Generation Cloud

As AI is getting increasingly popular today, Machine Learning (ML) is becoming essential for a lot of research areas, including computer vision, speech recognition, and natural language processing. Typically, deploying ML models at scale involves virtualization, networking, and infrastructure-related knowledge. The scale and complexity of ML workflows makes it hard to provision and manage resources – a burden for ML practitioners that hinders both their productivity and effectiveness. On the other hand, cloud computing has changed the way we model software and solutions. There are various first-generation serverful cloud services (e.g., IaaS, PaaS and SaaS) to provide efficient, economic and intelligent solutions for ML models. Recently, following the footsteps of traditional cloud computing, the next-generation cloud – “serverless” architectures, represented by AWS Lambda and Google Cloud Functions, have emerged as a burgeoning computation model to further reduce costs and improve manageability in cloud computing.

AI Over BigData

Today users are building even deeper, more complex neural networks to take advantage of the massive amount of data that they have access to. In practice, Big Data (e.g., Apache Hadoop or Apache Spark ) clusters are ubiquitously deployed as the global data platform, where all the production data are stored and made available to all the users. This project can enable powerful distributed Deep Learning (DL) on Big Data stacks with many benefits, such as easy integration with other Big Data components and local data access with existing clusters. As shown in the Figure, the new platform provides comprehensive support of deep learning technologies (neural network operations, layers, losses and optimizers). Users can directly run existing models defined in other frameworks (e.g., TensorFlow, Keras, Caffe and PyTorch) on Big Data (e.g., Hadoop and Spark) clusters in a distributed fashion.

In-Memory Computing

An emergent field called memory-intensive computing has ignited interest among industry and academia, largely driven by various emerging non-volatile memory technologies (NVMs). Machine Learning (ML) applications are being targeted by memory-intensive computing, leveraging unique properties of ML applications to improve their distributed performance by orders of magnitude. ML applications crunch a lot of data from disk drives, increasing latency due to disk access delays. This project proposes a hybrid NVM based computing architecture with effective data sharing and communication strategy to optimize file management, resource allocation and data communication for ML applications. This research centers on two key designs: 1) a new file and data management system based on the hybrid NVM pool consisting of Byte and Block addressable devices; 2) an efficient data sharing and communication management among memories to guarantee data consistency in the hybrid NVM memory pool.

Real-Time Stream

Today, there are an increasing demands for real-time stream data analytics from web site statistics/analytics, E-commerce, social media and many other practical applications. For example, Twitter and Facebook have various real-time information used for analyzing stock market changes and warning of natural hazards. Correspondingly, many popular stream pro- cessing systems have been developed based on two ways, i.e., record-by-record model and micro-batch model. In this project, we focus on the following challenging question: how to best store, manage, and process data records in batch based streaming systems to provide a high service availability and low latency system.

Green Datacenter

Today, major cloud service operators have taken various initiatives to operate their datacenters with renewable energy partially. Google, Facebook, and Apple have started to build their own green power plants to support the operation of their datacenters. Researchers envision that in the near future datacenters, at least micro-clouds, can be completely powered by renewable energy and be self-sustainable. Most green power plants use wind turbines and/or solar panels for power generation. Unlike traditional energy, the availability of green energy varies widely during the times of a day, seasons of the year, and geographical locations of the power plants. Such intermittency makes it very hard for sustainable datacenters to effectively use green energy. This project develops elastic power-aware resource provisioning approaches and flexible workload placement policies, which aim to maximize the overall system performance by effectively prioritizing the power budget of workloads with respect to dynamic green power supply.