|
The Glean project focuses on performing analytics at scale over Big Data. The datasets we consider are in the order of Petabytes and encompass billions of files representing trillions of observations, measurements, or simulation datapoints. Glean achieves this by combing innovations in large-scale storage systems, cloud computing, machine learning, and statistics. A particular focus of this effort is to perform analytics in real-time over streaming data representing time-series observations. The data we consider include those generated by radars and satellites, medical sensors, and epidemiology simulations. The data are voluminous, have diverse formats, and are produced at different rates.
|
|
Approximate Queries Paper will appear in IEEE Transcations on Cloud Computing journal. Analytics Queries paper appears in IEEE Transactions on Knowledge and Data Engineering. Paper on spatially constrained queries appears in IEEE CiSE special issue on Extreme Data Vol 16(4).
|
|