gleantext

| INTRODUCTION | PEOPLE | DOCUMENTS | SOFTWARE |

intro

The Glean project focuses on performing analytics at scale over Big Data. The datasets we consider are in the order of Petabytes and encompass billions of files representing trillions of observations, measurements, or simulation datapoints. Glean achieves this by combing innovations in large-scale storage systems, cloud computing, machine learning, and statistics. A particular focus of this effort is to perform analytics in real-time over streaming data representing time-series observations.

The data we consider include those generated by radars and satellites, medical sensors, and epidemiology simulations. The data are voluminous, have diverse formats, and are produced at different rates.

key

  • Alleviation of storage and retrieval spots for Petascale datasets
  • Support for probabilistic and approximate queries
  • Use of ensemble methods
  • Visual analytics
  • Significance evaluation and hypothesis testing
  • Kernel density estimations
  • Support for variants for the MapReduce cloud programming model
  • Efficient orchestration of processing loads
  • Support for radial, proximity, and geometry constraints
  • Scalable anomaly detection and autonomous adaptation to evolving data
  • Forecasts based on multiple linear regression with LASSO and ridge regularizations
  • Parameter space exploration using Latin Hypercube and Monte Carlo sampling
  • Support for density (EM and GMMs) and distance based clustering
  • Tuning and forecasts artificial neural networks and multistage nueral networks.
  • Probabilistic classifications based on logistic regression
  • Use of random forests
  • Support for Allen’s Interval Algebra in time-series queries
  • Conditional probability and Naïve Base Classifications

 

 

 

 

projectnews

Approximate Queries Paper will appear in IEEE Transcations on Cloud Computing journal.

Analytics Queries paper appears in IEEE Transactions on Knowledge and Data Engineering.

Paper on spatially constrained queries appears in IEEE CiSE special issue on Extreme Data Vol 16(4).

 

 

eps

dhs
edf



| INTRODUCTION | PEOPLE | DOCUMENTS | SOFTWARE |

Department of Computer Science
Colorado State University