Computing Variance Online

In my previous post, A Fable of an Aggregator, I dedcribed the properties of an abstract data type (ADT) that enables concurrent execution of aggregations, such as sum, mean, max. For example, if we want the mean of a collection of values, it sufficies for us to accumluate its sum and count - dividing the former by the latter gives us the answer. More importantly, this accumulation can be done concurrently - and hence, it’s parallelizable. ...

September 5, 2020 · guidj

A fable of an Aggregator

Many parallel data computing tasks can be solved with one abstract data type (ADT). We will describe how an Aggregator does that by walking through a problem we want to solve with parallelism and uncovering the ideal properties of an ADT that enable us to do so. Relevance of Aggregations: The Desirable ADT In the world of analytics and machine learning, data processing makes up a significant chunk of the plumbing required to do both. In the world of big data, or medium-sized data for that matter, parallel processing enables efficient usage of disparate computing resources. Quite frequently, the data we’re referring to is represented by a collection of records. ...

June 13, 2020 · guidj