Skip to main content

Posts

Showing posts from April, 2018

Big Data

Big Data Analytics Introduction 5 Vs of Big Data Volume Velocity Variety Veracity value Big Data Infrastructure ( Slide P38 ) Reliable Distributed File system Data kept in “chunks” spread across machines Each chunk replicated on different machines.(if machine/disk) failure, recovery seamlessly) Bring computation directly to data Chunk servers also servers as compute servers Societal concerns: privacy, algorithmic black boxes, filter bubble. MapReduce Programming model: A programming model, a parallel, data aware, fault-tolerant implementation mappers:( Slide P11 ) The Map function takes an input element as its argument and produces zero or more key-value pairs. The types of keys and values are each arbitrary. Further, keys are not “keys” in the usual sense; they do not have to be unique. Rather a Map task can produce several key-value pairs with the same key, even from the same element. reducers:( Slide P11 ) The Reduce function’s argument is a p