High Performance and Scalable Analytics Module

Mining with big data or big data mining has become an active research area. Running current analytical methodologies and software tools on a single personal computer cannot efficiently deal with very large datasets. Distributed computing platforms are a scalable solution for big data mining, obtained by dividing a large problem into smaller ones that are concurrently solved by many single processor/machine. This course aims at teaching the basic theoretical concepts behind the MapReduce distributed computing paradigm, and Hadoop in particular, and at building expertise in the practical usage of high performance computing tools for data engineering, analysis and mining. In particular the students will learn how the classical data mining algorithms can be applied on Big Data using Hadoop (Spark & MLlib).

It is part of the Master in Big Data Analytics & Social Mining at the University of Pisa (https://www.masterbigdata.it).

The author did not intend to violate any copyright on figures or content. In case you are the legal owner of any copyrighted content, please contact info@sobigdata.eu and we will immediately remove it

Tags
Data and Resources
To access the resources you must log in
  • Introduction to Parallel ComputingPDF

    This lecture focuses on providing an introduction to Parallel Computing

    The resource: 'Introduction to Parallel ...' is not accessible as guest user. You must login to access it!
  • Introduction to HadoopPDF

    This lecture focuses on providing an introduction to Hadoop, an open-source...

    The resource: 'Introduction to Hadoop' is not accessible as guest user. You must login to access it!
  • Hadoop PatternsPDF

    This lecture focuses on Hadoop Patterns

    The resource: 'Hadoop Patterns' is not accessible as guest user. You must login to access it!
  • Remote Connection and HDFSPDF

    This lecture focuses on Remote Connection and the Hadoop Distributed File...

    The resource: 'Remote Connection and HDFS' is not accessible as guest user. You must login to access it!
  • Exercises for Remote Connection and HDFS LectureZIP

    This .zip file contains an exercise to be carried out while exploring the...

    The resource: 'Exercises for Remote ...' is not accessible as guest user. You must login to access it!
  • Introduction to SparkPDF

    This lecture provides an introduction to Spark, which consists of a driver...

    The resource: 'Introduction to Spark' is not accessible as guest user. You must login to access it!
  • Exercises for Introduction to SparkZIP

    This .zip file contains exercises to be carried out while exploring the...

    The resource: 'Exercises for Introduction ...' is not accessible as guest user. You must login to access it!
  • Introduction to Spark SQLPDF

    This lecture provides an introduction to Spark SQL, Relational Data...

    The resource: 'Introduction to Spark SQL' is not accessible as guest user. You must login to access it!
  • Exercises for Introduction to Spark SQLZIP

    This .zip file contains an exercise to be carried out while exploring the...

    The resource: 'Exercises for Introduction ...' is not accessible as guest user. You must login to access it!
  • Hadoop Ecosystem and ArchitecturePDF

    This lecture focuses on Hadoop Ecosystem and Architecture

    The resource: 'Hadoop Ecosystem and ...' is not accessible as guest user. You must login to access it!
  • Data Mining with Spark (MLLIB)PDF

    This lecture focuses on Data Mining with Spark (MLLIB), which is Spark's...

    The resource: 'Data Mining with Spark (MLLIB)' is not accessible as guest user. You must login to access it!
  • Exercises for Data Mining with Spark (MLLIB)ZIP

    This .zip file contains an exercise to be carried out while exploring the...

    The resource: 'Exercises for Data Mining ...' is not accessible as guest user. You must login to access it!
Additional Info
Field Value
Availability On-Site
Course UNIPI Master in Big Data Analytics & Social Mining
Keywords Big Data
Keywords Social Mining
Keywords Distributed
Keywords Real-use Cases
Keywords Parallel Computing
Keywords Hadoop
Keywords Spark
Keywords Spark SQL
Keywords Machine Learning
Keywords Correlation
Keywords Support Vector Machines
Keywords Regression
Keywords Classification
Keywords Clustering
Keywords Word2Vec
Length 418 slides, 4 exercise repositories
Lesson number 9
Prerequisites None
Provider Institution ISTI-CNR
Target users Social Scientists
Target users Data Scientists
Target users Professionals
Target users Other
Thematic Cluster Text and Social Media Mining [TSMM]
Thematic Cluster Social Network Analysis [SNA]
Thematic Cluster Human Mobility Analytics [HMA]
Thematic Cluster Web Analytics [WA]
Thematic Cluster Social Data [SD]
Training material typology Slides
Training material typology Other
system:type TrainingMaterial
Management Info
Field Value
Author BRAGHIERI MARCO
Maintainer BRAGHIERI MARCO
Version 1
Last Updated 8 October 2021, 13:11 (CEST)
Created 29 June 2018, 11:34 (CEST)