Information Retrieval Module

Study, design and analysis of IR systems which are efficient and effective to process, mine, search, cluster and classify bigdata document collections, coming from textual as well as any unstructured domain. The lectures analyze the main components of a modern search engine: Crawler, Parser, Compressor, Indexer, Query resolver, Query and Document annotator, Results Ranker. Furthermore, the course digs into some basic algorithmic techniques which are now ubiquitous in any IR application for data compression, indexing and sketching; and describes few other IR tools which are used either as a component of a search engine or as independent tools and build up the previous algorithmic techniques, such as: Classification, Clustering, Recommendation, Random Sampling, Locality Sensitive Hashing. Further information can be found at http://didawiki.di.unipi.it/doku.php/magistraleinformatica/ir/ir18/start

It is part of the Master in Big Data Analytics & Social Mining at the University of Pisa (https://www.masterbigdata.it).

The author did not intend to violate any copyright on figures or content. In case you are the legal owner of any copyrighted content, please contact info@sobigdata.eu and we will immediately remove it

Tags
Data and Resources
To access the resources you must log in
  • IntroductionPDF

    This lecture provides an introduction to Information Retrieval

    The resource: 'Introduction' is not accessible as guest user. You must login to access it!
  • ParsingPDF

    This lecture focuses on Parsing

    The resource: 'Parsing' is not accessible as guest user. You must login to access it!
  • CrawlingPDF

    This lecture focuses on crawling

    The resource: 'Crawling' is not accessible as guest user. You must login to access it!
  • Query ProcessingPDF

    This lecture focuses on Query Processing

    The resource: 'Query Processing' is not accessible as guest user. You must login to access it!
  • Index Construction: SortingPDF

    This lecture introduces Index Construction focusing on Sorting

    The resource: 'Index Construction: Sorting' is not accessible as guest user. You must login to access it!
  • Random Walks, Ranking and Summarisation IPDF

    This lecture is the first part of a focus on Random Walks, Ranking and...

    The resource: 'Random Walks, Ranking and ...' is not accessible as guest user. You must login to access it!
  • Random Walks, Ranking and Summarisation IIPDF

    This lecture is the second part of a focus on Random Walks, Ranking and...

    The resource: 'Random Walks, Ranking and ...' is not accessible as guest user. You must login to access it!
  • Topic Annotation: Concepts and Knowledge GraphsPDF

    This lecture focuses on Topic Annotation, exploring Concepts and Knowledge...

    The resource: 'Topic Annotation: Concepts ...' is not accessible as guest user. You must login to access it!
  • Document Compression and ClusteringPDF

    This lecture focuses on Document Compression and Clustering

    The resource: 'Document Compression and ...' is not accessible as guest user. You must login to access it!
  • Document Compression and Clustering IIPDF

    This lecture is the second part of a focus on Document Compression and...

    The resource: 'Document Compression and ...' is not accessible as guest user. You must login to access it!
Additional Info
Field Value
Availability On-Site
Course UNIPI Master in Big Data Analytics & Social Mining
Keywords Crawling
Keywords Information Retrieval
Keywords Search Engine
Keywords Web
Keywords Data compression
Keywords PageRank
Keywords Algorithm and Data Structure
Keywords Inverted Lists
Keywords Text Statistical Proprieties
Keywords Parsing
Keywords NLP RAKE
Keywords Web-graph
Keywords BFS
Keywords Query Processing
Keywords Query auto-completion
Keywords Soundex
Keywords IR Evaluation
Keywords Indexing
Keywords Dynamic Indexing
Keywords Document Ranking
Keywords Recommendation System
Keywords Document Ranking
Keywords Topic Annotation
Keywords TagMe
Keywords Text Comparison
Keywords User Profiling
Keywords Document Compression
Keywords Document Replication
Length 320 Slides for a 40 hour Module
Lesson number 10
Prerequisites Basic Algorithms course; Basic notions of Statistics and functional analysis
Provider Institution UNIPI
Target users Social Scientists
Target users Data Scientists
Target users PhD Students
Target users Other
Thematic Cluster Text and Social Media Mining [TSMM]
Thematic Cluster Web Analytics [WA]
Training material typology Slides
system:type TrainingMaterial
Management Info
Field Value
Author BRAGHIERI MARCO
Maintainer BRAGHIERI MARCO
Version 1
Last Updated 8 October 2021, 13:05 (CEST)
Created 29 June 2018, 11:34 (CEST)