Data Mining & Knowledge Discovery (8ΕΠ01)
Instructor : Vassilis Plagianakos
Assistant : Pigadas Vasileios
Course typeElective
Semester8
TermSpring Semester
ECTS5
Teaching hours3
Laboratory hours
Description
The course provides an introduction to data mining and knowledge discovery from data. The key data mining methods of clustering, classification and prediction are illustrated, together with practical tools for their execution. Next, we focus on particular aspects of Big Data such as high volume, high dimensionality and high frequency and incorporate tools build to deal with such structures (dimensionality reduction, incremental clustering) into data mining methodologies. Finally the key methods for Big Data sensing and acquisition are discussed, together with basics of applications in social media mining, text mining and biomedicine. We conclude with an introduction to big data visualization. Syllabus: 1. Data mining and the knowledge discovery process. Overview of data mining and machine learning techniques. Exemplar studies in clustering, classification and pattern mining. 2. Clustering. Taxonomy of clustering concepts: distance-based (separation, centroids, contiguity), density-based, partitional vs. hierarchical. Methods for centroid-based clustering (k-means), hierarchical clustering (agglomerative and divisive), density-based clustering (DBSCAN). 3. Classification and prediction models. Model learning and model validation. Explanation vs. prediction. Rule-based classifiers and decision trees. Naïve Bayes classifiers. Basic machine learning models (K-nearest neighbors, linear discriminant analysis, support vector machines, ensemble methods). 4. Dimensionality reduction in Big Data (PCA, Random Projection, Parallelized methods) 5. Pattern mining and association rules. A priori principle. Mining high-frequency patterns and high-confidence rules. Interestingness measures for patterns and rules. 6. Big data and social sensing. Big data acquisition. Web scraping, crawling, crowdsourcing, crowdsensing. Big data technologies and platforms. 7. Social media mining - Text Mining. Listening social media sources. Monitoring social trends. Basics of opinion mining and sentiment analysis. Recommended Systems. 8. Applications in Biomedicine. Population Genomics, DNA sequence data mining. 9. Data visualization and visual analytics. Basics of visual representation of data: hierarchies, networks, maps, time series, spatio-temporal data, text. Exemplar case studies.
Course objectives
  • To enable students to understand, select and use appropriate data mining methodologies.
  • To introduce the students to the basic concepts of Big Data analytics.
  • To give the students experience of using clustering algorithms along with dimensionality reduction in practice.
  • To give the students experience in clustering of high frequency data streams.
  • To extend students’ knowledge in real life Big Data applications and Visualization.
Textbooks/Bibliography
  • Data Mining, Εισαγωγικά και Προηγμένα Θέματα Εξόρυξης Γνώσης από Δεδομένα, Margaret H. Dunham, ΕΚΔΟΣΕΙΣ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΜΟΝ. ΕΠΕ, 1η/2004, ΑΘΗΝΑ
  • Εισαγωγή στην Εξόρυξη Δεδομένων και τις Αποθήκες Δεδομένων, Αλ. Νανόπουλος - Γ. Μανωλόπουλος, ΕΚΔΟΣΕΙΣ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΜΟΝ. ΕΠΕ, 1η/2008, ΑΘΗΝΑ
  • Εξόρυξη Γνώσης από Βάσεις Δεδομένων και τον Παγκόσμιο ιστό, Βαζιργιάννης Μιχάλης, Χαλκίδη Μαρία, Γ. ΔΑΡΔΑΝΟΣ - Κ. ΔΑΡΔΑΝΟΣ Ο.Ε., 2η έκδ./2005, ΑΘΗΝΑ
  • Εισαγωγή στην εξόρυξη δεδομένων, Tan Pang - Ning,Steinbach Michael,Kumar Vipin, ΕΚΔΟΣΕΙΣ Α. ΤΖΙΟΛΑ & ΥΙΟΙ Α.Ε., 1η έκδ./2010, ΘΕΣ/ΝΙΚΗ
Assessment method
Written examination at the end of the semester and optional tasks.