Data Mining & Knowledge Discovery

Course ID
8ΕΠ01
Επίπεδο
Undergraduate
Είδος
Optional (compulsory)
Εξάμηνο
8
Περίοδος
Spring Semeter
ECTS
5
Ώρες Θεωρίας
3
Ώρες Εργαστηρίου
-

Description

The course provides an introduction to data mining and knowledge discovery from data. The key data mining methods of clustering, classification and prediction are illustrated, together with practical tools for their execution. Next, we focus on particular aspects of Big Data such as high volume, high dimensionality and high frequency and incorporate tools build to deal with such structures (dimensionality reduction, incremental clustering) into data mining methodologies. Finally the key methods for Big Data sensing and acquisition are discussed, together with basics of applications in social media mining, text mining and biomedicine. We conclude with an introduction to big data visualization. Syllabus: 1. Data mining and the knowledge discovery process. Overview of data mining and machine learning techniques. Exemplar studies in clustering, classification and pattern mining. 2. Clustering. Taxonomy of clustering concepts: distance-based (separation, centroids, contiguity), density-based, partitional vs. hierarchical. Methods for centroid-based clustering (k-means), hierarchical clustering (agglomerative and divisive), density-based clustering (DBSCAN). 3. Classification and prediction models. Model learning and model validation. Explanation vs. prediction. Rule-based classifiers and decision trees. Naïve Bayes classifiers. Basic machine learning models (K-nearest neighbors, linear discriminant analysis, support vector machines, ensemble methods). 4. Dimensionality reduction in Big Data (PCA, Random Projection, Parallelized methods) 5. Pattern mining and association rules. A priori principle. Mining high-frequency patterns and high-confidence rules. Interestingness measures for patterns and rules. 6. Big data and social sensing. Big data acquisition. Web scraping, crawling, crowdsourcing, crowdsensing. Big data technologies and platforms. 7. Social media mining – Text Mining. Listening social media sources. Monitoring social trends. Basics of opinion mining and sentiment analysis. Recommended Systems. 8. Applications in Biomedicine. Population Genomics, DNA sequence data mining. 9. Data visualization and visual analytics. Basics of visual representation of data: hierarchies, networks, maps, time series, spatio-temporal data, text. Exemplar case studies.

Course objectives

  • To enable students to understand, select and use appropriate data mining methodologies.
  • To introduce the students to the basic concepts of Big Data analytics.
  • To give the students experience of using clustering algorithms along with dimensionality reduction in practice.
  • To give the students experience in clustering of high frequency data streams.
  • To extend students’ knowledge in real life Big Data applications and Visualization.

Textbooks/Bibliography

  • Data Mining, Εισαγωγικά και Προηγμένα Θέματα Εξόρυξης Γνώσης από Δεδομένα, Margaret H. Dunham, “ΕΚΔΟΣΕΙΣ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΙΔΙΩΤΙΚΗ ΚΕΦΑΛΑΙΟΥΧΙΚΗ ΕΤΑΙΡΕΙΑ”, 1η/2004, ΑΘΗΝΑ, 395
  • Εισαγωγή στην Εξόρυξη Δεδομένων και τις Αποθήκες Δεδομένων, Αλ. Νανόπουλος – Γ. Μανωλόπουλος, “ΕΚΔΟΣΕΙΣ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΙΔΙΩΤΙΚΗ ΚΕΦΑΛΑΙΟΥΧΙΚΗ ΕΤΑΙΡΕΙΑ”, 1η/2008, ΑΘΗΝΑ, 3079
  • ΕΞΟΡΥΞΗ ΚΑΙ ΑΝΑΛΥΣΗ ΔΕΔΟΜΕΝΩΝ: ΒΑΣΙΚΕΣ ΕΝΝΟΙΕΣ ΚΑΙ ΑΛΓΟΡΙΘΜΟΙ, MOHAMMED J. ZAKI, WAGNER MEIRA JR., ΕΚΔΟΣΕΙΣ ΚΛΕΙΔΑΡΙΘΜΟΣ ΕΠΕ, 1η/2017, ΑΘΗΝΑ, 68386089
  • Εισαγωγή στην εξόρυξη δεδομένων, Tan Pang – Ning,Steinbach Michael,Kumar Vipin, ΕΚΔΟΣΕΙΣ Α. ΤΖΙΟΛΑ & ΥΙΟΙ Α.Ε., 1η έκδ./2010, ΘΕΣ/ΝΙΚΗ, 18549105

Assessment method

Written examination at the end of the semester and optional tasks.