|Short description: Machine learning is the science of making computers act without being explicitly programmed. Instead, algorithms are used to find patterns in data. It is so pervasive today that you probably use it dozens of times a day without knowing it, for instance in web search, speech recognition, and (soon) self-driving cars. It is also a crucial component of data-driven industry (Big Data), scientific discovery, and modern healthcare. |
Learning objectives: In this class, you will learn the foundations of how data mining and machine learning work internally, understand when and how to use key concepts and techniques, and gain hands-on experience in getting them to work for yourself. You'll learn about the theoretical underpinnings of data analysis, and leverage that to quickly and powerfully apply this knowledge to tackle new problems.
Upon completion of this course you will be able to:
Identify and classify data mining problems.
Understand the theoretical foundations of data analysis.
Build and evaluate predictive models and clusterings.
Use data mining tools such as R and Python to build machine learning systems.
While there are no strict requirements, it is highly recommended to have a working knowledge of statistics, and to have programming experience. Programming is part of the assignments. The course will mostly feature examples from R and Python.
Entrance requirementsEntrance requirements tests -Assumed previous knowledge
This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include:
- Similarity and distances for non-Euclidean data (e.g., text documents)
- Efficient nearest neighbor search in non-Euclidean spaces
- Challenges of high-dimensional data analysis; metric embeddings and dimensionality reduction
- Unsupervised learning (clustering, hierarchical clustering, clustering in metric spaces)
- Supervised learning (classification, decision trees, Bayesian learners, support vector machines, kernel methods, ensemble methods)
- Evaluation of predictive models (cross-validation, overfitting, ROC space, bias/variance theory)
Previous knowledge can be gained byResources for self study
|While there are no strict requirements, it is highly recommended to have a working knowledge of statistics, and to have programming experience. Programming is part of the assignments. The course will mostly feature examples from R and Python.|
|Bachelor College or Graduate School||Required materials-Recommended materials|
|Charu Aggarwal: Data Mining - The Textbook|
|Course materials will be provided along the duration of the course via Sakai.|
|Hopcroft, Kannan: Computer Science Theory for the Information Age|
|Matousek: Embedding Finite Metric Spaces into Normed Spaces, from Lectures on Discrete Geometry.|
|Peter Flach: Machine Learning. https://www.cs.bris.ac.uk/~flach/mlbook/|
|Lecture with notebook / PC|
|Number of opportunities||1|
|Test duration in minutes||-|
RemarkGraded individual assignments and written tests.