A Data Scientist is a professional who combines many types of technical and industry competencies to turn data, which is very often idiosyncratic and ambiguous, into actionable intelligence in a business environment. The skills needed to make this transformation draw from mathematics, statistics, computer science, business, and require the ability to communicate technical information to people with a range of technical competence. The Master of Science in Data Science is a rigorous program designed to rapidly bring students to the point of functioning in the role of a data scientist and then, building upon the initial growth, to develop expertise with their data science skills.
The program in Data Science has several components. It requires coursework over a two-year period in mathematics, statistics, and computer science that supports the program outcomes. The program is centered on core data science courses including an introduction to data mining and applied data analytics. Supporting courses include applied statistics, applied linear algebra, computer programming, and databases. It also requires coursework that uses core knowledge and skills in a professional environment, such as communication, professional writing, research methods, and project management. The program includes a capstone project that provides a substantive professional context for students to apply their data science knowledge.
The most competitive candidates will satisfy the prerequisites and core competencies as follows:
Candidates who meet some but not all of the prerequisites and core competencies are encouraged to apply and will be considered conditionally. The Program Director can identify opportunities for those candidates to gain familiarity in the relevant area(s).
Applications open in September for entry into the program the following fall. The application deadline is rolling, and applications will be accepted as long as seats are available in the entering class. The early action deadline is January 15. The priority application deadline is February 15.
Saint Mary’s College students meeting the prerequisites set forth below may apply to the Master of Science in Data Science program as a second semester junior or first semester senior. If admitted to the graduate program, students will complete two graduate courses in Data Science in the senior year prior to baccalaureate graduation. The student will continue Data Science courses in the summer term immediately following her baccalaureate graduation and continue for the next fall, spring, and summer terms to complete the bachelors and graduate degree in five years. In the fifth year, students will be charged the per credit hour rate equivalent to the cohort she is joining.
All students are required to participate in a summer immersion experience on campus the week following the summer term. The summer immersion is an intensive experience during which students work in teams and consult on data science related projects from regional businesses and non-profit organizations.
All students are required to give a formal presentation about the project completed for the DSCI 599 Practicum. The presentation shall be given during the summer orientation/symposium in August following the enrollment in DSCI 599 Practicum.
|CPSC 507||Computer Programming||3|
|CPSC 529||Database Systems||3|
|DSCI 501||Data Mining||3|
|DSCI 502||Data Mining at Scale||3|
|DSCI 511||Data Preprocess/Visualization||3|
|MATH 527||Applied Linear Algebra Applied Linear Algebra||3|
|MATH 546||Applied Statistics I||3|
|MATH 547||Applied Statistics II||3|
|Six credits of the following:||6|
|Communication and Data Science|
|Essential Calculus for Data Science|
|Essential Probability Theory for Data Science|
|Professional & Tech Writing|
|Data Analytics and Outcomes Improvement|
|DSCI 599||Practicum (at least 3 credits)||3-6|
|Additional graduate credits to total 36 credit hours||0-3|
The Master of Science in Data Science program is committed to providing graduates with the range and depth of expertise to be leaders in data driven industries. Students who successfully complete the program will demonstrate high levels of mathematical, analytical, technical, and professional skills and knowledge. Upon the completion of the program, students will be able to:
48 Madeleva Hall
C. Fitzpatrick, C. Hoover, J. Juszkiewicz, K. Kuter, D. Mallot, E. Misiolek, R. Rohatgi, S. Rohr, R. Rohatgi, B. Vajiac, C. Wedrychowicz, M. Zwart
The course develops the competencies and skills for planning and controlling projects and understanding interpersonal issues that drive successful project outcomes. Focusing on the introduction of new products and processes, it examines the project management life cycle, defining project parameters, matrix management challenges effective project management tools and techniques, and the role of a project manager.
Industry experts stress the importance of often-overlooked communication skills in data science. Rachel Hawley, Analytic Solutions Architect at the SAS Institute, states “it is extremely important that potential candidates have effective communication and presentation skills. It’s not enough to just have the technical chops, a data scientist must be able to effectively explain how he or she came to a specific conclusion and convince the internal or external customer that their results should be leveraged.” This course is designed to explore this intersection between communication and data science. Topics will include assessing and improving communication skills, interpersonal and intercultural communication, teamwork, and leadership. The development of effective presentational skills, particularly oral skills, will be stressed.
A problem-solving approach to learning computer programming. Topics include variables, data types, conditional statements, loops, arrays, recursion, principles of software engineering, object-oriented programming, data structures, algorithms, and the use of standard libraries available in a variety of programming languages. The course will use commerically common programming languages and integrated development environments (IDEs).
Basic concepts of databases. Topics include conceptual data modeling, database design and normalization, and database implementation. Use of SQL for data definition, manipulation, and query processing. While primary emphasis will be on the relational model and traditional RDBMS, discussion will also include a survey of techniques for handling non-relational data models, massive datasets, and unstuctured data, including data warehousing, in-memory databases, NewSQL, NoSQL and Hadoop.
This course introduces the concepts from differential, integral, and multivariate calculus essential for the study of data science. Elements of linear algebra, such as vectors, planes, and matrices, also included. Emphasis on computation and application.
This course introduces concepts from probability theory essential for the study of data science. Topics include probability spaces, Bayes’ Theorem, random variables, discrete and continuous distributions, specifically the normal distribution, and the Central Limit Theorem. Emphasis on computation and application.
This course is about mining knowledge from data in order to gain useful insights and predictions. From theory to practice, the course investigates all stages of the knowledge discovery process, which includes data preprocessing, exploratory data analysis, prediction and discovery through regression and classification, clustering, association analysis, anomaly detection, and postprocessing.
A second semester of data mining introducing tools and techniques necessary for mining large scale data sources. Prerequisite:DSCI 501
An introduction to basic scientific and statistical research methods when dealing with measurements of human and corporate activity. Students read and evaluate current research and translate their ideas into viable research projects. Topics include scholarly writing and presentation, descriptive research methods, quasi-experimental and experimental design, ethical issues, and analytical methods.
Thesis credit may be earned for significant work toward the writing of a master’s thesis. This thesis may be used to fulfill the culminating project requirement.
The practicum is an opportunity to directly experience the work of a data scientist or data analytics professional. It consists of project-based learning on a significant and contributory business objective in conjunction with practicing professionals in one of many appropriate industries. May be repeated up to 6 credits.
This course teaches skills in written, visual, and verbal communication of particular importance to data science professionals. It engages with foundational concepts of rhetoric, composition, and design that can be applied in any setting while also addressing the forms and conventions of technical writing in a professional setting that students will encounter as practicing researchers and data analysts. The course stresses the seamless continuity between analysis of data and communication about that analysis.
An application-focused approach to linear algebra in a variety of fields. Topics include matrices, gaussian elimination, vector spaces, determinants, inner products, orthogonality, least squares solution, eigenvalue problems, Gram-Schmidt process, matrix decomposition/factorization, Jordan canonical forms, methods of dimension reduction such as singular value decomposition or principal component analysis, quadratic forms, pseudo-inverses, Markov processes, data/image processing, and other advanced topics pertinent to data analysis.
An introduction to the foundations and applications of statistics. Topics include basic concepts of data collection sampling and experimental design, descriptive analysis and graphical displays of data, probability concepts and expectations, normal and binomial distributions, sampling distributions and the Central Limit Theorem, confidence intervals and hypothesis testing, likelihood-based statistics, ANOVA, correlation and simple linear regression.
An application-focused approach to regression analysis and related techniques. Topics include simple and multiple linear regression, weighted and generalized least squares estimators, polynomial regression, exponential regression, model selection, categorical variables and ANOVA, logistic regression, principal component analysis, time series analysis, and other applications of statistics as relevant. Prerequisite: MATH 546
This course is designed to provide the DNP student with an opportunity to examine the lifecycle of data and the use of data analytics to measure healthcare delivery and improve patient outcomes. Transformation of healthcare outcomes that arise from changes in health care delivery systems will be driven by insights from existing large data seats that optimize clinical, financial, operational, and behavioral perspectives. Students will examine the process by which the DNP gains insight from data and the role of analytics in supporting a data-driven healthcare system as a component of healthcare reform. Students will explore the application of data to value-based innovation projects that maximize the use of data for quality improvement, cost effective, and sustainable change in healthcare delivery systems. The use of the internet in healthcare settings, ethical and legal issues associated with working with large data sets, and the focus on the individual patient as the center of evidence based practice in nursing are emphasized. Prerequisites: NURS 612; NURS 620; NURS 622.
Data about us is collected continuously, and in many ways makes our lives as we know them possible—enabling your doctors to treat you efficiently, letting Amazon show you what you need to buy before you even know it exists, helping Spotify introduce you to the next music you’ll love. But is there a dark side to all this data-driven convenience? In this one credit hour course, students will engage with the ethical challenges posed by data collection, analysis and use, through class discussion, case study analysis and course readings. We begin by considering various ethical frameworks, including utilitarianism and deontology. We then engage with the history of data collection, looking at the abuse of humans, particularly from marginalized groups, in the Nazi experiments, the Tuskegee syphilis experiments and the history of eugenics in the US. Turning to contemporary methods of collecting and using data, we consider key areas of ethical concern including: issues of autonomy and consent, privacy and surveillance, artificial intelligence and machine learning, disinformation and bias, and algorithmic discrimination. Students demonstrate mastery of the material in online discussion, brief writing assignments, and analysis of a self-chosen contemporary case study.