Hero

Foundations of Data Science Part III

Project management for data analysts



PRINT THE SYLLABUS

Program Info

Program Title Program Evaluation and Data Analytics

Course Info

Course Title Foundations of Data Science Part III
Course Number CPP 528
Canvas Shell https://canvas.asu.edu/courses/82561
Course Level Graduate
Course Start-End March 8 to April 23, 2021
Class Meeting Times Asynchronous
Class Location https://asu.zoom.us/meeting/557182841

Course Instructors

Cristian E. Nuno Manager of Data Engineering and Information Systems / Faculty Associate
Office Location: virtual

Office Hours

Cristian E. Nuno By appointment (see appointment app) https://asu.zoom.us/meeting/557182841 SCHEDULE

Lab Sessions

Discussion Session Time TBD
Discussion Session Location Zoom
Assignment Discussion Board SUBMIT A QUESTION

Textbooks

R Cookbook. Proven recipes Paul Teetor 2015 Not Required
R for Data Science Wickham, H., & Grolemund, G. Free Online Not Required
The Art of Data Science Peng, R. D., & Matsui, E. Free Online Not Required

I. Course Description, Course Goal and Course Learning Objectives

The Foundations of Data Science course sequence will cover the fundamentals of data programming – building unique datasets using APIs and custom tools, importing data from the cloud, linking multiple data sources, and wrangling processes to clean, transform, and reshape datasets. Advanced topics will be introduced such as writing functions, running simulations, writing packages for R, and de-bugging techniques. We will spend roughly a third of the units on graphing procedures and reporting packages.

This course, Foundations of Data Science III, reviews material from CPP 526 and 527 while emphasizing project management and team work-flow skills.

After completing the course students will be able to:

  • Deploy modern project management frameworks to ensure data projects run efficiently and avoid errors.
  • Store data processing steps in .R files - as opposed to .RMD files - and export R objects as .rds files to be called later
  • Create .RMD files that use inline r code to create reports that easily pick up on changes.
  • Initiate, participate, and lead code reviews - via GitHub Pull Request - in order to perform continuous quality assurance of your team’s work
  • Host their work on a GitHub pages website.

Course Prerequisites:

This course builds upon basic R programming material from Data Science I (CPP 526), Data Science II (CPP 527), Program Evaluation I (CPP 523), and Community Analytics (CPP 529).

II. Assessment of Student Learning Performance & Proficiency: Keys to Student Success

Assessment of student performance in this course is based on indications that the course learning objectives stated above have been achieved. Several areas of measurement will be used to produce a final student performance rating. These areas of performance assessment include the following:

  • Working collaboratively and asynchronously with your team
  • Building a research database by combining several data sources.
  • Perform descriptive analysis.
  • Use regression to make inferences about programs.
  • User version control to manage ongoing change within a project via GitHub Desktop
  • Export list objects as .rds files, which are then imported into .RMD files whenever possible
  • Use of RStudio Projects, GitHub Issues, Pull Requests, and commit messages to communicate information about each other’s code
  • Providing proper documentation on analysis performed.

Students will demonstrate competency in understanding, producing and communicating results of their analyses through the following assignments:

  • Weekly labs that provide opportunities to consolidate and apply material from the lectures.
  • A final project that integrates several components of the learning objectives above.
  • Participation in discussion boards.

Assigned work, including the course final project, and the quality of active participation in the regular online discussion sessions that are a critical part of the course learning strategy are the tools the instructors will use to measure comprehension and skill; the student’s course grade is a direct reflection of demonstrated performance. Students should take stated expectations seriously regarding preparation, conduct, and academic honesty in order to receive a grade reflective of outstanding performance. Students should be aware that merely completing assigned work in no way guarantees an outstanding grade in the course. To receive an outstanding course grade (using the grading scheme described below and the performance assessment approach noted above) all assigned work should completed on time with careful attention to assignment details.

III. Course Structure and Operations; Performance Expectations

A. Format and Pedagogical Theory

Mastering analytic techniques is like learning a language. You start by mastering basic vocabulary that is specific to statistics and data science. Through your coursework you will become conversant in the domains of regression analysis, research design, and data science. Progress might be slow at first as you work to master core concepts, integrate the building blocks into a coherent mental model of real-world problems, learn to translate technical results into clear narratives for non-technical audiences, and become comfortable with data programming skills. Over time you will find that your thought processes change as you approach problem-solving in a more structured and evidence-based manner, you apply counter-factual reasoning to performance problems, and you start reading the news and viewing scientific evidence differently. You begin to think and speak like a program evaluator.

By the end of this degree you will be conversant in statistics, research design, and data programming. Fluency takes time and will be developed through professional experience. It requires you to practice these skills to develop muscle memory. You can do this through participating in evaluations on the job and gaining experience building and cleaning data sets from scratch. Understand, though, that this degree focuses on building foundations for your career. Don't be nervous if it feels like it's impossible to master all of the material in this program – it is impossible to learn everything in this field in a year.

Similar to immersion in a language, the best way to learn the material is to be consistent in doing course work each day. The more frequently you revisit concepts and practice data programming the more you will absorb. The curriculum has been designed around this approach. Lectures are split into small units, and each unit includes questions to test your understanding of the material. Weekly labs allow you to spend some time applying the material to a specific problem. The final exam at the end of the semester is designed to help you make connections between concepts and consolidate knowledge. You will be much better off spending a small amount of time each day on the material instead of trying to cram everything into a couple of days a week.

Online discussion boards are design for students to engage with the material together. The purpose of online discussion sessions is threefold: (1) the online discussion sessions allow students to interact with their peers and share ideas and interpretations of the assigned material, (2) such peer-to-peer discussion online helps build professional relationships with potential future colleagues in the field, and (3) the discussions permit the instructor to assess student engagement with the assigned material.

The online discussions are explicitly intended to meet the objectives stated above. They are not intended as another form of “lecture” where the instructor provides commentary and students simply react to that. Rather, the discussions are a chance for peer-to-peer interaction and proactive engagement by each individual student.

B. Assigned Reading Materials

All assigned reading material are provided on the Schedule page.

There are no required textbooks for this class.

The following texts are recommended as reference material for this course:

  • Wickham, H., & Grolemund, G. (2016). R for Data Science. O'Reilly Press_. (free online)_
  • Peng, R. D., & Matsui, E. (2015). The Art of Data Science. A Guide for Anyone Who Works with Data. Skybrude Consulting, 200, 162.

C. Course Grading System for Assigned Work, including Final Project:

Letter grades comport with a traditional set of intervals:

100 – 99% A+*
98 – 94% A
93 – 90% A -
89 – 87% B+
86 – 84% B
83 – 80% B –
Below 80% C, D, F

*An A+ is given at the discretion of the professors, and is normally limited to one or two students per term.

The assigned work for the term comes in the form of four elements, described below:

  • Weekly Labs (25%): Each week there will be a lab designed to guide you through the project step for the week. They will require data wrangling to build your research database, descriptive analysis, and regression. They are graded pass / fail by the instructors based upon an assessment of whether you have sincerely attempted the lab and answered over half of the questions correctly. This is designed to hold you accountable for the material, but not create anxiety about perfection. There are six labs total, each worth 5%. You can drop one lab during the term but at least two of your team members must be working on a weekly lab to avoid having one team member working on a lab on behalf of the entire team.

  • Applied Project Deliverables (65%): Your report on the impact of federal tax credit programs designed to stimulate community revitalization will serve as your primary deliverable for the semester. Each week you will be asked to complete new steps on your final project. You may submit them for feedback and guidance, and revise them before the final deadline. Your final grade will be based upon how clearly results are presented and how easy it is for a non-team member to replicate results from your study using your GitHub repository.

  • Yellowdig Discussions (10%): YellowDig will be used this term to guide discussions on the substantive policy issues regarding community revitalization efforts, and on challenges of using administrative data to conduct causal analysis to measure program impact.

    • 5 points for a new pin.
    • 3 points for a comment made to another pin.
    • 2 point if you receive a comment on your pin.
    • 1 point for liking another pin.
    • 5 points if you earn an instructor badge for an informative post.
    • max of 20 points can be earned each week.


D. General Grading Rubric for Written Work

In general, any submitted work written work (assignments and/or exams) is assessed on these evaluative criteria:

  • Assignment completeness – all elements of the assignment are addressed
  • Quality of analysis – substantively rigorous in addressing the assignment
  • Demonstrated synthesis of core concepts from lecture notes and ability to apply to new problems

Most assignments in this course are labs that are graded pass-fail based upon completeness and correctness of responses (every attempt must be made to complete labs, and they must be more than 50% correct to receive credit). Discussion boards that accumulate points through each activity on the board.

The final project will be accompanied by a rubric describing the allocation of points and criteria for evaluation.

E. Late and Missing Assignments

Grades for the course are largely based on weekly labs. Assigned work is accompanied by detailed instructions, adequate time for completion and opportunities to consult the instructor with questions. As a result, each assignment element in the course is expected to be completed in a timely fashion by the due date. Once solutions are posted it is no longer possible to receive points for assignments.

F. Course Communications and Instructor Feedback:

Course content is hosted on this website. Lecture files, assignments and other course communications will be transmitted via this site and/or through the class email list. All assignment submissions will be made through the Canvas shell.

Please post lab questions on the Get Help page on this site, schedule individual office hours using the Calendly link provided above, and email the instructor directly instead of using the Canvas system.

Students should be aware that the course instructor will attempt to respond to any course-related email as quickly as possible. Students are asked to allow between 24 and 48 hours for replies to direct instructor emails, generally, as a reasonable time to reply to questions or other issues posed in an email. Additionally, the general timeline for instructor grading or other feedback on assignments, either writer work or online discussion work, is between 5 and 10 work days.

G. Student Conduct: Expectation of Professional Behavior:

Respectful conversations and tolerance of others' opinions will be strictly enforced. Any inappropriate language, threatening, harassing, or otherwise inappropriate behavior during discussion could result in the student(s) being administratively dropped from the course with no refund, per ASU policy USI 201-10. Students are required to adhere to the behavior standards listed in the Arizona Board of Regents Policy Manual Chapter V—Campus and Student Affairs .

H. Academic Integrity and Honesty

ASU expects the highest standards of academic integrity. Violations of academic integrity include but are not limited to cheating, plagiarism, fabrication, etc. or facilitating any of these activities. This course relies heavily on writing and original critical thought. Any student who is suspected of not producing his or her own original work will be reported to the College of Public Programs for investigation. Plagiarism will not be tolerated. Any student who plagiarizes or otherwise fabricates his or her work will receive no credit for that assignment. It will be recorded as zero points—and the student will risk a failing grade for the course. For more information, refer to http://provost.asu.edu/academicintegrity.

I. Student Learning Environment: Accommodations

Disability Accommodations: Students should be fully aware that the Arizona State University, the MA in EMHS program, and all program course instructors are committed to providing reasonable accommodation and access to programs and services to persons with disabilities. Students with disabilities who wish to seek academic accommodations must contact the ASU Disability Resources Center directly. Information on the Center's procedures, resources and how to contact its staff can be found here: https://eoss.asu.edu/drc/. The Disability Resources Center is responsible for reviewing any student's requests; once that review has taken place, the Center will provide the student with appropriate information on academic accommodations which in turn will be provided to the course instructor.

Religious accommodations: Students will not be penalized for missing an assignment due solely to a religious holiday/observance, but as this class operates with a fairly flexible schedule, all efforts should be made to complete work within the required timeframe. If this is not possible, students must notify the instructor as far in advance as possible in order to make an alternative arrangement.

Military Accommodations: A student who is a member of the National Guard, Reserve, or other branch of the armed forces and is unable to complete classes because of military activation may request complete or partial unrestricted administrative withdrawals or incompletes depending on the timing of the activation. For more information see ASU policy USI 201-18.

IV. Course Schedule and Unit-specific Learning Objectives

A. Schedule: Overview of Readings and Assignments

As students are all aware, ASU Online courses are typically offered on a seven and a half week schedule. A schedule for each week of the term is outlined here; the course is divided into seven units with specific learning objectives for each unit.

Please note: the course instructor may from time to time adjust assigned readings or adjust the due dates for assignment. The basic course content approach and learning objectives will not change, but slight modifications are possible if circumstances warrant an adjustment.

Use the Schedule tab on the navigation bar for detailed information each week.

Couse Schedule

  • Week 01: Introduction to project management
  • Week 02: Introduction to data management
  • Week 03: Descriptive analysis of neighborhood change
  • Week 04: Predicting median home value change, 2000 to 2010
  • Week 05: Adding federal program data to your predictive models
  • Week 06: Test reproducible work flow with a parameter change
  • Week 07: Finalize project website and project requirements