
DATA 510: Data Science Capstone
May 18, 2026
By the end of this session, you will be able to:
The DATA 510 capstone is a semester-long culminating project: you propose, build, evaluate, and communicate a consequential data science outcome that draws together your MS coursework as one coherent story, not a pile of disconnected homework.
The project is flexible by design. You should focus effort on a problem you (or a small team you choose) care about, in a domain where you can sustain curiosity through August. That passion is an asset, not a loophole.
You still owe a defensible, integrated capstone: engineering, analytics or ML, visualization and communication, ethics, and research design should all show up somewhere in the plan. Silo pitches (“only a dashboard,” “only a notebook”) get pushed to deepen before approval.
| Dimension | Rule |
|---|---|
| Execution | Solo, or a self-selected team of 2 to 3 |
| Scope bar | Multi-person teams need noticeably higher scope than a comparable solo project; my approval at the proposal |
| Meta-project cluster | Assigned by me: your project sits beside two peer capstones for the whole term |
You are not merged into one mega-team with those peers. You remain owners of your backlog and milestones. You do follow each other’s boards and weekly summaries and give structured feedback every week, the way parallel teams stay visible in real organizations.
Data sources must be selected and approved by me early. If access, license, or ethics look shaky, surface that in tonight’s snapshot and in next week’s charter work.

You are here: project forming tonight. The project proposal locks direction, data plan, ethics, and methods and is due by the end of week 4. The data summary shows ingestion is stable enough to stop firefighting data; it is due week 7.
Data-Driven Scrum is how you steer week to week toward those graded milestones with visible process.
Goal: Move from last week’s wide exploration to a defensible direction you can charter and propose. You may keep and sharpen an idea, merge threads you heard from classmates, or scrap a direction that no longer survives feasibility or ethics checks.
Bring: Your submission from last week’s brainstorming session.
On your own, answer in writing (notebook or device; not submitted separately):
If last week’s idea still feels vague, that is fine. Name what you need to learn in the next two weeks to choose.
Talk with two or three different classmates (fewer than week one; deeper this time).
Exchange:
Capture one line per conversation for yourself. You will reuse this in the Canvas snapshot.
Do
Avoid
Use PRIDE on the question you are actually considering for the proposal:
| Step | Tonight’s focus |
|---|---|
| P Problem and impact | Who is affected? What improves if you succeed? |
| R Review and gap | What is known? What is unknown and matters for your setting? |
| I Inquiry | Primary (and optional secondary) research question(s) in plain language |
| D Data and ethics | What data exist or could exist? Consent, fairness, retention risks? |
| E Evidence plan | What would count as answering each question (metrics, baselines, design)? |
Draft at least P, I, and one bullet each for D and E. You do not need a full literature review tonight.
Pair with someone not in your proposed project team (if you already have one).
Trade drafts. Each reviewer answers:
Revise your question once based on the review.
Data-Driven Scrum keeps capstone work transparent, prioritized, and iterative: short cycles from questions to experiments, observe results, reprioritize the backlog.
Your repo and boards: You own the backlog and task flow. Weekly README summaries make progress legible to me and to your meta-project peers.
Meta-project cluster (assigned): Two other capstone projects read your board and summary each week. You do the same for them. They are not co-authors on your code; they are a standing review panel.
In class: Standups and backlog refinement finish during the scheduled meeting when we run them, including cross-team touchpoints in your cluster.
Shared execution: One backlog and task board the team maintains together. Division of labor should be visible on the board, not only in private chats.
Higher scope expectation: More integration, evaluation, or surface area than a solo project at the same quality bar. Proposal week is where scope gets approved.
Meta-project cluster (still assigned): Same as solo: two outside projects follow your work weekly. Your teammates are not a substitute for meta-project feedback.
Good peer comments tie to visible evidence (board column, README section, plot, schema sketch), separate curiosity from blocking concerns, and surface ethics and engineering risks early.
Goal: One concise submission so I can form Canvas project groups and meta-project clusters before chartering and proposal work next week.
Due: End of tonight’s class on Canvas (see assignment page for the upload link).
Who submits: If you are proposing a team, one person submits for the group; list all members in the form. If you are working solo, you submit individually.
Use the Canvas assignment instructions. Include:
Optional but useful: one sentence on the biggest open risk (data, ethics, scope).
Tonight: Project direction snapshot on Canvas.
Next week: Project chartering exercise (part of the proposal package) and proposal activities in class. Come with the same domain and questions unless you document a pivot.
Week 4: Project proposal due by the end of class.
Keep last week’s brainstorming notes and tonight’s PRIDE draft. You will reuse the language in charter sections and backlog items.
Questions on data approval, teaming, or scope: note them in the Canvas form or email after class.