Introductory Workshop: Asking Questions about Data
Rough Draft. Created with / Forked from Colin Jemmott.
- CSE 91 Section C
- TuTh 5:00-6:20, EBU3B 4140
This sequence of projects familiarizes the you with the process of answering questions using data. The projects are hands-on and open-ended. They are meant to be completed first using Tableau (Project 1), then Jupyter notebooks (Projects 2-4). The projects attempt to answer the three big questions:
- What happened?
- Why did it happen?
- What will happen? They use a variety of data types and exposes you to data's limitations and messiness.
By the time you finish this class, you will be able to:
- Identify problems that are good candidates for data science,
- Reframe the problem in a way that can be answered with the available data,
- Evaluate the limitations and quirks of the data,
- Manipulate the data to answer relevant questions, and
- Communicate the results clearly.
This class operates as a data science lab class, meaning that the bulk of time in class is spent doing data science. Formal lectures will be minimal, and meant to help you understand the tasks and the context.
Unless we get behind, there should not be mandatory coding or analytics work outside of class (though you are encouraged to expand on the projects and see how much you can do!). There is significant reading outside of class, and it is important to keep up with the reading because of the limited formal instruction during class time.
Projects are worked on during class and must be turned in to the instructor via email by the due date below. Please include "[CSE 91 Fa18 Project]" in the subject line and attach Tableau workbook or Jupyter notebook. You should not need to work on the projects outside of class, but are encouraged to expand on the project and clean up the presentation if you are motivated. A little extra work can make a great project to show off on your resume.
- SDPD traffic stops Tableau project (Due 10/23)
- Favorite of the "why did it happen?" (Due 11/20)
- SDPD traffic stops final project (Due 12/06)
| Week | Section | Topic |
|---|---|---|
| 0/1 | What happened? | Getting Started |
| 1/2 | Data is Messy | |
| 2/3 | Questions to Metrics | |
| 3/4 | Communicating Results | |
| 4/5 | Why did it happen? | Images |
| 5/6 | Audio | |
| 6/7 | Unstructured Text | |
| 7/8 | A/B Testing | |
| 8/9 | What will happen? | Prediction |
| 9/10 |
There are loosely related readings scheduled concurrently with the
projects, intended to get you thinking about larger issues in data
science. These are found in the readings.md file in each project. A
journal response to the readings are due six days after starting a
module (one per line of the schedule) -- i.e. 24 hours before every
Thursday class.
Each week, at least 24 hours before the start of the Thursday class, you must write me a paragraph or two about the weekly reading and send it via a private note in Piazza. The goal of this is to have a more in-depth conversation than our class time allows. Think of these journals more as emails to discuss something you read with a colleague than as a formal essay.
The topic is up to you, but examples include:
- Relating a topic in the reading to something in your life or in the news,
- Asking a thoughtful question about the reading, or
- Picking a quote from the reading that you agree or disagree with and explaining why.
I may bring up what you write to me in class unless you explicitly ask me in that email not to.
This class is graded on a Pass/Not Pass scale. Your course grade consists of the following components:
| Percentage | Component |
|---|---|
| 10% | Class Participation |
| 20% | Journals |
| 30% | SDPD traffic stops Tableau project |
| 20% | Best assignment of “Why did it happen?” |
| 20% | SDPD traffic stops final project |
The final grade is computed using the following range:
- Pass: 70%-100%
- Not Pass: 0%-69%
Attendance is critical because the bulk of the coding happens in a collaborative manner during class time. Of course, you may have to miss class due to illness, a family emergency, or similar reason. If this happens, you should let the instructor know by email as soon as possible (preferably before class). You will still be responsible for completing the in-class work. Without the collaboration and explanations that happen in class, this will be much more difficult, so I strongly recommend coming to the office hours for help. In-class assignments from a missed class will not be accepted after the following class meeting.
Some coding will be by yourself, and some will be paired programming, meaning that you will work together with a partner to complete tasks. In both cases you are encouraged to ask for help from the instructor or from other students. This is a collaborative environment, which means that while in this classroom it is ok to show your work to other students and discuss it openly.
However, even in this collaborative environment, the work you do must be your own. Specifically, you must do the actual work of completing the assignment (i.e. typing out the code, moving the mouse) and understand what your code or analysis is doing.
If you are unsure about if what you are doing is ok, just ask! You will never be reprimanded in this class for asking for clarification. Also note that this is likely a different standard than your other classes.
For this class, the key to academic integrity is accurately representing the status and authorship of your work. I strongly encourage you to read the official UCSD policy on integrity of scholarship.
I am committed to an inclusive learning environment that respects our diversity of perspectives, experiences and identities. You, as a student in this course, are also responsible for maintaining an environment where your fellow students feel safe and respected.
In my opinion, the key to this is recognizing the inherent worth and dignity of every person. If there is a way you could feel more included please let me know, either in person, via email/discussion board, or even in a note under the door.
If you have a disability for which you are or may be requesting accommodations, please contact Office for Students with Disabilities. You must have documentation from the the Office before accommodations can be granted.