Group Project: Exploratory Data Analysis & Visualization
Working in groups (3-5 members per group) visually explore datasets and confirm or disconfirm hypotheses about the data. Fill in your group member details in [this spreadsheet]
The task in this assignment is to formulate and answer a series of specific questions about a dataset of your choice. You should use visualization both to answer the questions (exploration) and to communicate your answers to others (presentation). Create a comment (one comment per group) that documents all the questions you asked and the steps you performed from start to finish. The goal of this assignment is to understand better the process of using visualizations to perform exploratory data analysis and communicate insights effectively. If (and only if) the Comment space is insufficient or unsuitable, you can hyperlink parts of the work as external documents, all of which should be easily/openly accessible to stay on forever.
Step 1. Pick a domain and dataset that you are interested in.
Peruse the provided datasets below. Choose the one of greatest interest to you. If you would like to explore a different dataset, you are free to do so. If you are unsure about your choice, consult with me.
Step 2. Pose an initial question that you would like to answer.
For example: Is there a relationship between festivals and fatalities in Mumbai Locals? What do NOTA votes tell us about the electoral contests? Which rule change led to the most dramatic change in batting performances in Cricket? Who is least expensive, most valuable player in IPL? Is there a relationship between melting point and atomic number? Are the brightness and color of stars correlated? Are there different patterns of nucleotides in different regions in human DNA?
Step 3. Assess the fitness of the data for answering your question.
Inspect the data--it is invariably helpful to first look at the raw values. Does the data seem appropriate for answering your question? If not, you may need to start the process over. If so, does the data need to be reformatted or cleaned prior to analysis? Perform any steps necessary to get the data into shape prior to visual analysis.
Exploratory Analysis Process
After you have an initial question and a dataset, construct a visualization that provides an answer to your question. As you construct the visualization you will find that your question evolves - often it will become more specific. Keep track of this evolution and the other questions that occur to you along the way. Once you have answered all the questions to your satisfaction, think of a way to present the data and the answers as clearly as possible.
Before starting, write down the initial question clearly. And, as you go, document what you had to do to construct the visualizations and how the questions evolved. Include which dataset you chose; describe any transformations or rearrangements of the dataset that you needed to perform. In particular, describe how you got the data into the format needed by the visualization system. Post any intermediate visualizations that helped you refine your question.
After you have constructed the final visualization for presenting your answer, write a caption and a paragraph describing the visualization, and how it answers the question you posed. Think of the figure, the caption and the text as material you might include in a research paper.
Datasets
Grading Criteria
Each project will be equally graded irrespective of the number of members in the group. This is to encourage you to work in groups, larger the group, less work for each member. Grading will be based on the analysis process and final visualization.
Analysis Process
- Clear questions applicable to the chosen data set
- Appropriate data diagnostics and transformation
- Sufficient breadth of analysis, exploring multiple questions
- Sufficient depth of analysis, with appropriate follow up questions
- Clear explanation of data exploration process
Final Visualization
- Image answers the chosen question in a compelling manner
- Visualization can function as a "stand alone" figure
- Expressive and effective visualization, good choice of visual encodings
- Appropriate caption, labels and description
(adopted from Jeffery Heer's CSE512 Data Visualization course at University of Washington (https://courses.cs.washington.edu/courses/cse512/)
Group Project: Exploratory Data Analysis & Visualization
Working in groups (3-5 members per group) visually explore datasets and confirm or disconfirm hypotheses about the data. Fill in your group member details in [this spreadsheet]
The task in this assignment is to formulate and answer a series of specific questions about a dataset of your choice. You should use visualization both to answer the questions (exploration) and to communicate your answers to others (presentation). Create a comment (one comment per group) that documents all the questions you asked and the steps you performed from start to finish. The goal of this assignment is to understand better the process of using visualizations to perform exploratory data analysis and communicate insights effectively. If (and only if) the Comment space is insufficient or unsuitable, you can hyperlink parts of the work as external documents, all of which should be easily/openly accessible to stay on forever.
Step 1. Pick a domain and dataset that you are interested in.
Peruse the provided datasets below. Choose the one of greatest interest to you. If you would like to explore a different dataset, you are free to do so. If you are unsure about your choice, consult with me.
Step 2. Pose an initial question that you would like to answer.
For example: Is there a relationship between festivals and fatalities in Mumbai Locals? What do NOTA votes tell us about the electoral contests? Which rule change led to the most dramatic change in batting performances in Cricket? Who is least expensive, most valuable player in IPL? Is there a relationship between melting point and atomic number? Are the brightness and color of stars correlated? Are there different patterns of nucleotides in different regions in human DNA?
Step 3. Assess the fitness of the data for answering your question.
Inspect the data--it is invariably helpful to first look at the raw values. Does the data seem appropriate for answering your question? If not, you may need to start the process over. If so, does the data need to be reformatted or cleaned prior to analysis? Perform any steps necessary to get the data into shape prior to visual analysis.
Exploratory Analysis Process
After you have an initial question and a dataset, construct a visualization that provides an answer to your question. As you construct the visualization you will find that your question evolves - often it will become more specific. Keep track of this evolution and the other questions that occur to you along the way. Once you have answered all the questions to your satisfaction, think of a way to present the data and the answers as clearly as possible.
Before starting, write down the initial question clearly. And, as you go, document what you had to do to construct the visualizations and how the questions evolved. Include which dataset you chose; describe any transformations or rearrangements of the dataset that you needed to perform. In particular, describe how you got the data into the format needed by the visualization system. Post any intermediate visualizations that helped you refine your question.
After you have constructed the final visualization for presenting your answer, write a caption and a paragraph describing the visualization, and how it answers the question you posed. Think of the figure, the caption and the text as material you might include in a research paper.
Datasets
Grading Criteria
Each project will be equally graded irrespective of the number of members in the group. This is to encourage you to work in groups, larger the group, less work for each member. Grading will be based on the analysis process and final visualization.
Analysis Process
Final Visualization
(adopted from Jeffery Heer's CSE512 Data Visualization course at University of Washington (https://courses.cs.washington.edu/courses/cse512/)