Within the scope of this course, I have learned how to effectively critique data visualizations and create some of my own, but for this assignment, I will be doing both to critique an existing visualization via redesign. On this page, I will post the initial visualization and critique, delve into user research and feedback on new sketches I’ve created, and recreate the original visualization according to the feedback I gave and received.
Below is the original data visualization I chose to critique via redesign. The graph comes from the New York State Education Department from a slide deck outlining spring 2020 digital equity survey results. The data were submitted by New York State Schools “to the best of their ability and knowledge” in June and July of 2020. I picked this visualization because I am interested in digital equity as a policy topic and this particular visualization didn’t fully represent the brevity of internet access barriers, in my opinion. I know that it’s very important for schools to understand the challenges their students face at home (including connecting to the internet to complete schoolwork) and I felt as though this visualization didn’t effectively tell the data’s story.
Describe your overall observations about the data visualization here. What stood out to you? What did you find worked really well? What didn’t? What, if anything, would you do differently? Upon first glance, the data visualization was confusing and felt chaotic due to the many colors, data labels, and labels on the x-axis. I didn’t immediately know what “NRC” meant from the title, so it took a bit of time to understand what was being said with the data. The “#N/A” category on the x-axis felt lazy and led me to question if the person organizing the data was reliable. In terms of things that worked well, the big, bold title is easy to read and draws my eye to the top left corner immediately. I think it’s wise that the creator disaggregated the data by different ‘type’ of school to help identify that schools—and the students at those schools—may have differing needs as they relate to internet access. The color palette feels natural to the eye as we tend to see red-green color schemes in everyday life.
That said, the bright colors make it hard for me as a reader to identify what’s important about the data. Does red signify the “worst” barrier? Not in every category. Additionally, the data itself is hard to interpret because the y-axis is represented in percentages, but the data labels are raw to show sample sizes. This discrepancy could work for a more advanced audience, but I think are generally confusing. Additionally, the categories of school type feel disparate and somewhat unrelated which makes me wonder if there is a type of school I should be focusing on as a viewer. Regarding my thoughts on preliminary changes, some things that come to mind are to cut down on the types of schools represented, delete the #N/A data, represent the data as percentages to cut down on the mental interpretation the audience has to do, and change the color scheme to highlight certain data over others.
Who is the primary audience for this tool? Do you think this visualization is effective for reaching that audience? Why or why not? For context, this visualization is part of a larger slide deck exploring the spring 2020 digital equity survey results for the New York State Education Department. Thus, the intended audience seems to be staff that work in the department who want to learn more about the demography of their students (perhaps administrators like principals). I think this visualization is somewhat effective for reaching the audience. On one hand, the data is helpful for learning about students’ challenges to internet connectivity. On the other hand, the data isn’t presented in a particularly clear way that would leave staff with an impression about the important components of the data.
How successful what this method at evaluating the data visualization you selected? Are there measures you feel are missing or not being captured here? What would you change? Provide 1-2 recommendations (color, type of visualization, layout, etc.) I think Stephen Few’s Data Visualization Effectiveness Profile was very successful at capturing the various aspects of quality and I wouldn’t add any additional measures. Regarding changing the measures, I think the “aesthetics” category should explicitly mention the use of color as a highlight (e.g., does the use of color effectively highlight the most important data?). Additionally, I might change the “usefulness” category to include something about including TOO much data in the visualization (e.g., is the entirety of data presented in this visualization useful to the intended message?).
To aid in my redesign process, I created three separate sketches that display the data differently than the original visualization, and differently than one another by iterating each sketch slightly. To gather feedback and test the readability of my sketches, I asked two separate users a series of questions that remained the same for all three visualizations. The users have the following demography:
User 1: Male, 27 years old, software engineer
User 2: Female, 52 years old, sixth grade middle school counselor
Below are the sketches and feedback.
To begin, I knew I had to do a fair amount of cleaning in terms of data presentation and overall chart junk. I immediately decided to remove the data for “None” and “Not Reported” from each type of school cateogry and cut down on the number of categories represented in this visualization. It seemed to me that the categories most homogenous with one another where those referncing the “need” of schools rather than their location. Thus, I kept “low needs,” “average needs,” and both “high needs” schools types in my sketch.
Further, I decided that instead of focusing on all barriers to internet access, I could highlight the data that represents barriers to cost. Ostensibly, cost as a barrier to internet access is something the district can more directly impact (as opposed to internet availability). The proportion of students who experienced barriers to internet via “availability” or “other” were represented with shades of gray while the data of interest–the proportion of students for whom cost was a barrier–were represented with green.
Can you describe what this visualization is telling you?
User 1: This visualization is a breakdown of the prominence of what prevents students (or schools?) from accessing the internet.
User 2: Cost is a significant factor among all types of schools in accessing internet.
Is there anything you find surprising or confusing?
User 1: I can’t tell if the x-axis and data labels are referring to students or schools–seems conflicting and that’s unclear to me. Are the schools trying to get access? Or people who ATTEND the schools trying to get access?
User 2: I’m taken aback by the large amount of rural high needs schools for whom availability is an issue. Why isn’t that data highlighted? I’m curious about it but left with questions. Also, I’m not totally understanding what the labels on the x-axis mean; what defines a ‘low needs’ school?
Based on the data represented here, who do you think is the intended audience?
User 1: Ideally, people who have the power to allocate funding to schools and/or students or internet service providers who are price-gouging subscription costs.
User 2: School administrators and teachers/staff who are trying to learn more about the demography of their district–and the differences between each ‘type’ of school.
What would you change about this visualization, if anything?
User 1: I would change the labels on the x-axis to be more clear for the population you’re trying to represent (schools or students at the schools), keep percentage signs on all data labels (because the percentages don’t add up to 100 so I think you need them), and find a way to represent high needs schools together and THEN disaggregate. Does it make sense to represent high needs schools together and then have a separate visulization breaking down how high needs schools differ by a variable?
User 2: I would make the title and categories of schools more clear, but I’m not really sure how to do that. I think you could mention in the title that the population of interest is students in these schools, and then amend the school category labels to reflect the change in the title. I think the green for the cost could be a little brighter because it seems like the dark gray and dark green are almost of the same significance.
My second sketch has many similarities with the first sketch in terms of type of graphic, layout, and axes. However, for this redesign, I wanted to highlight the majority barrier per each type of school rather than only cost because the proportion of students in rural high-needs schools experiecing a lack of availability of internet access is a signficant piece of the story of the data. I chose to omit the “other” barrier option (due to its lack of clarity or elucidation about the root problem) and instead focus on cost and availability. The majority proportion per type of school is represented as a solid bar while the other source of inaccess is represented with dotted lines. I still wanted to show the proportion of the other type of inaccess without highlight it, per se.
Can you describe what this visualization is telling you?
User 1: Okay, I already like this one better. This visualization is telling me that different types of schools have different needs. Most students who can’t access the internet struggle with cost as an issue, except for rural high needs schools where students struggle with access/availability of internet.
User 2: Hm, interesting. You highlighted the data I wanted for rural schools! This graph is telling me that rural high needs schools have different needs than the other types of schools and the other categories of school all struggle with cost. I would still make the green brighter, though.
Is there anything you find surprising or confusing?
User 1: I still think there’s too much text on the x-axis that makes the information hard to absorb. Like, in the last visualization, I didn’t even really register than the last two categories represented urban and rural high-needs schools. I know you underlined them, but it wasn’t clear to me until now.
User 2: I find it surprising that among students in rural high needs schools, of course the majority struggle with internet availability and that is highlighted very clearly, but a huge group of students struggle with cost, too–comparable to the majority barrier for average needs schools.
Based on the data represented here, who do you think is the intended audience?
User 1: For this visualization, the audience seems to be policy makers at the state or federal level who need to know that rural schools have different needs and should be considered as such. I think the message is still related to funding for schools, but compared to the last graph, this one seems to highlight MSA status in a significant way that screams policy to me.
User 2: Same population as the last one–school administrators and teachers/staff who need to know how to treat students differently based on their different barriers to internet access and completing assignments at home.
What would you change about this visualization, if anything?
User 1: Consider removing low-needs and average-needs schools from this visualization altogether. Are they really necessary to make the point? Also, I might move rural high-needs schools to the end of the chart to further highlight the significance of the difference. Oh, and I think if you keep all 4 categories, both high needs categories should still be represented together in some way.
User 2: I really like this one. I’m still a little confused by who is being represented by this data (students or schools) so I’d make that more clear via title or label alterations.
Finally, I tried to represent said ‘majority’ proportions per school type in a visualization other than a bar chart. I was attempting creativity but I’m not satisfied with the product. I agree with the users who provided feedback below in that this matrix is confusing due to a lack of data rather than too much data like in the original visualization. For instance, should the data be read across the rows or columns?
I do like the use of color for the left-justified labels denoting low, medium, and high need schools. Also, the rural high-needs school data is clearly highlighted in without the use of additional color.
Can you describe what this visualization is telling you?
User 1: Depending on whether a school is categorized as low-need, average-need, or high-need, the majority of students will struggle with internet availability or internet cost.
User 2: Half of rural high-need students struggle with internet cost. Well, actually, I’m not sure. No, it’s confusing.
Is there anything you find surprising or confusing?
User 1: I see where you were going with this, but the lack of data across the rows confuses me. User 2: I’m thrown off by the blank cells for non-majority barriers and left feeling confused about how to add up the data–across row or column?
Based on the data represented here, who do you think is the intended audience?
User 1: I’d say the same audience as the first visualization (For reference, User 1 said: “Ideally, people who have the power to allocate funding to schools and/or students or internet service providers who are price-gouging subscription costs”).
User 2: Um, maybe someone who has the ability to make internet more accessible and available to rural students.
What would you change about this visualization, if anything?
User 1: I’d add light gray stats to the blank sections of the matrix, like barely perceptible, but enough so the reader knows there is data there. And, I’d add in some sort of consistent hierarchy for low, average, and high-needs categories by adding one big “high-needs” label and then the urban and rural labels smaller under the high-needs label.
User 2: Maybe add additional data to the empty parts of this chart and show how the reader should sum the percentages, like with an arrow pointing across the rows or an equals sign and total percentages across either row or column data.
A few key patterns have emerged in the feedback from the two users. First, the way I’ve represented the data in the titles and axes are confusing and unclear as to who is struggling with access: schools themselves or students attending the schools? Similarly, the representation of low, average, and high needs schools should be more clear by removing text from the axis or using color/bold to differentiate type.
Rural and urban high needs schools should be sub-labels under a greater “high needs” label to make it more clear that those two columns of data are unique than the other two (low and average needs). Further, a larger label would help the reader pay attention to what makes the two types of schools different which will hopefully make the rural/urban distinction more clear. My classmates suggested that I remove words from the x-axis labels to further clarify the meaning of the graph and reduce unneccesary text.
Finally, there was some feedback about the organization of the data in terms of what gets included on the chart. I will base my final chart on the layout in sketch 2, keeping low and average needs school data and maintaining the position of rural high needs schools on the axis. While I agree with User 1 in that moving the rural data to the rightmost position would further mark it as ‘other,’ I think the different colored majority data sandwiched in between average needs and urban high needs schools is more powerful in showing its uniqueness. Additionally, my classmates suggested that instead of representing the ‘minority’ data with dashed lines, I can instead represent both cost and availability data with solid colored bars to reduce confusion.
Finally, below is my updated redesign:
I attempted to clarify the message of the data in the title and cut down on text within the visualization to reduce confusion resulting from chart junk. I picked colors that didn’t have a clear association with one another, though still effectively highlight important facets of cost and availability barriers. I did find that I was slightly limited by the functionality of Flourish. For example, I wanted to emphasize certain words within the y-axis labels (i.e., bold urban/suburban and rural and potentially change the color of high, average, and low needs text) but I was only able to edit the entire label.
Overall, I believe this final redesign is both effective on its own (sans comparison within the context of this course) and addresses the shortcomings of the original visualization and subsequent sketches.