Back

Tyler Su

Over the past three years, the Designing Education Lab has been conducting a national longitudinal study of the post-graduation experience of engineers. I had the unique opportunity to work with the data, the first and only dataset of its kind to explore the experiences of engineers post-graduation.

My project was able to develop a visualization framework to better map and understand the pathways of engineers post-graduation, derive insights combining qualitative and quantitative techniques, and identify future areas of investigation using an evidence-based framework. These areas are currently being explored further at DEL, please check back again for more updates soon!

  • Team
    Designing Education Lab
    @ Stanford Center of Design Research
  • Role
    Data Science Intern
  • Timeline
    June-September 2018 (3 months)
  • Tools
    Python, Gensim, Gephi,
    Nvivo12, Excel, Photoshop

My Role

As a data science research intern, I was initially brought on to statistically code the unstructured qualitative responses of the EMS dataset. However, I saw an opportunity to also develop a new methodology combining user research and data science methods to uncover and validate insights gathered on the experience of higher education. Developing this methodology became my primary focus throughout the internship. Since leaving, I've shared my method with other graduate researchers and elements of it have since been adapted into projects beyond my own.

New Method

This new methodology addressed limitations of using either a single qualitative or quantitative approach, by allowing me to:

1. Generating hypotheses through qualitative data that can be validated quantitatively

2. Explain ambiguous patterns in data through rich qualitative analysis

3. Use storytelling to bring insights to life in compelling, impactful, and actionable ways

🎓

This case study illustrates just a few ways a methodology combining user research and data science can be powerful, an approach I hope to continue to use in future projects.

Throughout this independent project, I had the wonderful opportunity of working closely with Prof. Sheri Sheppard and Dr. Shannon Gilmartin and alongside several other incredible researchers at the Stanford Center of Design Research that stretched my thinking in unimaginable ways.

Let's jump in!

The Data

I was given access to the EMS dataset, a survey initially administered to over 30,000 engineering students. In the EMS 3.0, participants were asked to write an optional reflection on their post-graduation experience. Hundreds chose to write a substantial response, about 1 in 3 total survey respondents. (395/663)

Goals
  • Propose an evidence-based framework
    to better understand the complex journey of engineers and their evolving needs, that can be generalized and scaled into a useful tool for other datasets
  • Develop insights
    on the post-graduation experience of engineers from an unstructured qualitative dataset
  • Identify actionable next steps
    so that educators and employers can better support engineers in their post-graduation pathways
Primary Goal

Deliver a compelling data-driven narrative that brings discovered insights to life and mobilizes stakeholders to take action.

Step 1: Generating hypotheses through qualitative data

I selected a random representative sample of responses from the data and 'hand' coded them into thematic codes. Reading the responses allowed me to create hypotheses about the responses, and I was able to note major themes in the dataset (e.g. various extrinsic and intrinsic goals). This process is vital to qualitative research, because "hand" coding data allows researchers to capture nuances in language that other methods cannot. However, it faces scrutiny for lacking consistency, the possibility of bias, and the tedious and time-consuming nature of the process.

Step 2: Validating Patterns:
Natural Language Processing for Emergent Themes

The previous step was used to train a dataset of (document,code) pairs. I then used Gensim, a python library, to analyze the latent semantic structure of the data. My script transforms each response into vectors, using a bag-of-words representation. Feature vectors are created using a TFIDF (term frequency–inverse document frequency ) transformation model to reflect how important a word is to a document in the dataset. I then transform this new TFIDF corpus via Latent Semantic Indexing (LSI), an indexing & retrieval method that identifies patterns in the relationships between terms and concepts contained in an unstructured collection of text. For an unclassified document, the probability of the document belonging to 'x' code for all codes is returned. The highest probabilities are noted.

This step allowed me to validate the patterns I was qualitatively noting quickly and consistently over a large unstructured dataset.

Step 3: Discover Latent Patterns:
Co-occurrence Matrix of Coding Scheme

As it stood, my coding scheme could only identify how often respondents mentioned a given code, but could not identify in what ways, and under what contexts, a code was discussed which could lead to richer themes in the data. To resolve this, I built a co-occurrence matrix of my coding scheme to see if I could uncover latent patterns in the dataset. I am defining co-occurrence as a response being categorized into two different ‘codes’ simultaneously.

Fig. 3 This response would be coded into ‘internships’, ‘initiative’, and ‘job offer’. Then, (‘internships’, ‘initiative’), (‘initiative’, ‘job offer’), and (‘internship’, ‘job offer’) would be considered co-occurring pairs of codes.

A high number of co-occurrence between two different codes indicated a stronger relationship than a low number of co-occurrence between codes, allowing me to identify what experiences, attitudes, skills, and resources co-occurred frequently with different goals and motivations.

From these connections, I could infer relationships between different themes mentioned in the responses

However, this still doesn't mean anything to a simple onlooker.

What a qualitative approach excels in, is its ability to use storytelling to bring insights to life in compelling, impactful, and actionable ways. I needed to develop a framework that could communicate the complex relationships shown in the cooccurence matrix in an accessible way.

This is where it got difficult.

A majority of this project was spent trying to develop a good framework for this data that could possibly be scaled or generalized to apply to other datasets.

[Then I finally found it.]

Step 4: Explaining data through rich qualitative analysis: Directed Network Visualization of Relationships

I transformed the co-occurrence matrix into a

directed network visualization
for each of the four most frequently mentioned goals and motivations. Each co-occurrence became a ‘connection’ in the visualization. The network visualization allows us to visually map out the connections, or co-occurrence, between goals and different tools and resources easily.

With this framework, we could literally, visually trace paths from resources to goals, without ever drawing a single line.

I next sorted all codes into three tiers: goals, tools, and resources. I was interested in the directed relationships from resources to tools, and tools to goals.

Step 5:
Use storytelling to bring insights to life in compelling, impactful, and actionable ways
Top four goals

Developing Motivation or Direction, Exposure to Different Pathways, Job Readiness, Job Search

Key
  • The size of bubble
    represents total number of responses coded at that node
  • The graph is segregated into three tiers
    : goal (top), tools (middle), resources(bottom)
  • The size of bubble
    represents
  • The thickness of the connection
    represents the proportion of all responses from the source node that cooccured with the target node. A high number means that a larger proportion of responses that were coded under the source node, were also coded under the target node.
  • The opaque
    bubbles/connections are the most significant connections in the nap
  • The call-out excerpts
    represent a common response or theme of the responses that cooccured between the given source/target nodes.
Exposure to Different Pathways

Exposure to different pathways was the second largest intrinsic goal (33% of responses coded for intrinsic goals). Interestingly, despite many students expressing a desire for exposure to different pathways, a small subset of responses actually had relationships to other defined tools. Only three tools had relationships with the goal. Is there a lack of relevant tools to achieve this?

Job Search

A successful job search was the second largest extrinsic goal (38% of responses coded for extrinsic goals). In this map, we see al ack of any major conenctions with any skills (blue). What seems to be the most predominent relationships are those between the job search and internships, co-ops, and network. What we don't see are any significant relationships with skills. Why is that the case? Are skills preceived to be less important than other factors? Is this an intended goal, or is it evidence of a mismatch between student's priorities and what employers actually desire?

Job Readiness

Job readiness was the largest goal mentioned overall (44% of responses coded for extrinsic goals). Interestingly, there is a strong relationship to university resources. However, when you look closely at the responses coded between the two, you see that students expressed a lack of workforce-relevant courses that prepared them for their careers.

Developing Goals or Direction

Developing goals was the largest intrinsic goal mentioned (33% of responses coded for intrinsic goals). Despite the strong relationship with exploration and reflection, there was no relationship between these tools and any resources. Is there a lack of relevant resources that help foster exploration and reflection in students?

More to come

These visualizations and findings only represent the tip of the iceberg in the whole of an ongoing research probe into engineering education.

My project was able to develop a visualization framework to better map and understand the pathways of engineers post-graduation, derive insights combining qualitative and quantitative techniques, and identify future areas of investigation using an evidence-based framework. These areas are currently being explored further at DEL, please check back again for more updates soon!