The data steward often serves as the unsung hero within scientific organizations. The role is typically filled by a scientist with technological expertise who is dedicated to developing processes and standards for organizing the vast amounts of data collected in pursuit of breakthrough discoveries.
A data steward’s responsibilities primarily revolve around managing metadata and scientific context surrounding how data is captured, stored, and analyzed. It’s a role that comes with unique challenges that can test even the most patient professional, including:

- The incredible diversity of data types and datasets generated throughout the scientific investigation process
- The sheer volume and complexity of data gathered from various experimental methods and instrumentation
- The numerous, often firmly held, opinions among scientists and investigators regarding data organization and naming conventions (e.g., whether to use “scientist” or “investigator”), which can lead to surprisingly heated, intense debates.
Having served as a data steward in one role or another for most of my career, I thought I’d share some hard-won insights and observations from the trenches.
Stop Chasing the ‘Right’ Standard: Just Make It Work
“It’s more effective to aim for good enough to be useful rather than getting too fixated on getting it perfectly right.”
As a newly minted data steward at a small molecule drug discovery company, I was tasked with organizing all of the historical data. I eagerly conducted numerous interviews with scientists across the company, synthesizing their input into a proposed data standard. However, when I presented my proposal, I encountered significant disagreements among scientific disciplines regarding nomenclatures and data organization. In other words, I got my first exposure to the deep-seated religious wars that exist between scientific disciplines when it comes to nomenclature and data organization.
Questions arose such as whether to call it the “top” or “upper asymptote,” whether the compound identifier should include the salt form or not, or how to interpret averages when results fell outside the linear range of the curve. These weren’t merely technical questions, but rather fundamental battles over scientific identity and methodology. My biggest error was in believing that finding the “right” answer was paramount.
“I soon realized that my role as data steward was to build enough consensus within the organization to enable us to move forward with our critical work at hand.”
While scientists will passionately advocate for their vision of how data should be structured and categorized, ultimately, what they truly desire is a standard that is both understandable and workable.
Valuable lesson learned: It’s important to listen to your scientific community, but don’t get bogged down trying to make everything perfectly “right.” Sometimes, it’s better to aim for “good enough to be functional” rather than striving for unattainable perfection.
Don’t Drown in Data Details – Start with the End Game
When beginning as a data steward, it’s tempting to focus on every experimental detail. I’ve observed data stewards meticulously recording environmental details, like the temperature and humidity, when these metrics aren’t relevant in the context of the experiment, i.e., aren’t sensitive to either metric. While this attention to experimental context is laudable and understandable, it often overlooks the core objective: enabling the organization to collect and interpret data for product development purposes.
Instead, I recommend beginning envisioning the final data representation, i.e., the format your organization will use for crucial decision-making. For many, this is a SAR (structure-activity relationship) table, accompanied by visualizations such as IC50 curves or stability over time charts.
By outlining the ultimate data dashboard that will guide your organization’s decisions, you clarify precisely which characteristics are truly essential. From this endpoint, you can easily trace back through the data’s journey all the way through the lab bench, ensuring all vital and experimental factors are captured to generate clear, actionable insights.
Valuable lesson learned: Always start with the end goal in mind.
Let Data Be Data: Work with its Natural Format
“Not all data is created equal, and trying to force everything into a single mold is a recipe for frustration and inefficiency.”
Some data is best kept in its native format, especially in cases where robust tool ecosystems exist around its format. ‘Omics data, for instance, going back to the original FASTA files and used with the BLAST aligner, often benefits from being kept in files that inherently support querying and indexing, similar to a database, without needing a database. Other data is inherently tabular, like IC50s, metabolic stability, pharmacokinetics measurements (I’m setting aside more complex hybrid data types, such as histopathology, which combine images with tabulated measurements.)
My key takeaway is that for each type of experiment, we should aim to capture data in the most natural way it emerges from the experimental process rather than trying to fit everything into a single model. This means that data stewards need to plan for systems that can accommodate diverse data types and be mindful of what metadata scientists will naturally and accurately capture during their work.
Additionally, we should avoid adding unnecessary process steps to capture information that the experimenting scientist might perceive as irrelevant. While it’s tempting to include “nice-to-have” information (I often hear “we might need that additional metadata someday”), every extra field required can slow scientists down and reduce their compliance with their processes.
Valuable lesson learned: Work with the natural flow of scientific work, not against it.

“The role of data steward demands a unique blend of technical expertise, diplomatic finesse, and strategic foresight. Although challenges are inherent in this position, the profound impact of effective data stewardship on scientific advancement is undeniable. The lessons I’ve learned through experience, including trial and error (and the occasional bruised ego), can offer valuable guidance to the next generation of data stewards as they navigate this critical and multifaceted, complex role more effectively.”
Articles on the Practices of Data Stewardships
- 6 Strategic Steps to Enable Data Stewards in 2024! A contemporary guide on empowering data stewards with practical strategies, including creating psychological safety for stewards to address challenges proactively—directly relevant to your “battle-scarred” perspective.
- Current State of Data Stewardship Tools in Life Science Recent academic review of data stewardship tools specifically in life sciences, providing context for the technical challenges you discuss in organizing diverse scientific data types.
- Practicing Data Stewardship During Research Government perspective on collaborative data stewardship in research, reinforcing your points about working with diverse scientific stakeholders and maintaining data integrity.
- Data Stewardship Best Practices Comprehensive overview of data stewardship best practices across industries, providing broader context for your experience-based lessons and practical advice.
- Research Data Stewardship Policy – University of Michigan Example of institutional data stewardship policy implementation, showing how your practical lessons translate into formal organizational frameworks.
- Promoting the Stewardship of Research Data Authoritative academic resource on research data stewardship best practices, providing theoretical foundation for your practical insights and validating your “begin with the end” approach.