A key feature of any style is how it capitalizes words in the titles of books, articles, and other works. Most recommend a variation of title case, or what CMOS until very recently referred to as headline style (before the publication of the 18th edition).
And though there are some differences among the major styles—for example, AP and APA capitalize prepositions of four letters or more in a title, whereas for Chicago it’s now five or more—they all specify an initial capital for verbs, regardless of length.
This includes the word “is,” as in the song title “Breaking Up Is Hard to Do” (Neil Sedaka and Howard Greenfield, 1962).* When such a title is mentioned in ordinary text or in a source citation, there are generally no exceptions (see CMOS 13.89). But there are some nuances to consider, including some graphical contexts where it may be appropriate to leave “is” lowercase.
“Is” is a mere linking verb, the textual equivalent of an equals sign—and it’s only two letters long. So it’s an easy word to forget to capitalize.
Nor does “is” appear all that frequently in titles, considering its ubiquity in ordinary prose. When it is used, it’s sometimes contracted, which is a good way of minimizing its impact. Take the title of the iconic movie It’s a Wonderful Life (1946). Without the contraction, and particularly with a capital I, the emphasis would shift toward the verb: It Is a Wonderful Life.
“Is” is spelled out in the title of the 1997 movie Life Is Beautiful (a translation from the original Italian), so it gets a capital I in Chicago style.‡ But the word is de-emphasized in the poster art for the theatrical release. Notice how the movie’s title is in caps and small caps except for the word “is,” which is in all small caps—and in a smaller font than any of the other letters in the title:
That works well: “Life” and “Beautiful” are the words that matter most.
Thanks for reading Capturing Voices! Subscribe for free to receive new posts and support my work.
Consider also the cover for Sue Grafton’s novel Y Is for Yesterday (G. P. Putnam’s Sons, 2017):**
The connecting words “is” and “for” are both lowercase, which allows the more important elements in the title (namely the Y’s) to stand out. (The preposition “for” would be lowercase in Chicago and most other styles.) In Grafton’s title—as in each of the titles mentioned in this post—“is” plays more of a supporting than a leading role.
A lowercase “is” like the one on the Grafton cover, where the small i alone signals that the word is unimportant, would be unlikely to make it past Chicago’s editorial team. But our publications tend to be scholarly in nature; in fiction and other creative contexts, rules are made to be broken.
What’s the Verdict?
“Is” is a verb, so unless it’s hiding behind a contraction, it should always be capitalized in titles mentioned in the text or in a Chicago-style source citation. But it’s a humble little word that doesn’t always like to stand out. In a graphical setting like a book cover or a movie poster, bigger isn’t necessarily better.
* Note that “Up” is an adverb, not a preposition, in the title phrase “Breaking Up”—and therefore capitalized (see also CMOS 8.160, rule 3).
† The subheads in this post are in title case, but sentence case is also an option for subheads, provided it’s consistently applied across a document (see CMOS 2.22 and 8.159).
‡ Wikipedia’s entry for Life Is Beautiful, as of August 23, 2021 (the day before this post was originally published), mentioned or cited that title twenty-eight times (up to and including the bibliography); in thirteen of those instances—or nearly half—the word “is” was spelled with a small i. Apparently, it’s natural to want to lowercase “is” in a title. (As of July 27, 2025, most of these had been fixed.) Such inconsistency isn’t a problem with the Italian title—La vita è bella—where sentence case (and, by extension, lowercase for è, “is”) is the norm (see CMOS 11.8).
** According to CMOS 7.67, letters used as letters are normally italicized (as when mentioned in text). Ditto for “Yesterday,” a word used as a word—which, according to CMOS 7.66, would normally be set in either italics or quotation marks. In an italic title, however, these distinctions are unnecessary (see CMOS 8.175).
Top image: Life Is Beautiful, by Linnaea Mallette (public domain).
A key feature of any style is how it capitalizes words in the titles of books, articles, and other works. Most recommend a variation of title case, or what CMOS until very recently referred to as headline style (before the publication of the 18th edition).
And though there are some differences among the major styles—for example, AP and APA capitalize prepositions of four letters or more in a title, whereas for Chicago it’s now five or more—they all specify an initial capital for verbs, regardless of length.
This includes the word “is,” as in the song title “Breaking Up Is Hard to Do” (Neil Sedaka and Howard Greenfield, 1962).* When such a title is mentioned in ordinary text or in a source citation, there are generally no exceptions (see CMOS 13.89). But there are some nuances to consider, including some graphical contexts where it may be appropriate to leave “is” lowercase.
“Is” is a mere linking verb, the textual equivalent of an equals sign—and it’s only two letters long. So it’s an easy word to forget to capitalize.
Nor does “is” appear all that frequently in titles, considering its ubiquity in ordinary prose. When it is used, it’s sometimes contracted, which is a good way of minimizing its impact. Take the title of the iconic movie It’s a Wonderful Life (1946). Without the contraction, and particularly with a capital I, the emphasis would shift toward the verb: It Is a Wonderful Life.
“Is” is spelled out in the title of the 1997 movie Life Is Beautiful (a translation from the original Italian), so it gets a capital I in Chicago style.‡ But the word is de-emphasized in the poster art for the theatrical release. Notice how the movie’s title is in caps and small caps except for the word “is,” which is in all small caps—and in a smaller font than any of the other letters in the title:
That works well: “Life” and “Beautiful” are the words that matter most.
Thanks for reading Capturing Voices! Subscribe for free to receive new posts and support my work.
Consider also the cover for Sue Grafton’s novel Y Is for Yesterday (G. P. Putnam’s Sons, 2017):**
The connecting words “is” and “for” are both lowercase, which allows the more important elements in the title (namely the Y’s) to stand out. (The preposition “for” would be lowercase in Chicago and most other styles.) In Grafton’s title—as in each of the titles mentioned in this post—“is” plays more of a supporting than a leading role.
A lowercase “is” like the one on the Grafton cover, where the small i alone signals that the word is unimportant, would be unlikely to make it past Chicago’s editorial team. But our publications tend to be scholarly in nature; in fiction and other creative contexts, rules are made to be broken.
What’s the Verdict?
“Is” is a verb, so unless it’s hiding behind a contraction, it should always be capitalized in titles mentioned in the text or in a Chicago-style source citation. But it’s a humble little word that doesn’t always like to stand out. In a graphical setting like a book cover or a movie poster, bigger isn’t necessarily better.
* Note that “Up” is an adverb, not a preposition, in the title phrase “Breaking Up”—and therefore capitalized (see also CMOS 8.160, rule 3).
† The subheads in this post are in title case, but sentence case is also an option for subheads, provided it’s consistently applied across a document (see CMOS 2.22 and 8.159).
‡ Wikipedia’s entry for Life Is Beautiful, as of August 23, 2021 (the day before this post was originally published), mentioned or cited that title twenty-eight times (up to and including the bibliography); in thirteen of those instances—or nearly half—the word “is” was spelled with a small i. Apparently, it’s natural to want to lowercase “is” in a title. (As of July 27, 2025, most of these had been fixed.) Such inconsistency isn’t a problem with the Italian title—La vita è bella—where sentence case (and, by extension, lowercase for è, “is”) is the norm (see CMOS 11.8).
** According to CMOS 7.67, letters used as letters are normally italicized (as when mentioned in text). Ditto for “Yesterday,” a word used as a word—which, according to CMOS 7.66, would normally be set in either italics or quotation marks. In an italic title, however, these distinctions are unnecessary (see CMOS 8.175).
Top image: Life Is Beautiful, by Linnaea Mallette (public domain).
A key feature of any style is how it capitalizes words in the titles of books, articles, and other works. Most recommend a variation of title case, or what CMOS until very recently referred to as headline style (before the publication of the 18th edition).
And though there are some differences among the major styles—for example, AP and APA capitalize prepositions of four letters or more in a title, whereas for Chicago it’s now five or more—they all specify an initial capital for verbs, regardless of length.
This includes the word “is,” as in the song title “Breaking Up Is Hard to Do” (Neil Sedaka and Howard Greenfield, 1962).* When such a title is mentioned in ordinary text or in a source citation, there are generally no exceptions (see CMOS 13.89). But there are some nuances to consider, including some graphical contexts where it may be appropriate to leave “is” lowercase.
“Is” is a mere linking verb, the textual equivalent of an equals sign—and it’s only two letters long. So it’s an easy word to forget to capitalize.
Nor does “is” appear all that frequently in titles, considering its ubiquity in ordinary prose. When it is used, it’s sometimes contracted, which is a good way of minimizing its impact. Take the title of the iconic movie It’s a Wonderful Life (1946). Without the contraction, and particularly with a capital I, the emphasis would shift toward the verb: It Is a Wonderful Life.
“Is” is spelled out in the title of the 1997 movie Life Is Beautiful (a translation from the original Italian), so it gets a capital I in Chicago style.‡ But the word is de-emphasized in the poster art for the theatrical release. Notice how the movie’s title is in caps and small caps except for the word “is,” which is in all small caps—and in a smaller font than any of the other letters in the title:
That works well: “Life” and “Beautiful” are the words that matter most.
Thanks for reading Capturing Voices! Subscribe for free to receive new posts and support my work.
Consider also the cover for Sue Grafton’s novel Y Is for Yesterday (G. P. Putnam’s Sons, 2017):**
The connecting words “is” and “for” are both lowercase, which allows the more important elements in the title (namely the Y’s) to stand out. (The preposition “for” would be lowercase in Chicago and most other styles.) In Grafton’s title—as in each of the titles mentioned in this post—“is” plays more of a supporting than a leading role.
A lowercase “is” like the one on the Grafton cover, where the small i alone signals that the word is unimportant, would be unlikely to make it past Chicago’s editorial team. But our publications tend to be scholarly in nature; in fiction and other creative contexts, rules are made to be broken.
What’s the Verdict?
“Is” is a verb, so unless it’s hiding behind a contraction, it should always be capitalized in titles mentioned in the text or in a Chicago-style source citation. But it’s a humble little word that doesn’t always like to stand out. In a graphical setting like a book cover or a movie poster, bigger isn’t necessarily better.
* Note that “Up” is an adverb, not a preposition, in the title phrase “Breaking Up”—and therefore capitalized (see also CMOS 8.160, rule 3).
† The subheads in this post are in title case, but sentence case is also an option for subheads, provided it’s consistently applied across a document (see CMOS 2.22 and 8.159).
‡ Wikipedia’s entry for Life Is Beautiful, as of August 23, 2021 (the day before this post was originally published), mentioned or cited that title twenty-eight times (up to and including the bibliography); in thirteen of those instances—or nearly half—the word “is” was spelled with a small i. Apparently, it’s natural to want to lowercase “is” in a title. (As of July 27, 2025, most of these had been fixed.) Such inconsistency isn’t a problem with the Italian title—La vita è bella—where sentence case (and, by extension, lowercase for è, “is”) is the norm (see CMOS 11.8).
** According to CMOS 7.67, letters used as letters are normally italicized (as when mentioned in text). Ditto for “Yesterday,” a word used as a word—which, according to CMOS 7.66, would normally be set in either italics or quotation marks. In an italic title, however, these distinctions are unnecessary (see CMOS 8.175).
Top image: Life Is Beautiful, by Linnaea Mallette (public domain).
Listening, in particular, was more demanding. As stories unfolded into complex ideas, listeners recruited a broader set of brain regions involved in memory retrieval, sustained attention, and social cognition. These included areas like the angular gyrus and posterior cingulate cortex, which help link incoming language to stored knowledge, and the medial prefrontal cortex, which supports imagining other people’s thoughts and intentions.
These networks allowed the listener not only to absorb the speaker’s words but to track their meaning over time, integrate it with prior knowledge, and infer intention. Speaking did not require the same level of integration. It remained more localized, focused on generating language and responding to immediate context. This involved regions like Broca’s area in the left frontal lobe, which helps plan speech, and nearby motor areas responsible for controlling the muscles used in speaking.
Thanks for reading Capturing Voices! Subscribe for free to receive new posts and support my work.
The brain builds conversational meaning across multiple timescales, from short phrases to full narratives.
While brief segments rely on shared brain regions, others engage different systems for speaking and listening.
These findings explain how people keep track of conversations and shift fluidly between roles.
“Happy talk,
Keep talkin’ happy talk,
Talk about things you’d like to do.”
These lyrics from South Pacific hint at something deeply human: Our lives unfold through talk.
Our conversations give form to our thoughts and tie us to one another. But beneath the surface of every spoken exchange lies a complex neural process, one that shapes how we create and interpret meaning together.
A new study published in Nature Human Behaviour reveals that the brain organizes this exchange by adapting to the timescale of the conversation. At shorter intervals, the brain uses overlapping systems for both speaking and listening. But as the dialogue stretches into full thoughts or stories, speaking and listening begin to rely on distinct processes. This layered structure helps explain how people carry out fluid, responsive conversations.
How the Brain Follows Conversations
To explore the inner mechanics of dialogue, researchers in Japan invited pairs of individuals to engage in unscripted conversation while lying in separate scanners, speaking through headphones and microphones. Their goal was not to study isolated words or scripted exchanges, but the fluid, spontaneous rhythms of how human communication unfolds in daily life.
The researchers segmented each conversation into varying lengths, from fleeting phrases to full narrative arcs. They then examined how the brain responded to these different timescales. During short exchanges, the same neural systems were active whether a person was speaking or listening. It seemed that, in the early moments of a conversation, both parties relied on a shared set of circuits to manage the rapid flow of words. However, as the conversation deepened and the timescale lengthened, the brain began to diverge in its treatment of each role.
Listening, in particular, was more demanding. As stories unfolded into complex ideas, listeners recruited a broader set of brain regions involved in memory retrieval, sustained attention, and social cognition. These included areas like the angular gyrus and posterior cingulate cortex, which help link incoming language to stored knowledge, and the medial prefrontal cortex, which supports imagining other people’s thoughts and intentions.
These networks allowed the listener not only to absorb the speaker’s words but to track their meaning over time, integrate it with prior knowledge, and infer intention. Speaking did not require the same level of integration. It remained more localized, focused on generating language and responding to immediate context. This involved regions like Broca’s area in the left frontal lobe, which helps plan speech, and nearby motor areas responsible for controlling the muscles used in speaking.
article continues after advertisement
In this asymmetry lies a profound insight. To speak is to project thought outward, but to listen is to reconstruct another person’s inner world. It is no surprise, then, that the brain allocates its deepest resources to the act of listening.
Why Speaking and Listening Feel So Different
To uncover how this works, the researchers constructed computational models capable of predicting whether a person was speaking or listening based solely on their brain activity.
Even the smallest acknowledgments, like “right,” “uh-huh,” and “you know,” elicit stable patterns in the brain. These fragments serve a subtle but vital purpose. They signal presence, mark engagement, and keep the rhythm of dialogue intact. In doing so, they reflect the fundamentally social nature of language: We do not speak into a void, but to be heard, understood, and affirmed.
As conversations become emotionally charged or intellectually complex, the gap between speaker and listener widens. The listener, more than the speaker, must navigate shifting layers of meaning. This involves not only cognitive effort, but emotional attunement.
Brain areas like the anterior insula and amygdala become more active during emotionally rich moments, helping the listener register tone and affect. Other regions, such as the temporoparietal junction, help track the speaker’s perspective, allowing the listener to imagine what the speaker might be feeling or intending. To listen well is to hold another person’s experience in mind, to mirror their emotions without losing oneself.
A Brain Designed for Dialogue
Conversation is more than the exchange of words. It is a layered, time-dependent process involving memory, emotion, attention, and the ability to switch between speaker and listener. The brain makes this possible by drawing on flexible systems: some geared for rapid responses, others tuned for extended stretches of meaning.
article continues after advertisement
What emerges is a brain finely shaped for connection. As South Pacific reminds us, “Happy talk, keep talkin’ happy talk.” The complex choreography within the brain allows us not only to speak, but to understand and be understood.
References
Yamashita, M., Kubo, R., & Nishimoto, S. (2025). Conversational content is organized across multiple timescales in the brain. Nature Human Behaviour, 1-13.
Listening, in particular, was more demanding. As stories unfolded into complex ideas, listeners recruited a broader set of brain regions involved in memory retrieval, sustained attention, and social cognition. These included areas like the angular gyrus and posterior cingulate cortex, which help link incoming language to stored knowledge, and the medial prefrontal cortex, which supports imagining other people’s thoughts and intentions.
These networks allowed the listener not only to absorb the speaker’s words but to track their meaning over time, integrate it with prior knowledge, and infer intention. Speaking did not require the same level of integration. It remained more localized, focused on generating language and responding to immediate context. This involved regions like Broca’s area in the left frontal lobe, which helps plan speech, and nearby motor areas responsible for controlling the muscles used in speaking.
Thanks for reading Capturing Voices! Subscribe for free to receive new posts and support my work.
The brain builds conversational meaning across multiple timescales, from short phrases to full narratives.
While brief segments rely on shared brain regions, others engage different systems for speaking and listening.
These findings explain how people keep track of conversations and shift fluidly between roles.
“Happy talk,
Keep talkin’ happy talk,
Talk about things you’d like to do.”
These lyrics from South Pacific hint at something deeply human: Our lives unfold through talk.
Our conversations give form to our thoughts and tie us to one another. But beneath the surface of every spoken exchange lies a complex neural process, one that shapes how we create and interpret meaning together.
A new study published in Nature Human Behaviour reveals that the brain organizes this exchange by adapting to the timescale of the conversation. At shorter intervals, the brain uses overlapping systems for both speaking and listening. But as the dialogue stretches into full thoughts or stories, speaking and listening begin to rely on distinct processes. This layered structure helps explain how people carry out fluid, responsive conversations.
How the Brain Follows Conversations
To explore the inner mechanics of dialogue, researchers in Japan invited pairs of individuals to engage in unscripted conversation while lying in separate scanners, speaking through headphones and microphones. Their goal was not to study isolated words or scripted exchanges, but the fluid, spontaneous rhythms of how human communication unfolds in daily life.
The researchers segmented each conversation into varying lengths, from fleeting phrases to full narrative arcs. They then examined how the brain responded to these different timescales. During short exchanges, the same neural systems were active whether a person was speaking or listening. It seemed that, in the early moments of a conversation, both parties relied on a shared set of circuits to manage the rapid flow of words. However, as the conversation deepened and the timescale lengthened, the brain began to diverge in its treatment of each role.
Listening, in particular, was more demanding. As stories unfolded into complex ideas, listeners recruited a broader set of brain regions involved in memory retrieval, sustained attention, and social cognition. These included areas like the angular gyrus and posterior cingulate cortex, which help link incoming language to stored knowledge, and the medial prefrontal cortex, which supports imagining other people’s thoughts and intentions.
These networks allowed the listener not only to absorb the speaker’s words but to track their meaning over time, integrate it with prior knowledge, and infer intention. Speaking did not require the same level of integration. It remained more localized, focused on generating language and responding to immediate context. This involved regions like Broca’s area in the left frontal lobe, which helps plan speech, and nearby motor areas responsible for controlling the muscles used in speaking.
article continues after advertisement
In this asymmetry lies a profound insight. To speak is to project thought outward, but to listen is to reconstruct another person’s inner world. It is no surprise, then, that the brain allocates its deepest resources to the act of listening.
Why Speaking and Listening Feel So Different
To uncover how this works, the researchers constructed computational models capable of predicting whether a person was speaking or listening based solely on their brain activity.
Even the smallest acknowledgments, like “right,” “uh-huh,” and “you know,” elicit stable patterns in the brain. These fragments serve a subtle but vital purpose. They signal presence, mark engagement, and keep the rhythm of dialogue intact. In doing so, they reflect the fundamentally social nature of language: We do not speak into a void, but to be heard, understood, and affirmed.
As conversations become emotionally charged or intellectually complex, the gap between speaker and listener widens. The listener, more than the speaker, must navigate shifting layers of meaning. This involves not only cognitive effort, but emotional attunement.
Brain areas like the anterior insula and amygdala become more active during emotionally rich moments, helping the listener register tone and affect. Other regions, such as the temporoparietal junction, help track the speaker’s perspective, allowing the listener to imagine what the speaker might be feeling or intending. To listen well is to hold another person’s experience in mind, to mirror their emotions without losing oneself.
A Brain Designed for Dialogue
Conversation is more than the exchange of words. It is a layered, time-dependent process involving memory, emotion, attention, and the ability to switch between speaker and listener. The brain makes this possible by drawing on flexible systems: some geared for rapid responses, others tuned for extended stretches of meaning.
article continues after advertisement
What emerges is a brain finely shaped for connection. As South Pacific reminds us, “Happy talk, keep talkin’ happy talk.” The complex choreography within the brain allows us not only to speak, but to understand and be understood.
References
Yamashita, M., Kubo, R., & Nishimoto, S. (2025). Conversational content is organized across multiple timescales in the brain. Nature Human Behaviour, 1-13.
Researchers examined more than 200 federal datasets and found that nearly half of them were altered between January and March. In most, the term “gender” was replaced with “sex.”
or months now, researchers and journalists have been documenting the disappearance of federal health data and monitoring changes to government websites. Now, a new analysis finds that some of the existing datasets have also been modified, most of them lacking a notice or log about the change.
Researchers compared more than 200 federal datasets that were available between January and March with their archived versions and found that nearly half were altered. In most cases, the word “gender” was changed to “sex.” Only 15 of the altered datasets included a note about the modification.
“The lack of transparency is a particular concern,” says Janet Freilich, a professor at Boston University School of Law, and co-author of the study, which was published in The Lancet in July.
Alterations were made across multiple federal agencies, including the Department of Veterans Affairs and the Centers for Disease Control and Prevention. The reason for the modifications was not documented in the datasets, but they coincide with a January 20 presidential directive instructing federal agencies to use the term “sex” instead of “gender.”
Federal health datasets have been a major source of information for scientists, and undocumented changes to existing data can undermine confidence in government statistics and distort research.
“There are two levels of harm here,” says Freilich, a patent lawyer by training, who has been following changes to the federal data in recent months. “If you think you’re looking for whatever the column title reflects, but the column — the underlying data — actually reflects something else, then you’re going to get a wrong answer. But the second level of harm is, this really impairs trust in federal data.”
A screenshot of CDC’s Youth Risk Behavior Surveillance System, captured on Aug. 18, 2025.
In March, Freilich co-wrote a paper in The New England Journal of Medicine on the disappearing data, finding that from Jan. 21 to Feb. 11 2025, the Centers for Disease Control and Prevention had removed 203 databases.
“I’m not expecting this information to come back,” Freilich says. “I just plead for transparency.”
Michelle Kaufman, an associate professor and director of the Gender Equity Unit at the Johns Hopkins Bloomberg School of Public Health, who was not involved in the Lancet study, said that while most people are aware that several federal datasets have been taken down, “this actual doctoring of it takes it to the next level.”
“I’ve been telling my students, ‘You might want to find other data sets that aren’t connected to the U.S. government, because we don’t know the accuracy at this point,’” Kaufman says.
She has also been advising her students to immediately download federal datasets they might need for research.
“You don’t know if it’s going to be there tomorrow,” she says.
Thanks for reading Capturing Voices! Subscribe for free to receive new posts and support my work.
The study and its findings
Freilich and her co-author Aaron Kesselheim, a professor of medicine at Harvard Medical School, examined metadata from more than 200 datasets from the Department of Health and Human Services, the Centers for Disease Control and Prevention, and the Department of Veterans Affairs, covering Jan. 20 to March 25, 2025.
Under the OPEN Government Data Act, federal agencies keep lists of information about all their datasets, called their metadata, including a unique ID, title, creation data, description, and content of the dataset (here’s an example). These lists are collected from each agency regularly by Data.gov, which acts as a central hub that brings together datasets from across the federal government and other sources.
Using Microsoft Word’s comparison tool, the authors then manually compared current datasets to the archived versions recorded by the Internet Archive. They focused on word changes, not numerical data. Researchers also didn’t track changes to the wording on government websites.
In one example, the authors identified a Department of Veterans Affairs modified dataset about veteran health care use in 2021, in which a column titled “gender” was renamed “sex”. Those words were also changed in the dataset’s title and description. Before March 5, the dataset had not changed since it was published in 2022.
Because many datasets did not have an archived copy, the Lancet study may not be representative of all datasets in federal repositories, the authors note. But in addition to documenting undisclosed changes to some of the existing datasets, the study reveals an increase in the pace of data alterations since January: 4% of changes happened in late January, while 72% occurred in March.
Researchers also found:
In 25% of altered datasets, the change from “gender” to “sex” made the data descriptions more consistent, as the word “gender” had been applied to data also labeled as “sex.”
In four datasets, “social determinants of health” was changed to “non medical factors.” In one, “socioeconomic status” was changed to “socioeconomic characteristics.” In another existing dataset, the question “Are PTSD clinical trials gender diverse?” was changed to “Do PTSD clinical trials include men and women?”
Of the altered datasets, 89 involved changes in classification or categorization, such as changing the column headers. About 25 had modified descriptive text, such as tags and paragraph overview.
To safeguard data integrity, Freilich and Kesselheim call for stronger transparency measures, independent archiving, and international alternatives.
“Gender” and “sex” in research
Sex and gender capture different information in research.
Sex usually refers to a person’s biological characteristics, whereas gender refers to socially constructed roles and norms, according to a 2023 paper by Kaufman, published in the Bulletin of the World Health Organization.
“So just because you were born as a designated sex category at birth, it does not mean that, psychologically, that’s how you feel, and that’s where the separation of biological sex comes in as separate to the social construction of gender,” Kaufman says.
Gender has been a focus of research, particularly in psychology, since the 1970s. Researchers still conflate the two concepts, which can make it difficult to compare studies. However, overall, gender and sex are not interchangeable in most studies and surveys. Gender captures a wider range of social experiences of people, compared with sex, which only captures male and female.
“Whether you’re talking about intersex people biologically, or nonbinary, third gender, transgender people in terms of identity, it erases that experience because you have to fit people into one of those two categories, male or female,” Kaufman says.
In addition, if a study aims to investigate the social constructions of gender and how roles and norms might have impacted health outcomes, using “sex” would make it difficult to interpret the results.
“Is it about the biology, the hormones, the chemical makeup of the person that led to these health outcomes, or was it their roles as a woman, or expectations as a man, that then led them down a certain path to those health outcomes?” Kaufman says. “By going back to this sort of gender essentialism of sex being a binary and that lining up completely with gender is sort of backtracking a lot of the research that’s been done over the past several decades.”
Where to find archived data
There’s no perfect alternative to the government databases.
“There’s a lot that can be done on the non-governmental side, but the government has such a leg up in the scope of information it can gather and its authority to gather information that others just can’t get access to,” Freilich says.
Since January, several volunteer groups and newsrooms have also been downloading and archiving government datasets and making them available to the public.
We’ve curated some of those resources below.
The Data Rescue Project is a collaboration among a group of data organizations and members of the Data Curation Network. The project — a clearinghouse for preserving at-risk public information — has created a Data Rescue Tracker and a Portal to catalogue ongoing public data rescue efforts.
Harvard Dataverse: Harvard Dataverse is a large publicly available repository of data from researchers at Harvard University and around the world, covering a range of topics from astronomy to engineering to health and medicine.
DataLumos is a crowdsourced repository for at-risk US federal government data. DataLumos is hosted by ICPSR, an international consortium of more than 800 academic institutions and research organizations.
Public Environmental Data Project: Run by a coalition of volunteers from several organizations, including Boston University and the Harvard Climate and Health CAFE Research Coordinating Center, the project has compiled a large list of federal databases and tools, including the CDC’s Social Vulnerability Index and Environmental Justice Index.
Dataindex.us is a collaborative effort to monitor changes to federal datasets.
The 19th, an independent nonprofit newsroom reporting on gender, politics, and policy, has archived government documents, including the CDC’s maternal mortality data, the CDC’s abortion and contraception data, research studies on teens, and guidelines from the National Academies on how to collect data on gender and sexuality.
Investigative Reporters & Editors: The nonprofit journalism organization has downloaded more than 120 data sets from the federal websites, as recently as November. Some of those data sets include Adverse Event Reporting System, Behavioral Risk Factor Surveillance System, Medical Device Reports, Mortality Multiple Cause-of-Death Database, National Electronic Injury Surveillance System (NEISS), National Practitioner Databank, Nuclear Materials Events Database, OSHA Workplace Safety Data, and Social Security Administration Death Master File. IRE members can contact the organization and order the data sets. The organization has been providing data to members since the early 1990s.
Naseem Miller is the senior editor for health at The Journalist’s Resource. She joined JR in 2021 after working as a health reporter in local newspapers and national medical trade publications for two decades. Immediately before joining JR, she was a senior health reporter at the Orlando Sentinel, where she was part of the team that was named a 2016 Pulitzer Prize finalist for its coverage of the Pulse nightclub mass shooting. You can follow her on Bluesky.