2019 Science of Science Funding Summary and Notes

NBER Summer Institute

Science of Science Funding

Day 1: Jul 18, 2019


A. Relationship of funding to Patents:

  • Government-funded Research Increasingly Fuels Innovation Lee Fleming, University of California at Berkeley Hillary Greene, University of Connecticut Matt Marx, Boston University Dennis Yao, Harvard University

Tracking policy actions from Wellcome funded research is a priority.

Amibition #7: health is improved thru changes in policy and practices

1bn GDP per year budget, 80% of funding goes towards science research

Wellcome focuses on 9 ambitions to stay accountable, 'what success looks like at wellcome'

Malaria case: 2006 artemisinan for malaria recommended by WHO in 2006

  • Wellcome-funded studies showed ACT worked, reduced deaths by malaria
  • it took 15 years

Can funders play larger active role to make sure research findings get taken up into policy

Wellcome trying to do more for real policy, and track outcomes that make it into policy/society

  • Danil Mikhailov, Wellcome Trust

Data Analysis

Wellcome data labs - interdisciplinary team of scientists, data scientists, engineers, ML people

Results presented purposely include errors and mistakes, because this is a meta analysis

Went through 220,886 documents on WHO website, counts as large 'scale', which justified use of ML and automation


    1. made some software tools to scrape policy docs from WHO, unicef, NICe, and MSF
    1. developed ML tools using naive bayes to find citation sections, split them, parse into buckets of title, athor, date, etc, and match them with what Wellcome funded
    1. built a pipeline to automate it and make it fast

All code is availble on Github

Digital Science tools for funding data.

"Reach" is potential connection of infleuence, not necessarily 'impact'


Even though 4 of 10 were wrong/inaccurate, they still found projects and outcomes that they funded that they didn't know about (lol)

Some papers were inaccurately claiming to be funded by Wellcome, when they weren't. (???)

Since this first round in February, they've done more rounds of iterations.

They want to add features: find subcategories - e.g. disease types or document types(white papers, conf talks, panel proceedings), expand to other funders, maybe make a useable front end service?

August/September beta for a Wellcome Reach Tool as open and free service

Denny: curious about confirmed dead-ends or disputed using a positive/negative citation correlation

Practical ethics in prod dev

They also had an intentional ethical analysis as part of the data science team and product development practices, e.g. if they found negative outcomes

To do this

  • paired social researchers with user researcher embedded in tech team
  • anticipatory analysis: uses, edge cases, misuses (accidental breaking of the tool), abuses (intentional using the tool wrong), e.g. making certain labs publicly known/tied to certain sensitive research topics
  • using github to prioritize concerns

  • Jonas Hjort, Columbia University and NBER
  • Diana B. Moreira, University of California at Davis
  • Gautam Rao, Harvard University and NBER
  • Juan Franc. Santini, Innovations for Poverty Action

Do Research Findings Influence Policy? Experimental Evidence from 2,145 Brazilian Municipalities

Are policy makers paying attention for social science research - childhood development programs, housing vouchers, school choice, etc

Do policy makers value research?

Does it actually shift policy makers beliefs?

Do they act on the research?

Interested in information friction - barriers to adoption of scientific evidence in politics.

Ran 2 experiments in municipalities in Brazil:

  1. demand belief experiment: 900 muni gov officials (mayors, secretaries, council members) in 2017, 2018. asks how sophisticated research consumption is

  2. policy adoption experiment: field exp with 1818 new mayors in 2016. around a specific research content: remind citizens to pay taxes. then measured policy changes 18-24 mos later.

In BRA mayors have a lot of autonomy. They worked with a non-partisan conf org for mayors and local legislators.

The experiment only had mayors interact with a tablet. They were randomly assigned studies and asked questions. Eliciting priors included beliefs about the policies, offer a randomly selected study out of 4 for purchase. Elicit willingness to pay: "how much do you want to pay to get access to the funding". Measure posterior beliefs.

Untitled 5877da71 31f6 49c2 a6cd 670f94d0de04

findings: they on average pay 45 lottery tickets, and actually update their posterior beliefs based on the findings.

2nd experiment

in a 2 day conference, lots of traiining sessions. mayors participated were from 5-10k populatants (54% of municipalities).

Untitled 23ea7e1d 2fb0 4c45 8a52 4fd1451a5333

45-min info session about sending reminder letters to tax payers, how to implement it, effect on tax compliance, etc

So who showed up?

38% attended the sessions: younger and college educated are more likely to attend. Things that didnt matter: term limited, poverty, education of the constituents, etc. (figure)


found that they communicated the findings to staff, mayors and secretaries of finance or more educated

  1. There is demand for research findings (~40%) with some price (either money or time)
  2. Policymakers are indeed quite sophisticated about research consumption
  3. "translated-to-policy" research findings increased the tax-reminder policy by 10 percentage points


Daniel Goroff, Alfred P. Sloan Foundation

Sloan Fdn spiel - Sloan doesn't fund policy work or policy research ("policy research is the manifestation of confirmation bias").

Mentions how haggard WH OSTP policy practices are, it's very opportunistic and somewhat unpredictable.

Sloan doesnt even get papers or outcomes from grantees, it's in practice hard to track down what happens to funding.

"Would any of this have happened without Wellcome" - difficult to prove the counterfactual. This kind of tool is looking at observational data.

Niche papers are cited most, not necessarily quality ones.

For ML implementation: wanted more info on training sets, cross validation, false negatives/positives.

Would this kind of tool actually change author citing behavior? "If funders/Wellcome is measuring this behavior, maybe we should change our current writing behavior..."

2nd paper

Methodological comments:

  • "looks like a good randomized trial"

  • Interested in giving them papers with null results and see what they do with that (lol)

  • peer effects (sitting in a room with lots of excited people)? spillovers (going home to tell people)? potential bias

  • Discussant: Kei Koizumi, American Association for the Advancement of Science

Spent time in WH OSTP

"Citation is just the tip of the iceberg" - citations don't weigh a lot in policymakers practice, phone calls, face to face time, happens too.

Executive orders don't have footnotes, so hard to find them (e.g. all the literature). There isn't a strong link between policy and scientific support. They tried in the past to use administrative records to document this, but (for some reason) this practice hasn't stuck.

"Policymakers dont read scientific papers" (i dispute this among younger politicians)

"Science advice is a contact sport"

AAAS is actively trying to connect scientists to policymakers in the moment, on various issues. "We need evidence that policymakers will act on", seeing some hope that (lol) 40% of politicians respond to scientific evidence.

"Confirmation bias is a real thing" the 2nd paper did a good job of addressing beliefs. You've already decided you're going to pursue a path and looking for citations that confirm that path.


  • Chair: Chiara Franzoni, Politecnico di Milano

Risk is highest where market fails most often.

High-risk, high-reward science: currently there are diminishing industry investments in science (Arora, 2018)

It's gotten more critical for scientists to participate - the consequences of research failure for scientists is getting worse: soft money, publish or perish bibliometrics, spread of tenure track systems to other countries.

Risk has many economic definitions, speculative, preventative, psychological, bayesian/confidence: finance, insurance, behavioral economics, statistics.

When talking about risk, it's easy to misunderstand because of the many perspectives of risk. e.g. Policy experts see risk in a speculative sense, program officers (speculative + bias against risk), panelists, scientists, etc. This means that each of these actors develops different strategies based on their goals and priorities (speculating versus minimizing risk) ⇒ monopolization, diversification, minimize bias through procedure, etc.

  • NIH Approaches to Support High Impact High Risk Research; Nicole Garbarini (NIH)

4 different awards for high risk, high reward.

They don't require preliminary data.

They do have a third level of peer review.

Pioneer Award - single investigators for any career stage, transformative research goals, paradigm shift. First person-based award program.

Youth innovator award - within 10 years of last degree / clinical training

"Skip a postdoc award" - common in life sciences

Transformative R01 - multi investigator proposal with larger budget sizes.


R21 - short pilot project grants

R35 - person based awards, renew after 5 years for extension.

Small business/SBIR - funds tech transfer out of universities, derisk startups, incubators, etc

Research challenges - interdisciplinary goal-oriented funding, eg retinoid challenge

  • Pew Foundation, Kara Coleman (Pew Charitable Trust)

Pew Scholars program in Biomedical Sciences - founded in 1985

22 scholars awarded each year for early-career biomed researchers, 700+ total

Eligibility: PhD or MD in biomedical sciences, first 3 years for assistant professors

$300k = $75k per year for 4 years

Minimal restrictions: 8% indirects, limit for use on PI salary is $12.5 per year or $50k total

Where are biases located?

Institutional Biases: Pew program is nomination based, 145 institutions can nominate (9% funding rate), 33 alumni can nominate (27% funding rate). They noticed sometimes institutions take turns among departments, which isn't ideal. Some also pool all their nominations (e.g. "James gets Pew, Sally gets Sloan", etc). Alumni rate is 200% better than institutional nominations, which means either 1) alumni are better pickers, or 2) Pew just follows their alumni network

Applicant Biases: they just won't put forth a risky proposal

Overcoming biases

Institutional: webinars, creation of alumni nominations

Scientific Review Board: 6 year term on the board, 2 reviewers per application (option to discuss other apps), adding an option for reviewers to call on specific experts for opinion if needed. Final selection meeting is in person.

Fostering risk taking

Annual meeting for grantees; requirement of the grant contract. keynote speakers, informal mentorship, scientific advisory board attends

Evan Michaelson, Sloan Foundation (standing in for Richard Wiener at Research Corporation for Science Advancement, flight cancelled)

Founded by Frederick Gardner Cottrell in 1912 (1 year before rockefeller)

Supported 20,000 scientists, including 41 nobel laureates

2 main programs: SCIALOG, Cottrell Scholars programs, both focus on early career faculty.

Scialog (science & dialogue): annual workshops on interdisciplinary areas of research. Build networks of researchers across discipline. Seed some 'blue sky' projects.

Scialog fellows: 1 year 'canvassing selection process' to talk to professors, deans, funders, etc to invite people in.

Past and current topics include: solar energy, time domain astrophysics, advanced energy storage, chemical machinery of the cell, molecules come to life.

After the meeting, next morning, they have a list of 25-30 collaborative projects which will be supported with $100-150k, and bring in other funders (Sloan, Moore, Simons Foundation).

"the rapidfire environment is not overedited", "remove caution" which might constrain ideas

They focus on diversity of participants: fellows, disciplines, institutions, geography

Before the meeting, they make people list who else they know. Intended to design 'collisions' before and during the meeting. Foster risky thinking within a bounded structure - provide a safe space to experiment with different kinds of grantmaking. 'It's very experiential'

They have a lot of data - all the proposals, attendees, outcomes, etc.

  • ERC, Theodore Papazoglou

ERC supports frontier research through bottom-up individual-based pan-european competition.

Lots of different grant classes (starting to advanced (€2.5Mio for 5 years).

"High-gain/high-risk research" is part of their ERC Annual Work Programme, first paragraph of the whole text.

Stepped out, only caught the results

Found that there is funder panel selection away from novelty and risk.

Bias against PI with novel profile is definitely an issue, even when agency's mission is high risk/high reward

Inducing more novel research cannot be taken for granted.

Novetly is just one indicator for risk-taking

  • Discussant: Karim Lakhani, Harvard University and NBER

Science funding decisions are still understudied.

This study did both selection and treatment effects.

How do you hold funding agencies and selection panels accountable? "Selection panels have captured the agencies.", driving towards "selection hell".

Big implications for the rate and direction of science.

Shows Nature article showing ERC's amazing figures:

Untitled b5593135 12a9 44d0 a23a 5ac6022c8c64


Focus on selection process and subsequent treatment

  • selection stage 1: CV + 5 stage proposal
  • selection stage 2: CV + 15 page proposal + interviews (interviews are extremely bias inducing: gender, seniority, etc)
  • treatment: is novel/risky research being done?

Findings on selection and treatment-

  • Novel PIs suffer
  • second stage PI quality still matters
  • junior scholars get penalized for novel profiles
  • no treatment effects ("is this basically expected")

HMS internal funding (Boudreau et al 2016)

Untitled 63177ee8 31c2 41e4 9d59 9b426266d53a

What is the problem with bias against novelty?

"maybe its not a problem" ⇒ example about how novelty pushes boundary tests for needed areas, like an early map of California showing it as an island.

Basically just saying we need more empirics, theory, and policy collaboration

  • who submits? who should submit? what are the burdens of submission?
  • what is submitted?
  • evaluation: who gets to evaluate? whats the structure? selection approach... etc
  • research contracts: length, amounts, control & oversight over contracts, renewal
  • outcomes: what are the appropriate outcomes, students vs papers/citations/patents, and broader impact

"We take for a given that processes at NIH work", "Make science funding more scientific"


  • Chair: Benjamin Jones, Northwestern University and NBER
  • Data-methodology session: what is currently the state of the art on how to link science and technology? Best practices? Pitfalls? How to move forward?

Measurement via citations; How to measure the use of scientific knowledge, linking it to patents is easy to but introduces spillover.

Is science only beneficial only when there is a marketable product?

  • alternative to marketplace innovation: policy innovations (economics papers dont turn into patents, they turn into policy), direct consumption (people enjyoy scientific discoveries)
  • picture of pillars of heaven → science is beautiful

Trying to measure public interest in science with a 3D pyramid

  • government, news, and patents; based on fields
  • e.g. computer science is high on patents, low on news and government
  • e.g. economics high on government and news, low on patents
  • e.g. physics or biology, big on all three

Tried to do predictive analysis of patent:news:government based on the funding amount (NIH/NSF); very positively correlated and high R^2.

  • Felix Pöge, Max Planck Institute for Innovation and Competition
  • Dietmar Harhoff, Max Planck Institute for Innovation and Competition
  • Fabian Gaessler, Max Planck Institute for Innovation and Competition
  • Stefano Baruffaldi, Max Planck Institute for Innovation and Competition
  • Science Quality and the Value of Inventions

In Europe, lots of questions and pressure from citizens: what are we getting in return for this science funding? What are the results?

Initial answer: papers and people

"This is a very old debate" 🙃

Zoned out

"Breakthrough technologies are based on science" marconi's wireless telegraph, fertilizer, gene therapy, etc.

The overall contribution of science to technology is debated.

What is the value of science for patented inventions?

  • monetary value
  • novelty


Analyzed private market value and novelty of a patent.

combined with:

Distance of science measure (ahmadpoor and jones 2017): min citation distance between patents and articles


  1. more science intensive patents have higher private market value
  2. more science intensive patents are more novel
  3. novelty predicts private market value of patents

1: concern: some technologies are close to science and more valuable

  • found average dollar value based on a distance to science (1 to 4). it quickly drops off.

extrapolations: science contributes $720 per capita per year

2: idea - can we \

\use unusual word combinations as an indicator of novelty: e.g. "mouse trap" "mouse display" "mouse transgenic"

Developed a new measure of novelty - found all 2-word bigrams, count each pair in all patents, take average.

Finding: More science intensive patents are novel. Patents that are close to science, are more novel, using this measure.

3: novelty predicts market value

very clear correlation using binned graph of likelihood of word combinations vs. average dollar value, more novel patents are more valuable.`

This study highlights that inventors stand on the shoulders of science. Basic science creates the applicative science. "Basic research is the pacemaker of technological science"


  • Bhaven N. Sampat, Columbia University and NBER

Patents are shaky: the behaviors aren't very explicitly described with the patent distance to science. E.g. bad actors will flood a patent office to drown out and cause workers to miss the important patent applications.

What exactly is a citation? We're not talking about that they could be general, theory, background, tools, methods, etc. Unbundling these things will be very useful for the field.

In Europe, they have categories for how citations are used. (! nice)

Using ML to do sentiment analysis on in-text citations in patents would help create a more directed graph.

  • Andrew Toole, U.S. Patent and Trademark Office

"on the policy side, NIH wants to know how much do we spend, where do we spend it?"

Science and markets are distinct things; economics concerns itself with markets, do science and economics connect?

Wanted to make the distinction to be accurate about USPTO terminology - a patent publish date vs granted date are different things

  • Matt Marx, Boston University

PHD students cant actually share citations and data sets; thompson reuters web of science is expensive (6 figures).

Google scholar you cant even buy the data.

Microsoft is taking 150M papers and giving it out through their academic

Open patent-paper linkage dataset

16.7MM patent to paper linkages since 1947, for frontpage citations

  • identifiers:

    • doi
    • pubmed ID
    • WoS

applicant examiner indicator



stepped out

  • Discussant: Danielle Li, Massachusetts Institute of Technology and NBER

Designed for impact, short 6 page paper.


main finding: 50-60% of biotech and pharma products rely on NIH funding, coming from patents citations citing NIH funding

  1. Is NIH funding worth it?

  2. paper doesnt address failed NIH outcomes that dont yield products

  3. how much do private firms rely on public funding?

  4. would industry still be able to be productive without public funding?

Project Presentations

  • Adam B. Jaffe, Brandeis University and NBER; Bev Holmes, Michael Smith Foundation for Health Research

Michael Smith Foundation funds people!

Lots of programs, this is focusing on just 2 of them: postdoc program, young investigator program.

British Columbia govt funds it.

Don't have data to share yet because of international data privacy complexity.

  • Charles Ayoubi, EPFL; Fabiana Visentin, Maastricht University, UNU-MERIT; Michele Pezzoni, Université Nice; Sandra Barbosu, Alfred P. Sloan Foundation

Scholars teaming up with funding agencies

Off the record (preliminary results), but doing cool things about correlating content of research e.g. abstracts with funding likelihood.

  • Chiara Franzoni, Politecnico di Milano; Henry Sauermann, European School of Management and Technology

Pilot studies for funding science

Fathom fund

  • Valentina Tartari, Copenhagen Business School; Henrik Barslund Fosse, Novo Nordisk Foundation; Rikke Nording Christensen, Novo Nordisk Foundation

DAY 2: Jul 19, 2019

Leibniz prize, german science foundation: most important research prize in Germany - 7 nobel laureates, a few fields medals. "bestows honor for past achievement, but also provides funds for future experiments".

In 2007, refrom of increase of funding amount by €1M and 2 years in length.

Diff-in-diff, comparation of leibniz winners before 2007 and after 2007.

Treated cohorts with more funding and longer funding, publish less overall (decreases by more than half) but more in top journals (increases by more than double). Fewer but higher quality outputs.

Overview: cannot apply directly, must be nominated from a german institute. the grant came with "truly legendary freedom" - can't spend it on yourself, can't buy a vacation home, but you do have complete discretion of how to spend it. Rent a new lab, buy new equipment.

Post 2007: funding from 1.55M to 2.5M, mainly for inflation. Extended the funding period from 5 to 7 years due to complaints. Selection criteria and process remained the same.


This paper focused on 257 winners - 36 post 2007 - between 1986 and 2010. Age gender, field from CVs, etc. Count of all typs of publications, irrespective of outlet. Count journal pubs by journal quality by ranking journals in each year (based on impact factor). Count all pubs in top 1%, top 2%, etc.

Identifying assumption: earlier prize winners are good counterfactual for later prize winners

Prior to prize: Balancing table

Untitled 5791e8f9 e7f2 43db 93cd 86fa12b8debf

Some things to note: More women now in academia than in the prior period. Recent sci's are more productive too (num of pubs per year).

Untitled 97367be3 58af 4009 a700 0f5ddb6ab883

Distributions are reasonably similar between two groups.

Main results shown here:

Untitled 06e2395a 1b41 44ed 9324 176b42de3c9c

Treatment group is definitely publishing less post leibniz prize!

Untitled 6cb85ed0 dae9 4076 b988 95621a65a684

Diff-in-diff where we change the dependent variable. See decrease for overall num of publications. Zooming in on the top 20% you see more publications in higher impact top 1% of journals. There is a pretty clear quantity-quality shift.

Concern: is this just a cohort effect?

  • peak career researchers might behave differently in 2007 than in 1986
  • Introduction of erc grants in 2007 one major event
  • to rule this out, look at other scientists in the same field and group who did not receive a leibniz prize

To do this, we use the German Wikipedia for text matching. Match on gender, year of birth, broad scientific field, academic scientist in Germany. Totalled 1,819 matched control scientists. This is good because haven't conditioned on publication outcomes. Is this a reasonable control group?

Controls from wiki are comparable to Leibniz recipients.

Untitled 66dee509 c20a 4df6 aea4 71dcac9d1f1c

See two things:

  1. big difference in level of publiciations, overall trends are similar.

  2. placebo people are very close to Leibniz winners in their respective time periods.

Placebo test: no evidence of cohort effects

Untitled 3b908e3b 5d5c 4503 a9fe 23fb7b21e30c

Triple diff with placebo group taken into account:

Untitled 3b947243 4467 4832 9127 fd2caa99bcde


Disentagling effect of additional time and money. Later prize winners publish less overall, but more in top journals. But is it money, time, or both that matter? Looking at pairwise comparisons of funding amount and duration. So far, looks like funding is more important than just the duration of funding but both don't demonstrate the same trend alone. However, both together have a clear combined effect.

How to split budgets across fields: thorniest question in funding. Two most common methods:

  1. Fixed budget: set by field-by-field budgets ex ante, top down (e.g. NSF)
  2. Proportional: budget allocated by field applications, bottom up (e.g. ERC, NCERC in Canada, EU Marie Sklowdowska-Curie postdoc fellowships)

What is the impact of proportional allocation on application incentives.

Concern: proportional allocation might favor certain fields. e.g. economics might be disfavored.

For example, proportional allocation equalized success rates. NIH payline is a automatically equalized budget across fields.

Developing a model: too much math

Proportional funding system: rewards noise and allocation orthogonal to social benefit.

  • Discussant: Timothy Simcoe, Boston University and NBER

The proportionality problem

Nice piece of applied theory. Intuition: partial eqbm with perfect information. If the process has perfect information, and everyone in a room applies for a grant, and you know to top 2 people will win, then no one else will apply and it very quickly unravels to the eqbm. If you add more noise to the process and say instead the selector rolls a dice, then there will be a payoff point where you encourage everyone to apply.

When you introduce more noise, you generate more applications.

Benefits of collective ignorance.

This process doesnt screen quality, its about consensus. Econs might not be better at identifying quality - philosophers might just be better that distributing ex post.

The empirics of the paper could be better.

Flipping it: why don't we see more unraveling? Sub-fields, grantor's explicit objectives (change the dimensions that you are evaluating on), own-type knowledge / overconfidence ⇒ dynamics where we learn our own type (do enough grant applications where you learn where you are / what your type is).

Truly legendary freedom

Leibniz prize case study: total pubs decline, top pubs increase, marginal change in "direction".

Diff-in-diff in both comparisons is kind of neat methodologically.

Average treatment effect and generalizability (e.g. age).

What we want to know: are they really hitting more home runs? What's happening with the papers? Is the stuff they're doing at the top 1% of journals really important? Could weight the journals with their impact factors to reveal the answer.

  • Discussant: Reinhilde Veugelers, KU Leuven

"If I was a funding agency, how would I use these papers?"

Truly legendary freedom

Have to be careful about using this because this is a prize, not a grant: selection criteria are not on project for the future but from past work only. Awards to peak career researchers.

Understanding the case better:

  1. Check the stated aim of the prize for Leibniz - "aims to improve working conditions..."
  2. who selects the review panel? what criteria do they need to use to select? (a breakthrough, risk taking, interdisciplinary, etc)

Check explicitly the selection process.

What about impact on research: do they change their research post prize?

Resource allocation across fields

She was on the ERC council where they discussed the proportionality funding rule!

Task force at ERC in 2007, three domains (Life sci, Physics, humanities) on a fixed ratio

In 2014, ERC switched to demand led funding rules. Wanted equal opportunity and equal value for all fields. Every applicant should have the same success rate irrespective of the field. Don't weigh one field more than the other. All fields should have applicants with same inclination to apply → always have a measure of the true demand,

Every panel has the same quality: need of regular ex post assessment of quality of the selec panels.

Budget $ allocation changes in ERC

2014 → 2018:

LS 39% → 31%

PE 44% → 46%

SSH 17% → 23%

Number of applications grew a lot in SSH, most of the growth there.

In 2018, task force to monitor and eval the allocation rule to see effectiveness. There were worries about certain fields get too little. Conclusions: not yet change the budte alloc rule, but need to continue to monitor its effects.

Signal dispersion: unravelling. Model identifies nose in the evaluation system, differences across panels in quality distribution of researchers.

ERC evidence on the link between noise and panel's budget share acquired. Some evidence within ERC points to this.

How to avoid unravelling?

  1. think carefully about panel construction. as similar as possible in quality of evaluation, distribution of quality of research, incentives to apply. also within panels, avoid heterogeneity. too diverse, quality of evaluation may go down
  2. guard the quality of the selection to avoid noise (differences)
  3. problem is lower when there is a high incentive to apply, then reduce cost of applying to increase benefit of applying (e.g. less time consuming, budgetary supplements)
  4. adjust the budget allocation rule - but requires measures for the critical parameters: should be able to empirically measure it.


Chair: Henry Sauermann, European School of Management and Technology

  • Julian Kolev, SMU Cox School of Business
  • Yuly Fuentes-Medel, Massachusetts Institute of Technology
  • Fiona Murray, Massachusetts Institute of Technology and NBER

Is Blinded Review Enough? How Gendered Outcomes Arise Even Under Anonymous Evaluation

Are women able to access resources/funding despite to bias?

Lots of evidence

  • Symphony auditions, (Golden 2000)
  • Holding content fixed, change applicant identity: CVs (McIntyre 1980), Venture pitches (Brooks 2014)

Blinded Review: Holy Grail for how to do it? Blind code review, blind job interviews, etc. However, if men and women communicate differently, then maybe not enough. (NYTimes 2017, Hengel 2017)


  1. Is blinded review of ideas sufficient? If not, what other mechanisms are contributing. To what extent are gender differences driving gender disparities in selection?
  2. What impact on scientific outcomes would eliminating the gender gap in selection have?

Gates GCE program provided funding.


2-page proposals, 4-6 topics available each round, twice a year. 6,794 proposals, (33% female). 5,058 applicants (34% female). 132 reviewers (20 female, 16%)

Total; 21,453 reviewer proposal pairs covering 19 rounds of GCE programs. Each 2 page proposal is independently reviewed by 3-4 experts. Each expert reviews 100 proposals, 1 week to review, 5-10 mins per proposal. (caveat: fairly certain it was anonymous)

Reviewer decisions: 1% gold rating and 5% silver rating per round. Fdn aggregates the proposals, 7% of them win awards ($100k grant).

Results under blind review: Female: 30% silver ratings, 25% gold ratings. Silver ratings 11% gender gap, gold 33% gender gap. Anonymity does not remove gender gap! 15-20% persists. Female reviewers do lead to selection of more women in a blind-review process.

Mechanism 1: Choice of topics

Are women applying to topics where they'll receive lower scores, less funding? Maybe - slight relationship, looks like low R-squared.

Mechanism 2: Ex-ante differences

Female applicants were less experienced and had fewer publications in the 3 years leading to applying, but controlling for this difference does not explain the disparity.

Mechanism 3: Word choice

Men and women do communicate differently in this context - looked at narrow/broad words based on top-100 most frequent words, and based on presence across topic areas. Broad: 'medical', narrow: 'genotyping'. Reviewers are also sensitive to word choice: reviewers score highly based on words they favor, disfavor others.

Once we control for word choice (broad and narrow words) the gender gap disappears. Do this also with reviewers, reapplying after rejection (women don't do as often), and the word choice explanation is still dominant.

Empirical design: ex-post outcomes

interesting - women encouraged to write at a lower text grade level but ends up negatively correlated with outcomes (publishing, grants, etc).


  1. hire more female reviewers

  2. ask for narrower and more techincal descriptions from applicants

  3. train reviewers

  4. Discussant: Donna Ginther, University of Kansas and NBER

This paper examines the gender diffs in Gates Fdn GCE life science funding, found females receive worse scores and are less likely to receive funding. Based on word choice.

Women who do receive funding have better publications and more likely to get NIH funding

Paper concludes: comm style offers valuable info, reviewers focus on wrong metrics

"Each field is a separate market" women are doing better in life sciences, than in other fields like CS and economics.

Ceci 2014, overall found no differences in grant funding. "lots of evidence that there's no difference".

Gates vs. NIH review process

  • Ginther 2016 found women are 1.3 ppt more likely to receive R01 awards, women new investigators are 2.7 ppt more likely to receive an R01.
  • Heggeness 2018 similar findings
  • Pohlhaus 2011, women are less likely to get Type 2 R01 (continuation)
  • Ley and Hamilton 2008, women applied less for grants than men

Why is there a major difference between NIH and Gates.

N: not blinded, G: double-blinded

N: narrow reviewers, G: diverse reviewers

N consensus-based decisions for top competitive grants, G: champion-based review

N: biosketch lists contributions, G: biosketch n/a

N: 10-12 propsals per panel, G: 100 proposals is a lot

Mounting evidence that rules provide better outcomes. Rules over discretion.

Other misc points:

  • Applicant pool also is less female.
  • Self-citations may be throwing off the blindness procedure. Women are less likely to self-cite.
  • Maybe rookie female applicants may be driving the results
  • First author cites still matter a lot, women disadvantaged there
  • Word choice:

    • controlling word choice across length of careers between men and women
    • evidence that men are overconfident in reporting

Main concern: Gates review process is shit

Misc Thoughts

  • high-risk, high-reward consensus as good
  • "impossible to know the counterfactual"
  • economists are obsessed with oldest foundation, running meme for the day
  • everyone's hawking their own shit: 5 people referenced their own papers
  • for the most part, everyone is still at the stage of describing mechanisms but not understanding the behaviors underneath and why those behaviors persist.
  • people feel weird saying direct consumption is a goal to optimize/design for (bill nye drew the biggest laugh)
  • common themes: OCR, big data analysis (900k-2M datapoints) "text analysis is the train and its going", ML, data cleaning, "distance to science",
  • common problems: citations that cite themselves/author, their particular ML parsing model isnt that accurate
  • citations and patents dominate, traditional bibliometrics still the only real thing to examine
  • public services for data sharing: aggregate, centralize, and serve still dominates as the mental model
  • no one was angry or rude, everyone is very friendly, collegial, and nice 🙂

Martin Watzinger, University of MunichMonika Schnitzer, University of Munich

Standing on the Shoulders of Science

Back to other posts