All information in the original files were entered using the Latin alphabet, and we did therefore not have to take into account different types of writing systems.

In theory, any bibliographic variable can be added to WhoGov. This will – dependent on the scope - require a lot of manual work. Please contact the creators of WhoGov if you are interested in adding any further bibliographic variables and we will share our experiences.

Yes! There are several ways of doing so.

First, we provide alpha-3 ISO codes and calendar years so that the dataset can be merged with other country-year datasets. For harmonizing the county codes with other datasets, we can recommend the R-packages “countrycode” and “democracyData”. Our data is (mostly) from July, which may not coincide with other datasets. Users are therefore encouraged to consult any documentation that details, for example, start and end dates of political regimes when merging other datasets with WhoGov in cases where the value of the merged variable may be attributed to the wrong cabinet.

Second, we use the abbreviations from the PartyFacts’ database (Döring and Regel 2019), which allows researchers to merge this dataset.

Third, we provide the leader start date and end date based on Archigos (Goemans et al. 2009). Thus, the data can be connected with this database.

Fourth, we provide government start/end-date based on ParlGov (2019) and Bértoa (2020), enabling researchers to merge WhoGov with these datasets.

There may also be other ways of merging the data with other datasets.

The classifications (such as junior minister and full rank minister) and portfolios are coded on a country-by-country basis. However, the prestige variables are general and do not vary across countries.

No! It is just a generic variable, and if researchers, for example, want to differ between full rank and junior ministers in terms of prestige, they need to add this themselves.

To code the gender variable, we produced a script, which printed the list of people listed for every country. Using these lists, we added the gender, and then merged the list with the original datasets to add these new variables. We have primarily coded the gender based on the person’s first name, where we developed a script which matched the first names in our dataset against the information in the "World Gender Name Dictionary" (Raffo and Lax-Martinez 2018). Using this method, we were able to classify the majority of the names. However, some names did not exist in the directory or were gender neutral. In these cases, we looked up the person and used biographical information to classify the name. Some countries use names not covered by the directory. In these cases, we got help from people who were familiar with the language and the country.  These countries are Albania, Bangladesh, Cambodia, China, Cyprus, Ethiopia, Fiji, Finland, Greece, Hungary, India, Indonesia, Japan, Laos, Nepal, Mauritius, Myanmar, Mongolia, South Korea, Taiwan, Thailand and Vietnam. In addition, we looked up all ministers classified as female to make sure no ministers are mistakenly coded. As a result, we might slightly underestimate the number of female ministers.

We refer to Appendix F in the online appendix for a thorough answer to this question.

The earliest available version dates back to 1966 and the directory has been updated at least half-yearly until today. The versions dating back to 2001 are freely available on the CIA's website through The versions before 2001 have been downloaded either from HathiTrust Digital Library or have been obtained through Freedom of Information Requests (FOI) to the CIA.

As with all datasets, and despite our best efforts, mistakes cannot be ruled out with absolute certainty and readers are encouraged to contact the authors if they find any. We have an in-depth discussion of the sources of error and bias in Appendix E in the online appendix.

We are currently applying for funding to start updating the data, and plan to begin collecting the data in 2021. Our main priority is to update the data until 2020. In addition, we are looking at adding data further back in time (hopefully back to WWI), making the data half-yearly and adding extra background variables. However, since we are still applying for funding, we cannot guarantee this.

We have used July because data was most widely available for this month. There are two exceptions. The directory began in September 1966, and we are therefore using September for 1966. Furthermore, it has not been possible to obtain the files from July in 1970. Instead, we use January for 1970.