GreyFish Data


These data sets include fields for year, number of authors, and a full citation, as well as a link to the report where available. For each set, there is a csv file for importing into R, and a txt file with background information (e.g. source, download date, search parameters).

Authorship Mapping

We are in the process of extracting first author information from the citation text strings and identifying unique authors. This step is complicated by inconsistencies in formatting and the inevitable typos (e.g. WE Ricker, W Ricker, B Ricker, BE Ricker, and WE Rikker likely all refer to the same author, but J Smith could be several people). So far, this is only an issue for a few dozen entries per data set, and we are reviewing them on a case-by-case basis.

As GreyFish grows, we plan to extract co-author names and explore more formal methods for authorship disambiguation (e.g. likelihood based on publication year, key words, and co-authors)

The use of unique author IDs, such as ORCID, is growing, but none of the fisheries agencies covered so far have implemented formal authorship tracking.