Global, regional, and cryptic population structure in a high gene-flow transatlantic fish

Lumpfish (Cyclopterus lumpus) is a transatlantic marine fish displaying large population sizes and a high potential for dispersal and gene-flow. These features are expected to result in weak population structure. Here, we investigated population genetic structure of lumpfish throughout its natural distribution in the North Atlantic using two approaches: I) 4393 genome wide SNPs and 95 individuals from 10 locations, and II) 139 discriminatory SNPs and 1,669 individuals from 40 locations. Both approaches identified extensive population genetic structuring with a major split between the East and West Atlantic and a distinct Baltic Sea population, as well as further differentiation of lumpfish from the English Channel, Iceland, and Greenland. The targeted discriminatory loci displayed ∼2-5 times higher divergence than the genome wide approach, revealing further evidence of local population substructures. Lumpfish from Isfjorden in Svalbard were highly distinct but resembled most fish from Greenland. The Kattegat area in the Baltic transition zone, formed a previously undescribed distinct genetic group. Also, further subdivision was detected within North America, Iceland, West Greenland, Barents Sea, and Norway. Although lumpfish have considerable potential for dispersal and gene-flow, the observed high levels of population structuring throughout the Atlantic suggests that this species displays natal homing and potentially local populations with adaptive differences. This fine-scale population structure calls for consideration when defining management units for lumpfish fishing stocks and in decisions related to sourcing and moving lumpfish for farming and cleaner fish use.


1
Abstract 20 Lumpfish (Cyclopterus lumpus) is a transatlantic marine fish displaying large population 21 sizes and a high potential for dispersal and gene-flow. These features are expected to result in 22 weak population structure. Here, we investigated population genetic structure of lumpfish 23 throughout its natural distribution in the North Atlantic using two approaches: I) 4393 24 genome wide SNPs and 95 individuals from 10 locations, and II) 139 discriminatory SNPs 25 and 1,669 individuals from 40 locations. Both approaches identified extensive population 26 genetic structuring with a major split between the East and West Atlantic and a distinct Baltic 27 Sea population, as well as further differentiation of lumpfish from the English Channel, 28 Iceland, and Greenland. The targeted discriminatory loci displayed ~2-5 times higher 29 divergence than the genome wide approach, revealing further evidence of local population 30 substructures. Lumpfish from Isfjorden in Svalbard were highly distinct but resembled most 31 fish from Greenland. The Kattegat area in the Baltic transition zone, formed a previously 32 undescribed distinct genetic group. Also, further subdivision was detected within North 33 America, Iceland, West Greenland, Barents Sea, and Norway. Although lumpfish have 34 considerable potential for dispersal and gene-flow, the observed high levels of population 35 structuring throughout the Atlantic suggests that this species displays natal homing and 36 potentially local populations with adaptive differences. This fine-scale population structure 37 calls for consideration when defining management units for lumpfish fishing stocks and in 38 decisions related to sourcing and moving lumpfish for farming and cleaner fish use. 39 Introduction 40 Understanding population demographics and population genetic structure is important for 41 effective management and sustainable exploitation of wild species. Globally, it has been 42 estimated that 34% of the assessed fish stocks are currently overfished, and another 60% fully 43 utilized (FAO, 2018). Stock assessments have traditionally been based on changes in 44 abundance over time, but to be accurate, they need to account also for population genetic 45 structure and connectivity patterns among populations, and their underlying drivers 46 (Allendorf et al., 2008;Lowe and Allendorf, 2010;Lowe et al., 2017). Therefore, and where 47 available, knowledge of population genetic structure should be included in management 48 regimes (Waples et al., 2008;Reiss et al., 2009;Pinsky and Palumbi, 2014). This applies 49 especially to marine species that display wide distribution ranges and large census population 50 sizes, factors that can lead to weak population structure (Waples and Gaggiotti, 2006). 51 Genetic studies on marine fish have shown that, I) the effective population size reflecting 52 population resilience may be only a fraction of the census size (Palstra and Ruzzante, 2008; 53 Portnoy et al., 2008), II) the rate of effective migration might be lower than anticipated also 54 in species with pelagic eggs and larvae (Spies et  Adult lumpfish are solitary and inhabit the upper 50m of the water column (Davenport, 1985) 85 in offshore waters. Lumpfish are generally found in low densities and spread over a large 86 geographical area (Nøttestad et al., 2020). When the adults are ready to spawn, they migrate 87 to shallow coastal waters. During this migration, they will make extensive vertical 88 movements through the water column, and show a greater association with the sea bed 89 (Kennedy et al., 2016). Males arrive at the coast earlier than females in order to seek out and 90 establish a territory/nesting site. When females arrive, they deposit their eggs in the male´s 91 nests, where the male fertilizes the eggs and the females leave quickly (Mitamura et al., 92 2012). Males guard the eggs that they fertilized until they hatch. Following hatching, the 93 larvae attach to substrates including seaweed and floating seaweed clumps (Ingólfsson,  94 2000). Juveniles remain in shallow water areas for approximately 6-12 months before 95 gradually making their way to the feeding grounds offshore. In this study, we aim to improve our understanding of connectivity and population genetic 119 structure of lumpfish across the entire north Atlantic, with a focus on the eastern Atlantic and 120 the Norwegian coast where lumpfish are increasingly being used as cleaner fish, increasing 121 the risk of human mediated gene-flow. In order to achieve this, we first identified a set of 122 genome-wide SNP markers using 2b-RAD sequencing (Davey and Blaxter, 2010). 123 Thereafter, we selected a set of putatively discriminatory SNPs (hereon referred to as the 124 targeted SNPs) to resolve regional population structure, to be used in a geographically much 125 more comprehensive study. We genotyped a high number of individuals (N=1,669), collected 126 over the entire species' distribution ( Figure 1) including previously unexplored northernmost 127 areas in the north-eastern Atlantic. Finally, both datasets were analyzed jointly with several 128 environmental variables to identify patterns and drivers of local adaptation. 129 130 3 Material and methods 131

Sampling 132
In total, 1,669 lumpfish collected in the period 2008-2020 from 40 locations were included in 133 this study ( Figure 1; Table 1). For lumpfish collected over a larger area, the average latitude  134 and longitude was used (Figure 1; Supplementary Figure 2a

Population genetic analysis 199
The genome-wide, sequence-derived dataset consisting of 95 fish from 10 locations and 200 4,393 SNPs, is here after referred as the "genome-wide dataset". The genotype data from the 201 final 139 SNP panel is referred to as the "targeted dataset". All analyses were conducted with 202 both datasets unless otherwise stated. Besides analysis of samples, several analyses were 203 carried out using larger pooled groups based on geographic regions (as given in Table 1). 204 When applicable, grouping was made differently based on results received from the upstream 205 analyses. 206

Genetic variation and its division 207
Expected and observed heterozygosity, allelic richness, and inbreeding coefficient (FIS) were 208 estimated with the diveRsity package ( run after 20 000 repeats were discarded as burn-in. K was set from 1 to 10 or 12 (depending if 236 whole or partial dataset was used, see below), and the number of iterations was set to 5. To 237 determine the optimal K, bar plots were inspected visually and runs analyzed with the 238 StructureSelector software (Li and Liu, 2018 2,000 repeats. 245 As higher levels of structure can effectively mask lower levels of structure (Puechmaille,246 2016) a hierarchical approach was employed. The above-mentioned clustering analyses were 247 therefore performed first for the whole dataset followed by separate analysis for sampling 248 sites that were geographically close and showed similar admixture profiles. For each of the 249 investigated regions, the most informative loci behind the observed structure were determined 250 with help of PCA loadings. The threshold for being informative was set to 0.1. 251

Population graphs 252
Genetic structure for the targeted SNP data was also analyzed with Population Graphs, a 253 graph-theoretic approach that uses conditional genetic distance (Dyer and Nason, 2004) as 254 response variable, which is primarily created by both gene-flow and shared ancestry, and has 255 been shown to capture underlying demographic processes more accurately than methods 256 based on pairwise estimates of genetic structure or various genetic distance metrics (Dyer et  257 al., 2010). The identification of genetic covariance structure among populations is 258 independent of evolutionary assumptions aiming to minimize Hardy-Weinberg and linkage 259 disequilibrium within populations (Dyer and Nason, 2004). Population Graphs were 260 constructed using the R packages popgraph and gstudio (Dyer, 2015a; b) and topological 261 analyses were performed using the igraph package (Csardi and Nepusz, 2006 genetic divergence among these three groups were consistently higher, about two to five-fold, 333 for the targeted dataset (Table 2) Table 1. 337 Besides these three major branches, the targeted SNP dataset also showed separation between 338 North America (USA and Canada) and West Greenland/Svalbard-Isfjorden. Also, Kattegat in 339 the Baltic transition zone, the English Channel, Iceland and three fjords in southwestern 340 Norway showed separation from their neighboring samples (Figure 2 Skagerrak, and open Arctic waters (Svalbard and Barents Sea) formed one group with 353 relatively low genetic divergence among samples (Figure 2), referred to as the East Atlantic 354 Group below. 355 13    The results from STRUCTURE were consistent with those from the PCA/DAPC approaches 383 but provided additional insights. In the genome-wide dataset, the division between western 384 and eastern Atlantic was clear, but large parts of genetic information were also shared among 385 all individuals (Figure 3). In the East Atlantic, all lumpfish shared at least ~ 50% of their 386 genetic information. K=7 was deemed as the best fit for the data (Supplementary Table 5), as 387 this was the highest value creating visually discrete clusters, corresponding to six genetic 388 clusters: North America, Greenland, Iceland, English Channel, Baltic Sea, and finally 389 Norway and the Outer Hebrides which clustered together. 390  structure runs for different regions were a set of samples, which due to their genetic and geographic closeness were analyzed together. Each pie 397 chart shows one samples assignment to the selected number of clusters (K) averaged across individuals. Note that some of the samples are 398 included in multiple assignments but that Isfjorden and the English Channel samples were not included in any regional structure analysis due to 399 their uniqueness. 400 The targeted SNP dataset revealed nine major clusters (K=9), showing an additional cluster in 401 Kattegat, the Norwegian southwestern fjords and an east-west divergence across Iceland. 402 Hierarchical clustering analysis of West Atlantic samples showed clear separation between 403 US and Canada, and between northwest and southwest Greenland (Figure 4; Supplementary 404 Table 5). In the sample from USA-Maine, a possible migrant from Canada was observed 405 together with a possible first-generation hybrid (also clearly visible in DAPC plot; S. File 1). 406 The Baltic Sea and the North Sea transition zone, Kattegat and Skagerrak, separated into 407 three clusters (Figure 4; Supplementary

Outlier locus detection and association with environment 441
Based on the correlation analysis among the 14 environmental factors, five independent 442 synthetic variables (Productivity, Oxygen_MinTemp, Velocity, Temp_max_mean, and 443 Salinity_Temp_range; Supplementary significantly associated (p < 0.001) with the observed patterns of genetic structure. Together 457 they explained 55.5% of the total variation ( Figure 6). Partial analysis removing the effect of 458 geography still showed significant albeit much weaker association (R 2 = 0.086, p < 0.001). 459 Outliers were defined as being at least 2.5 standards deviations away from the mean SNP 460 loadings along the significant axes. Only a single SNP (Lump-165)  We found that the global population subdivision of lumpfish based on the genome-wide data 500 alone, would best be represented by six groups: North America, Western Greenland, Iceland, 501 English Channel, Baltic Sea, and Norway together with the Outer Hebrides. While, using the 502 targeted dataset and applying a hierarchical approach, we identified additional structuring 503 with a total of nine major genetic groups globally: North America, Greenland together with 504 Isfjorden in Svalbard, eastern and western Iceland split in two, English Channel, Baltic Sea, 505 Kattegat in the North Sea transition zone, fjords in southwestern Norway and finally, an East 506 Atlantic group covering a larger geographic area and including samples from Norway, Arctic 507 open waters, Skagerrak, North Sea and Outer Hebrides. 508

Regional population structures 509
We detected different degrees of sub-structuring within the large regional groups. Some 510 finer-scale population patterns for lumpfish have already been described in previous studies 511 where isolation-by-distance was detected in West Greenland (Garcia-Mayoral et al., 2016), 512 and also suggested in Norway where Averøy in mid-Norway was found to be different from 513 other Norwegian samples (Whittaker et al., 2018). Here we found further structuring within 514 regions that has not been previously described. When we examined the genetic results jointly 515 with geographic location and information on life history stages (Table 1), we found that 516 samples with only breeding and juvenile lumpfish formed regional and clear-cut groups, 517 whereas migrating fish, mid-sea samples and samples known to contain both juveniles and 518 adults were more admixed. Genetic divergence of regional breeding populations is thus in 519 line with observations of natal homing in lumpfish (Kennedy et al., 2015;Kennedy and 520 Ólafsson, 2019). 521

West Atlantic 522
In the western Atlantic, we found significant genetic differentiation among samples collected 523 from Greenland, Canada and USA, as well as a north-south division along Greenland's west-524 coast Furthermore, the targeted SNP panel produced distinct clustering patterns for the groups 526 (Figure 4; Supplementary Table 5), suggesting good regional resolution, and could likely be 527 used to e.g., detect migrants. Indeed, two individuals collected in Maine, USA, had high 528 admixture proportions (~50 and ~80%) and resembled the Canadian sample, suggesting that 529 they could be migrants or hybrids between the two populations. 530

Isfjorden 531
Despite being geographically closer to several samples from the East Atlantic, the sample 532 from Isfjorden in Svalbard clustered closely with samples from western Greenland. As this 533 sample was highly divergent from off-shore Svalbard samples, and we did not detect any 534 individuals with similar genetic make-up in any nearby samples, it is possible that this fjord 535 population represents a regionally divergent gene pool, perhaps specially adapted to its Arctic 536 environment (Skogseth et al., 2020). Svalbard is surrounded by strong Arctic currents that go 537 north and west towards the Arctic Ocean and the Greenland Sea (Supplementary Figure 6). 538 However, the closest sample in Greenland in this study was over 3000 kilometers away 539 (along the shortest ice-free waterway). It is not known how far north lumpfish could currently 540 reach but historical Arctic connectivity could explain the observed pattern. Either Isfjorden is 541 an isolated relic of past gene-flow or a manifestation of still on-going connectivity. Strong 542 cold polar currents have also been proposed as the barrier separating the West and East 543 Atlantic ( which showed significant divergence and no likely migrants across these borders. 546

Iceland 547
We found weak but statistically significant east-west subdivision across the Westfjords in 548 Iceland. Breiðafjörður appeared most distinct, whereas fish from Sandgerði and Bolungarvik 549 were more admixed (Figure 4). This pattern was driven by only three outlier loci 550 (Supplementary Table 2c), all of them in chromosome 13, and thus possibly suggesting a 551 structural variant or a genomic region for local selection. A region in this chromosome, close 552 to the outliers reported here, was recently discovered to be highly associated with sex 553 (Holborn et al., 2021), indicating that the observed structure could be related to differences 554 between males and females. 555 556

English Channel 557
In the English Channel, lumpfish were genetically differentiated from all other samples 558 (Table 2). This pattern was also observed using microsatellites in a previous study by 559 Whittaker et al. (2018), which used the same individuals as in this study. As suggested 560 therein, we agree that a plausible mechanism behind the southernmost population 561 differentiation is warmer water (Supplementary Table 1a) causing separation in spawning 562 time, and thus temporal isolation restricting gene-flow. 563 with temperature range, as well as minimum temperature with dissolved oxygen level, were 572 significant factors explaining the genetic population structure of lumpfish. This was true even 573 when the geographic variation was accounted for (Supplementary Figure 5), indicative of 574 local adaptation in Baltic lumpfish. Nonetheless, one individual of Baltic Sea genetic origin 575 was found in the Skagerrak off-shore sample, despite the large difference in salinity between 576 the two areas, suggesting that adult Baltic Sea lumpfish can survive in a much more saline 577 environment than they inhabit. 578

Baltic Sea and the transition zone Skagerrak and Kattegat
The salinity transition zone between the North Sea and Baltic Sea showed surprisingly high 579 levels of structuring. While the Skagerrak area, adjacent to the North Sea, was clearly 580 clustered with the larger East Atlantic group (see below), the Kattegat formed its own genetic 581 cluster. Individuals from Orust in the northernmost parts of Kattegat showed some degree of 582 assignment to the Skagerrak cluster. However, all other samples in Kattegat, including 583 samples next to the entrance of the Baltic Sea, were highly differentiated from both the Baltic 584 Sea and Skagerrak (Figure 4) plausible explanation for the observed two-cluster model in the Arctic, as well as for 607 mainland Norway (see below) for several reasons: First, groupings were far from equal in 608 size nor were they randomly distributed but constantly found within certain samples. 609 Moreover, in the samples with several genetic groups, inbreeding coefficients were 610 consistently higher than elsewhere (Supplementary Table 3 these samples are more connected than suggested by their spatial distance (Supplementary  616  Table 6), our data supported some internal subdivision within the group (Supplementary  617  Table 5; see results for hierarchical clustering for "Similar East Atlantic sites"). As 618 STRUCTURE estimates both ancestral gene frequencies and admixture proportions for each 619 individual in relation to other samples, interpretation of weakly differing admixture 620 proportions can easily be misleading, and over-interpretation should be avoided (Lawson et  621 al., 2018). 622

Norway 623
Along the coast of Norway, we discovered two main groups: fish from large fjords in 624 southwestern Norway; Sognefjord, Hardangerfjord and Boknafjord (Supplementary Table 5),  625 and fish which clustered with the large East Atlantic group described above (Alta, Flatanger  626 and Flekkefjord). The southwestern fjords had two highly differentiated genetic clusters, 627 where all individuals clearly assigned to one genetic group or another. We had detailed catch 628 information of all fish from one of the fjords, Boknafjord (data not shown). Examination of 629 the genetic clusters did not reveal any relationship with the status of the fish (juvenile vs 630 adult) nor sex but when we compared the genetic groups with the geographic location of each 631 fish, spatial division was evident. Within this large fjord, ~100 km in length and covering an 632 area of 1579 km 2 , fish that were caught closer to the open sea clustered with the large East 633 Atlantic group, whereas the group found mainly deeper inside the fjord formed another, 634 distinct regional group (Supplementary Figure 7). This is likely the same genetic group as 635 reported by Whittaker et al. (2018) who also found a distinct genetic group from this same 636 region. 637 It is unclear what the regionally diverged coastal group in the southwestern fjords represents, 638 but there are some possibilities: First, these fish could represent local lumpfish ecotypes. 639 Fjord ecotypes have been described for other fish species such as Atlantic cod (Knutsen et al.,640 2018a), and are likely upheld by innate differences in feeding and movement ecologies 641 (Kristensen et al., 2021 is also possible that this diverged group originates from another colonization event than the 645 East Atlantic group. It has been shown for another marine fish, Corkwing wrasse (Symphodus 646 melops) inhabiting the same region that re-establishment followed by genetic drift due to 647 some isolation mechanism such as assortative mating or difference in the time of spawning is 648 enough to maintain highly distinct gene pools (Mattingsdal et al., 2020). high numbers in several adjacent fjords simultaneously is unlikely, given that broodstock are 656 largely sourced from wild populations and thus unlikely to be this inbred. 657

Two datasets -pitfalls and advantages 658
Targeting outlier loci that often reflect divergent selection processes can offer a powerful way 659 to study species with large population sizes, high dispersal capabilities and frequent gene- (see Table 1) and in the SNP selection phase were determined based on their location and 669 previous knowledge of genetic population structure for this species. This selection process 670 has a few potential pitfalls: First, because genetic diversity is unequally distributed across 671 populations, the populations in which SNPs are discovered may contribute to ascertainment 672 bias. Second, as RADseq is a non-selective process in relation to which genomic regions are 673 to be included, the final genomic representation is random. Thus, population processes 674 mediated by neutral genetic markers, such as demographic history, can reliably be studied as 675 they usually affect the entire genome. However, selection-related processes can be missed, 676 since selection often affects relatively small and targeted parts of genome, many of which 677 may not be included in random RADseqs (Lowry et al., 2017). 678 It was apparent that the targeted dataset performed equally well, or even better than the 679 genome-wide dataset at separating lumpfish populations regionally. This is despite the fact 680 that only a small portion (3-17 SNPs; Supplementary Table 2c) of the selected 139 SNPs  681 were informative within each region. The targeted dataset displayed ~2 to 5 times higher 682 level of divergence (Table 2), and greater clarity than the genome-wide dataset (Figure 3). In 683 addition to the previously described population structure, our analyses revealed further 684 genetic structure. Furthermore, our analyses revealed evidence of likely migrants and hybrids 685 on both sides of the Atlantic. 686

Conclusions and management implications 687
Lumpfish occupy vast and highly variable marine environments, and has high migration 688 ability. However, based on its trophic level and long estimated population doubling time, 689 lumpfish is expected to have low resilience to fishing pressure (Powell et al., 2018a). The 690 impact of past and present fisheries on populations of lumpfish is largely unknown, and likely 691 to be highly varying among regions and countries due to differences in management 692 (Kennedy et  Blakeslee et al., 2020)). Local adaptation together with the observed strong, clear regional 716 population subdivision likely upheld by natal fidelity emphasize the need to manage the 717 species on a regional level despite its wide transatlantic distribution. Considering on-going 718 climate change, for cold-adapted species like lumpfish, the standing genetic variation 719 especially in the southernmost latitudes of the species range might prove to be an invaluable 720 resource in the future. The results from this study improves upon existing knowledge of 721 lumpfish populations, which should be taken in consideration for the future use and 722 translocation of lumpfish to be used as cleaner fish. This study also provides a baseline for 723 future monitoring of wild populations. The datasets analyzed in this study will be made publicly available upon acceptance to a 756 peer-reviewed journal. 757 Dyer, R.J., and Nason, J.D. (2004). Population Graphs: the graph theoretic shape of genetic structure.