The discernible and hidden effects of clonality on the genotypic and genetic states of populations: improving our estimation of clonal rates

Partial clonality is widespread across the tree of life, but most population genetics models are conceived for exclusively clonal or sexual organisms. This gap hampers our understanding of the influence of clonality on evolutionary trajectories and the interpretation of population genetics data. We performed forward simulations of diploid populations under increasing rates of clonality ($c$), analysed their relationships with genotypic and genetic indices, and tested predictions of $c$ from population genetics data through supervised machine learning. Two complementary behaviours emerged from the probability distribution of genotypic and genetic indices under increasing $c$. While the impact of $c$ on genotypic descriptors ($R$ and $\beta$) was easily described by simple mathematical equations, it was noticeable only at the highest levels ($c>0.95$) on genetic indices ($F_{IS}$ and Linkage Disequilibrium). Consequently, genotypic richness allowed reliable estimates of $c$, while genetic descriptors led to poorer performances when $c<0.95$. These results provide clear baseline expectations for genotypic and genetic diversities and dynamics under partial clonality. Worryingly, however, the use of realistic sample sizes to acquire empirical data systematically led to gross underestimates (often of one to two orders of magnitude) of $c$, calling for a reappraisal of many interpretations hitherto proposed in the literature and mostly based on genotypic richness. We propose future avenues through which to derive realistic confidence intervals for $c$ and show that although still approximate, a supervised learning method would greatly improve the estimation of $c$ from population genetics data.


Introduction
Clonality (also referred to as asexuality) occurs across the entire tree of life (Avise & Nicholson, 2008;Schön, Van Dijk, & Martens, 2009;Tibayrenc, Avise, & Ayala, 2015). Most, if not always all eucaryote clonal species use it with some sexuality at population scale over a handful of generations. This mode of reproduction called partial clonality (PC) is particularly relevant for understanding ecosystems, life evolution and ensuring human development as it encompasses, for example, invasive or pathogenic species, and presents some importance in challenging environments or at the leading edges of distributions (Barrett, 2016;Barrett, 2015;Tibayrenc & Ayala, 2012;Yu, Roiloa, & Alpert, 2016). Despite its prevalence, the consequences of PC for the evolution of species and the ecological dynamics of their natural populations have been subject to little in-depth theoretical or empirical development (Yonezawa, Ishii, & Nagamine, 2004). This lack of ratiocinative development makes a substantial number of studies on partially clonal species unlogical and confused when analysing population genetics data and interpreting them in terms of demographic and evolutionary dynamics (Avise, 2015;Fehrer, 2010;Yu et al., 2016). Nevertheless, the influences of PC are likely to be extremely important at all spatial and temporal scales. For example, evolutionarily speaking, the ability for a given genotype to persist across generations adds a new target for natural selection, namely, the genotype (Ayala, 1998).
There are three gaps related to PC: diagnosing it in species where its occurrence is not obvious based on classical naturalistic observations (i.e., human pathogens in contrast to rhizomatic clonal plants); quantifying its extent once a given species is determined to be partially clonal; and understanding its influence on the ecological and evolutionary trajectories of partial clonals by investigating their population genetics. Those gaps have only been partly filled during the past 30 years. The use of molecular markers in a population genetics framework paved the road towards easier detection of PC (De Meeûs, Lehmann, & Balloux, 2006;Halkett, Simon, & Balloux, 2005;Tibayrenc, Kjellberg, & Ayala, 1990) through the discrimination of clonal lineages and detailed analysis of the genotypic and genetic compositions of species suspected of having PC (Bailleul, Stoeckel, & Arnaud-Haond 2015, Arnaud-Haond et al., 2005Tibayrenc et al., 1990). However, conditions allowing (or not) the detection of PC and its consequences for the trajectories of natural populations over different time scales are likely important yet still poorly understood (Avise, 2015;Dia et al., 2014;Fehrer, 2010;Yu et al., 2016). But we still face difficulties to infer the rate of clonal multiplication (denoted c) versus sexual recombination (1c) or an approximate but consistent proxy for it (i.e., the "level of clonality"), disabling the empirical information necessary to compare the ecological dynamics and evolutionary trajectories of partially clonal populations living in different environments (McMahon et al., 2017). To understand the effect of PC on the fate of natural populations and species, its extent c shall be first estimated.
Estimates of the rate of clonality in natural populations of clonal plants have sometimes been obtained at local scales through extremely time-consuming and tedious mark-recapture studies on rhizomatic clonal plants (Eckert, 2002;Marbà & Duarte, 1998). However, using this method at large spatial scales and for most species exhibiting PC through fragmentation or multiplication at microscopic stages is unrealistic. Therefore, indirect reconstruction through population genetics is the only solution for the vast majority of species. Unfortunately, although population genetics studies can shed light on the occurrence of PC in nature, no method has been developed thus far to reliably infer (or at least estimate) such potentially crucial parameters in natural populations using indices gathered through a classical one-time step sampling strategy. Two recently developed methods allow the rate of clonality to be quantitatively inferred in populations genotyped at two time steps, but they require sampling the population twice after at least one generation and, more importantly, a comprehensive knowledge of major life history traits, such as generation time, which are seldom available except for well-known macroscopic species for which extensive field data have been collected (Ali et al., 2016;Becheler et al., 2017).
Most empirical studies thus infer the importance of clonal reproduction in populations using a one-time step sampling strategy to compute the ratio of genotypes to the number of sampling units ( = / ) as an estimate of genotypic (i.e., clonal) richness, often implicitly assumed to bear a linear relationship with the rate of sexual reproduction 1 − and be comparable among natural populations submitted to the same sampling strategy. Theoretical studies have shown the strong influence of only high clonality rates (c>0.95) on parameters such as FIS and linkage disequilibrium (LD) (Balloux, Lehmann, & de Meeûs, 2003;De Meeûs et al., 2006;Navascués, Stoeckel, & Mariette, 2010) but no noticeable departure from expectations under purely sexual reproduction at lower rates. However, more recent mathematical developments have shown that the distribution of FIS is wider at high clonality rates but is actually affected at all clonality rates (Stoeckel & Masson, 2014) depending on the strength of departure from equilibrium (Reichel, Masson, Malrieu, Arnaud-Haond, & Stoeckel, 2016).
This research led to a present-day paradox in the literature on PC. Many populations exhibit average or elevated genotypic diversity, leading several authors to conclude a large incidence of sexual reproduction, whereas in the same studies, consistent departure from Hardy-Weinberg equilibrium (HWE), when reported (which is much rarer), would instead have led them to conclude a negligible occurrence of sexual recombination versus clonal reproduction (e.g., Orantes, Zhang, Mian, & Michel, 2012;Villate, Esmenjaud, Van Helden, Stoeckel, & Plantard, 2010 , 2015). These two studies demonstrated this worrying effect by using two empirical datasets (of seagrasses and corals) where the true rates of clonality were unknown; assessing the order of magnitude of these rates thus requires further investigation.

A roadmap to fill the gaps
Given the current state of knowledge, unravelling the genotypic and genetic compositions of populations is both the target and the proxy of population genetics studies aiming to understand the influence of clonal reproduction on the dynamics and evolution of natural populations.
Reconciling both families of parameters in a robust theoretical framework is thus required to shed light on the concomitant changes in their respective estimators depending on the rate of clonality.
Here, we propose a simulation-based exploratory approach to both enhance our understanding of the consequences of clonality and improve our ability to reliably assess its rate within natural populations. We aim to provide the first exploration of the effect of increasing c on the genotypic and genetic compositions of populations to provide baseline expectations of the composition of natural populations depending on the extent of clonality. We used comprehensive forward individual-based simulations to obtain the theoretical distribution of genotypic (genotypic richness and size distribution of lineages) and genetic (departure from HWE and LD) parameters describing the population composition at increasing rates of clonality from 0 to 1. We explored the temporal evolution of these populations along trajectories towards equilibrium and under various levels of drift (population sizes spanning three orders of magnitude). To move from insights about the expected effects of PC on natural populations towards more reproducible and formalised arguments, we assessed the signature of PC in the genotypic and genetic index distributions using a classical and robust Bayesian supervised learning method. This method allowed the selection of descriptors that were more clearly affected to in turn develop sound estimates of the extent of clonality. Finally, we tested the robustness of the method in examining the influence of sample size on the accuracy of estimates and proposed further improvements based on realistic sample sizes.

Material and methods
PC is empirically known to affect genotypic and genetic descriptors commonly used in population genetics studies (Halkett et al., 2005): 1) the number of different genotypes per population as characterised by the genotypic richness indices R and (Arnaud-Haond et al., 2007) and two genetic indices, namely, 2) the inbreeding coefficient FIS and its moments (Balloux et al., 2003;Stoeckel & Masson, 2014) and 3) the LD index r ‾ d (Navascués et al., 2010).
To date, no analytical formalisation has been developed to predict the theoretical probability distributions of these descriptors under varying rates of clonality. We thus used simulations to (i) synthesize the effects of varying rates of clonality on the ranges and dynamics of these genotypic and genetic descriptors, (ii) assess whether these descriptors actually provide the ability to discriminate and quantify rates of clonality using a classic supervised learning method, and (iii) determine which descriptors best account for specific ranges of rates of clonality with the aim of providing recommendations for future analyses and interpretations.

Simulations
Theoretical results were obtained using forward individual-based simulations run over 10 4 nonoverlapping generations. In the initial generations, alleles at all loci were randomly drawn from a uniform distribution (i.e., maximum genetic diversity merged at random within individuals).
In these simulations, all diploid individuals lived in constantly finite-sized populations. In each generation, each population produced the next generation using clonal or panmictic sexual reproduction following a fixed rate of clonality. During clonal reproduction, new independent individuals were produced as full genetic copies of their only parent, with somatic mutations occurring at a fixed rate of 10 -6 mutations per generation per locus. During panmictic sexual reproduction, new independent individuals descended from two parents chosen at random within the previous generation, from which the individuals inherited half their genomes, mutated at a rate of 10 -3 mutations per generation per locus. Genomes were coded as 100 independent loci.
Alleles mutated following a K-allele mutation (KAM) model (Putman & Carbone, 2014;Weir & Cockerham, 1984), which has the advantage of well simulating the behaviour of both microsatellites and single nucleotide polymorphisms (SNPs) and best approximates the "disturbing factor of gene frequencies" (in the sense of Wright 1931) in finite-sized populations.
Mutating alleles during both clonal and sexual reproduction were drawn at random from the respective pools of clonal and sexual offspring. During simulations, the clonality rate, genetic drift and mutation rate were applied homogeneously across generations and loci.
Each scenario was run 100 times and characterised by a set of parameters (N, c). When subsampling populations, we performed 10 independent resamplings of each generation and sample size, resulting in 1000 independent data points per single set of parameters for each sample size.

Genotypic and genetic descriptors
To account for the genotypic composition and genetic state of populations, we computed two indices describing the number and distribution of genotypes (genotypic richness and slope of the size distribution of lineages ; Arnaud-Haond et al., 2007) and two genetic descriptors referring to intra-individual genetic variation (as the first four moments of the inbreeding coefficient distribution; Stoeckel and Masson, 2014) and LD (as the summarised unbiased multilocus LD ̅ ; Agapow & Burt, 2001).

Genotypic richness
The R index of clonal diversity (Dorken & Eckert, 2001) was defined as follows: where G is the number of distinct genotypes (genets) and N is the number of genotyped individuals.

Size distribution of lineages
The parameter describes the slope of the power-law inverse cumulative distribution of the size of lineages (Arnaud-Haond et al., 2007): where ≥ is the number of sampled ramets belonging to genets containing X or more ramets in the sample of the population studied, and the parameters a and β are fitted by regression analysis.

Genetic variance apportionment
The Wright (1921Wright ( , 1969 inbreeding coefficient FIS accounts for intra-individual genetic variation as a departure from Hardy-Weinberg assumptions of the genotyped populations. We Linkage disequilibrium was studied using ̅ (Agapow & Burt, 2001). The mean correlation coefficient (r) of genetic distance (d) between unordered alleles at loci ranged from 0 to 1. This metric has the advantage of limiting the dependency of the correlation coefficient on the number of alleles and loci and is well suited to studies of partially clonal populations.
where is the number of loci at which two individuals, namely, and , differ (genetic distance between two individuals over all their loci), is the number of different alleles between two individuals at locus (for diploids, can be 0, 1 or 2), and ν is the number of unique possible pairs of individuals and where ≠ within a population.

Genotypic descriptors as empirical functions of the rate of clonality
To assess the relation between c and the genotypic descriptors, we explored the mean results of simulations as a function of c using known shapes of curves. To assess the accuracy of our empirically inferred formula to describe the relationships, we computed the mean absolute error where n is the number of pseudo-observed simulations per scenario, ys is the computed value of the considered genotypic descriptor obtained by simulation, and yf is the computed value of the considered genotypic descriptor obtained by calculation.

Identifiable signals in genotypic and genetic descriptors, inferences and machine learning
Our second objective was to test for the ability of genotypic and genetic descriptors to estimate specific rates of clonality. These descriptors were commonly used in previous studies to roughly assess the importance of clonality in determining population reproductive modes, but no theoretical development has demonstrated the existence of identifiable signals allowing such descriptors to be used as key parameters with which to estimate rates of clonality. To assess the existence of identifiable signals in these descriptors and demonstrate their potential usefulness in inferring rates of clonality for one episode of genotyping, we used the results obtained from simulations as classifiers to train a Bayesian supervised learning algorithm. We used simulation results to compute the approximated nonparametric probability distributions of genotypic and Provided that dependencies between the seven genotypic and genetic descriptors are evenly distributed or cancel each other out or that their distributions sufficiently segregate over their means per class, we can approximate the joint probability model using the conditional independence between features (Hand & Yu, 2001;Webb, Boughton, & Wang, 2005;Zhang, 2004). The posterior probability of the i th class, given that the seven measured features are known, can be expressed as the product of the seven likelihoods of each feature weighed by the class prior probability.
From this joint posterior probability, we identified the maximum a posteriori (MAP) to discern the class ('rate of clonality' and 'population size' pair) most likely to explain the measured features.
We assumed a uniform distribution prior, i.e., equiprobability for each class ( ) = 1/12, to place the algorithm in an initial state of complete ignorance of the likely values that the couple of parameters might take.
We built a training and a test databases of respectively 100 and 30 replicates per pair of (rates of clonality and population size). We explored by cross-validation if there were enough identifiable signal in the features of our classifier to infer the true rates of clonality only knowing values of population genotypic ( , ) and genetics ( , ̅ ) indices, one by one and combined all together.
Posterior distributions of the thirty test pseudo-observed datasets per pair of (rates of clonality and population size) were combined to obtain plotted results.

Results
We first explored the results at equilibrium to understand the influence of clonality on various parameters depending on the population size ( Figure 1, Figure S1) and then examined the evolutionary dynamics of parameters over generations to determine the effect of clonality at different time steps and quantify the time needed to converge towards stationary values ( Figure   2, Figure S2). We assessed which genotypic and genetic parameter contains best identifiable signal allowing accurate inferences (Figure 3, Figure S3 and S4). Finally, we approached the issue of sampling strategy to determine its effects on the accuracy of estimates that can be expected for datasets obtained from natural populations ( Figure 4, Figure S4, Figure 5).  Figure  S1).

Rate of clonality
Rate of clonality

Evolution of genotypic states and the distribution of clonal size at equilibrium under an increasing rate of clonality
In terms of genotypic diversity, our results showed a clear, progressive, and even stepwise decrease with increasing rates of clonality ( Figure 1, Figure S1).
When genotyping the entire population, the relationship between R and c ( Figure 3) does not follow a linear trend such as = 1 − , as might have been assumed in some previous studies.

Evolution of the genetic composition of populations under an increasing rate of clonality
In contrast to the genotypic results but in agreement with previous studies on populations at equilibrium with a realistic low mutation rate (Balloux et al., 2003;Navascués et al., 2010;Stoeckel & Masson, 2014)   FIS would be best for obtaining a good estimate of c for any natural population when no a priori information on its extent is available.

Figure 4.
Subsampling effects on the estimates of genotypic indices (R and Pareto β) and genetic indices (mean, variance, skewness and kurtosis of FIS distribution, and linkage disequilibrium as ṝd) for five levels of subsampling (n=10, 20, 30, 50 and 100) applied to the dataset with N=10 5 at equilibrium (generation g=10000).

Subsampling
The inference method described above assumes that all individuals from the population have been sampled and genotyped. In particular, theoretical identifications were based on simulations and in silico populations for which all MLGs were known and used for inference. Biased information could emerge with subsampling ( Figure 4, Figure S5). Genetic parameters (which proved to be less informative) were nearly unaffected by realistic sample sizes, whereas genotypic parameters (which were most informative) were considerably overestimated when using realistic sample sizes, leading to a gross underestimate of c from real datasets collected

Sample size
Sample size from natural populations. The R parameter is so susceptible to sampling bias that an already high sampling effort of 50 units cannot reliably estimate an R value lower than 0.9, with the exception of highly clonal populations (c>0.8). A correct and unbiased estimate of R can be achieved only by genotyping the entire population ( Figure 5).

Figure 5.
Relationship between rates of clonality and the mean expected R values sampled using 50 (light blue), 100 (blue), 500 (dark blue) and 10000 (purple) individuals. The true unbiased values computed using the whole population (N=100 000) are shown in black.

Discussion
The present work sheds new light on the important influence of PC on the genotypic and genetic composition of natural populations across a broad range of possible c values and in turn allows the parameters that would be preferentially used to estimate it in a wide range of conditions to be selected. Nevertheless, this work also demonstrates that the most useful parameters, i.e., those describing genotypic diversity, are severely impacted by sampling density, raising questions as Sample size to our ability, with the analytical tools currently available, to even detect clonality in large populations, let alone estimate its rate. These findings stimulate new interpretations of some published data and perspectives of improvement that are required to further understand the dynamics and evolution of the broad range of species exhibiting PC.

Parameters mostly influenced by c and the consequent accuracy of inferences based on these parameters
The index R (R, clonal richness; Dorken and Eckert, 2001) is the most widely used index to assess the level or even rate of clonality within natural populations, especially in correlation with environmental drivers to decipher the impacts of ecological features on the level of clonality (McMahon et al., 2017). Using entire populations, we empirically formalised the mathematical relationship between c and the genotypic richness indices R and (Figure 1, Figure S1). The relationship with R is not linear (as sometimes seemingly assumed in the literature) but follows = √1 − 2 ± (with being almost zero depending on the extent of genetic drift). Clonal evenness represented by the Pareto is also not a linear function of the rate of clonality.
instead follows a custom sigmoid curve with three domains (ranging from 0-0.15, 0.15-0.9 and 0.9-1), with the first and last showing a strong decrease in with increasing c. In contrast, in the smooth linear domain ranging from c~0.15 to c~0.9, the steepness of the relationship is almost horizontal, suggesting limited changes in evenness in populations with a balanced amount of sexual and clonal events.
In contrast, genetic parameters are on average largely unaffected below extreme rates of clonality (c<0.95), yet the variance in FIS and ̅ hints at PC and allows estimation of its extent under a high prevalence of clonality (c≥0.95; Figure 1, Figure S1). Clonality acts by releasing the coercive effects of sexuality that constrain and channel the evolutionary trajectories of genotype frequencies towards Hardy-Weinberg proportions, which in turn increases the range of genetic indices. This effect results in broader distributions of genetic indices with higher variance and unusual shapes but with nearly unaffected mean values. Logically, genetic parameters reach their equilibrium value with lower temporal variation and faster than genotypic indices, even at small population sizes (N≤1000 in our simulations). Accounting for genetic indices may thus limit the risk of misinterpretation when estimating c out of equilibrium. However, these indices are poorly informative about the rate of clonality below extreme values, in agreement with previous findings (Balloux et al., 2003;De Meeûs et al., 2006).

Detecting clonality in realistic conditions
Based on our results, clonal richness (R) and clonal evenness ( ) are very sensitive to sampling, even when using relatively large sample sizes (from 100 to 500 individuals), which still provide very deeply biased estimates of the true R and and thus c values: R was always greatly overestimated, by some orders of magnitude more than previously demonstrated with empirical datasets for which the rates of clonality remained unknown (Arnaud-Haond et al., 2007;Gorospe et al., 2015), and was underestimated in nearly strictly sexual populations but greatly In contrast, the distribution moments of FIS and mean LD for common sample sizes (more than 20 individuals) produced values consistent with those obtained from genotyping the whole population, yet they previously could be interpreted only for extreme rates of clonality (i.e., beyond c=0.95). Consequently, when analysing samples from populations with more than 1000 individuals, most genetic descriptors would remain informative and sometimes together with R values lower than 1 should indicate a high prevalence of clonality (i.e., beyond c=0.95) This worrying limitation resembles, for example, the results recently reported by Dia et al. (2014) for a unicellular phytoplankton species involved in harmful algal blooms (HABs), Alexandrium minutum. This species producing paralytic shellfish poisoning (PSP) blooms shows an alternation of clonal and sexual phases and was sampled throughout bloom events during which it passed from being nearly undetectable to exhibiting a concentration of 10 4 to 10 5 cells per litre.
Of the more than 1000 strains cultivated, 265 were fully genotyped, among which no replicated genotypes were found, driving the estimate of clonal diversity to R=1. Without extensive knowledge of the biology of this species, clonality would not have been diagnosed on the basis of this sampling, questioning the occurrence of clonality. Unfortunately, no FIS values could be reported in this study because only the haploid phase could be sampled, and the LD detected suggested the occurrence of recombination in this species. However, according to the present results, genetic descriptors allow the detection or estimation of clonality only when its prevalence is extreme: the results by Dia et al. (2014) thus mainly suggest that the clonal rate during the bloom event did not exceed 0.95, leading to large uncertainty as to the prevalence of sexual or clonal reproduction in this species.
Most target species in the literature, including clonal plants and invasive and pathogenic species, exhibit extremely large population sizes, thus seriously questioning our ability to detect clonality based on realistic sample sizes, let alone infer its importance. The importance of sample size is reflected in the guidelines provided by the pioneering work of Tibayrenc et al. (1991), who listed 8 criteria to detect clonality, among which fixed heterozygosity, deviation from HWE and LD were expected to be importance in the ability to diagnose clonality. Nevertheless, these criteria would only apply to diploid species with extreme rates of clonality, excluding diploid species with c<0.95 and even more studies on haploid lineages.
One may consider the clonal mechanisms and the way clonal replicates spatially disperse to better estimate the effect of the joint incidence of the sampling density and scale of dispersal of clones (driving the scale of spatial autocorrelation of genotypes compared to the grain size of sampling) on the ability of a given strategy to detect clonal replicates and therefore on the conclusions derived from population genetics data as to the incidence of sexual versus clonal reproduction. Along a continuum of dispersal from microorganisms such as unicellular algae, to flying aphids, to clonal plants with strong rhizomatic connections and ramets more often clumped than dispersed, the spatial autocorrelation of clones increases, as does the ability of sampling to reveal clonal replicates at equal sampling densities. As a consequence, at the first end of this continuum, where spatial dispersal is not limited (as is the case for A. minutum), genotypic parameters alone may not be informative on the existence or extent of clonality except for nearly strictly clonal organisms such as the human pathogen Trypanosoma cruzi. Such power would be gained as the spatial distance of clonal dispersal becomes lower than the sampling mesh size (for an example of the influence of sampling strategy in corals, see Gorospe et al., 2015; see Riginos, 2015 for a comment), and clonal replicates would become decreasingly randomly diluted at large population sizes and across vast spatial scales.

Quantifying clonality or merely evaluating its extent: how wrong can we be?
In many studies, R may reflect the orders of magnitude separating sample size and population size (sometimes together with the clonal size and/or clumping of clonal replicates) rather than the prevalence of sexual reproduction. As exemplified in the present work, suggest rates of clonality exceeding 95% in all these populations, which better agree with naturalistic knowledge (Villate et al., 2010).
In fact, revising the numerous data acquired on clonal plants, including seagrasses, in light of the present results reveals very frequent negative FIS values, suggesting a much higher contribution of clonality than previously thought (Evans et al., 2014;Sinclair, Krauss, Anthony, Hovey, & Kendrick, 2014;Stoeckel et al. 2006)  Interestingly, clonal richness is the proxy used in one of the two available methods requiring a population to be sampled twice at two time steps (CloNcaSe, Ali et al., 2016). The difficulties in both reaching the expected equilibrium values and providing estimates for realistic sample sizes may explain the methodological difficulties in inferring rates of clonality and population sizes (later discussed by Becheler et al., 2017).  Massa, Paulino, Serrão, Duarte, & Arnaud-Haond, 2013). Severe overestimation of genotypic diversities may thus have led to strongly misleading conclusions as to the resilience of the studied populations enhanced by their supposedly high R value as well as to their ability to rely on dispersal of seeds due to recurrent events of sexual reproduction (Kendrick et al., 2017;2012;McMahon et al., 2017). A case-by-case re-evaluation is thus needed as what may hold true for some species, depending on their life history traits (particularly longevity and turnover), may be completely incorrect for others.

Conclusion
To conclude, our results showed a large impact of PC on the genotypic composition of natural populations across the whole spectrum of all possible rates of clonality, supporting its strong influence on the tuning of evolutionary forces acting on them at different spatial and temporal scales,even at lower c as conjectured by Lewis (1987). By affecting the main path of emergence of new variants (somatic mutations rather than recombination), the targets of natural selection and migration ("…the entity that persists and evolves is the clonal lineage…"; Ayala, 1998), and the influence of drift (through the potentially much longer-term retention of polymorphism; Reichel et al., 2016;Yonezawa et al., 2004; and the present results), PC has the potential to profoundly influence both the short-term dynamics and the evolutionary trajectories of natural populations, even at a modest rate of clonality. Unravelling the occurrence of clonality and understanding the extent of clonality are thus of paramount importance to reconstruct, understand and forecast the demography, ecology and evolution of the vast number of (and possibly still often undiagnosed) partially clonal species across the tree of life.
Unfortunately, given the present state of knowledge and existing analytical tools, the possibilities of inferring the rates of clonality using one episode of population genotyping are remote. These results also clarify the paradox of the often-reported (but also often-overlooked) combination of high genotypic diversities, suggesting significant rates of sexual reproduction, and significant heterozygote excess, supporting nearly strict clonality (Dia et al., 2014;Orantes et al., 2012).
Many partially clonal organisms studied to date may rely on a much higher prevalence of clonal reproduction than initially thought, but clonal richness in these organisms may be overestimated due to the limited sampling power at hand. This work thus calls for a reappraisal of previously published data and conclusions on a broad range of clonal organisms. Perspectives on how to infer the importance of clonality using one episode of genotyping may, however, exist and can be summarised with the following guidelines: 1) PC can be detected or quantified with the usual sampling power and existing methods, mostly when the rate of clonality exceeds 95%.
2) Departure from HWE towards heterozygote excess, particularly together with large variance in FIS across loci, indicates the occurrence and prevalence of clonality.
3) The joint examination of genotypic and genetic descriptors is often necessary when PC detection is still needed (a recommendation reminiscent of the ones formulated a long time ago for human pathogens (Tibayrenc et al., 1991; see also Tibayrenc and Ayala, 2012) but seldom used in ecological studies). 4) Only taking into account both families of parameters may help better estimate the extent of clonal reproduction but require accepting large uncertainty, particularly when the rate of clonal reproduction is not extreme.
5) As such departures are expected due to clonality, FIS should not be used a-for the estimation of psex (as initially offered by Douhovnikoff andDodd, 2003 andrelayed by Arnaud-Haond et al., 2007), as it may be in most cases due to clonality rather than non-random pairing of gametes, b-(perhaps not as strictly) when filtering next-generation sequencing (NGS) data based on possible PC. Such filters, failing to fit in the case of partial PC, would lead to at best a very large number of informative loci being discarded and at worst completely ignoring the occurrence of PC in the dataset, nor c-to detect technical artefacts such as null alleles and correct data or select loci using models based on pure sexuality including those implemented in software, such as Micro checker (van osterhout et al., 2004).
6) Finally, due to the observed but faint signature of c slightly below 95% in the second and further moments of FIS and to a lesser extent rd, which remains visually undetectable but can be detected by machine learning methods, improvement is expected to result from using machine learning based on informed databases corresponding to the broadest possible range of scenarios.
Such development represents a promising avenue and will require large and versatile databases to accommodate the diversity of life history traits associated with clonality and subsampling to allow accounting for sampling effects.

R β Pareto
F IS ̅ All Figure S3.b. Machine learning inferences of c for N=10 4 and for each parameter used for the inference: the genotypic parameters (a) R and (b) Pareto β and the genetic parameters (c) FIS distribution, (d) ṝd as well as (e) the combination of all the four parameters. The inferred values are plotted against the simulated ones, with the density gradient from black to light grey indicating the most to least likely/probable.

R β
F IS ̅ All R β F IS ̅ Figure S4. Machine learning inferences of c for N=10 5 using the four first moments of FIS distribution: (a) mean (Mean(FIS)), (b) variance (Var(FIS)), (c) skewness (Skew(FIS)) and (d) kurtosis (Kurt(FIS)). The inferred values are plotted against the simulated ones, with the density gradient from black to light grey indicating the most to least likely/probable.

Mean(F IS ) Var(F IS )
Skew(F IS ) Kurt(F IS ) Figure S5.a. Subsampling effects on the estimates of genotypic indices (R and Pareto β) and genetic indices (mean, variance, skewness and kurtosis of FIS distribution, and linkage disequilibrium as ṝd) for five levels of subsampling (n=10, 20, 30, 50 and 100) applied to the dataset with N=10 3 at equilibrium (generation g=10000). Figure S5.b. Subsampling effects on the estimates of genotypic indices (R and Pareto β) and genetic indices (mean, variance, skewness and kurtosis of FIS distribution, and linkage disequilibrium as ṝd) for five levels of subsampling (n=10, 20, 30, 50 and 100) applied to the dataset with N=10 4 at equilibrium (generation g=10000).