vostok
Well-Known Member
Cannabis labelling is associated with genetic variation in terpene synthase genes
Abstract
Analysis of over 100 Cannabis samples quantified for terpene and cannabinoid content and genotyped for over 100,000 single nucleotide polymorphisms indicated that Sativa- and Indica-labelled samples were genetically indistinct on a genome-wide scale. Instead, we found that Cannabis labelling was associated with variation in a small number of terpenes whose concentrations are controlled by genetic variation at tandem arrays of terpene synthase genes.
Main
Cannabis has been consumed for its psychoactive properties for over 2,500 years, and its estimated global market value is US$340 billion1,2,3. Because it is a widely used drug that is increasingly being legalized for medicinal and recreational use, it is critical that Cannabis’s genetic and chemical variation be accurately quantified and communicated. The vernacular labels Sativa and Indica (not to be confused with the taxonomic names C. sativa sativa L. and C. sativa indica Lam.) are routinely assigned to Cannabis cultivars by breeders, retailers and users to describe a cultivar’s morphology, aromas and/or psychoactive effects4. However, it is unclear whether these labels capture meaningful information about Cannabis genetic and chemical variation.
Cannabis genomics research has thus far largely focused on the characterization of genes underlying the production of the cannabinoids cannabidiol (CBD) and tetrahydrocannabinol (THC)5,6,7,8. However, Cannabis produces hundreds of aromatic terpenes that drive consumer preference and are frequently associated with Sativa and Indica labels4,9. In addition, there is evidence to suggest that a cultivar’s terpene profile affects its psychoactive properties10,11. To date, various terpene synthase genes have been identified in Cannabis; however, the genetic control of terpene variation across Cannabis cultivars remains largely unexplored12,13,14,15.
Here we re-analysed 297 samples of drug-type Cannabis that were previously quantified for 40 terpenes and cannabinoids using gas chromatography–mass spectrometry (GC–MS)16 (Supplementary Table 1 and Extended Data Fig. 1), and we paired these data with 116,296 newly generated single nucleotide polymorphisms (SNPs) from 137 of these samples from which sufficient high-quality DNA could be extracted. We determined the degree to which the genomic and GC–MS data corresponded to a five-point labelling scale ranging from 1 (100% Sativa) to 5 (100% Indica) as reported by sample sources.
Principal component analysis (PCA) of the genomic data showed no clear clustering according to sample labels (Fig. 1a). Even though PC1 and PC2 were significantly correlated with the Sativa–Indica scale, the variance explained by the primary PCs was low (PC1: R2 = 0.12, P = 2.1 × 10−5; PC2: R2 = 0.12, P = 1.8 × 10−5). Furthermore, the overall genetic structure (captured by including the first ten PCs of the genomic data in a linear model) explained only 37% of the variance in labelling (Fig. 1c). Sativa–Indica labels thus do not accurately reflect genetic relatedness, which is consistent with previous work17,18. In addition, we determined that pairs of samples with identical cultivar names (for example, OG Kush) were often as genetically and chemically distant from each other as pairs of samples with different names (Extended Data Fig. 2). This is consistent with previous studies indicating that cultivar names were not reliable indicators of a sample’s genetic or chemical identity17,19,20,21.
a, Genome-wide SNP data. b, Terpenes and cannabinoids. Each dot represents a Cannabis sample and is coloured by the labelling scale ranging from 100% Sativa to 100% Indica. c, The percent variance explained by PCs from the genome-wide SNP data (blue), from the terpene and cannabinoid data (green) and from both the genetic and chemical data (purple). The y axis shows the percent variance explained as PCs are added to linear models where the Sativa–Indica labelling scale is the dependent variable.
Abstract
Analysis of over 100 Cannabis samples quantified for terpene and cannabinoid content and genotyped for over 100,000 single nucleotide polymorphisms indicated that Sativa- and Indica-labelled samples were genetically indistinct on a genome-wide scale. Instead, we found that Cannabis labelling was associated with variation in a small number of terpenes whose concentrations are controlled by genetic variation at tandem arrays of terpene synthase genes.
Main
Cannabis has been consumed for its psychoactive properties for over 2,500 years, and its estimated global market value is US$340 billion1,2,3. Because it is a widely used drug that is increasingly being legalized for medicinal and recreational use, it is critical that Cannabis’s genetic and chemical variation be accurately quantified and communicated. The vernacular labels Sativa and Indica (not to be confused with the taxonomic names C. sativa sativa L. and C. sativa indica Lam.) are routinely assigned to Cannabis cultivars by breeders, retailers and users to describe a cultivar’s morphology, aromas and/or psychoactive effects4. However, it is unclear whether these labels capture meaningful information about Cannabis genetic and chemical variation.
Cannabis genomics research has thus far largely focused on the characterization of genes underlying the production of the cannabinoids cannabidiol (CBD) and tetrahydrocannabinol (THC)5,6,7,8. However, Cannabis produces hundreds of aromatic terpenes that drive consumer preference and are frequently associated with Sativa and Indica labels4,9. In addition, there is evidence to suggest that a cultivar’s terpene profile affects its psychoactive properties10,11. To date, various terpene synthase genes have been identified in Cannabis; however, the genetic control of terpene variation across Cannabis cultivars remains largely unexplored12,13,14,15.
Here we re-analysed 297 samples of drug-type Cannabis that were previously quantified for 40 terpenes and cannabinoids using gas chromatography–mass spectrometry (GC–MS)16 (Supplementary Table 1 and Extended Data Fig. 1), and we paired these data with 116,296 newly generated single nucleotide polymorphisms (SNPs) from 137 of these samples from which sufficient high-quality DNA could be extracted. We determined the degree to which the genomic and GC–MS data corresponded to a five-point labelling scale ranging from 1 (100% Sativa) to 5 (100% Indica) as reported by sample sources.
Principal component analysis (PCA) of the genomic data showed no clear clustering according to sample labels (Fig. 1a). Even though PC1 and PC2 were significantly correlated with the Sativa–Indica scale, the variance explained by the primary PCs was low (PC1: R2 = 0.12, P = 2.1 × 10−5; PC2: R2 = 0.12, P = 1.8 × 10−5). Furthermore, the overall genetic structure (captured by including the first ten PCs of the genomic data in a linear model) explained only 37% of the variance in labelling (Fig. 1c). Sativa–Indica labels thus do not accurately reflect genetic relatedness, which is consistent with previous work17,18. In addition, we determined that pairs of samples with identical cultivar names (for example, OG Kush) were often as genetically and chemically distant from each other as pairs of samples with different names (Extended Data Fig. 2). This is consistent with previous studies indicating that cultivar names were not reliable indicators of a sample’s genetic or chemical identity17,19,20,21.
a, Genome-wide SNP data. b, Terpenes and cannabinoids. Each dot represents a Cannabis sample and is coloured by the labelling scale ranging from 100% Sativa to 100% Indica. c, The percent variance explained by PCs from the genome-wide SNP data (blue), from the terpene and cannabinoid data (green) and from both the genetic and chemical data (purple). The y axis shows the percent variance explained as PCs are added to linear models where the Sativa–Indica labelling scale is the dependent variable.
Last edited: