Background: Widespread commercialization of cannabis has led to the introduction of brand names based on users' subjective experience of psychological effects and flavors, but this process has occurred in the absence of agreed standards. The objective of this work was to leverage information extracted from large databases to evaluate the consistency and validity of these subjective reports, and to determine their correlation with the reported cultivars and with estimates of their chemical composition (delta-9-THC, CBD, terpenes). Methods: We analyzed a large publicly available dataset extracted from Leafly.com where users freely reported their experiences with cannabis cultivars, including different subjective effects and flavour associations. This analysis was complemented with information on the chemical composition of a subset of the cultivars extracted from Psilabs.org. The structure of this dataset was investigated using network analysis applied to the pairwise similarities between reported subjective effects and/or chemical compositions. Random forest classifiers were used to evaluate whether reports of flavours and subjective effects could identify the labelled species cultivar. We applied Natural Language Processing (NLP) tools to free narratives written by the users to validate the subjective effect and flavour tags. Finally, we explored the relationship between terpenoid content, cannabinoid composition and subjective reports in a subset of the cultivars. Results: Machine learning classifiers distinguished between species tags given by "Cannabis sativa"and "Cannabis indica"based on the reported flavours: <AUC> = 0.828 ± 0.002 (p < 0.001); and effects: <AUC> = 0.9965 ± 0.0002 (p < 0.001). A significant relationship between terpene and cannabinoid content was suggested by positive correlations between subjective effect and flavour tags (p < 0.05, False-Discovery-rate (FDR)-corrected); these correlations clustered the reported effects into three groups that represented unpleasant, stimulant and soothing effects. The use of predefined tags was validated by applying latent semantic analysis tools to unstructured written reviews, also providing breed-specific topics consistent with their purported subjective effects. Terpene profiles matched the perceptual characterizations made by the users, particularly for the terpene-flavours graph (Q = 0.324). Conclusions: Our work represents the first data-driven synthesis of self-reported and chemical information in a large number of cannabis cultivars. Since terpene content is robustly inherited and less influenced by environmental factors, flavour perception could represent a reliable marker to indirectly characterize the psychoactive effects of cannabis. Our novel methodology helps meet demands for reliable cultivar characterization in the context of an ever-growing market for medicinal and recreational cannabis.