Simulation methods have always been instrumental in finance, but data-driven methods with minimal model specification, commonly referred to as generative models, have attracted increasing attention, especially after the success of deep learning in a broad range of fields. However, the adoption of these models in practice has not kept pace with the growing interest, probably due to the unique complexities and challenges of financial markets. This paper aims to contribute to a deeper understanding of the development, use and evaluation of generative models, particularly in portfolio and risk management. To this end, we begin by presenting theoretical results on the importance of initial sample size, and point out the potential pitfalls of generating far more data than originally available. We then highlight the inseparable nature of model development and the desired use case by touching on a very interesting paradox: that generic generative models inherently care less about what is important for constructing portfolios (at least the interesting ones, i.e. long-short). Based on these findings, we propose a pipeline for the generation of multivariate returns that meets conventional evaluation standards on a large universe of US equities while providing interesting insights into the stylized facts observed in asset returns and how a few statistical factors are responsible for their existence. Recognizing the need for more delicate evaluation methods, we suggest, through an example of mean-reversion strategies, a method designed to identify bad models for a given application based on regurgitative training, retraining the model using the data it has itself generated.