Klingberg Malmer, OliverPettersson, Gustav2020-07-012020-07-012020-07-01http://hdl.handle.net/2077/65398There exist over 300 firm characteristics that provide significant information about average asset return. John Cochrane refers to this as a “factor zoo” and challenges researchers to find the independent characteristics which can explain average return. That is, to find the unsubsumed and non-nested firm characteristics that are highly predictive of asset return. In this thesis we act on the posed challenge by using a data driven approach. We apply two machine learning methods to create sparse factor models composed by a small set of these characteristics. The two methods are one unsupervised learning method, the Principal Component Analysis, and one supervised learning method, the LASSO regression. The study is done using the S&P 500 index constituents and 54 firm characteristics over the time period 2009-07-01 to 2019-07-01. The performance of the factor models is in this study measured using out-of-sample measurements. Using established methods of post-LASSO regression and new developed techniques for variable selection based on PCA, we generate four new factor models. The latter mentioned variable selection method based on PCA is, to our knowledge, an original contribution of this thesis. The generated factor models are compared against the Fama French factors in the out-of-sample test and are shown to all outperform. The best performer is a LASSO generated factor model containing 6 factors. By analysing the results we find that momentum factors, such as price relative to 52-week-high-price, are highly predictive of return and are commonly selected factors, which confirms the results of previous responses to the same challenge.engAsset pricingFactor modelsMachine learningPCALASSOVariable selectionDimension reductionFama French Three Factor modelFama French Five Factor modelTidying up the factor zoo: Using machine learning to find sparse factor models that predict asset returns.Uppordnande av faktorer: Användande av maskininlärning för framtagande av faktormodeller som förklarar avkastning.text