Overview

A cross-section forecast problem

In finance, predicting asset price returns is a fascinating yet very hard problem. For this reason, alternative prediction problems have emerged in an attempt to circumvent these difficulties and still obtain predictions with tradeable potential. One of the most interesting alternatives is the problem of identifying the relative ordering in performance of an investment vehicle, in the cross-section of a pool or subset of them. This is the cross-section forecast problem. In this setting, we track a pool of investment vehicles that are generally obtained through some rule (for example S&P 500 tracks the stock performance of the 500 largest companies in the US) at different dates. This pool is known as the universe in financial jargon and its definition is an object of study by itself. The goal of this competition is to rank the performance of all assets in the universe from best to worst at each given date. The target to predict in this competition is the ranking of the future performance of each asset, remapped to the interval [-1,1], and the scoring function is Spearman's rank correlation between the predicted vs true rankings.

To illustrate an interesting use case of this problem, we can imagine an investment strategy that is long on the best-performing element of the universe, and short in the worst. In this setting, no matter the direction of the market is still possible to obtain positive returns - or to minimize losses.

The dataset presented to the competitors is an obfuscated version of high-quality market data. Therefore, details such as the nature of each investment vehicle, the constant frequency at which dates are measured, and the definition of each feature, are not available. We hope you enjoy the challenge!

Competition phases and format

This competition is focused on forecasting and has two phases. The first is the submission phase where participants can submit and test their models. The second phase, which is automatic, involves running the models against unobserved live market data.

Submission phase - 12 weeks:

In the first phase, participants are required to submit either a Python notebook (.ipynb) or Python script (.py) file. This file should contain the necessary code to build, load, or update their models trained on the data. The code will be executed by the CrunchDAO platform for every submission, to obtain predictions on unseen data. Participants can either use static models, trained only once on the initial training set, or dynamic models that update or retrain themselves on the unseen data, as explained further in the documentation.

Out-of-Sample phase - 12 weeks:

In the second phase, also called Out-of-Sample (OOS), the participant's code will be automatically run by the platform on live market data and evaluated. In this phase, the participants won't be able to modify their code.

Why the two-phase approach?

  • Only the performance on Out-of-Sample data will be taken into account.

  • Reproducibility of the winning solution is ensured.

  • Participants won't be able to exploit data leaks.

CrunchDAO is acting as a third-party intermediary in this competition and will never communicate the code to the organizer in any way.

Last updated