Get a score on the leaderboard
Once submitted it's time to run your model in the cloud
To get a score on the leaderboard, you need to run your code on the competition server. Your code will be fed with never seen data, and your predictions will be scored on this private test set.
During the submission phase, you can submit multiple times and receive a score on the private test set, up to five times a day. This approach is useful for iterating on various solutions, but may result in overfitting to the private test set. It's crucial to have a robust solution for this portion of the data, rather than focusing solely on the submission leaderboard. A high-quality submission should demonstrate consistent performance on both the test set and the training data. Discrepancies in performance between the two data segments may indicate an inadequate model.
For the above reason, only your last submission will be taken into account for the scoring Out-of-Sample.
To run your submission on the cloud and get a score, you need to click on a submission and then on the Run in the Cloud button.
Click on run in the cloud to launch your run
Your code is called on each individual date. Code calls go through the dates sequentially, but are otherwise independent. Be reminded that the data contains, for each individual date, the cross-section of the investment vehicles of the universe at that time.
At each date, your code will access only the data available up to that point.
Here is a high-level overview of how your code will be called:
# This loop over the private test set dates to avoid leaking the x of future periods
for date in dates:
# The wrapper will block the logging of users code after the 5 first dates
if date >= log_treshold:
log = False
# If the user asked for a retrain on the current date
# Cutting the sample such that the user's code will only access the right part of the data
X_train = X_train[X_train.date < date - embargo]
y_train = y_train[y_train.date < date - embargo]
# This is where your `train` code is called
train(X_train, y_train, model_directory_path)
# Only the current date
X_test = X_test[X_test.date == date]
# This is where your `infer` code is called
prediction = infer(model_directory_path, X_test)
if date > log_treshold:
# Concat all of the individual predictions
prediction = pandas.concat(predictions)
# Upload it to our servers
# Upload the model's files to our servers
for file_name in os.listdir(model_directory_path):