Get a score on the leaderboard

Once submitted it's time to run your model in the cloud

To get a score on the leaderboard, you need to run your code on the competition server. Your code will be fed with never seen data, and your predictions will be scored on this private test set.

The Submission Phase Leaderboard

During the submission phase, you can submit multiple times and receive a score on the private test set, up to five times a day. This approach is useful for iterating on various solutions, but may result in overfitting to the private test set. It's crucial to have a robust solution for this portion of the data, rather than focusing solely on the submission phase leaderboard. A high-quality submission should demonstrate consistent performance on both the test set and the training data. Discrepancies in performance between the two data segments may indicate an inadequate model.

Running your submission in the Cloud to get a score

To run your submission on the cloud and get a score, you need to click on a submission and then on the Run in the Cloud button.

How your code is called by the system

Your code is called on each individual date. Code calls go through the dates sequentially, but are otherwise independent. Be reminded that the data contains, for each individual date, the cross-section of the investment vehicles of the universe at that time.

At each date, your code will access only the data available up to that point.

Here is a high-level overview of how your code will be called:

# This loop over the private test set dates to avoid leaking the x of future periods
for date in dates:
    # The wrapper will block the logging of users code after the 5 first dates
    if date >= log_treshold:
        log = False

    # If the user asked for a retrain on the current date
    if retrain:
        # Cutting the sample such that the user's code will only access the right part of the data
        X_train = X_train[X_train.date < date - embargo]
        y_train = y_train[y_train.date < date - embargo]
        
        # This is where your `train` code is called
        train(X_train, y_train, model_directory_path)
    
    # Only the current date
    X_test = X_test[X_test.date == date] 
    
    # This is where your `infer` code is called
    prediction = infer(model_directory_path, X_test)

    if date > log_treshold:
        predictions.append(prediction)

# Concat all of the individual predictions
prediction = pandas.concat(predictions)

# Upload it to our servers
upload(prediction)

# Upload the model's files to our servers
for file_name in os.listdir(model_directory_path):
    upload(file_name)

Last updated