The Model Fine-Tuning page allows you to further train and enhance the performance of your agent’s retrieval model. Using conversation history between the agent and users, the system automatically generates training data that can be used to fine-tune the search model.
Fine-tuning is available for Enterprise customers only. For more information, please contact our Sales Team.
Click the Fine-tune button to configure the required training parameters.
You can specify which data to use when fine-tuning the retrieval model.
Selecting a date range allows the system to use the conversation logs from that period for training.
Depending on data volume, training may take additional time. In general, at least 15 minutes of training time is required.
You can also filter and train the model using data related to a specific folder.
For example, selecting the “Product Guide” folder will extract conversation logs associated with documents in that folder, improving the model’s ability to answer product-related queries.
Indicates how many correct documents are linked to each question. This value is fixed at 1 — meaning each question is paired with one correct document, allowing the model to learn clear and consistent mappings.
Number of Easy Negative Samples
Randomly sampled unrelated documents are added as “easy negative” data. This helps improve generalization performance by exposing the model to diverse data. If the value is too low, generalization drops; if too high, training efficiency decreases.
Number of Hard Negative Samples
Defines how many “hard negative” samples (documents contextually similar to the correct answer but not actually correct) are used per question. Increasing this value helps the model better distinguish fine-grained differences between relevant and irrelevant documents but increases training time and complexity.
Negative Sample Exclusion Threshold
To prevent overly similar documents from being incorrectly treated as negatives, the system uses Jaccard similarity to exclude documents that are too close to the correct answer. Lowering the threshold makes the exclusion criterion stricter, ensuring only clearly irrelevant documents are used as negatives.
Specifies how many samples are processed per update step. Larger batch sizes make training faster and more resource-efficient but require more memory. Smaller batches reduce memory load but may result in unstable or slower training. Choose a balanced value based on available GPU/TPU resources and stability needs.
Number of Trainable Layers
Defines how many of the pretrained model’s layers to fine-tune. Fewer layers retain the model’s base stability, while more layers allow for deeper adaptation. However, increasing layers also raises the risk of overfitting and training cost.
Number of Epochs
Controls how many times the model iterates over the full dataset. More epochs may improve performance but can also lead to overfitting or longer training times.
Adjusts how sharply the model converts similarity scores into probabilities. Lower values concentrate probability more heavily on the top-ranked document, reinforcing clear right–wrong distinctions. However, overly low values may cause unstable training.
Hard Example Focus
Controls how the model weights training samples based on prediction difficulty. Values below 1 focus learning on harder samples (those the model finds difficult), while values above 1 focus more on easier samples it already predicts well.
Minimum Training Weight
Prevents extremely low weights (caused by the “hard example focus” parameter) from completely excluding samples from training. Ensures that even difficult samples contribute minimally to learning. A value of 0 disables this safeguard.