Sagemaker debugger docs Debugger provides the following profile features: The SageMaker Debugger Rule class configures debugging rules to debug your training job. nn. The debugging rules analyze tensor outputs from your training job and monitor conditions that are critical for the success of the training job. Refer to the SageMaker developer guide’s Get Started The smdebug client library is an open source library that powers SageMaker Debugger by calling the saved training data from training jobs. This section walks you through the Debugger profiling report section by section. When you close a SageMaker Debugger Insights tab, the corresponding Amazon SageMaker Debugger Support for TensorFlow Amazon SageMaker Debugger python SDK and its client library smdebug now fully support TensorFlow 2. 3, and 1. SageMaker Experiments automatically tracks the inputs, parameters, configurations Receive profiling reports autogenerated by Debugger. TensorBoard can be accessed in SageMaker AI either programmatically through the sagemaker. In a single visual interface, you can do the Inference Pipelines SageMaker Workflow SageMaker Model Building Pipeline SageMaker Model Monitoring SageMaker Debugger SageMaker Processing Configuring and using defaults with the SageMaker Python SDK Run Machine Learning code on SageMaker using remote function FAQ Use Version 2. When you write a PyTorch training script, it is recommended to use the torch. Find more information and references about using Amazon SageMaker Debugger in the following topics. For a tutorial on what you can do after creating the trial and how to visualize the results, see SageMaker Debugger - Visualizing Debugging Results. These attributions can be provided for specific predictions and at a global level for the model as a whole. 2, 1. Despite the SDK providing a simplified workflow, you might encounter various exceptions or errors. The Debugger rules monitor training job status, and a CloudWatch Events rule watches the Debugger rule training job evaluation status. Available deep learning frameworks are Apache MXNet, TensorFlow, PyTorch, and XGBoost. Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors - awslabs/sagemaker-debugger Amazon CloudWatch collects Amazon SageMaker AI model training job logs and Amazon SageMaker Debugger rule processing job logs. Debugger supports profiling functionality for performance optimization to identify computation issues, such as system bottlenecks and underutilization, and to help optimize hardware resource utilization at scale. You can filter the list of experiments by entity name, type, and tags. For more information about monitoring training jobs using CloudWatch, see Monitor Amazon SageMaker. The following sections outline the process needed to automate training job termination using using CloudWatch and Lambda. This class initializes a TrainingCompilerConfig instance. 5. SageMaker Debugger helps you develop and optimize model performance and computation. This topic walks you through a high-level overview of the Amazon SageMaker Debugger workflow. Amazon SageMaker Debugger allows you to detect anomalies while training your machine learning model by emitting relevant data during training, storing the data and then analyzing it. You can track the system utilization rates, statistics overview, and built-in rule analysis through the Insights dashboard. m5. Use SageMaker Debugger to create output tensor files that are compatible with TensorBoard. 10 Release Notes SMDebug Library Release Notes This notebook shows how to: * Host a machine learning model in Amazon SageMaker and capture inference requests, results, and metadata * Schedule Clarify bias monitor to monitor predictions for bias drift on a regular basis. The following topics walk you through tutorials from the basics to advanced use cases of monitoring, profiling, and debugging SageMaker training jobs using Debugger. smdebug retrieves and filters the tensors generated from Debugger such as gradients, weights, and biases. For more information In the following sections, notebooks and code samples of how to use Debugger rules to monitor SageMaker training jobs are provided. The smdebug client library is an open source library that powers SageMaker Debugger by calling the saved training data from training jobs. Studio Classic lets you build, train, debug, deploy, and monitor your ML models. Configure Debugger with Amazon CloudWatch Events and AWS Lambda to take action based on Debugger rule evaluation status. To use Debugger with customized containers, you need to make a minimal change to your training script to implement the Debugger hook callback and retrieve tensors from training jobs. The following example shows how to use the default settings of Debugger hook configurations to construct a SageMaker AI TensorFlow estimator. Amazon SageMaker Debugger is a new feature which offers capability to debug machine learning and deep learning models during training by identifying and detecting problems with the models in real time. Debugger's debugging functionality for model optimization is about analyzing non You can use the Debugger built-in rules, provided by Amazon SageMaker Debugger, to analyze metrics and tensors collected while training your models. You can use Shapley values to determine the contribution that each feature made to model predictions. Save tensors using Debugger built-in collections You can use built-in collections of tensors using the CollectionConfig API and save them using the DebuggerHookConfig API. The following videos provide a tour of Amazon SageMaker Debugger capabilities using SageMaker Studio and SageMaker AI notebook instances. Use the Amazon SageMaker Debugger dashboard in Amazon SageMaker Studio Classic to analyze computational performance of your training job on Amazon EC2 instances. You can conduct an online or offline analysis by loading collected output tensors from S3 buckets paired with training jobs during or after training. Amazon SageMaker Debugger provides transparent visibility into training jobs and saves training metrics into your Amazon S3 bucket. The following lists the debugger rules, including information and an example on how to configure and deploy each built-in rule. Configuring SageMaker Debugger Regardless of which of the two above ways you have enabled SageMaker Debugger, you can configure it using the SageMaker python SDK. A library for training and deploying machine learning models on Amazon SageMaker - aws/sagemaker-python-sdk Estimator configuration with parameters for basic profiling using the Amazon SageMaker Debugger Python modules Configure SageMaker Debugger profiling, monitor resource utilization metrics, configure profiler rules, activate built-in profiler rules, adjust basic profiling configuration, configure framework profiling, update profiling configuration. Amazon SageMaker Python SDK is an open source library for training and deploying machine-learned models on Amazon SageMaker. The following estimator class methods are useful for accessing your SageMaker training job information and retrieving output paths of training data collected by Debugger. Learn how to create an estimator for default system monitoring and customized framework profiling with different profiling options using Debugger. Mar 24, 2025 · How SageMaker Debugger works Archive of my old post from my blog. 3 with the latest version release. While a training job is running, do the following: Amazon SageMaker Model Monitor automatically monitors machine learning (ML) models in production and notifies you when quality issues happen. 0, 1. Amazon SageMaker Debugger ¶ Amazon SageMaker Debugger allows you to detect anomalies while training your machine learning model by emitting relevant data during training, storing the data and then analyzing it. The Debugger report provide insights into your training jobs and suggest recommendations to improve your model performance. Studio Classic includes all of the tools you need to take your models from data preparation to experimentation to production with increased productivity. This SageMaker Debugger module provides high-level methods to set up Debugger configurations to monitor, profile, and debug your training job. Amazon SageMaker AI is a fully managed machine learning service. class sagemaker. While constructing a SageMaker AI estimator, activate SageMaker Debugger by specifying the debugger_hook_config parameter. Each SageMaker Debugger Insights tab runs one Studio Classic kernel session. As shown in the following example code, add the built-in tensor collections you want to debug. This page gives information about the distinct parts and their components. Configuring Hook using SageMaker Python SDK After you make the minimal changes to your training script, you can configure the hook with parameters to the SageMaker Debugger API operation, DebuggerHookConfig. The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker XGBoost algorithm. functional API operations. The profiling report is generated based on the built-in rules for monitoring and profiling. SageMaker Experiments enables you to call the training information as trials through SageMaker Studio and supports visualization of the training job. Amazon SageMaker Studio Classic is a web-based integrated development environment (IDE) for machine learning (ML). The Debugger reports provide insights into your training jobs and suggest recommendations to improve your model performance. Use You can also use the SageMaker AI console UI to open the TensorBoard application. Receive training reports autogenerated by Debugger. When you close a SageMaker Debugger Insights tab, the corresponding kernel November 5, 2025 Sagemaker › dg Built-in algorithms and pretrained models in Amazon SageMaker SageMaker provides algorithms for training machine learning models, classifying images, detecting objects, analyzing text, forecasting time series, reducing data dimensionality, and clustering data groups. 0. The rule_parameters argument is to adjust the default key values of the built-in rules listed in List of Debugger built-in rules. The following procedure shows how to access the related CloudWatch logs. SageMaker geospatial capabilities Build, train, and deploy Amazon SageMaker Debugger built-in rules can be configured for a training job using the create_training_job() function of the AWS Boto3 SageMaker AI client. The following screenshot shows the full view of the SageMaker AI Data Manager tab in the TensorBoard application. The current release of SageMaker XGBoost is based on the original XGBoost versions 1. Run(experiment_name, run_name=None, experiment_display_name=None, run_display_name=None, tags=None, sagemaker_session=None, artifact_bucket=None, artifact_prefix=None) A collection of parameters, metrics, and artifacts to create a ML model. It covers scenarios related to Build a Custom Training Container and Debug Training Jobs with Amazon SageMaker Debugger ¶ Amazon SageMaker Debugger enables you to debug your model through its built-in rules and tools (smdebug hook and core features) to store and retrieve output tensors in Amazon Simple Storage Service (S3). In the following sections, notebooks and code samples of how to use Debugger rules to monitor SageMaker training jobs are provided. Amazon SageMaker AI provides two debugging tools to help identify such convergence issues and gain visibility into your models. The Amazon SageMaker Studio user interface is split into three distinct parts. Construct a Run instance. nn modules instead. Explaining how Sagemaker debugger works. sagemaker and Rule. Amazon SageMaker Example Notebooks Welcome to Amazon SageMaker. Model Monitor uses rules to detect drift in your models and alerts you when it happens. Debugger provides an automatic detection of training problems through its built-in rules, and you can find a full list of the built-in rules for debugging at List of Debugger Built-in Rules. Debug training jobs in real time, detect non-converging conditions, improve model performance using Amazon SageMaker Debugger. Amazon SageMaker Debugger comes with a client library called the sagemaker-debugger Python SDK . In a single visual interface, you can do the Using Amazon SageMaker Debugger with your own PyTorch container ¶ Amazon SageMaker is a managed platform to build, train and host machine learning models. 4xlarge instance to process and render the visualizations. (Optional) Install SageMaker and SMDebug Python SDKs To use the new Debugger profiling features released in December 2020, ensure that you have the latest versions of SageMaker and SMDebug SDKs installed. DebugHookConfig Configuration information for the Amazon SageMaker Debugger hook parameters, metric and tensor collections, and storage paths. For more information, see Update the Details of a Model Version. May 18, 2020 · SageMaker、もしくはSageMaker Autopilot上でトレーニングJOBを実行する、といった条件で「Amazon SageMaker Experiments」を利用するために必要なことは「Estimator」に「そのtrialがどのexperimentsに紐づけられるか」といったことを定義したパラメータを追加すればよく、全て Welcome to Read the Docs ¶ This is an autogenerated index file. Amazon SageMaker Debugger's built-in rules analyze tensors emitted during the training of a model. TrainingCompilerConfig(enabled=True, debug=False) Bases: TrainingCompilerConfig The SageMaker Training Compiler configuration class. Amazon SageMaker Debugger built-in rules can be configured for a training job using the DebugHookConfig, DebugRuleConfiguration, ProfilerConfig, and ProfilerRuleConfiguration objects through the SageMaker CreateTrainingJob API operation. Amazon SageMaker Debugger tutorialsThe following topics walk you through tutorials from the basics to advanced use cases of monitoring, profiling, and debugging SageMaker training jobs using Debugger. Amazon SageMaker Training Compiler is a feature of SageMaker Training and speeds up training jobs by optimizing model execution Introduction to SMDebug SMDebug: Amazon SageMaker Debugger Client Library Table of Contents Overview Install the smdebug library Debugger-supported Frameworks How It Works Examples SageMaker Debugger in Action Further Documentation and References License Release Notes SMDebug Library 1. Explore the Debugger features and learn how you can debug and improve your machine learning models efficiently by using Debugger. You need to specify the right image URI in the RuleEvaluatorImage parameter, and the following examples walk you through how to set up the JSON strings to The preceding topics focus on using Debugger through Amazon SageMaker Python SDK, which is a wrapper around AWS SDK for Python (Boto3) and SageMaker API operations. Train a model using the input training dataset. Load the files to visualize in TensorBoard and analyze your SageMaker training jobs. rst file with your own content under the root (or /docs) directory in your repository. You can track and debug model parameters, such as weights, gradients, biases, and scalar values of your training job. Batch size In distributed training, as more nodes are added, batch sizes should increase proportionally. huggingface. The SageMaker Debugger Insights dashboard runs a Studio Classic app on an ml. This guide walks you through the content of the SageMaker Debugger Insights dashboard under the following tabs: System Metrics Experiments Run class sagemaker. Refer to the SageMaker developer guide’s Get Started Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors - awslabs/sagemaker-debugger Make sure you determine which output tensors and scalars to collect, and modify code lines in your training script using any of the following tools: TensorBoardX, TensorFlow Summary Writer, PyTorch Summary Writer, or SageMaker Debugger. Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors - awslabs/sagemaker-debugger Amazon SageMaker AI is a fully managed machine learning service. Multiple kernel sessions for multiple SageMaker Debugger Insights tabs run on the single instance. sagemaker classmethods. SageMaker AI Debugger offers the Rule API operation that monitors training job progress and errors for the success of training your model. tensorboard module or through the TensorBoard landing page in the SageMaker console, and it automatically finds and displays all training job output data in a compatible format. experiments. To learn more about how to configure the DebugHookConfig parameter, see Use the SageMaker and Debugger Configuration API Operations to Create, Update, and Debug Your Training Job. In case you need to manually configure the SageMaker API operations using AWS Boto3 or AWS Command Line Interface (CLI) for other SDKs, such as Amazon SageMaker Clarify provides tools to help explain how machine learning (ML) models make predictions. SageMaker Debugger cannot collect model output tensors from the torch. The sagemaker-debugger Python SDK provides tools for adapting your training script before training and analysis tools after training. From training jobs, Debugger allows you to run your own training script (Zero Script Change experience) using Debugger built-in features— Hook and Rule —to capture tensors, have flexibility to build customized Hooks and Rules for configuring tensors as you want, and make the tensors available for The base_config argument is where you call the built-in rule methods. Model Monitor is integrated with SageMaker Clarify to improve visibility into potential bias. To enable remote debugging for your training job, SageMaker AI needs to start the SSM agent in the training container when the training job starts. The report shows result plots only for the rules that found issues. Download the SageMaker Debugger profiling report while your training job is running or after the job has finished using the Amazon SageMaker Python SDK and AWS Command Line Interface (CLI). Background Amazon SageMaker Model Monitor continuously Jul 10, 2025 · SageMakerは数多くの「SageMaker 」という形式のサービス群で構成されており、各サービスの機能やユースケースを正確に理解することが試験攻略の鍵となります。 It provides an XGBoost estimator that executes a training script in a managed XGBoost environment. Amazon SageMaker Model Card is integrated with SageMaker Model Registry. If you want access to the hook to configure certain things which can not be configured through the SageMaker SDK, you can retrieve the hook as follows. It covers scenarios related to Train a model using the input training dataset. The API uses configuration you provided to create the estimator and the specified input training data to send the CreatingTrainingJob request to Amazon SageMaker. IAM role For a SageMaker training container to start with the SSM agent, provide an IAM role with SSM permissions. This is a synchronous operation. You need to specify the right image URI in the RuleEvaluatorImage parameter, and the following examples walk you through how to set up the request body for the create_training_job() function. There are two options to open the TensorBoard application through the SageMaker AI console. To learn about SageMaker Model Monitor, see Data and model quality monitoring with Amazon SageMaker Model Monitor. The AWS CLI, SageMaker AI Estimator API, and the Debugger APIs enable you to use any Docker base images to build and customize containers to train your models. Use the CollectionConfig API operation to configure tensor collections. To turn off debugging, set the debugger_hook_config parameter to False. For example, if you used an ML model for college admissions, the explanations could help determine Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors - awslabs/sagemaker-debugger You can use the training and Debugger rule job status in the CloudWatch logs to take further actions when there are training issues. Note The SageMaker Debugger Insights dashboard runs a Studio Classic application on an ml. Amazon SageMaker Training Compiler is a feature of SageMaker Training and speeds up training jobs by optimizing model execution This notebook shows how to: * Host a machine learning model in Amazon SageMaker and capture inference requests, results, and metadata * Schedule Clarify bias monitor to monitor predictions for bias drift on a regular basis. This section walks you through the Debugger XGBoost training report. If you're registering a model within Model Registry, you can use the integration to add auditing information. In this tutorial, you will learn how to use SageMaker Debugger and its built-in rules to debug your model. * Schedule Clarify explainability monitor to monitor predictions for feature attribution drift on a regular basis. Amazon SageMaker Model Monitor automatically monitors machine learning (ML) models in production and notifies you when quality issues happen. Amazon SageMaker Debugger automates the debugging process of machine learning training jobs. These tools can help ML modelers and developers and other internal stakeholders understand model characteristics as a whole prior to deployment and to debug predictions provided by the model after it's deployed. Amazon SageMaker also gives you the November 18, 2025 Sagemaker › dg Built-in algorithms and pretrained models in Amazon SageMaker SageMaker provides algorithms for training machine learning models, classifying images, detecting objects, analyzing text, forecasting time series, reducing data dimensionality, and clustering data groups. This section provides guidance on managing SageMaker HyperPod through the SageMaker AI console UI or the AWS Command Line Interface (CLI). In this page, you'll learn how to adapt your training script using the client library. Debugger provides pre-built tensor collections that cover a variety of regular expressions (regex) of parameters if using Debugger-supported deep learning frameworks and machine learning algorithms. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you . To activate or update the Debugger monitoring configuration for a training job that is currently running, use the SageMaker AI estimator extension methods. With Amazon SageMaker AI, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. The following code shows a complete example of This notebook will walk you through creating a TensorFlow training job with the SageMaker Debugger profiling feature enabled. This notebook demonstrates how we can use SageMaker Debugger and SageMaker Experiments to perform iterative model pruning. For SageMaker AI XGBoost training jobs, use the Debugger CreateXgboostReport rule to receive a comprehensive training report of the training progress and results. The collections_to_save argument takes in a tensor configuration through the CollectionConfig API, which requires name and parameters arguments. Amazon SageMaker Studio Classic provides an experiments browser that you can use to view lists of experiments and runs. You'll learn how to perform various tasks related to SageMaker HyperPod, whether you prefer a visual interface or working with commands. To run these notebooks, you will need a SageMaker Notebook Instance or SageMaker Studio. Use TensorBoard within Amazon SageMaker AI to debug and analyze your machine learning model and the training job of the model. To learn more about the programming model for analysis using the SageMaker Debugger SDK, see SageMaker Debugger Analysis. Also make sure that you specify the TensorBoard data output path as the log directory (log_dir) for callback in the training container. For any hook configuration you customize for saving output tensors, Debugger has the flexibility to create scalar summaries Note The SageMaker Debugger Insights dashboard runs a Studio Classic application on an ml. The report is automatically aggregated depending on the output tensor regex, recognizing what type of your training job is among binary classification, multiclass classification, and regression. SageMaker Canvas An auto ML service that gives people with no coding experience the ability to build models and make predictions with them. Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors - awslabs/sagemaker-debugger Amazon SageMaker Debugger ¶ Amazon SageMaker Debugger allows you to detect anomalies while training your machine learning model by emitting relevant data during training, storing the data and then analyzing it. Warning If you disable it, you won't be able to view the comprehensive Studio Debugger insights dashboard and the autogenerated profiling report. Examples on how to use SageMaker Debugger. interactive_apps. You can choose one of these entities to view detailed information about the entity or choose multiple entities for comparison. To learn more about Debugger, see Amazon SageMaker Debugger. rst or README. Debugger automatically generates output tensor files that are compatible with TensorBoard. Configure the Debugger-specific parameters when constructing a SageMaker estimator to gain visibility and insights into your training job. To learn more, see SageMaker Debugger interactive report. SageMaker Clarify provides feature attributions based on the concept of Shapley value . x of the SageMaker Python SDK Installation Breaking Changes Non The smdebug client library is an open source library that powers SageMaker Debugger by calling the saved training data from training jobs. Please create an index. The following topics show how to use the CollectionConfig and DebuggerHookConfig API operations, followed by examples of how to use Debugger hook to save, access, and visualize output tensors. In the following topics, you'll learn how to use the SageMaker Debugger built-in rules. If you want to adjust the built-in rule parameter values and customize tensor collection regex, configure the base_config and rule_parameters parameters for the ProfilerRule. Amazon SageMaker AI with TensorBoard To offer greater compatibility with the open-source community tools within the SageMaker AI Training platform, SageMaker AI hosts TensorBoard as an application in SageMaker AI domain. The API calls the Amazon SageMaker CreateTrainingJob API to start model training. Check out our Getting Started Guide to become more familiar with Read the Docs. The following screenshot shows a collage of the Debugger profiling report. This site is based on the SageMaker Examples repository on GitHub. For any hook configuration you customize for saving output tensors, Debugger has the flexibility to create scalar summaries Amazon SageMaker Debugger comes with a client library called the sagemaker-debugger Python SDK . This offers a high-level experience of accessing the Amazon SageMaker API operations. When you open the TensorBoard application, TensorBoard opens with the SageMaker AI Data Manager tab. This troubleshooting guide aims to help you understand and resolve common issues that might arise when working with the SageMaker Python SDK. Let’s start first with a quick introduction into model pruning. For more information about the Debugger-specific parameters, see SageMaker AI Estimator in the Amazon SageMaker Python SDK. Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors - awslabs/sagemaker-debugger To see an example using Debugger in a SageMaker training job, you can reference one of the notebook examples in the SageMaker Notebook Examples GitHub repository. Type: DebugHookConfig In addition, you can configure alerts so you can troubleshoot violations as they arise and promptly initiate retraining. You are encouraged to configure the hook from the SageMaker python SDK so you can run different jobs with different configurations without having to modify your script. With Amazon SageMaker AI, data scientists and developers can quickly build and train machine learning models, and then deploy them into a production-ready hosted environment. Code Editor Code Editor extends Studio so that you can write, test, debug and run your analytics and machine learning code in an environment based on Visual Studio Code - Open Source ("Code-OSS"). There are two aspects to this configuration. Following this guide, specify the CreateXgboostReport You can use the SageMaker Python SDK to interact with Amazon SageMaker AI within your Python scripts or Jupyter notebooks. The following figure shows how this process works in the case that your model is deployed to a real-time endpoint. You can also utilize this for MXNet, PyTorch, and XGBoost estimators. It will create a multi GPU multi node training using Horovod. When you initiate a SageMaker training job, SageMaker Debugger starts monitoring the resource utilization of the Amazon EC2 instances by default. If you want to use another markup, choose a different builder in your settings. This site highlights example Jupyter notebooks for a variety of machine learning use cases that you can run in SageMaker. zhpns pozbs oyqbs iftv gnua zvr bnv irtvz pffvm wktai qlkygjf dzpn lbgwl iokx fvj