{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(lightning_advanced_example)=\n",
    "\n",
    "# Finetune a BERT Text Classifier with LightningTrainer\n",
    "\n",
    ":::{note}\n",
    "\n",
    "This is an advanced example for {class}`LightningTrainer <ray.train.lightning.LightningTrainer>`, which demonstrates how to use LightningTrainer with {ref}`Dataset <data>` and {ref}`Batch Predictor <air-predictors>`. \n",
    "\n",
    "If you just want to quickly convert your existing PyTorch Lightning scripts into Ray AIR, you can refer to this starter example:\n",
    "{ref}`Train a Pytorch Lightning Image Classifier <lightning_mnist_example>`.\n",
    "\n",
    ":::\n",
    "\n",
    "In this demo, we will introduce how to finetune a text classifier on [CoLA(The Corpus of Linguistic Acceptability)](https://nyu-mll.github.io/CoLA/) datasets with pretrained BERT. \n",
    "In particular, we will:\n",
    "- Create Ray Data from the original CoLA dataset.\n",
    "- Define a preprocessor to tokenize the sentences.\n",
    "- Finetune a BERT model using LightningTrainer.\n",
    "- Construct a BatchPredictor with the checkpoint and preprocessor.\n",
    "- Do batch prediction on multiple GPUs, and evaluate the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "SMOKE_TEST = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the following line in order to install all the necessary dependencies:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install numpy datasets \"transformers>=4.19.1\" \"pytorch_lightning>=1.6.5\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start by importing the needed libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ray\n",
    "import torch\n",
    "import pytorch_lightning as pl\n",
    "import torch.nn.functional as F\n",
    "from torch.utils.data import DataLoader, random_split\n",
    "from transformers import AutoTokenizer, AutoModelForSequenceClassification\n",
    "from datasets import load_dataset, load_metric\n",
    "import numpy as np"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-process CoLA Dataset\n",
    "\n",
    "CoLA is a binary sentence classification task with 10.6K training examples. First, we download the dataset and metrics using the HuggingFace API, and create Ray Data for each split accordingly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = load_dataset(\"glue\", \"cola\")\n",
    "metric = load_metric(\"glue\", \"cola\")\n",
    "\n",
    "ray_datasets = ray.data.from_huggingface(dataset)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, define a preprocessor that tokenizes the input sentences and pads the ID sequence to length 128 using the bert-base-uncased tokenizer. The preprocessor transforms all datasets that we provide to the LightningTrainer later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ray.data.preprocessors import BatchMapper\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased\")\n",
    "\n",
    "\n",
    "def tokenize_sentence(batch):\n",
    "    encoded_sent = tokenizer(\n",
    "        batch[\"sentence\"].tolist(),\n",
    "        max_length=128,\n",
    "        truncation=True,\n",
    "        padding=\"max_length\",\n",
    "        return_tensors=\"pt\",\n",
    "    )\n",
    "    batch[\"input_ids\"] = encoded_sent[\"input_ids\"].numpy()\n",
    "    batch[\"attention_mask\"] = encoded_sent[\"attention_mask\"].numpy()\n",
    "    batch[\"label\"] = np.array(batch[\"label\"])\n",
    "    batch.pop(\"sentence\")\n",
    "    return batch\n",
    "\n",
    "\n",
    "preprocessor = BatchMapper(tokenize_sentence, batch_format=\"numpy\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define a PyTorch Lightning Model\n",
    "\n",
    "You don't have to make any change of your `LightningModule` definition. Just copy and paste your code here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SentimentModel(pl.LightningModule):\n",
    "    def __init__(self, lr=2e-5, eps=1e-8):\n",
    "        super().__init__()\n",
    "        self.lr = lr\n",
    "        self.eps = eps\n",
    "        self.num_classes = 2\n",
    "        self.model = AutoModelForSequenceClassification.from_pretrained(\n",
    "            \"bert-base-cased\", num_labels=self.num_classes\n",
    "        )\n",
    "        self.metric = load_metric(\"glue\", \"cola\")\n",
    "        self.predictions = []\n",
    "        self.references = []\n",
    "\n",
    "    def forward(self, batch):\n",
    "        input_ids, attention_mask = batch[\"input_ids\"], batch[\"attention_mask\"]\n",
    "        outputs = self.model(input_ids, attention_mask=attention_mask)\n",
    "        logits = outputs.logits\n",
    "        return logits\n",
    "\n",
    "    def training_step(self, batch, batch_idx):\n",
    "        labels = batch[\"label\"]\n",
    "        logits = self.forward(batch)\n",
    "        loss = F.cross_entropy(logits.view(-1, self.num_classes), labels)\n",
    "        self.log(\"train_loss\", loss)\n",
    "        return loss\n",
    "\n",
    "    def validation_step(self, batch, batch_idx):\n",
    "        labels = batch[\"label\"]\n",
    "        logits = self.forward(batch)\n",
    "        preds = torch.argmax(logits, dim=1)\n",
    "        self.predictions.append(preds)\n",
    "        self.references.append(labels)\n",
    "\n",
    "    def on_validation_epoch_end(self):\n",
    "        predictions = torch.concat(self.predictions).view(-1)\n",
    "        references = torch.concat(self.references).view(-1)\n",
    "        matthews_correlation = self.metric.compute(\n",
    "            predictions=predictions, references=references\n",
    "        )\n",
    "\n",
    "        # self.metric.compute() returns a dictionary:\n",
    "        # e.g. {\"matthews_correlation\": 0.53}\n",
    "        self.log_dict(matthews_correlation, sync_dist=True)\n",
    "        self.predictions.clear()\n",
    "        self.references.clear()\n",
    "\n",
    "    def configure_optimizers(self):\n",
    "        return torch.optim.AdamW(self.parameters(), lr=self.lr, eps=self.eps)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure your LightningTrainer\n",
    "\n",
    "Define a LightningTrainer with necessary configurations, including hyper-parameters, checkpointing and compute resources settings. \n",
    "\n",
    "You may find the API of {class}`LightningConfigBuilder <ray.train.lightning.LightningConfigBuilder>` and the discussion {ref}`here <lightning-config-builder-intro>` useful.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ray.train.lightning import LightningTrainer, LightningConfigBuilder\n",
    "from ray.air.config import RunConfig, ScalingConfig, CheckpointConfig\n",
    "\n",
    "# Define the configs for LightningTrainer\n",
    "lightning_config = (\n",
    "    LightningConfigBuilder()\n",
    "    .module(cls=SentimentModel, lr=1e-5, eps=1e-8)\n",
    "    .trainer(max_epochs=5, accelerator=\"gpu\")\n",
    "    .checkpointing(save_on_train_epoch_end=False)\n",
    "    .build()\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "Note that the `lightning_config` is created on the head node and will be passed to the worker nodes later. Be aware that the environment variables and hardware settings may differ between the head node and worker nodes.\n",
    ":::\n",
    "\n",
    ":::{note}\n",
    "{meth}`LightningConfigBuilder.checkpointing() <ray.train.lightning.LightningConfigBuilder.checkpointing>` creates a [ModelCheckpoint](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.ModelCheckpoint.html#lightning.pytorch.callbacks.ModelCheckpoint) callback. This callback defines the checkpoint frequency and saves checkpoint files in Lightning style. \n",
    "\n",
    "If you want to save AIR checkpoints for Batch Prediction, please also provide an AIR {class}`CheckpointConfig <ray.air.config.CheckpointConfig>`.\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save AIR checkpoints according to the performance on validation set\n",
    "run_config = RunConfig(\n",
    "    name=\"ptl-sent-classification\",\n",
    "    checkpoint_config=CheckpointConfig(\n",
    "        num_to_keep=2,\n",
    "        checkpoint_score_attribute=\"matthews_correlation\",\n",
    "        checkpoint_score_order=\"max\",\n",
    "    ),\n",
    ")\n",
    "\n",
    "# Scale the DDP training workload across 4 GPUs\n",
    "# You can change this config based on your compute resources.\n",
    "scaling_config = ScalingConfig(\n",
    "    num_workers=4, use_gpu=True, resources_per_worker={\"CPU\": 1, \"GPU\": 1}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "if SMOKE_TEST:\n",
    "    lightning_config = (\n",
    "        LightningConfigBuilder()\n",
    "        .module(cls=SentimentModel, lr=1e-5, eps=1e-8)\n",
    "        .trainer(max_epochs=2, accelerator=\"gpu\")\n",
    "        .checkpointing(save_on_train_epoch_end=False)\n",
    "        .build()\n",
    "    )\n",
    "\n",
    "    for split, ds in ray_datasets.items():\n",
    "        ray_datasets[split] = ds.random_sample(0.1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fine-tune the model with LightningTrainer\n",
    "\n",
    "Train the model with the configuration we specified above. \n",
    "\n",
    "To feed data into LightningTrainer, we need to configure the following arguments:\n",
    "\n",
    "- `datasets`: A dictionary of the input Ray datasets, with special keys \"train\" and \"val\".\n",
    "- `datasets_iter_config`: The argument list of {meth}`iter_torch_batches() <ray.data.Dataset.iter_torch_batches>`. It defines the way we iterate dataset shards for each worker.\n",
    "- `preprocessor`: The preprocessor that will be applied to the input dataset.\n",
    "\n",
    ":::{note}\n",
    "Note that we are using Dataset for data ingestion for faster preprocessing here, but you can also continue to use the native `PyTorch DataLoader` or `LightningDataModule`. See {ref}`this example <lightning_mnist_example>`. \n",
    "\n",
    ":::\n",
    "\n",
    "\n",
    "Now, call `trainer.fit()` to initiate the training process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"tuneStatus\">\n",
       "  <div style=\"display: flex;flex-direction: row\">\n",
       "    <div style=\"display: flex;flex-direction: column;\">\n",
       "      <h3>Tune Status</h3>\n",
       "      <table>\n",
       "<tbody>\n",
       "<tr><td>Current time:</td><td>2023-04-24 10:42:50</td></tr>\n",
       "<tr><td>Running for: </td><td>00:06:26.94        </td></tr>\n",
       "<tr><td>Memory:      </td><td>23.8/186.6 GiB     </td></tr>\n",
       "</tbody>\n",
       "</table>\n",
       "    </div>\n",
       "    <div class=\"vDivider\"></div>\n",
       "    <div class=\"systemInfo\">\n",
       "      <h3>System Info</h3>\n",
       "      Using FIFO scheduling algorithm.<br>Logical resource usage: 0/48 CPUs, 0/4 GPUs (0.0/1.0 accelerator_type:T4)\n",
       "    </div>\n",
       "    \n",
       "  </div>\n",
       "  <div class=\"hDivider\"></div>\n",
       "  <div class=\"trialStatus\">\n",
       "    <h3>Trial Status</h3>\n",
       "    <table>\n",
       "<thead>\n",
       "<tr><th>Trial name                  </th><th>status    </th><th>loc              </th><th style=\"text-align: right;\">  iter</th><th style=\"text-align: right;\">  total time (s)</th><th style=\"text-align: right;\">  train_loss</th><th style=\"text-align: right;\">  matthews_correlation</th><th style=\"text-align: right;\">  epoch</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td>LightningTrainer_87ecf_00000</td><td>TERMINATED</td><td>10.0.60.127:67819</td><td style=\"text-align: right;\">     5</td><td style=\"text-align: right;\">         376.028</td><td style=\"text-align: right;\">   0.0119807</td><td style=\"text-align: right;\">              0.589931</td><td style=\"text-align: right;\">      4</td></tr>\n",
       "</tbody>\n",
       "</table>\n",
       "  </div>\n",
       "</div>\n",
       "<style>\n",
       ".tuneStatus {\n",
       "  color: var(--jp-ui-font-color1);\n",
       "}\n",
       ".tuneStatus .systemInfo {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "}\n",
       ".tuneStatus td {\n",
       "  white-space: nowrap;\n",
       "}\n",
       ".tuneStatus .trialStatus {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "}\n",
       ".tuneStatus h3 {\n",
       "  font-weight: bold;\n",
       "}\n",
       ".tuneStatus .hDivider {\n",
       "  border-bottom-width: var(--jp-border-width);\n",
       "  border-bottom-color: var(--jp-border-color0);\n",
       "  border-bottom-style: solid;\n",
       "}\n",
       ".tuneStatus .vDivider {\n",
       "  border-left-width: var(--jp-border-width);\n",
       "  border-left-color: var(--jp-border-color0);\n",
       "  border-left-style: solid;\n",
       "  margin: 0.5em 1em 0.5em 1em;\n",
       "}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "(pid=67819) /home/ray/anaconda3/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n",
      "(pid=67819)   from pandas import MultiIndex, Int64Index\n",
      "(LightningTrainer pid=67819) 2023-04-24 10:36:31,679\tINFO backend_executor.py:128 -- Starting distributed worker processes: ['68396 (10.0.60.127)', '68397 (10.0.60.127)', '68398 (10.0.60.127)', '68399 (10.0.60.127)']\n",
      "(RayTrainWorker pid=68396) 2023-04-24 10:36:32,731\tINFO config.py:86 -- Setting up process group for: env:// [rank=0, world_size=4]\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f9443dd2a6dc49029ef7fb4d7a596729",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=67819) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "283f3585cca4444d904ebbc138527a15",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=67819) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "(LightningTrainer pid=67819) 2023-04-24 10:36:34,052\tINFO streaming_executor.py:87 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[BatchMapper] -> AllToAllOperator[RandomizeBlockOrder]\n",
      "(LightningTrainer pid=67819) 2023-04-24 10:36:34,052\tINFO streaming_executor.py:88 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "(LightningTrainer pid=67819) 2023-04-24 10:36:34,053\tINFO streaming_executor.py:90 -- Tip: To enable per-operator progress reporting, set RAY_DATA_VERBOSE_PROGRESS=1.\n",
      "(RayTrainWorker pid=68396) /home/ray/anaconda3/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n",
      "(RayTrainWorker pid=68396)   from pandas import MultiIndex, Int64Index\n",
      "Downloading:   0%|          | 0.00/416M [00:00<?, ?B/s]\n",
      "Downloading:   0%|          | 1.41M/416M [00:00<00:29, 14.8MB/s]\n",
      "Downloading:   2%|▏         | 7.52M/416M [00:00<00:09, 43.8MB/s]\n",
      "Downloading:   4%|▍         | 16.8M/416M [00:00<00:06, 68.4MB/s]\n",
      "Downloading:   6%|▌         | 25.7M/416M [00:00<00:05, 78.2MB/s]\n",
      "Downloading:   8%|▊         | 34.1M/416M [00:00<00:04, 81.8MB/s]\n",
      "Downloading:  10%|█         | 42.0M/416M [00:00<00:04, 80.8MB/s]\n",
      "Downloading:  12%|█▏        | 49.7M/416M [00:00<00:05, 76.4MB/s]\n",
      "Downloading:  14%|█▍        | 58.3M/416M [00:00<00:04, 80.5MB/s]\n",
      "Downloading:  16%|█▌        | 66.5M/416M [00:00<00:04, 82.1MB/s]\n",
      "Downloading:  18%|█▊        | 74.3M/416M [00:01<00:04, 78.1MB/s]\n",
      "Downloading:  20%|██        | 83.2M/416M [00:01<00:04, 82.5MB/s]\n",
      "Downloading:  22%|██▏       | 91.8M/416M [00:01<00:04, 84.5MB/s]\n",
      "Downloading:  24%|██▍       | 99.9M/416M [00:01<00:04, 79.3MB/s]\n",
      "Downloading:  26%|██▌       | 108M/416M [00:01<00:04, 80.3MB/s] \n",
      "Downloading:  28%|██▊       | 116M/416M [00:01<00:04, 78.3MB/s]\n",
      "Downloading:  30%|██▉       | 123M/416M [00:01<00:03, 79.3MB/s]\n",
      "Downloading:  31%|███▏      | 131M/416M [00:01<00:04, 72.6MB/s]\n",
      "Downloading:  34%|███▎      | 139M/416M [00:01<00:03, 76.8MB/s]\n",
      "Downloading:  35%|███▌      | 147M/416M [00:02<00:03, 79.2MB/s]\n",
      "Downloading:  37%|███▋      | 155M/416M [00:02<00:03, 77.9MB/s]\n",
      "Downloading:  39%|███▉      | 163M/416M [00:02<00:03, 67.7MB/s]\n",
      "Downloading:  42%|████▏     | 173M/416M [00:02<00:03, 79.2MB/s]\n",
      "Downloading:  44%|████▎     | 182M/416M [00:02<00:02, 81.8MB/s]\n",
      "Downloading:  46%|████▌     | 190M/416M [00:02<00:03, 70.8MB/s]\n",
      "(RayTrainWorker pid=68399) /home/ray/anaconda3/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. [repeated 3x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)\n",
      "(RayTrainWorker pid=68399)   from pandas import MultiIndex, Int64Index [repeated 3x across cluster]\n",
      "Downloading:  48%|████▊     | 198M/416M [00:02<00:03, 74.1MB/s]\n",
      "Downloading:  49%|████▉     | 205M/416M [00:02<00:03, 72.9MB/s]\n",
      "Downloading:  51%|█████     | 212M/416M [00:02<00:02, 71.7MB/s]\n",
      "Downloading:  53%|█████▎    | 220M/416M [00:03<00:02, 73.6MB/s]\n",
      "Downloading:  55%|█████▍    | 228M/416M [00:03<00:02, 76.4MB/s]\n",
      "Downloading:  57%|█████▋    | 236M/416M [00:03<00:02, 78.7MB/s]\n",
      "Downloading:  59%|█████▊    | 244M/416M [00:03<00:02, 75.0MB/s]\n",
      "Downloading:  60%|██████    | 251M/416M [00:03<00:02, 73.0MB/s]\n",
      "Downloading:  62%|██████▏   | 258M/416M [00:03<00:02, 67.8MB/s]\n",
      "Downloading:  64%|██████▎   | 264M/416M [00:03<00:02, 67.1MB/s]\n",
      "Downloading:  66%|██████▌   | 273M/416M [00:03<00:02, 72.4MB/s]\n",
      "Downloading:  67%|██████▋   | 280M/416M [00:03<00:01, 73.5MB/s]\n",
      "Downloading:  69%|██████▉   | 287M/416M [00:04<00:02, 65.1MB/s]\n",
      "Downloading:  71%|███████   | 294M/416M [00:04<00:01, 67.6MB/s]\n",
      "Downloading:  73%|███████▎  | 302M/416M [00:04<00:01, 72.4MB/s]\n",
      "Downloading:  74%|███████▍  | 309M/416M [00:04<00:01, 69.6MB/s]\n",
      "Downloading:  76%|███████▋  | 318M/416M [00:04<00:01, 75.2MB/s]\n",
      "Downloading:  78%|███████▊  | 326M/416M [00:04<00:01, 78.8MB/s]\n",
      "Downloading:  80%|████████  | 334M/416M [00:04<00:01, 77.5MB/s]\n",
      "Downloading:  82%|████████▏ | 341M/416M [00:04<00:01, 75.1MB/s]\n",
      "Downloading:  84%|████████▍ | 349M/416M [00:04<00:00, 75.4MB/s]\n",
      "Downloading:  86%|████████▌ | 356M/416M [00:04<00:00, 76.7MB/s]\n",
      "Downloading:  88%|████████▊ | 365M/416M [00:05<00:00, 78.9MB/s]\n",
      "Downloading:  90%|████████▉ | 372M/416M [00:05<00:00, 75.9MB/s]\n",
      "Downloading:  91%|█████████▏| 380M/416M [00:05<00:00, 78.5MB/s]\n",
      "Downloading:  93%|█████████▎| 388M/416M [00:05<00:00, 78.5MB/s]\n",
      "Downloading:  95%|█████████▌| 395M/416M [00:05<00:00, 75.7MB/s]\n",
      "Downloading:  97%|█████████▋| 403M/416M [00:05<00:00, 70.9MB/s]\n",
      "Downloading: 100%|██████████| 416M/416M [00:05<00:00, 74.0MB/s]\n",
      "(RayTrainWorker pid=68398) Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias']\n",
      "(RayTrainWorker pid=68398) - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "(RayTrainWorker pid=68398) - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "(RayTrainWorker pid=68398) Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
      "(RayTrainWorker pid=68398) You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
      "(RayTrainWorker pid=68396) Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']\n",
      "(RayTrainWorker pid=68396) Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "(RayTrainWorker pid=68397) Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']\n",
      "(RayTrainWorker pid=68399) Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight']\n",
      "(RayTrainWorker pid=68398) Missing logger folder: /home/ray/ray_results/ptl-sent-classification/LightningTrainer_87ecf_00000_0_2023-04-24_10-36-23/rank_2/lightning_logs\n",
      "(RayTrainWorker pid=68396) GPU available: True, used: True\n",
      "(RayTrainWorker pid=68396) TPU available: False, using: 0 TPU cores\n",
      "(RayTrainWorker pid=68396) IPU available: False, using: 0 IPUs\n",
      "(RayTrainWorker pid=68396) HPU available: False, using: 0 HPUs\n",
      "(RayTrainWorker pid=68398) LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3]\n",
      "(RayTrainWorker pid=68399) - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). [repeated 3x across cluster]\n",
      "(RayTrainWorker pid=68399) - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [repeated 3x across cluster]\n",
      "(RayTrainWorker pid=68399) LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3]\n",
      "(RayTrainWorker pid=68399) You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [repeated 3x across cluster]\n",
      "(RayTrainWorker pid=68397) LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3]\n",
      "(RayTrainWorker pid=68399) Missing logger folder: /home/ray/ray_results/ptl-sent-classification/LightningTrainer_87ecf_00000_0_2023-04-24_10-36-23/rank_3/lightning_logs [repeated 3x across cluster]\n",
      "(RayTrainWorker pid=68396) \n",
      "(RayTrainWorker pid=68396)   | Name  | Type                          | Params\n",
      "(RayTrainWorker pid=68396) --------------------------------------------------------\n",
      "(RayTrainWorker pid=68396) 0 | model | BertForSequenceClassification | 108 M \n",
      "(RayTrainWorker pid=68396) --------------------------------------------------------\n",
      "(RayTrainWorker pid=68396) 108 M     Trainable params\n",
      "(RayTrainWorker pid=68396) 0         Non-trainable params\n",
      "(RayTrainWorker pid=68396) 108 M     Total params\n",
      "(RayTrainWorker pid=68396) 433.247   Total estimated model params size (MB)\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:36:59,628\tINFO streaming_executor.py:87 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[BatchMapper] -> AllToAllOperator[RandomizeBlockOrder]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:36:59,629\tINFO streaming_executor.py:88 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:36:59,629\tINFO streaming_executor.py:90 -- Tip: To enable per-operator progress reporting, set RAY_DATA_VERBOSE_PROGRESS=1.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "70151d1b6133418fb5bf5e39b0089dd6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "17b58680ece94b7699a303cad96aff25",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "76460ea09d4a4ec0b99118aa688fd5c6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b514a52d9e10448f8bd3dddae8e23461",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "(RayTrainWorker pid=68396) /home/ray/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.\n",
      "(RayTrainWorker pid=68396)   rank_zero_warn(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fa311ab8d9b845e0834d8c7b2fc5a9cc",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "eb96302f8c3342f092028eca25803713",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4503395f4dac4f0ea775f627a14375d0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "956a7115081d4a12af5b1a4308fc25a7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "(RayTrainWorker pid=68396) /home/ray/anaconda3/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.\n",
      "(RayTrainWorker pid=68396)   rank_zero_warn(\n",
      "(RayTrainWorker pid=68399) LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3] [repeated 3x across cluster]\n",
      "(RayTrainWorker pid=68399) 2023-04-24 10:36:59,628\tINFO streaming_executor.py:87 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[BatchMapper] -> AllToAllOperator[RandomizeBlockOrder] [repeated 3x across cluster]\n",
      "(RayTrainWorker pid=68399) 2023-04-24 10:36:59,628\tINFO streaming_executor.py:88 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False) [repeated 3x across cluster]\n",
      "(RayTrainWorker pid=68399) 2023-04-24 10:36:59,629\tINFO streaming_executor.py:90 -- Tip: To enable per-operator progress reporting, set RAY_DATA_VERBOSE_PROGRESS=1. [repeated 3x across cluster]\n",
      "(RayTrainWorker pid=68398) [W reducer.cpp:1298] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())\n",
      "(RayTrainWorker pid=68396) 2023-04-24 10:37:27.091660: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA\n",
      "(RayTrainWorker pid=68396) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "(RayTrainWorker pid=68399) [W reducer.cpp:1298] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) [repeated 3x across cluster]\n",
      "(RayTrainWorker pid=68396) 2023-04-24 10:37:27.373013: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
      "(RayTrainWorker pid=68396) 2023-04-24 10:37:28.763569: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64\n",
      "(RayTrainWorker pid=68396) 2023-04-24 10:37:28.763761: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64\n",
      "(RayTrainWorker pid=68396) 2023-04-24 10:37:28.763770: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:38:01,220\tINFO streaming_executor.py:87 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[BatchMapper] -> AllToAllOperator[RandomizeBlockOrder]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:38:01,221\tINFO streaming_executor.py:88 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:38:01,221\tINFO streaming_executor.py:90 -- Tip: To enable per-operator progress reporting, set RAY_DATA_VERBOSE_PROGRESS=1.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "50090e60317342e8a2fa5747b2dfc7dd",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "8ffe378ca65e4a698362350d0d49eff1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0453d681b8fb4bdaa984028bd2c9b93d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "84eebff1dde74a4eb9a026dfef625756",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7015d4b830db42318c3472d44d36ff85",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ad928a0e15c648b9adb7d55a62f5aeda",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9ae588ddd9b6452b885a32272a7ce434",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "30b8764d38b34068b454ae27c3c01218",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div class=\"trialProgress\">\n",
       "  <h3>Trial Progress</h3>\n",
       "  <table>\n",
       "<thead>\n",
       "<tr><th>Trial name                  </th><th>_report_on    </th><th>date               </th><th>done  </th><th style=\"text-align: right;\">  epoch</th><th style=\"text-align: right;\">  experiment_tag</th><th>hostname      </th><th style=\"text-align: right;\">  iterations_since_restore</th><th style=\"text-align: right;\">  matthews_correlation</th><th>node_ip    </th><th style=\"text-align: right;\">  pid</th><th>should_checkpoint  </th><th style=\"text-align: right;\">  step</th><th style=\"text-align: right;\">  time_since_restore</th><th style=\"text-align: right;\">  time_this_iter_s</th><th style=\"text-align: right;\">  time_total_s</th><th style=\"text-align: right;\">  timestamp</th><th style=\"text-align: right;\">  train_loss</th><th style=\"text-align: right;\">  training_iteration</th><th>trial_id   </th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "<tr><td>LightningTrainer_87ecf_00000</td><td>validation_end</td><td>2023-04-24_10-42-46</td><td>True  </td><td style=\"text-align: right;\">      4</td><td style=\"text-align: right;\">               0</td><td>ip-10-0-60-127</td><td style=\"text-align: right;\">                         5</td><td style=\"text-align: right;\">              0.589931</td><td>10.0.60.127</td><td style=\"text-align: right;\">67819</td><td>True               </td><td style=\"text-align: right;\">   670</td><td style=\"text-align: right;\">             376.028</td><td style=\"text-align: right;\">           70.6609</td><td style=\"text-align: right;\">       376.028</td><td style=\"text-align: right;\"> 1682358165</td><td style=\"text-align: right;\">   0.0119807</td><td style=\"text-align: right;\">                   5</td><td>87ecf_00000</td></tr>\n",
       "</tbody>\n",
       "</table>\n",
       "</div>\n",
       "<style>\n",
       ".trialProgress {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "  color: var(--jp-ui-font-color1);\n",
       "}\n",
       ".trialProgress h3 {\n",
       "  font-weight: bold;\n",
       "}\n",
       ".trialProgress td {\n",
       "  white-space: nowrap;\n",
       "}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "(RayTrainWorker pid=68398) 2023-04-24 10:39:03,705\tINFO streaming_executor.py:87 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[BatchMapper] -> AllToAllOperator[RandomizeBlockOrder] [repeated 4x across cluster]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:39:03,706\tINFO streaming_executor.py:88 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False) [repeated 4x across cluster]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:39:03,706\tINFO streaming_executor.py:90 -- Tip: To enable per-operator progress reporting, set RAY_DATA_VERBOSE_PROGRESS=1. [repeated 4x across cluster]\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "125ccea4d26e48c0bf4e45610f9ae64a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "42e6e09b0f12416ba9a4214f891889f6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b3317e738d6e48a99fb6d6474a82ea8a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ed9185f249694fec8d12094aed5706fe",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0263af677aab4551b8e7395d61944ffc",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "6a96d187643d48958a9c1b6da1bbf14d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7786ef2aa39f4255a507f0fcd9ff007d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "12553efe6fe442a5b20bd15e17f159da",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "(RayTrainWorker pid=68398) 2023-04-24 10:40:09,873\tINFO streaming_executor.py:87 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[BatchMapper] -> AllToAllOperator[RandomizeBlockOrder] [repeated 4x across cluster]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:40:09,873\tINFO streaming_executor.py:88 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False) [repeated 4x across cluster]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:40:09,873\tINFO streaming_executor.py:90 -- Tip: To enable per-operator progress reporting, set RAY_DATA_VERBOSE_PROGRESS=1. [repeated 4x across cluster]\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "db4c22b67b844a6d8ff3e1882540bce4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7de5ee4b2912422bb8ce282ec8176f27",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2c836b08bf3d440abf6b35bf6d80b13e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bff92b27bd0b495e8e320473f044a1af",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "da3814c377f5457692a1cc634b7ce333",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "97b6a7cfc16f401aa43fbca727a077fd",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a92f027724f84ff98d87f0ce0a36d78b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2bd8112d0b6943afbed9ced00389f93d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "(RayTrainWorker pid=68398) 2023-04-24 10:41:18,552\tINFO streaming_executor.py:87 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[BatchMapper] -> AllToAllOperator[RandomizeBlockOrder] [repeated 4x across cluster]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:41:18,552\tINFO streaming_executor.py:88 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False) [repeated 4x across cluster]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:41:18,552\tINFO streaming_executor.py:90 -- Tip: To enable per-operator progress reporting, set RAY_DATA_VERBOSE_PROGRESS=1. [repeated 4x across cluster]\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ccc3d13c44b344e8891a81794fd17ffe",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "51d8cf4c66b64b419648ce2a42da3dae",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7a18bf2a62e745deaccc1eb3d219ea09",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1861c5b19416408aa8daaab5fd52fd84",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c8156c6329ce4325b17c3fc3bb3189b9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "debcd88ab8554bbcb1f5a79c74c9fb9f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d7275d28012d4c4dbe57d32563a59e02",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2e24892e2e474d9fbb01da32ed939f16",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "(RayTrainWorker pid=68398) 2023-04-24 10:42:29,325\tINFO streaming_executor.py:87 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[BatchMapper] -> AllToAllOperator[RandomizeBlockOrder] [repeated 4x across cluster]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:42:29,325\tINFO streaming_executor.py:88 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False) [repeated 4x across cluster]\n",
      "(RayTrainWorker pid=68398) 2023-04-24 10:42:29,325\tINFO streaming_executor.py:90 -- Tip: To enable per-operator progress reporting, set RAY_DATA_VERBOSE_PROGRESS=1. [repeated 4x across cluster]\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "55f6f7e8333341d1b57a890809bc90ad",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "97080faed72f49de81a97337499b6d52",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68398) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "16b64e5dc33e47f38a785f6699192224",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0554132542674890aad33cc55a4f8e4a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68396) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f309233b6e4a41fd9d566eb56d66d376",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "c90dbf036ed241b9a663a1514685afdc",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68397) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "20489e7331a44a9ea9baa64f96b7b0e3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) - RandomizeBlockOrder 1:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9f4666113bb1413f9bb35d601d8571d4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "(pid=68399) Running 0:   0%|          | 0/1 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2023-04-24 10:42:50,016\tINFO tune.py:1010 -- Total run time: 387.00 seconds (386.94 seconds for the tuning loop).\n"
     ]
    }
   ],
   "source": [
    "trainer = LightningTrainer(\n",
    "    lightning_config=lightning_config,\n",
    "    run_config=run_config,\n",
    "    scaling_config=scaling_config,\n",
    "    datasets={\"train\": ray_datasets[\"train\"], \"val\": ray_datasets[\"validation\"]},\n",
    "    datasets_iter_config={\"batch_size\": 16},\n",
    "    preprocessor=preprocessor,\n",
    ")\n",
    "result = trainer.fit()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "Note that we are using Ray Data for data ingestion for faster preprocessing here, but you can also continue to use the native `PyTorch DataLoader` or `LightningDataModule`. See {ref}`this example <lightning_mnist_example>`. \n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Result(\n",
       "  metrics={'_report_on': 'validation_end', 'train_loss': 0.011980690062046051, 'matthews_correlation': 0.5899314497879129, 'epoch': 4, 'step': 670, 'should_checkpoint': True, 'done': True, 'trial_id': '87ecf_00000', 'experiment_tag': '0'},\n",
       "  path='/home/ray/ray_results/ptl-sent-classification/LightningTrainer_87ecf_00000_0_2023-04-24_10-36-23',\n",
       "  checkpoint=LightningCheckpoint(local_path=/home/ray/ray_results/ptl-sent-classification/LightningTrainer_87ecf_00000_0_2023-04-24_10-36-23/checkpoint_000004)\n",
       ")"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Do Batch Inference with a Saved Checkpoint"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have fine-tuned the module, we can load the checkpoint into a BatchPredictor and perform fast inference with multiple GPUs. It will distribute the inference workload across multiple workers when calling `predict()` and run prediction on multiple shards of data in parallel. \n",
    "\n",
    "You can find more details in [Using Predictors for Inference](air-predictors)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from ray.train.batch_predictor import BatchPredictor\n",
    "from ray.train.lightning import LightningCheckpoint, LightningPredictor\n",
    "\n",
    "# Use in-memory checkpoint object\n",
    "checkpoint = result.checkpoint\n",
    "\n",
    "# You can also load a checkpoint from disk:\n",
    "# YOUR_CHECKPOINT_DIR = result.checkpoint.path\n",
    "# checkpoint = LightningCheckpoint.from_directory(YOUR_CHECKPOINT_DIR)\n",
    "\n",
    "batch_predictor = BatchPredictor(\n",
    "    checkpoint=checkpoint,\n",
    "    predictor_cls=LightningPredictor,\n",
    "    use_gpu=True,\n",
    "    model_class=SentimentModel,\n",
    "    preprocessor=preprocessor,\n",
    ")\n",
    "\n",
    "# Use 2 GPUs for batch inference\n",
    "predictions = batch_predictor.predict(\n",
    "    ray_datasets[\"validation\"],\n",
    "    feature_columns=[\"input_ids\", \"attention_mask\", \"label\"],\n",
    "    keep_columns=[\"label\"],\n",
    "    batch_size=16,\n",
    "    min_scoring_workers=2,\n",
    "    max_scoring_workers=2,\n",
    "    num_gpus_per_worker=1,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We obtained a Ray dataset containing predictions from `batch_predictor.predict()`. Now we can easily evaluate the results with just a few lines of code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Internally, BatchPredictor calls forward() method of the LightningModule.\n",
    "# Convert the logits tensor into labels with argmax.\n",
    "def argmax(batch):\n",
    "    batch[\"predictions\"] = batch[\"predictions\"].apply(lambda x: np.argmax(x))\n",
    "    return batch\n",
    "\n",
    "\n",
    "results = predictions.map_batches(argmax, batch_format=\"pandas\").to_pandas()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   predictions  label\n",
      "0            1      1\n",
      "1            1      1\n",
      "2            0      1\n",
      "3            1      1\n",
      "4            0      0\n",
      "5            1      0\n",
      "6            1      0\n",
      "7            1      1\n",
      "8            1      1\n",
      "9            1      1\n",
      "\n",
      "{'matthews_correlation': 0.5899314497879129}\n"
     ]
    }
   ],
   "source": [
    "matthews_corr = metric.compute(\n",
    "    predictions=results[\"predictions\"], references=results[\"label\"]\n",
    ")\n",
    "print(results.head(10))\n",
    "print(matthews_corr)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's next?\n",
    "\n",
    "- {ref}`Fine-tune a Large Language Model with LightningTrainer and FSDP <dolly_lightning_fsdp_finetuning>`\n",
    "- {ref}`Hyperparameter searching with LightningTrainer + Ray Tune. <tune-pytorch-lightning-ref>`\n",
    "- {ref}`Experiment Tracking with Wandb, CometML, MLFlow, and Tensorboard in LightningTrainer <lightning_experiment_tracking>`"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "build",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.16"
  },
  "vscode": {
   "interpreter": {
    "hash": "178108d354ddc93ba36c4b7bfc5283800982aac0e7ca92cc0cf312ad1b8f8b20"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}