Skip to content

"togethercomputer/RedPajama-Data-1T-Sample" dataset does not exist. #33

Description

@y-vectorfield

I implemented qtip for LLM models, however, the following error occured. The "togetehr computer/RedPajama-Data-1T-Smaple" dataset does not exist now.

^MLoading checkpoint shards:   0%|          | 0/16 [00:00<?, ?it/s]^MLoading checkpoint shards:   6%|~V~K         | 1/16 [00:00<00:02,  6.29it/s]^MLoading checkpoint shards:  12%|~V~H~V~N        | 2/16 [00:00<00:02,  5.65it/s]^MLoading checkpoint shards:  19%|~V~H~V~I        | 3/16 [00:00<00:02,  5.53it/s]^MLoo
ading checkpoint shards:  25%|~V~H~V~H~V~L       | 4/16 [00:00<00:02,  5.49it/s]^MLoading checkpoint shards:  31%|~V~H~V~H~V~H~V~O      | 5/16 [00:00<00:02,  5.45it/s]^MLoading checkpoint shards:  38%|~V~H~V~H~V~H~V~J      | 6/16 [00:01<00:03,  3.31it/s]^MLoading checkpoint shards:  44%|~V~H~V~H~V~H~~
V~H~V~M     | 7/16 [00:01<00:02,  3.80it/s]^MLoading checkpoint shards:  50%|~V~H~V~H~V~H~V~H~V~H     | 8/16 [00:01<00:01,  4.20it/s]^MLoading checkpoint shards:  56%|~V~H~V~H~V~H~V~H~V~H~V~K    | 9/16 [00:01<00:01,  4.52it/s]^MLoading checkpoint shards:  62%|~V~H~V~H~V~H~V~H~V~H~V~H~V~N   | 10/11
6 [00:02<00:01,  4.78it/s]^MLoading checkpoint shards:  69%|~V~H~V~H~V~H~V~H~V~H~V~H~V~I   | 11/16 [00:02<00:01,  4.97it/s]^MLoading checkpoint shards:  75%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~L  | 12/16 [00:02<00:00,  5.11it/s]^MLoading checkpoint shards:  81%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~~
O | 13/16 [00:02<00:00,  5.23it/s]^MLoading checkpoint shards:  88%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~J | 14/16 [00:02<00:00,  5.31it/s]^MLoading checkpoint shards:  94%|~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~H~V~M| 15/16 [00:03<00:00,  5.37it/s]^MLoading checkpoint shards: 100%|~V~H~V~H~V~H~V~HH
~V~H~V~H~V~H~V~H~V~H~V~H| 16/16 [00:03<00:00,  5.16it/s]
I1127 04:29:45.708108 3912 quantize_finetune_llama.py:134] loaded model
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/root/qtip/quantize_llama/quantize_finetune_llama.py", line 214, in <module>
    main(args)
  File "/root/qtip/quantize_llama/quantize_finetune_llama.py", line 136, in main
    devset = utils.sample_rp1t(tokenizer, args.devset_size, args.ctx_size,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/qtip/lib/utils/data_utils.py", line 197, in sample_rp1t
    dataset = load_dataset('togethercomputer/RedPajama-Data-1T-Sample',
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/py/.venv/lib/python3.11/site-packages/datasets/load.py", line 2594, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/py/.venv/lib/python3.11/site-packages/datasets/load.py", line 2266, in load_dataset_builder
    dataset_module = dataset_module_factory(
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/py/.venv/lib/python3.11/site-packages/datasets/load.py", line 1908, in dataset_module_factory
    raise e1 from None
  File "/root/.local/share/py/.venv/lib/python3.11/site-packages/datasets/load.py", line 1858, in dataset_module_factory
    raise DatasetNotFoundError(f"Dataset '{path}' doesn't exist on the Hub or cannot be accessed.") from e
datasets.exceptions.DatasetNotFoundError: Dataset 'togethercomputer/RedPajama-Data-1T-Sample' doesn't exist on the Hub or cannot be accessed.

The following table is datasets list of togethercomputer. The dataset does not exist now in the lineup.

Image

https://huggingface.co/togethercomputer/datasets

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions