467
edits
Changes
no edit summary
Of note, the K-Scale OpenVLA adaptation by [[User:Paweł]] is at https://github.com/kscalelabs/openvla == Prismatic REPL Script Guide == Here are some suggestions to running run the generate.py REPL Script from the repo if (you can find this in the '''scripts''' folder). == Prerequisites == Before running the script, ensure you have the following: * Python 3.8 or higher installed* NVIDIA GPU with CUDA support (optional but recommended for faster processing)* Hugging Face account and token for accessing Meta Lllama == Setting Up the Environment == In addition to installing requirements-min.txt from the repo, you want probably need to install rich, tensorflow_graphics, tensorflow-datasets and dlimp. Set up Hugging Face token You need a Hugging Face token to try out OpenVLAaccess certain models. Create a .hf_token file thats needed by the script. Create a file named `.hf_token` in the root directory of your project and add your Hugging Face token to this file: <syntaxhighlight lang="sh">echo "your_hugging_face_token" > .hf_token</syntaxhighlight> == Sample Images for generate.py REPL == You can get these by capturing frames or screenshotting rollout videos from <pre> https://openvla.github.io/ </pre> Make sure the images have an end effector in them. [[File:Coke can2.png|400px|Can pickup task]] == Starting REPL mode == Then, run generate.py. The script starts by initializing the generation playground with the Prismatic model prism-dinosiglip+7b. The model prism-dinosiglip+7b is downloaded from the Hugging Face Hub. The model configuration is found and then the model is loaded with the following components: Vision Backbone: dinosiglip-vit-so-384px Language Model (LLM) Backbone: llama2-7b-pure (this is also where the hf token comes into play) Architecture Specifier: no-align+fused-gelu-mlp Checkpoint Path: The model checkpoint is loaded from a specific path in the cache. You should see this in your terminal: [[File:Openvla1.png|800px|prismatic models]] ''After loading the model, the script enters a REPL mode, allowing the user to interact with the model. The REPL mode provides a default generation setup and waits for user inputs.'' Basically, the generate.py script runs a REPL that allows users to interactively test generating outputs from the Prismatic model prism-dinosiglip+7b. Upon running the script, users can enter commands in the REPL prompt: type (i) to load a new local image by specifying its path, (p) to update the prompt template for generating outputs, (q) to quit the REPL, or directly input a prompt to generate a response based on the loaded image and the specified prompt. [[File:Prismatic chat1.png|800px|prismatic chat]]