Changes

← Older edit

Prismatic VLM REPL

1,739 bytes added, 00:51, 21 June 2024

no edit summary

~~The K-Scale OpenVLA adaptation by~~ [~~[User~~https:~~Paweł]~~//github.com/TRI-ML/prismatic-vlms Prismatic VLM] is the project upon which OpenVLA is based. The generate.py REPL script is available in the OpenVLA repo as well but is essentially using Prismatic models. ~~is at~~ Note that the Prismatic models generate natural language whereas OpenVLA models were trained to generate robot actions. (see this https://github.com/~~kscalelabs~~openvla/openvla /issues/5).

~~== REPL Script Guide ==~~Of note, the K-Scale OpenVLA adaptation by [[User:Paweł]] is at https://github.com/kscalelabs/openvla

== Prismatic REPL Script Guide == Here are some suggestions to run the generate.py REPL Script from the repo (you can find this in the ~~extern~~ '''scripts''' folder) ~~if you would like to get started with OpenVLA~~.

== Prerequisites ==

Make sure the images have an end effector in them.

[[File:Coke can2.png|400px|Can pickup task]]

== Starting REPL mode ==

Then, run generate.py. The script starts by initializing the generation playground with the Prismatic model prism-dinosiglip+7b.

The model prism-dinosiglip+7b is downloaded from the Hugging Face Hub.

The model configuration is found and then the model is loaded with the following components:

Vision Backbone: dinosiglip-vit-so-384px

Language Model (LLM) Backbone: llama2-7b-pure (this is also where the hf token comes into play)

Architecture Specifier: no-align+fused-gelu-mlp

Checkpoint Path: The model checkpoint is loaded from a specific path in the cache.

You should see this in your terminal:

[[File:Openvla1.png|800px|prismatic models]]

''After loading the model, the script enters a REPL mode, allowing the user to interact with the model. The REPL mode provides a default generation setup and waits for user inputs.''

Basically, the generate.py script runs a REPL that allows users to interactively test generating outputs from the Prismatic model prism-dinosiglip+7b. Upon running the script, users can enter commands in the REPL prompt:

~~''work in progress~~type (i) to load a new local image by specifying its path, (p) to update the prompt template for generating outputs, (q) to quit the REPL,~~need~~ or directly input a prompt to ~~add screenshots~~ generate a response based on the loaded image and ~~next steps''~~the specified prompt. [[File:Prismatic chat1.png|800px|prismatic chat]]

Vrtnis

Administrators

467

edits

Humanoid Robots Wiki β

Changes

Prismatic VLM REPL

Humanoid Robots Wiki ^β