Streaming + Async

Streaming Responses
Async Completion
Async + Streaming Completion

Streaming Responses

LiteLLM supports streaming the model response back by passing stream=True as an argument to the completion function

Usage

from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
    print(chunk['choices'][0]['delta'])

Async Completion

Asynchronous Completion with LiteLLM. LiteLLM provides an asynchronous version of the completion function called acompletion

Usage

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, how are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="gpt-3.5-turbo", messages=messages)
    return response

response = asyncio.run(test_get_response())
print(response)

Async Streaming

We've implemented an __anext__() function in the streaming object returned. This enables async iteration over the streaming object.

Usage

Here's an example of using it with openai.

from litellm import acompletion
import asyncio, os, traceback

async def completion_call():
    try:
        print("test acompletion + streaming")
        response = await acompletion(
            model="gpt-3.5-turbo", 
            messages=[{"content": "Hello, how are you?", "role": "user"}], 
            stream=True
        )
        print(f"response: {response}")
        async for chunk in response:
            print(chunk)
    except:
        print(f"error occurred: {traceback.format_exc()}")
        pass

asyncio.run(completion_call())

Streaming + Async

Streaming Responses​

Usage​

Async Completion​

Usage​

Async Streaming​

Usage​

Streaming Responses

Usage

Async Completion

Usage

Async Streaming

Usage