Ever since I started writing what turned out to be a series of articles on how to build real-time chat apps natively on AWS, I had a huge backlog of ideas for extending the basic chat, which included adding authentication & authorization, full-text search, threads, reactions, and many other things that make any chat system production grade and up to par with what users expect these days. Given how AI has entered every pore of the industry, extending my small chat project had to go in this direction sooner or later. It all started with creating a solution that (ab)uses IoT Core, which worked exceptionally well - maybe even too well, as AWS ended up designing AppSync Events in a similar fashion. Read more on what I mean in my blog post about Serverless Chat on AWS with AppSync Events. Shortly after the initial release, AWS pushed out the promised improvements for AppSync Events, which I explore in the following blog post about leveraging WebSockets with it.
To build on top of existing chat solution, this time we won’t touch the main infrastructure that powers the chat but rather extend the group chat capabilities by adding a dedicated AI summary endpoint. This is one of the most common AI use cases, and apps like Viber or Slack already have it. Most of us can relate to the pain of having to go through numerous Slack threads with dozens of messages each - so I’d say this feature is more than only nice to have at this point. Let’s see how we can do it using Amazon Bedrock’s ConverseStreamCommand, and leveraging HTTP response streaming capability of API Gateway.
Architecture
As I mentioned, we won’t be touching the core infrastructure of the chat solution which is AppSync and the whole WebSockets part, but rather extend the system with a dedicated endpoint to create the AI summary of chat messages. So, essentially what we’re talking about here is shown in the following diagram:

New, and the most interesting addition here is Amazon Bedrock. Also, since API Gateway supports HTTP response streaming, we’ll make sure to leverage that to get the fastest possible response when invoking AI summary endpoint.
If we zoom back out, the whole solution looks something like:

So just to be clear - this is what will get deployed if you decide to run this in your own account. Let’s break down how the two new key pieces fit into the mix.
Amazon Bedrock
I won’t waste words on this since it is quite familiar, but in short - Bedrock is a fully managed, serverless AWS service
for building generative AI applications. It provides a single API access for wide variety of foundational models from leading
AI companies such as Anthropic, Meta, and Amazon itself. For this particular example I will rely on one of the smallest
Amazon models - amazon.nova-micro-v1:0. It is more than enough for showcasing the idea, but feel free to play with others
according to your liking.
Small caveat: when specifying the model, it is not enough to put only the name of the model, but you need to include also the region prefix of where your app will run. For example in my case it was
eu.amazon.nova-micro-v1:0. Worth knowing just so you don’t get confused when you see it in the code.
A dedicated Lambda function for AI summary will first fetch group chat messages from the database, and then create a prompt
for Bedrock to create a short summary. It uses ConverseStreamCommand
by BedrockRuntimeClient to send the prompt to Bedrock and get the stream response. To make the response smoothly stream
into Lambda function’s response, there is a small caveat compared to how you would normally define the handler in TypeScript.
One is defining the handler as:
awslambda.streamifyResponse(async (event: APIGatewayProxyEvent, responseStream: awslambda.HttpResponseStream)
and handling the streaming response:
const httpStream = awslambda.HttpResponseStream.from(responseStream, {
statusCode: 200,
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Access-Control-Allow-Origin': '*',
},
});
…and then the httpStream will be used to write the stream response from Bedrock into it.
Note: for simplicity I have set a limit of 50 messages to fetch for the summary, however it is configurable.
HTTP Response Streaming with API Gateway
Finally, the last piece to get the streaming response to the client is required to be configured on API Gateway.
This capability was introduced in November 2025
and has CDK support from version 2.227.0 via responseTransferMode property.
Setting it to ResponseTransferMode.STREAM on the Lambda integration resource will enable streaming:
getSummaryResource.addMethod('GET', new LambdaIntegration(getSummaryLambda, {responseTransferMode: ResponseTransferMode.STREAM}));
Demo
To showcase how the finished feature works, I’ve created a short video. I prefilled the group chat with test messages to simulate a conversation between team of engineers working on a production deployment. The AI summary will summarize their agreements and key points they discussed about the deployment.
Conclusions
In the end, building an AI summary today isn’t something groundbreaking, and to be honest that might be the point. What used to feel like cutting-edge quickly became the standard what users expect in these types of applications. Amazon Bedrock doing the heavy lifting on the model side makes the overall path from idea to working solution really smooth. I encourage you to try it yourself, check out the code and follow the instructions to deploy to your own AWS account:
https://github.com/imflamboyant/serverless-aws-chat/tree/main/chat-appsync-events-websocket
💸 Note: as always, be aware that usage might incur real costs, though relatively small for this example. If you have free tier or credits, just make sure that Bedrock is covered.
Thanks for reading, and stay tuned, because next I’ll take this further into agentic chat and explore what’s possible with Strands.