Automatically Generate Google Slides with an AI Prompt

I came across a very interesting idea from the author まじん (Majin) on note.com:

Majin used Gemini to turn a single prompt into a complete Google Slides presentation, but I tried customizing it to run with ChatGPT (or Google AI Studio), and the results were quite exciting.


1. 🔍 Structure of Majin’s Prompt

Through analysis, Majin’s prompt has the following main components:

  • Role assignment for AI: The AI is not just a chatbot, but acts as a Data Scientist and Presentation Designer.

  • Absolute mission: From a piece of input text, the AI must output a JavaScript object array (slideData) that defines the entire slide structure.

  • Fixed blueprint: The prompt includes a Google Apps Script (GAS) framework, where you only need to replace slideData to generate Google Slides.

  • SlideData includes:

    • title: Slide title

    • content: Content (in the form of bullets, text, or tables)

    • pattern: Display type (Title, List, TwoColumn, Image, …)

    • speakerNote: Presenter’s notes

👉 The important point: The prompt does not directly create slides, but outputs JSON-like data that the GAS script uses to build the slides.


2. 🛠️ Customizing & Running with ChatGPT

First, download system prompt.md

🔧 Steps inside the Project

  1. Create a new Project → give it a name (e.g., Google Slide Auto Generator).

  2. Add a file → upload system prompt.md.

  3. In the Project Settings / Overview screen, you will find the Instructions (System message) box → paste the line:
    添付した「system prompt.md」に忠実に従ってください。

  4. (Meaning: Please strictly follow the attached system prompt.md).
    If you don’t see a dedicated “Instructions” field, ChatGPT may show “Description” or “Setup” instead → paste it there. It works as the System Prompt for the whole Project.

  5. From now on, in that Project, whenever you input raw text, ChatGPT will immediately reference system prompt.md and output both slideData + GAS code.

👉 In other words: the Project’s Instruction acts as a dedicated System Message for that workspace. The UI may look different, but it’s always under “Overview/Setup” when you open the Project.

Next, type a prompt about the content you want to make slides from → and it will generate slideData.

I tested it with content about LLMs and got the following result:

const slideData = [

{

type: ‘title’,

title: ‘The Busy Person’s Intro to Large Language Models’,

date: ‘2025.09.21’,

notes: ‘Re-recorded overview of LLMs focusing on what they are, how to run them, how they are trained, and why next-word prediction works’

},

{

type: ‘content’,

title: ‘Agenda’,

points: [

‘What is an LLM’,

‘Running a model locally’,

‘Training at scale’,

‘Why next-word prediction is powerful’,

‘Key takeaways’

],

notes: ‘Keep it practical and concrete. We start with the simplest mental model, then show inference vs training, then the intuition for why LLMs encode knowledge’

},

{

type: ‘section’,

title: ‘1. What is an LLM’,

notes: ‘Define an LLM at the simplest level and frame the talk around artifacts you can hold on disk’

},

{

type: ‘content’,

title: ‘LLM = Two Files’,

subhead: ‘Weights + Runner’,

points: [

‘Parameters file holds learned weights’,

‘Runner implements forward pass of the network’,

‘Self contained package runs offline’

],

notes: ‘Stress that inference can be simple and local. The interesting part lives in the parameters learned during training’

},

{

type: ‘content’,

title: ‘Example: Llama 2 • 70B’,

subhead: ‘Open weights from Meta’,

points: [

‘Sizes include 7B 13B 34B 70B’,

‘Open weights and paper enable local study’,

‘Closed APIs trade control for convenience’

],

notes: ‘Contrast open weights with closed APIs like hosted assistants without exposing internals’

},

{

type: ‘section’,

title: ‘2. Running a Model’,

notes: ‘Give a feel for local inference and why the demo used a smaller model for speed’

},

{

type: ‘process’,

title: ‘Local Inference Flow’,

subhead: ‘From files to generations’,

steps: [

‘Get weights file on disk’,

‘Compile or install a runner’,

‘Point runner to weights’,

‘Send a prompt’,

‘Stream generated tokens’

],

notes: ‘Mention that a 7B model is fast enough on a laptop while 70B is roughly an order of magnitude slower’

},

{

type: ‘compare’,

title: ‘Open Weights vs Closed API’,

leftTitle: ‘Open Weights’,

rightTitle: ‘Closed API’,

leftItems: [

‘Full local control’,

‘Offline capable’,

‘Tunable and inspectable’

],

rightItems: [

‘Best quality access’,

‘No infra to manage’,

‘Faster to integrate’

],

notes: ‘There is no universal winner. Choose based on control privacy latency and capability needs’

},

{

type: ‘section’,

title: ‘3. Training at Scale’,

notes: ‘Explain where weights come from and why training is expensive compared to inference’

},

{

type: ‘content’,

title: ‘Training Ingredients’,

subhead: ‘Data Compute Cost’,

points: [

‘Large curated text corpus’,

‘Massive GPU clusters for days’,

‘Significant budget for runs’

],

notes: ‘Use order of magnitude thinking rather than exact numbers. Emphasize engineering and data work beyond raw compute’

},

{

type: ‘timeline’,

title: ‘From Corpus to Weights’,

subhead: ‘Very high level lifecycle’,

milestones: [

{ label: ‘Collect and filter data’, date: ‘Step 1’, state: ‘done’ },

{ label: ‘Train on GPU cluster’, date: ‘Step 2’, state: ‘done’ },

{ label: ‘Evaluate and align’, date: ‘Step 3’, state: ‘done’ },

{ label: ‘Release weights and runner’, date: ‘Step 4’, state: ‘next’ }

],

notes: ‘Keep the audience at the bird’s eye view. The output is a weights artifact you can reuse everywhere for inference’

},

{

type: ‘section’,

title: ‘4. Why It Works’,

notes: ‘Bridge from mechanics to intuition about capability’

},

{

type: ‘content’,

title: ‘Next Word Prediction Objective’,

subhead: ‘Simple objective rich consequences’,

points: [

‘Model predicts a distribution over next token’,

‘Good prediction forces world knowledge’,

‘Patterns styles and relations get compressed’

],

notes: ‘Relate prediction to lossy compression intuition. To predict the next token the model must encode structure facts and relationships’

},

{

type: ‘content’,

title: ‘Common Misconceptions’,

points: [

‘It is not copy paste of the web’,

‘Bigger is not always better’,

‘Closed is not always better than open’

],

notes: ‘Clarify generalization vs memorization and the role of data quality and algorithms along with scale’

},

{

type: ‘section’,

title: ‘5. Takeaways’,

notes: ‘Land the plane with pragmatic guidance’

},

{

type: ‘content’,

title: ‘Practical Guidance’,

points: [

‘Choose model size for latency and cost’,

‘Use open weights for control’,

‘Use hosted APIs for frontier quality’

],

notes: ‘Map choices to product constraints such as privacy cost latency and maintainability’

},

{

type: ‘closing’,

notes: ‘Thanks for watching. Invite questions and link resources in the description’

}

];

Next steps

  1. Open the .md file on your computer.

  2. Copy the code from the .md file.

  3. Open Google Slides → create a new slide deck.

  4. Click Extensions → Apps Script.

  5. Paste the copied code into the Apps Script editor.

  6. Paste your slideData.

  7. Press Run


3. ✅ Experience & Results

 

  • Works well on ChatGPT: No need for Gemini, GPT-5 is enough.

  • Advantage: The prompt standardizes the output into a JSON-like object, making it easy to control.

  • Reference implementation:


📌 Conclusion

  • Majin’s prompt is a great framework to turn AI into an automatic slide design tool.

  • It doesn’t have to be Gemini — ChatGPT (GPT-5) also works well.

  • You just need to customize the input → and you can generate Google Slides for any topic (training, pitching, learning…).

👉 This article was written with reference to blogs by まじん (Majin):

Trying the Realtime Prompting Guide for GPT-Realtime: Experiments with Vietnamese Voice Input

Try Realtime Prompting Guide for GPT-Realtime

1.Introduction

OpenAI’s Realtime API enables the creation of interactive voice experiences with ultra-low latency. Instead of waiting for a full text input, the model can “listen” to a user while they are still speaking and respond almost instantly. This makes it a powerful foundation for building voice assistants, audio chatbots, automated customer support, or multimodal creative applications.

To get the best results, writing a clear and well-structured prompt is essential. OpenAI published the Realtime Prompting Guide as a playbook for controlling model behavior in spoken conversations.

References:


2.What is GPT-Realtime

GPT-Realtime is a model/API designed to handle continuous audio input and provide rapid responses. Its key features include:

  • Real-time speech-to-text recognition.

  • Robust handling of noisy, cut-off, or unclear audio.

  • Customizable reactions to imperfect audio, such as asking for repetition, clarifying, or continuing in the user’s language.

  • Support for detailed prompting to ensure safe, natural, and reliable responses.


3.Overview of the Prompting Guide

The Realtime Prompting Guide outlines seven best practices for writing system prompts for voice agents:

1. Be precise, avoid conflicts.
Instructions must be specific and consistent. For example, if you say “ask again when unclear,” don’t also instruct the model to “guess when unsure.”

2. Use bullet points instead of paragraphs.
Models handle lists better than long prose.

3. Handle unclear audio.
Explicitly instruct what to do when input is noisy or incomplete: politely ask the user to repeat and only respond when confident.

4. Pin the language when needed.
If you want the entire conversation in one language (e.g., English only), state it clearly. Otherwise, the model may switch to mirror the user.

5. Provide sample phrases.
Include example greetings, clarifications, or closing lines to teach the model your desired style.

6. Avoid robotic repetition.
Encourage varied phrasing for greetings, confirmations, and closings to keep interactions natural.

7. Use capitalization for emphasis.
For example: “IF AUDIO IS UNCLEAR, ASK THE USER TO REPEAT.”

4.Prompt Examples

Sample Prompt A – Avoid conflicts, be clear

SYSTEM: – Always speak clearly and respond in the same language the user is speaking. – If the user’s audio is unclear, noisy, partial, or unintelligible, politely ask them to repeat. – Never assume what was said if you did not hear it clearly. – Use short, varied phrases; avoid repeating the same sentence. – Provide helpful, concise responses.

USER: “Hello, can you help me with my internet issue?”

ASSISTANT: (responds according to the prompt, asks for clarification if needed)


Sample Prompt B – Handling unclear audio

SYSTEM: – If the audio is not clear, ask: “I’m sorry, I didn’t catch that. Could you please repeat?” – Only respond when you are confident you understood the user’s request. – Continue conversation in the same language as the user. – Use bullet points for clarity.

USER: “Um… internet…” (noisy, unclear audio)

ASSISTANT: (follows the system instructions)


Sample Prompt C – Keep a natural style, avoid repetition

SYSTEM: – Do not respond with the same phrase twice. – Vary greetings, confirmations, closings. – Aim for a warm, helpful tone. – If the user ends the conversation, say something like: “If there’s anything else you need, feel free to ask.”

USER: “Thank you, that’s all.”

ASSISTANT: “You’re welcome! Glad I could help. Take care!” (or another variation)

5.Experiments

For my testing, I deliberately used Vietnamese speech to see how the model would react in different situations.

First Test: Speaking Unclear Without a Prompt

To begin, I tested what would happen if I spoke unclearly in Vietnamese without providing any system prompt.

For example, I said:

USER: “Ư… mệnh của mình không tỏa lại” (unclear Vietnamese speech)

The model responded in Indonesian, saying:

ASSISTANT: “Tentu, aku di sini untuk membantu. Coba ceritakan dulu apa yang lagi kamu rasakan atau alami. Kita bisa cari solusinya bareng-bareng.”

This shows that when no system prompt is defined, the model may guess or switch languages unpredictably, instead of asking for clarification.

Second Test: Adding a System Prompt

Next, I added a system prompt to guide the model’s behavior when the audio is unclear:

SYSTEM:
- If the audio is not clear, ask: “I’m sorry, I didn’t catch that. Could you please repeat?”
- Only respond when you are confident you understood the user’s request.
- Continue conversation in the same language as user.
- Use bullet points for clarity.

Then I spoke unclearly in Vietnamese again, for example:

USER: “Um… internet…” (spoken quietly, unclear audio)

This time, the model followed the system instructions and politely asked me to repeat. Sometimes, it even suggested that I try saying a simple test sentence so it could better check whether my voice was coming through clearly.

This shows how a well-written system prompt can prevent the model from making random guesses or switching languages, ensuring a more reliable and natural conversation flow.

Third Test: Singing to the Model

Finally, I experimented by singing in Vietnamese to see how the model would react.

The model listened carefully and was able to understand the lyrics and emotional tone of my singing. However, when I asked it to repeat the lyrics back to me, it refused due to copyright restrictions.

This shows that while GPT-Realtime can analyze and comment on songs — such as summarizing their meaning, describing the mood, or suggesting new lines in a similar style — it cannot reproduce lyrics verbatim. In practice, this means you should not expect the model to sing or echo back copyrighted content.

6.Conclusion

GPT-Realtime provides smooth and natural voice interactions with minimal latency. However, its effectiveness depends heavily on the prompt.

Key takeaways:

  • Always write a clear, bullet-pointed system prompt.
  • Define explicit behavior for unclear audio.
  • Control language use and discourage robotic repetition.
  • Respect copyright limitations: the model will not repeat lyrics verbatim but can summarize or create new content.

The Realtime Prompting Guide is a practical resource for building high-quality voice agents that are both natural and safe.