Codex CLI vs Gemini CLI vs Claude Code

1. Codex CLI – Capabilities and New Features

According to OpenAI’s official announcement (“Introducing upgrades to Codex”), Codex CLI has been rebuilt on top of GPT-5-Codex, turning it into an agentic programming assistant — a developer AI that can autonomously plan, reason, and execute tasks across coding environments.

🌟 Core Abilities

  • Handles both small and large tasks: From writing a single function to refactoring entire projects.
  • Cross-platform integration: Works seamlessly across terminal (CLI), IDE (extension), and cloud environments.
  • Task reasoning and autonomy: Can track progress, decompose goals, and manage multi-step operations independently.
  • Secure by design: Runs in a sandbox with explicit permission requests for risky operations.

📈 Performance Highlights

  • Uses 93.7% fewer reasoning tokens for simple tasks, but invests 2× more computation on complex ones.
  • Successfully ran over 7 hours autonomously on long software tasks during testing.
  • Produces more precise code reviews than older Codex versions.

🟢 In short: Codex CLI 2025 is not just a code generator — it’s an intelligent coding agent capable of reasoning, multitasking, and working securely across terminal, IDE, and cloud environments.

2.Codex CLI vs Gemini CLI vs Claude Code: The New Era of AI in the Terminal

The command line has quietly become the next frontier for artificial intelligence.
While graphical AI tools dominate headlines, the real evolution is unfolding inside the terminal — where AI coding assistants now operate directly beside you, as part of your shell workflow.

Three major players define this new space: Codex CLI, Gemini CLI, and Claude Code.
Each represents a different philosophy of how AI should collaborate with developers — from speed and connectivity to reasoning depth. Let’s break down what makes each contender unique, and where they shine.


🧩 Codex CLI — OpenAI’s Code-Focused Terminal Companion

Codex CLI acts as a conversational layer over your terminal.
It listens to natural language commands, interprets your intent, and translates it into executable code or shell operations.
Now powered by OpenAI’s Codex5-Medium, it builds on the strengths of the o4-mini generation while adding adaptive reasoning and a larger 256K-token context window.

Once installed, Codex CLI integrates seamlessly with your local filesystem.
You can type:

“Create a Python script that fetches GitHub issues and logs them daily,”
and watch it instantly scaffold the files, import the right modules, and generate functional code.

Codex CLI supports multiple languages — Python, JavaScript, Go, Rust, and more — and is particularly strong at rapid prototyping and bug fixing.
Its defining trait is speed: responses feel immediate, making it perfect for fast iteration cycles.

Best for: developers who want quick, high-quality code generation and real-time debugging without leaving the terminal.


🌤️ Gemini CLI — Google’s Adaptive Terminal Intelligence

Gemini CLI embodies Google’s broader vision for connected AI development — blending reasoning, utility, and live data access.
Built on Gemini 2.5 Pro, this CLI isn’t just a coding bot — it’s a true multitool for developers and power users alike.

Beyond writing code, Gemini CLI can run shell commands, retrieve live web data, or interface with Google Cloud services.
It’s ideal for workflows that merge coding with external context — for example:

  • fetching live API responses,

  • monitoring real-time metrics,

  • or updating deployment configurations on-the-fly.

Tight integration with VS Code, Google Cloud SDK, and Workspace tools turns Gemini CLI into a full-spectrum AI companion rather than a mere code generator.

Best for: developers seeking a versatile assistant that combines coding intelligence with live, connected utility inside the terminal.


🧠 Claude Code — Anthropic’s Deep Code Reasoner

If Codex is about speed, and Gemini is about connectivity, Claude Code represents depth.
Built on Claude Sonnet 4.5, Anthropic’s upgraded reasoning model, Claude Code is designed to operate as a true engineering collaborator.

It excels at understanding, refactoring, and maintaining large-scale codebases.
Claude Code can read entire repositories, preserve logic across files, and even generate complete pull requests with human-like commit messages.
Its upgraded 250K-token context window allows it to track dependencies, explain architectural patterns, and ensure code consistency over time.

Claude’s replies are more analytical — often including explanations, design alternatives, and justifications for each change.
It trades a bit of speed for a lot more insight and reliability.

Best for: professional engineers or teams managing complex, multi-file projects that demand reasoning, consistency, and full-codebase awareness.

3.Codex CLI vs Gemini CLI vs Claude Code: Hands-on With Two Real Projects

While benchmarks and specs are useful, nothing beats actually putting AI coding agents to work.
To see how they perform on real, practical front-end tasks, I tested three leading terminal assistants — Codex CLI (Codex5-Medium), Gemini CLI (Gemini 2.5 Pro), and Claude Code (Sonnet 4.5) — by asking each to build two classic web projects using only HTML, CSS, and JavaScript.

  • 🎮 Project 1: Snake Game — canvas-based, pixel-style, smooth movement, responsive.

  • Project 2: Todo App — CRUD features, inline editing, filters, localStorage, dark theme, accessibility + keyboard support.

🎮 Task 1 — Snake Game

Goal

Create a playable 2D Snake Game using HTML, CSS, and JavaScript.
Display a grid-based canvas with a moving snake that grows when it eats food.
The snake should move continuously and respond to arrow-key inputs.
The game ends when the snake hits the wall or itself.
Include a score counter and a restart button with pixel-style graphics and responsive design.

Prompt

Create a playable 2D Snake Game using HTML, CSS, and JavaScript.

  The game should display a grid-based canvas with a moving snake that grows when it eats

  food.

  The snake should move continuously and respond to keyboard arrow keys for direction

  changes.

  The game ends when the snake hits the wall or itself.

  Show a score counter and a restart button.

  Use smooth movement, pixel-style graphics, and responsive design for different screen sizes

Observations

Codex CLI — Generated the basic canvas scaffold in seconds. Game loop, input, and scoring worked out of the box, but it required minor tuning for smoother turning and anti-reverse logic.

Gemini CLI — Delivered well-structured, commented code and used requestAnimationFrame properly. Gameplay worked fine, though the UI looked plain — more functional than fun.

Claude Code — Produced modular, production-ready code with solid collision handling, restart logic, and a polished HUD. Slightly slower response but the most complete result overall.

✅ Task 2 — Todo App

Goal

Build a complete, user-friendly Todo List App using only HTML, CSS, and JavaScript (no frameworks).
Features: add/edit/delete tasks, mark complete/incomplete, filter All / Active / Completed, clear completed, persist via localStorage, live counter, dark responsive UI, and full keyboard accessibility (Enter/Space/Delete).
Deliverables: index.html, style.css, app.js — clean, modular, commented, semantic HTML + ARIA.

Prompt

Develop a complete and user-friendly Todo List App using only HTML, CSS, and JavaScript (no frameworks). The app should include the following functionality and design requirements:

    1. Input field and ‘Add’ button to create new tasks.
    2. Ability to mark tasks as complete/incomplete via checkboxes.
    3. Inline editing of tasks by double-clicking — pressing Enter saves changes and Esc cancels.
    4. Delete buttons to remove tasks individually.
    5. Filter controls for All, Active, and Completed tasks.
    6. A ‘Clear Completed’ button to remove all completed tasks at once.
    7. Automatic saving and loading of todos using localStorage.
    8. A live counter showing the number of active (incomplete) tasks.
    9. A modern, responsive dark theme UI using CSS variables, rounded corners, and hover effects.
    10. Keyboard accessibility — Enter to add, Space to toggle, Delete to remove tasks.
      Ensure the project is well structured with three separate files:
    • index.html
    • style.css
    • app.js
      Code should be clean, modular, and commented, with semantic HTML and appropriate ARIA attributes for accessibility.

Observations

Codex CLI — Created a functional 3-file structure with working CRUD, filters, and persistence. Fast, but accessibility and keyboard flows needed manual reminders.

Gemini CLI — Balanced logic and UI nicely. Used CSS variables for a simple dark theme and implemented localStorage properly.
Performance was impressive — Gemini was the fastest overall, but its default design felt utilitarian, almost as if it “just wanted to get the job done.”
Gemini focuses on correctness and functionality rather than visual finesse.

Claude Code — Implemented inline editing, keyboard shortcuts, ARIA live counters, and semantic roles perfectly. The result was polished, responsive, and highly maintainable.

4.Codex CLI vs Gemini CLI vs Claude Code — Real-World Comparison

When testing AI coding assistants, speed isn’t everything — clarity, structure, and the quality of generated code all matter. To see how today’s top command-line tools compare, I ran the same set of projects across Claude Code, Gemini CLI, and Codex CLI, including a 2D Snake Game and a Todo List App.
Here’s how they performed.


Claude Code: Polished and Reliable

Claude Code consistently produced the most professional and complete results.
Its generated code came with clear structure, organized logic, and well-commented sections.
In the Snake Game test, Claude built the best-looking user interface, with a balanced layout, responsive design, and smooth movement logic.
Error handling was handled cleanly, and the overall experience felt refined — something you could hand over to a production team with confidence.
Although it wasn’t the fastest, Claude made up for it with code quality, structure, and ease of prompt engineering.
If your workflow values polish, maintainability, and readability, Claude Code is the most dependable choice.


Gemini CLI: Fastest but Basic

Gemini CLI clearly took the top spot for speed.
It executed quickly, generated files almost instantly, and made iteration cycles shorter.
However, the output itself felt minimal and unrefined — both the UI and the underlying logic were quite basic compared to Claude or Codex.
In the Snake Game task, Gemini produced a playable result but lacked visual polish and consistent structure.
Documentation and comments were also limited.
In short, Gemini is great for rapid prototyping or testing ideas quickly, but not for projects where you need beautiful UI, advanced logic, or long-term maintainability.


Codex CLI: Flexible but Slower

Codex CLI offered good flexibility and handled diverse prompts reasonably well.
It could generate functional UIs with decent styling, somewhere between Gemini’s simplicity and Claude’s refinement.
However, its main drawback was speed — responses were slower, and sometimes additional manual intervention was needed to correct or complete the code.
Codex is still a solid option when you need to tweak results manually or explore multiple implementation approaches, but it doesn’t match Claude’s polish or Gemini’s speed.


Overall Impression

After testing multiple projects, the overall ranking became clear:

  • Gemini CLI is the fastest but produces simple and unpolished code.

  • Claude Code delivers the most reliable, structured, and visually refined results.

  • Codex CLI sits in between — flexible but slower and less cohesive.

Each tool has its strengths. Gemini is ideal for quick builds, Codex for experimentation, and Claude Code for professional, trust-ready outputs.

In short:

Gemini wins on speed. Claude wins on quality. Codex stands in between — flexible but slower.

Ask Questions about Your PDFs with Cohere Embeddings + Gemini LLM

🔍 Experimenting with Image Embedding Using Large AI Models

Recently, I experimented with embedding images using major AI models to build a multimodal semantic search system, where users can search images with text (and vice versa).

🧐 A Surprising Discovery

I was surprised to find that as of 2025, Cohere is the only provider that supports direct image embedding via API.
Other major models like OpenAI and Gemini (by Google) do support image input in general, but do not clearly provide a direct embedding API for images.


Reason for Choosing Cohere

I chose to try Cohere’s embed-v4.0 because:

  • It supports embedding text, images, and even PDF documents (converted to images) into the same vector space.

  • You can choose the embedding size (I used the default, 1536).

  • It returns normalized embeddings that are ready to use for search and classification tasks.


⚙️ How I Built the System

I used Python for implementation. The system has two main flows:

1️⃣ Document Preparation Flow

  • Load documents, images, or text data that I want to store.

  • Use the Cohere API to embed them into vector representations.

  • Save these vectors in a database or vector store for future search queries.

2️⃣ User Query Flow

  • When a user asks a question or types a query:

    • Use Cohere to embed the query into a vector.

    • Search for the most similar documents in the vector space.

    • Return results to the user using a LLM (Large Language Model) like Gemini by Google.


🔑 How to Get API Keys

🔧 Flow 1: Setting Up Cohere and Gemini in Python

✅ Step 1: Install and Set Up Cohere

Run the following command in your terminal to install the Cohere Python SDK:

pip install -q cohere

Then, initialize the Cohere client in your Python script:

import cohere

# Replace <<YOUR_COHERE_KEY>> with your actual Cohere API key
cohere_api_key = “<<YOUR_COHERE_KEY>>”
co = cohere.ClientV2(api_key=cohere_api_key)


✅ Step 2: Install and Set Up Gemini (Google Generative AI)

Install the Gemini client library with:

pip install -q google-genai

Then, initialize the Gemini client in your Python script:

from google import genai

# Replace <<YOUR_GEMINI_KEY>> with your actual Gemini API key
gemini_api_key = “<<YOUR_GEMINI_KEY>>”
client = genai.Client(api_key=gemini_api_key)

📌 Flow 1: Document Preparation and Embedding

Chúng ta sẽ thực hiện các bước để chuyển PDF thành dữ liệu embedding bằng Cohere.


📥 Step 1: Download the PDF

We start by downloading the PDF from a given URL.

python

def download_pdf_from_url(url, save_path=”downloaded.pdf”):
response = requests.get(url)
if response.status_code == 200:
with open(save_path, “wb”) as f:
f.write(response.content)
print(“PDF downloaded successfully.”)
return save_path
else:
raise Exception(f”PDF download failed. Error code: {response.status_code}”)

# Example usage
pdf_url = “https://sgp.fas.org/crs/misc/IF10244.pdf”
local_pdf_path = download_pdf_from_url(pdf_url)


🖼️ Step 2: Convert PDF Pages to Text + Image

We extract both text and image for each page using PyMuPDF.

python

import fitz # PyMuPDF
import base64
from PIL import Image
import io

def extract_page_data(pdf_path):
doc = fitz.open(pdf_path)
pages_data = []
img_paths = []

for i, page in enumerate(doc):
text = page.get_text()

pix = page.get_pixmap()
image = Image.open(io.BytesIO(pix.tobytes(“png”)))

buffered = io.BytesIO()
image.save(buffered, format=”PNG”)
encoded_img = base64.b64encode(buffered.getvalue()).decode(“utf-8″)
data_url = f”data:image/png;base64,{encoded_img}”

content = [
{“type”: “text”, “text”: text},
{“type”: “image_url”, “image_url”: {“url”: data_url}},
]

pages_data.append({“content”: content})
img_paths.append({“data_url”: data_url})

return pages_data, img_paths

# Example usage
pages, img_paths = extract_page_data(local_pdf_path)


📤 Step 3: Embed Using Cohere

Now, send the fused text + image inputs to Cohere’s embed-v4.0 model.

python

res = co.embed(
model=”embed-v4.0″,
inputs=pages, # fused inputs
input_type=”search_document”,
embedding_types=[“float”],
output_dimension=1024,
)

embeddings = res.embeddings.float_
print(f”Number of embedded pages: {len(embeddings)}”)


Flow 1 complete: You now have the embedded vector representations of your PDF pages.

👉 Proceed to Flow 2 (e.g., storing, indexing, or querying the embeddings).

🔍 Flow 2: Ask a Question and Retrieve the Answer Using Image + LLM

This flow allows the user to ask a natural language question, find the most relevant image using Cohere Embed v4, and then answer the question using Gemini 2.5 Vision LLM.


💬 Step 1: Ask the Question

We define the user query in plain English.

python
question = “What was the total number of wildfires in the United States from 2007 to 2015?”

🧠 Step 2: Convert the Question to Embedding & Find Relevant Image

We use embed-v4.0 with input type search_query, then calculate cosine similarity between the question embedding and previously embedded document images.

python

def search(question, max_img_size=800):
# Get embedding for the query
api_response = co.embed(
model=”embed-v4.0″,
input_type=”search_query”,
embedding_types=[“float”],
texts=[question],
output_dimension=1024,
)

query_emb = np.asarray(api_response.embeddings.float[0])

# Compute cosine similarity with all document embeddings
cos_sim_scores = np.dot(embeddings, query_emb)
top_idx = np.argmax(cos_sim_scores) # Most relevant image

hit_img_path = img_paths[top_idx]
base64url = hit_img_path[“data_url”]

print(“Question:”, question)
print(“Most relevant image:”, hit_img_path)

# Display the matched image
if base64url.startswith(“data:image”):
base64_str = base64url.split(“,”)[1]
else:
base64_str = base64url

image_data = base64.b64decode(base64_str)
image = Image.open(io.BytesIO(image_data))

image.thumbnail((max_img_size, max_img_size))
display(image)

return base64url


🤖 Step 3: Use Vision-LLM (Gemini 2.5) to Answer

We use Gemini 2.5 Flash to answer the question based on the most relevant image.

python

def answer(question, base64_img_str):
if base64_img_str.startswith(“data:image”):
base64_img_str = base64_img_str.split(“,”)[1]

image_bytes = base64.b64decode(base64_img_str)
image = Image.open(io.BytesIO(image_bytes))

prompt = [
f”””Answer the question based on the following image.
Don’t use markdown.
Please provide enough context for your answer.

Question: {question}”””,
image
]

response = client.models.generate_content(
model=”gemini-2.5-flash-preview-04-17″,
contents=prompt
)

answer = response.text
print(“LLM Answer:”, answer)


▶️ Step 4: Run the Full Flow

python
top_image_path = search(question)
answer(question, top_image_path)

🧪 Example Usage:

question = “What was the total number of wildfires in the United States from 2007 to 2015?

# Step 1: Find the best-matching image
top_image_path = search(question)

# Step 2: Use the image to answer the question
answer(question, top_image_path)

🧾 Output:

Question: What was the total number of wildfires in the United States from 2007 to 2015?

Most relevant image:

 

LLM Answer: Based on the provided image, to find the total number of wildfires in the United States from 2007 to 2015, we need to sum the number of wildfires for each year in this period. Figure 1 shows the annual number of fires in thousands from 1993 to 2022, which covers the requested period. Figure 2 provides the specific number of fires for 2007 and 2015 among other years. Using the specific values from Figure 2 for 2007 and 2015, and estimating the number of fires for the years from 2008 to 2014 from Figure 1, we can calculate the total.

 

The number of wildfires in 2007 was 67.8 thousand (from Figure 2).

Estimating from Figure 1:

2008 was approximately 75 thousand fires.

2009 was approximately 75 thousand fires.

2010 was approximately 67 thousand fires.

2011 was approximately 74 thousand fires.

2012 was approximately 68 thousand fires.

2013 was approximately 47 thousand fires.

2014 was approximately 64 thousand fires.

The number of wildfires in 2015 was 68.2 thousand (from Figure 2).

 

Summing these values:

Total = 67.8 + 75 + 75 + 67 + 74 + 68 + 47 + 64 + 68.2 = 606 thousand fires.

 

Therefore, the total number of wildfires in the United States from 2007 to 2015 was approximately 606,000. This number is based on the sum of the annual number of fires obtained from Figure 2 for 2007 and 2015, and estimates from Figure 1 for the years 2008 through 2014.

Try this full pipeline on Google Colab: https://colab.research.google.com/drive/1kdIO-Xi0MnB1c8JrtF26Do3T54dij8Sf

🧩 Final Thoughts

This simple yet powerful two-step pipeline demonstrates how you can combine Cohere’s Embed v4 with Gemini’s Vision-Language capabilities to build a system that understands both text and images. By embedding documents (including large images) and using semantic similarity to retrieve relevant content, we can create a more intuitive, multimodal question-answering experience.

This approach is especially useful in scenarios where information is stored in visual formats like financial reports, dashboards, or charts — allowing LLMs to not just “see” the image but reason over it in context.

Multimodal retrieval-augmented generation (RAG) is no longer just theoretical — it’s practical, fast, and deployable today.