Benefits of Using MD vs XLSX for Knowledge Base on Dify

Why Use Markdown?

1. Better AI Processing

  • Semantic understanding: AI models process continuous text more effectively than fragmented cell data
  • Context preservation: Paragraph-based content maintains relationships between information
  • Effective retrieval: Vector embeddings capture meaning better from natural language text
  • Natural chunking: Content splits logically by sections, preserving context in each chunk

2. Cost Efficiency

  • Smaller storage: Plain text (5-10KB) vs Excel with formatting overhead (50-100KB+)
  • Lower token usage: Markdown structure is simpler, reducing embedding and processing tokens
  • Faster processing: Text parsing is significantly faster than Excel binary format

3. Operational Benefits

  • Version control friendly: Git tracks line-by-line changes effectively
  • Universal editing: Any text editor works, no proprietary software needed
  • Better collaboration: Merge conflicts are easier to resolve in plain text
  • Automation ready: Easily integrated into CI/CD and documentation workflows

4. When to Use Excel?

XLSX may be suitable when:

  • You need structured tabular data with calculations/formulas
  • Data is primarily numerical with specific formatting requirements
  • Direct import/export with database systems or business intelligence tools
  • Non-technical users need to edit data in familiar spreadsheet interface

However, for knowledge bases consumed by AI, converting to Markdown yields better results even for tabular data.

Demo: Converting XLSX to MD

You can create a custom plugin tool on Dify to convert Excel files to Markdown. Here’s how I built mine:

Implementation Steps

  1. Accept XLSX file input
    • Require Xlsx File parameter and wrap its blob in a BytesIO stream
  2. Configure column selection
    • Extract Selected Columns parameter (accepts list/JSON string/comma-separated string)
    • Ensure it is non-empty
  3. Set delimiter
    • Resolve Delimiter parameter for separating entries
  4. Parse Excel file
    • Read the first worksheet into a DataFrame using pandas
    • Verify all requested columns exist in the DataFrame header
    • Subset DataFrame to selected columns only
    • Normalize NaN values to None
  5. Transform to structured data
    • Convert each row into a dictionary keyed by selected column names
    • If no rows remain, emit message indicating no data and stop
  6. Generate Markdown
    • Build content by writing column: value lines per row
    • Append delimiter between entries
    • Join all blocks into final Markdown
  7. Output file
    • Derive filename from uploaded file metadata
    • Emit blob message with Markdown bytes and metadata

Sample

Input

 

Output