AI - Scuti

Use Cases Of Generative AI In Manufacturing: 10 Applications And Future Outlook

Posted on July 9, 2025July 9, 2025 by Contents writer

Hello, I am Kakeya, the representative of Scuti.

Our company specializes in services such as Offshore Development And Lab-type Development in Vietnam, as well as Generative AI Consulting.

Recently, we have been fortunate to receive numerous requests for system development in collaboration with generative AI.

Even professionals with expertise in the use of generative AI in manufacturing may be seeking a deeper understanding. Traditionally, the manufacturing industry has been time-consuming and costly, making it difficult to respond swiftly to changes in the market.

However, with the advent of generative AI, every aspect of manufacturing — from product design to supply chain management — is undergoing a dramatic transformation. The market size for generative AI is projected to reach USD 6,398.8 million by 2032, indicating remarkable growth.

In this article, we introduce 10 specific applications of generative AI in the manufacturing industry and provide an in-depth explanation of the challenges during implementation and future outlook. Discover the groundbreaking changes that generative AI is bringing to manufacturing and gain insights to apply them to your business.

Fundamentals Of Generative AI And Its Role In The Manufacturing Industry

If you want to learn more about Generative AI, be sure to check out this article first.

What Is Generative AI? Differences From Traditional AI

Generative AI is a type of artificial intelligence specialized not only in analyzing existing data but also in creating new content and solutions. It learns patterns from vast amounts of data and has the ability to generate text, images, videos, designs, audio, code, simulations, and more based on user instructions.

While traditional AI performs predictions and analyses based on past data, generative AI is distinguished by its ability to create something new. This creative capability enables the manufacturing industry to generate new value as well.

The Role Of Generative AI In The Manufacturing Industry

In the manufacturing industry, generative AI is utilized across a wide range of areas, including product design, process optimization, predictive maintenance, quality control, and supply chain management. For example, in product design, generative AI can automatically generate numerous design candidates based on the conditions specified by the designer.

This significantly shortens the design process and enables the development of more innovative products. Moreover, generative AI also contributes to improving efficiency across the entire supply chain, helping reduce costs and shorten lead times.

10 Use Cases Of Generative AI In The Manufacturing Industry

1. Product Design: Automatic Generation And Optimization Of Designs

In traditional product design, designers had to manually create drawings and repeatedly go through the process of producing and evaluating prototypes, which required significant time and cost.

However, by utilizing generative AI, designers can simply input product specifications and requirements, and the AI will automatically generate numerous design candidates. This greatly shortens the design process and enables the development of more innovative products.

For example, the aircraft manufacturer Airbus successfully used Autodesk’s generative AI tool to design aircraft components that are both lighter and stronger.

The AI proposed innovative shapes that were previously unthinkable in conventional design, while still meeting constraints such as strength and weight. In this way, generative AI significantly contributes to innovation in product design.

2. Predictive Maintenance: Reducing Downtime Through Failure Prediction

In manufacturing settings, downtime caused by machine failures can lead to significant losses. Traditional maintenance methods have mainly relied on “preventive maintenance,” which involves periodic inspections. However, this approach carries the risk of over-maintenance or unexpected breakdowns. Predictive maintenance using generative AI enables the analysis of sensor data and other inputs to forecast machine failures in advance.

By performing maintenance at the optimal timing, downtime can be minimized.

For instance, Siemens has developed a predictive maintenance system called “Senseye Predictive Maintenance,” which utilizes generative AI. This system analyzes machine operation status and sensor data in real time to detect early signs of failure. As a result, it significantly reduces downtime and contributes to improved productivity.

3. Demand Forecasting: Flexibly Responding To Demand Fluctuations

Accurate demand forecasting is extremely important in the manufacturing industry. Excess inventory leads to increased storage costs, while inventory shortages result in lost business opportunities. Generative AI analyzes past sales data, market trends, economic indicators, and more to predict future demand with high accuracy. This enables optimized inventory management, cost reduction, and maximized profits.

For example, in the retail sector, generative AI is used to forecast demand while taking into account factors such as seasons, weather, and events. This has led to successful optimization of inventory management, allowing businesses to respond flexibly to demand fluctuations and improve customer satisfaction.

4. Customized Manufacturing: Delivering Products Tailored To Customer Needs

As customer needs continue to diversify, demand for customized manufacturing is increasing. However, traditional customized manufacturing has faced challenges such as increased complexity in design and production processes, leading to higher costs and longer lead times. By utilizing generative AI, it is possible to automate product design based on individual customer requests and achieve efficient customized manufacturing.

For example, in the apparel industry, services have emerged that use generative AI to automatically design clothing tailored to a customer’s body shape and preferences, and produce it on demand. This enables rapid delivery of optimal products to each customer, contributing to higher customer satisfaction.

5. Supply Chain Management: Achieving Efficient Procurement And Logistics

Streamlining the supply chain is a critical challenge for the manufacturing industry. Generative AI optimizes various processes across the entire supply chain, including demand forecasting, inventory management, and transportation route optimization. This enables cost reduction and improved efficiency from procurement to logistics.

For example, in the logistics industry, generative AI is being used to optimize delivery routes and improve truck loading efficiency. As a result, transportation costs are reduced and delivery times are shortened, enhancing the overall efficiency of the supply chain.

6. Quality Control: Detecting Defective Products And Improving Quality

Product quality control is extremely important in the manufacturing industry. Traditional quality control often relies on visual inspections by human inspectors, which carries the risk of human error and oversight.

However, by utilizing generative AI, it becomes possible to automatically detect defective products using technologies such as image recognition. This leads to improved efficiency and accuracy in quality control.

For example, in the automotive industry, generative AI is being used to automatically detect scratches on car bodies or inconsistencies in paint application. As a result, human inspection errors are reduced, enabling the delivery of high-quality products.

7. Workforce Management: Optimal Staffing And Task Allocation

In manufacturing sites, proper staffing and task allocation are essential. Generative AI analyzes factors such as employees’ skills, experience, and workload to suggest the optimal assignment of personnel and tasks.

This leads to improved operational efficiency and better utilization of the workforce.

For example, in warehouse management, generative AI is being used to streamline picking operations and optimize staff allocation. As a result, work time is reduced, costs are lowered, and overall productivity is enhanced.

8. Research And Development: Accelerating The Development Of New Materials And Products

The development of new materials and products is essential for manufacturers to maintain their competitiveness. Generative AI analyzes vast amounts of material and experimental data to suggest potential new materials and product candidates.

This enables greater efficiency and speed in research and development.

For example, in the chemical industry, generative AI is being used to search for candidate compounds for new drugs and to predict material properties. As a result, the R&D process is accelerated, and the time to market is significantly reduced.

9. Document Creation And Compliance: Improving Efficiency Through Automation

In the manufacturing industry, it is necessary to create various types of documents, such as product specifications, manuals, and reports. Generative AI can automate the creation of these documents, significantly improving efficiency.

It also features the ability to automatically check compliance with laws, regulations, and industry standards. This helps companies reduce the time and cost associated with documentation while making compliance easier to maintain.

For example, when creating product specifications or manuals, generative AI can automatically collect the necessary information and generate accurate and timely documents.

As a result, work efficiency is improved, and high-quality documentation can be delivered.

10. Energy Consumption Optimization: Reducing Costs And Environmental Impact

In the manufacturing industry, reducing energy consumption is important both for cutting costs and minimizing environmental impact. Generative AI analyzes energy usage within the factory in detail and proposes optimal energy consumption patterns.

This leads to improved energy efficiency and cost reduction.

Challenges In Implementing Generative AI

1. Quality and Quantity of Data

The performance of generative AI heavily depends on the quality and quantity of the data used for training. In manufacturing sites, large volumes of data—such as sensor data and production data—are accumulated, but these are not always suitable for training generative AI. To improve data quality, tasks such as data cleaning and preprocessing are required.

2. Security And Privacy

The training of generative AI may involve the use of highly confidential data. Therefore, ensuring data security and privacy is extremely important. It is necessary to implement appropriate security measures to prevent data leaks and unauthorized access.

3. Ethical Issues

Generative AI has the potential to produce content that contains misinformation or bias. Therefore, it is essential to thoroughly consider the ethical issues surrounding its use. Establishing and adhering to ethical guidelines is required.

The Future Of Manufacturing Brought By Generative AI

Generative AI is expected to contribute to the manufacturing industry by automating and streamlining various processes, leading to increased productivity, cost reduction, improved quality, and shorter lead times. Moreover, generative AI holds the potential to generate innovative ideas and solutions that may not be conceived by humans.

In the future, the use of generative AI will further expand across all areas of manufacturing, driving the evolution toward a smarter, more efficient, and more sustainable industry.

Trường Hợp Ứng Dụng Generative AI Trong Ngành Sản Xuất: 10 Phương Pháp Áp Dụng Và Triển Vọng Trong Tương Lai

Posted on July 9, 2025 by Contents writer

Xin chào, tôi là Kakeya, đại diện của công ty Scuti.

Công ty chúng tôi chuyên cung cấp các dịch vụ như Phát triển phần mềm offshore và phát triển theo hình thức Labo tại Việt Nam, cũng như Cung cấp giải pháp AI tạo sinh. Gần đây, chúng tôi rất vinh dự khi nhận được nhiều yêu cầu phát triển hệ thống kết hợp với AI tạo sinh.

Ngay cả những chuyên gia có kiến thức chuyên sâu về việc ứng dụng Generative AI trong ngành sản xuất cũng có thể đang muốn hiểu rõ hơn về lĩnh vực này. Trước đây, ngành sản xuất vốn tốn nhiều thời gian và chi phí, đồng thời khó thích ứng nhanh chóng với những biến động của thị trường.

Tuy nhiên, với sự xuất hiện của Generative AI, mọi khía cạnh trong ngành sản xuất — từ thiết kế sản phẩm đến quản lý chuỗi cung ứng — đang thay đổi một cách mạnh mẽ. Dự báo đến năm 2032, quy mô thị trường Generative AI sẽ đạt 6.398,8 triệu USD, cho thấy tốc độ tăng trưởng ấn tượng.

Trong bài viết này, chúng tôi sẽ giới thiệu 10 ví dụ ứng dụng cụ thể của Generative AI trong ngành sản xuất, đồng thời phân tích chi tiết về những thách thức khi triển khai và triển vọng trong tương lai. Hãy khám phá những thay đổi mang tính cách mạng mà Generative AI mang lại cho ngành sản xuất và tìm ra những gợi ý hữu ích để áp dụng vào doanh nghiệp của bạn.

Kiến Thức Cơ Bản Về Generative AI Và Vai Trò Của Nó Trong Ngành Sản Xuất

Nếu bạn muốn tìm hiểu thêm về Generative AI, hãy xem trước bài viết này.

Bài viết liên quan: Hướng Dẫn Toàn Diện Về Triển Khai AI Tạo Sinh: Từ Kiến Thức Cơ Bản Đến Ứng Dụng Thực Tiễn Và Triển Vọng Tương Lai

Generative AI Là Gì? Sự Khác Biệt So Với AI Truyền Thống

Generative AI là một loại trí tuệ nhân tạo không chỉ phân tích dữ liệu hiện có mà còn chuyên tạo ra nội dung và giải pháp mới. Nó học các mô hình từ khối lượng dữ liệu khổng lồ và có khả năng tạo ra văn bản, hình ảnh, video, thiết kế, âm thanh, mã nguồn, mô phỏng… dựa trên hướng dẫn của người dùng.

Trong khi AI truyền thống chỉ dựa vào dữ liệu trong quá khứ để dự đoán và phân tích, thì Generative AI nổi bật bởi khả năng sáng tạo ra cái mới. Nhờ vào khả năng sáng tạo này, ngành sản xuất cũng có thể tạo ra những giá trị mới.

Vai Trò Của Generative AI Trong Ngành Sản Xuất

Trong ngành sản xuất, Generative AI được ứng dụng trong nhiều lĩnh vực khác nhau như thiết kế sản phẩm, tối ưu hóa quy trình, bảo trì dự đoán, kiểm soát chất lượng và quản lý chuỗi cung ứng. Ví dụ, trong thiết kế sản phẩm, Generative AI có thể tự động tạo ra nhiều phương án thiết kế dựa trên các điều kiện do nhà thiết kế chỉ định.

Nhờ đó, quá trình thiết kế được rút ngắn đáng kể và việc phát triển các sản phẩm mang tính đổi mới cao trở nên khả thi hơn. Bên cạnh đó, Generative AI còn góp phần nâng cao hiệu quả cho toàn bộ chuỗi cung ứng, từ đó giúp giảm chi phí và rút ngắn thời gian giao hàng.

10 Ví Dụ Ứng Dụng Generative AI Trong Ngành Sản Xuất

1. Thiết Kế Sản Phẩm: Tự Động Tạo Và Tối Ưu Hóa Thiết Kế

Trong quy trình thiết kế sản phẩm truyền thống, các nhà thiết kế phải tự tạo bản vẽ bằng tay và lặp đi lặp lại quy trình chế tạo, đánh giá nguyên mẫu, dẫn đến tiêu tốn nhiều thời gian và chi phí.

Tuy nhiên, với việc ứng dụng Generative AI, nhà thiết kế chỉ cần nhập các thông số kỹ thuật và yêu cầu của sản phẩm, hệ thống AI sẽ tự động tạo ra nhiều phương án thiết kế khác nhau. Nhờ đó, quá trình thiết kế được rút ngắn đáng kể và việc phát triển các sản phẩm mang tính đổi mới cao trở nên khả thi hơn.

Chẳng hạn, hãng chế tạo máy bay Airbus đã sử dụng công cụ Generative AI của Autodesk để thiết kế thành công các bộ phận máy bay nhẹ hơn và có độ bền cao hơn.

AI đã đề xuất những hình dạng sáng tạo chưa từng có trong các thiết kế truyền thống, đồng thời vẫn đảm bảo đáp ứng các yêu cầu về độ bền và trọng lượng. Như vậy, Generative AI đang góp phần quan trọng vào việc đổi mới trong thiết kế sản phẩm.

2. Bảo Trì Dự Đoán: Giảm Thời Gian Ngừng Máy Nhờ Dự Đoán Sự Cố

Tại hiện trường sản xuất, thời gian ngừng máy do sự cố thiết bị có thể gây ra những tổn thất nghiêm trọng. Phương pháp bảo trì truyền thống chủ yếu dựa vào “bảo trì định kỳ”, tức là kiểm tra thiết bị theo lịch cố định. Tuy nhiên, phương pháp này có nguy cơ dẫn đến bảo trì quá mức hoặc không thể ngăn chặn các sự cố bất ngờ. Với bảo trì dự đoán sử dụng Generative AI, hệ thống có thể phân tích dữ liệu cảm biến và các thông tin liên quan để dự đoán trước khi sự cố xảy ra.

Nhờ đó, doanh nghiệp có thể thực hiện bảo trì đúng thời điểm cần thiết, từ đó giảm thiểu tối đa thời gian ngừng máy.

Ví dụ, Siemens đã phát triển hệ thống bảo trì dự đoán mang tên “Senseye Predictive Maintenance” sử dụng công nghệ Generative AI. Hệ thống này phân tích dữ liệu vận hành của máy móc và dữ liệu từ cảm biến theo thời gian thực để phát hiện sớm các dấu hiệu hỏng hóc. Nhờ vậy, thời gian ngừng máy được giảm đáng kể và năng suất được cải thiện rõ rệt.

3. Dự Báo Nhu Cầu: Linh Hoạt Ứng Phó Với Biến Động Nhu Cầu

Dự báo nhu cầu chính xác là yếu tố vô cùng quan trọng đối với ngành sản xuất. Tồn kho dư thừa sẽ làm tăng chi phí lưu trữ, trong khi thiếu hàng tồn kho lại dẫn đến mất cơ hội kinh doanh. Generative AI có khả năng phân tích dữ liệu bán hàng trong quá khứ, xu hướng thị trường, các chỉ số kinh tế,… để dự đoán nhu cầu trong tương lai với độ chính xác cao. Nhờ đó, việc quản lý tồn kho được tối ưu hóa, giúp giảm chi phí và tối đa hóa lợi nhuận.

Chẳng hạn, trong ngành bán lẻ, các doanh nghiệp đã sử dụng Generative AI để dự báo nhu cầu bằng cách tính đến các yếu tố như mùa vụ, thời tiết, sự kiện,… Qua đó, họ đã tối ưu hóa được việc quản lý hàng tồn kho, linh hoạt ứng phó với biến động nhu cầu và nâng cao sự hài lòng của khách hàng.

4. Sản Xuất Tùy Chỉnh: Cung Cấp Sản Phẩm Phù Hợp Với Nhu Cầu Của Khách Hàng

Khi nhu cầu của khách hàng ngày càng đa dạng, nhu cầu đối với sản xuất tùy chỉnh cũng ngày càng tăng. Tuy nhiên, sản xuất tùy chỉnh truyền thống gặp phải nhiều thách thức như quy trình thiết kế và sản xuất phức tạp hơn, kéo theo chi phí và thời gian sản xuất tăng lên. Nhờ ứng dụng Generative AI, doanh nghiệp có thể tự động hóa việc thiết kế sản phẩm theo yêu cầu cá nhân của khách hàng và hiện thực hóa quy trình sản xuất tùy chỉnh một cách hiệu quả.

Chẳng hạn, trong ngành thời trang, đã xuất hiện các dịch vụ sử dụng Generative AI để thiết kế quần áo phù hợp với vóc dáng và sở thích của từng khách hàng, sau đó sản xuất theo yêu cầu. Nhờ đó, doanh nghiệp có thể nhanh chóng cung cấp sản phẩm tối ưu cho từng cá nhân và nâng cao mức độ hài lòng của khách hàng.

5. Quản Lý Chuỗi Cung Ứng: Hiện Thực Hóa Việc Mua Sắm Và Logistics Hiệu Quả

Tối ưu hóa chuỗi cung ứng là một thách thức quan trọng đối với ngành sản xuất. Generative AI giúp tối ưu hóa nhiều quy trình trong toàn bộ chuỗi cung ứng, bao gồm dự báo nhu cầu, quản lý hàng tồn kho và tối ưu hóa tuyến đường vận chuyển. Nhờ đó, doanh nghiệp có thể giảm chi phí và nâng cao hiệu quả từ khâu mua sắm đến hậu cần.

Ví dụ, trong ngành logistics, Generative AI đang được sử dụng để tối ưu hóa tuyến giao hàng và cải thiện hiệu suất xếp hàng lên xe tải. Điều này giúp giảm chi phí vận chuyển, rút ngắn thời gian giao hàng và nâng cao hiệu quả tổng thể của chuỗi cung ứng.

6. Kiểm Soát Chất Lượng: Phát Hiện Sản Phẩm Lỗi Và Nâng Cao Chất Lượng

Kiểm soát chất lượng sản phẩm là yếu tố vô cùng quan trọng đối với ngành sản xuất. Trong các phương pháp kiểm tra truyền thống, nhân viên kiểm định thường sử dụng mắt thường để kiểm tra, dẫn đến nguy cơ xảy ra lỗi do con người hoặc bỏ sót.

Tuy nhiên, với việc ứng dụng Generative AI, doanh nghiệp có thể sử dụng công nghệ nhận dạng hình ảnh để tự động phát hiện sản phẩm lỗi. Nhờ đó, hiệu suất và độ chính xác trong kiểm soát chất lượng được nâng cao rõ rệt.

Chẳng hạn, trong ngành công nghiệp ô tô, các nhà sản xuất đã triển khai Generative AI để tự động phát hiện các vết trầy xước trên thân xe hoặc các lỗi trong lớp sơn. Điều này giúp giảm thiểu lỗi kiểm tra do con người và đảm bảo cung cấp các sản phẩm có chất lượng cao.

7. Quản Lý Lực Lượng Lao Động: Phân Bổ Nhân Sự Và Nhiệm Vụ Một Cách Tối Ưu

Tại hiện trường sản xuất, việc bố trí nhân sự hợp lý và phân công nhiệm vụ phù hợp là rất quan trọng. Generative AI có thể phân tích các yếu tố như kỹ năng, kinh nghiệm và khối lượng công việc của nhân viên để đề xuất phương án phân bổ nhân sự và nhiệm vụ tối ưu.

Nhờ đó, hiệu suất công việc được nâng cao và nguồn lực lao động được sử dụng hiệu quả hơn.

Chẳng hạn, trong lĩnh vực quản lý kho hàng, Generative AI đang được ứng dụng để tối ưu hóa quy trình lấy hàng và phân bổ nhân sự một cách hợp lý. Kết quả là thời gian làm việc được rút ngắn, chi phí được giảm thiểu và năng suất tổng thể được cải thiện.

8. Nghiên Cứu Và Phát Triển: Thúc Đẩy Phát Triển Vật Liệu Và Sản Phẩm Mới

Việc phát triển vật liệu và sản phẩm mới là yếu tố không thể thiếu để các doanh nghiệp sản xuất duy trì năng lực cạnh tranh. Generative AI có thể phân tích khối lượng lớn dữ liệu về vật liệu và kết quả thí nghiệm để đề xuất các vật liệu hoặc sản phẩm mới tiềm năng.

Nhờ đó, quá trình nghiên cứu và phát triển (R&D) trở nên hiệu quả hơn và được đẩy nhanh tiến độ.

Chẳng hạn, trong ngành công nghiệp hóa chất, Generative AI đang được ứng dụng để tìm kiếm hợp chất tiềm năng cho thuốc mới hoặc dự đoán đặc tính của vật liệu. Điều này giúp tăng tốc quá trình nghiên cứu và rút ngắn thời gian đưa sản phẩm ra thị trường.

9. Soạn Thảo Tài Liệu Và Tuân Thủ Quy Định: Nâng Cao Hiệu Quả Nhờ Tự Động Hóa

Trong ngành sản xuất, doanh nghiệp cần tạo ra nhiều loại tài liệu khác nhau như bản thông số kỹ thuật sản phẩm, hướng dẫn sử dụng và báo cáo. Generative AI có thể tự động hóa việc soạn thảo các tài liệu này, giúp nâng cao hiệu quả công việc.

Bên cạnh đó, AI còn tích hợp khả năng kiểm tra sự tuân thủ các quy định pháp luật và tiêu chuẩn ngành một cách tự động. Nhờ vậy, doanh nghiệp có thể giảm thời gian và chi phí dành cho việc tạo tài liệu, đồng thời dễ dàng đảm bảo yêu cầu tuân thủ.

Chẳng hạn, trong quá trình tạo bản thông số kỹ thuật hay hướng dẫn sử dụng sản phẩm, Generative AI có thể tự động thu thập thông tin cần thiết và tạo ra tài liệu một cách chính xác và nhanh chóng.

Kết quả là hiệu suất công việc được cải thiện và doanh nghiệp có thể cung cấp các tài liệu chất lượng cao.

10. Tối Ưu Hóa Tiêu Thụ Năng Lượng: Giảm Chi Phí Và Tác Động Đến Môi Trường

Trong ngành sản xuất, việc giảm tiêu thụ năng lượng đóng vai trò quan trọng cả về khía cạnh cắt giảm chi phí và giảm thiểu tác động đến môi trường. Generative AI có thể phân tích chi tiết tình trạng sử dụng năng lượng trong nhà máy và đề xuất các mô hình tiêu thụ năng lượng tối ưu.

Nhờ đó, hiệu suất sử dụng năng lượng được nâng cao và chi phí được giảm đáng kể.

Những Thách Thức Trong Việc Triển Khai Generative AI

1.Chất lượng và số lượng dữ liệu

Hiệu suất của Generative AI phụ thuộc rất nhiều vào chất lượng và số lượng dữ liệu được sử dụng trong quá trình huấn luyện. Tại hiện trường sản xuất, một lượng lớn dữ liệu như dữ liệu từ cảm biến hoặc dữ liệu sản xuất được thu thập, nhưng không phải tất cả đều phù hợp để huấn luyện Generative AI. Để nâng cao chất lượng dữ liệu, cần thực hiện các công việc như làm sạch và xử lý dữ liệu trước.

2. Bảo Mật Và Quyền Riêng Tư

Việc huấn luyện Generative AI có thể liên quan đến việc sử dụng dữ liệu có độ bảo mật cao. Do đó, việc đảm bảo an toàn và quyền riêng tư của dữ liệu là vô cùng quan trọng. Cần áp dụng các biện pháp bảo mật thích hợp để ngăn chặn rò rỉ dữ liệu và truy cập trái phép.

3. Vấn Đề Đạo Đức

Generative AI có thể tạo ra nội dung chứa thông tin sai lệch hoặc định kiến. Do đó, cần xem xét một cách nghiêm túc các vấn đề đạo đức liên quan đến việc sử dụng công nghệ này. Việc xây dựng và tuân thủ các nguyên tắc đạo đức là điều cần thiết.

Tương Lai Của Ngành Sản Xuất Dưới Tác Động Của Generative AI

Generative AI được kỳ vọng sẽ góp phần nâng cao năng suất, giảm chi phí, cải thiện chất lượng và rút ngắn thời gian sản xuất bằng cách tự động hóa và tối ưu hóa nhiều quy trình trong ngành sản xuất. Ngoài ra, Generative AI còn có khả năng tạo ra những ý tưởng và giải pháp đột phá mà con người có thể không nghĩ đến.

Trong tương lai, Generative AI sẽ được ứng dụng rộng rãi hơn trong mọi lĩnh vực của ngành sản xuất, thúc đẩy sự chuyển đổi sang một ngành công nghiệp thông minh hơn, hiệu quả hơn và bền vững hơn.

Combining tmux and Claude to Build an Automated AI Agent System (for Mac & Linux)

Posted on June 29, 2025June 29, 2025 by hello@scuti

1. Introduction

With the rapid growth of AI, multi-agent systems are attracting more attention due to their ability to coordinate, split tasks, and handle complex automation. An “agent” can be an independent AI responsible for a specific role or task.

In this article, I’ll show you how to combine tmux (a powerful terminal multiplexer) with Claude (Anthropic’s AI model) to build a virtual organization. Here, AI agents can communicate, collaborate, and work together automatically via the terminal.

2. What is tmux?

tmux lets you split your terminal into multiple windows or sessions, each running its own process independently. Even if you disconnect, these sessions stay alive. This is super useful when you want to run several agents in parallel, each in their own terminal, without interfering with each other.

3. What is Claude?

Claude is an advanced language AI model developed by Anthropic. It can understand and respond to text requests, and it’s easy to integrate into automated systems—acting as a “virtual employee” taking on part of your workflow.

4. Why combine tmux and Claude?

Parallel & Distributed: Each agent is an independent Claude instance running in its own tmux session.

Workflow Automation: Easily simulate complex workflows between virtual departments or roles.

Easy Debug & Management: You can observe each agent’s logs in separate panes or sessions.

5. System Architecture

Let’s imagine a simple company structure:

PRESIDENT: Project Director (sets direction, gives instructions)

boss1: Team Leader (splits up tasks)

worker1, worker2, worker3: Team members (do the work)

Each agent has its own instruction file so it knows its role when starting up.

Agents communicate using a script:

./agent-send.sh [recipient] “[message]”

Workflow:

PRESIDENT → boss1 → workers → boss1 → PRESIDENT

6. Installation

Since the code is a bit long, I’ll just share the GitHub link to keep things short.

tmux:
Install guide: tmux Installing Guide

Claude:
Install guide: Claude Setup Guide

Git:
Install guide: Git Download

Clone the project:

bash
git clone https://github.com/mhieupham1/claudecliagent

Inside, you’ll find the main folders and files:

CLAUDE.md: Describes the agent architecture, communication, and workflows.

instructions/: Contains guidance for each role.

.claude/: JSON files to manage permissions for bash scripts.

setup.sh: Launches tmux sessions for PRESIDENT, boss1, worker1, worker2, worker3 so agents can talk to each other.

agent-send.sh: Script for sending messages between agents.

7. Deployment

Run the setup script:

bash
./setup.sh
This will create tmux sessions for PRESIDENT and the agents (boss1, worker1, worker2, worker3) in the background.

To access the PRESIDENT session:

bash
tmux attach-session -t president

To access the multiagent session:

bash
tmux attach-session -t multiagent

In the PRESIDENT session, run the claude command to set up the Claude CLI.

Do the same for the other agents.

Now, in the PRESIDENT window, try entering a request like:

you are president. create a todo list website now
PRESIDENT will start the to-do list. PRESIDENT will send instructions to boss1, boss1 will assign tasks to worker1, worker2, and worker3.

You can watch boss1 and the workers do their jobs, approve commands to create code files, and wait for them to finish.

Result:

8. Conclusion

Combining tmux and Claude lets you create a multi-agent AI system that simulates a real company: communicating, collaborating, and automating complex workflows. Having each agent in its own session makes it easy to manage, track progress, and debug.

This system is great for AI research, testing, or even real-world workflow automation, virtual team assistants, or teamwork simulations.

If you’re interested in developing multi-agent AI systems, try deploying this model, customize roles and workflows to your needs, and feel free to contribute or suggest improvements to the original repo!

Introducing Claude 4 and Its Capabilities

Posted on June 24, 2025June 24, 2025 by Duong Nguyen

Claude 4 refers to the latest generation of AI models developed by Anthropic, a company founded by former OpenAI researchers. The most powerful model in this family as of June 2024 is Claude 3.5 Opus, often informally called “Claude 4” due to its leap in performance.

Claude Opus 4 is powerful model yet and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours—dramatically outperforming all Sonnet models and significantly expanding what AI agents can accomplish.

Claude Opus 4 excels at coding and complex problem-solving, powering frontier agent products. Cursor calls it state-of-the-art for coding and a leap forward in complex codebase understanding. Replit reports improved precision and dramatic advancements for complex changes across multiple files. Block calls it the first model to boost code quality during editing and debugging in its agent, codename goose, while maintaining full performance and reliability. Rakuten validated its capabilities with a demanding open-source refactor running independently for 7 hours with sustained performance. Cognition notes Opus 4 excels at solving complex challenges that other models can’t, successfully handling critical actions that previous models have missed.

Claude Sonnet 4 significantly improves on Sonnet 3.7’s industry-leading capabilities, excelling in coding with a state-of-the-art 72.7% on SWE-bench. The model balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.

GitHub says Claude Sonnet 4 soars in agentic scenarios and will introduce it as the model powering the new coding agent in GitHub Copilot. Manus highlights its improvements in following complex instructions, clear reasoning, and aesthetic outputs. iGent reports Sonnet 4 excels at autonomous multi-feature app development, as well as substantially improved problem-solving and codebase navigation—reducing navigation errors from 20% to near zero. Sourcegraph says the model shows promise as a substantial leap in software development—staying on track longer, understanding problems more deeply, and providing more elegant code quality. Augment Code reports higher success rates, more surgical code edits, and more careful work through complex tasks, making it the top choice for their primary model.

These models advance our customers’ AI strategies across the board: Opus 4 pushes boundaries in coding, research, writing, and scientific discovery, while Sonnet 4 brings frontier performance to everyday use cases as an instant upgrade from Sonnet 3.7.

Key Strengths of Claude 4

1. Superior Reasoning and Intelligence

Claude 4 ranks at the top in benchmark evaluations such as:

MMLU (Massive Multitask Language Understanding)
GSM8k (math problem solving)
HumanEval (coding)
It rivals or exceeds OpenAI’s GPT-4-turbo and Google Gemini 1.5 Pro in complex reasoning, long-context understanding, and task execution.

2. Massive Context Window (Up to 200K Tokens)

Claude 4 can read and reason over hundreds of pages at once, making it perfect for:

Analyzing lengthy legal or scientific documents
Comparing large codebases
Summarizing long texts or reports

3. Advanced Coding Support

Claude 4 excels in:

Writing and explaining code in multiple languages (Python, JS, Java, etc.)
Debugging and understanding large code repositories
Pair programming and iterative development tasks

4. Natural and Helpful Communication

Responses are clear, polite, and structured
Especially strong in creative writing, professional emails, and educational explanations
Can follow complex instructions and maintain context over long conversations

Safe and Aligned by Design

Claude is built with safety and alignment in mind:

It avoids generating harmful or unethical content
It is more cautious and transparent than most models

How to Access or Use Claude 4

Claude is a cloud-based AI model, so you don’t install it like software — instead, you access it via the web or API.

1. Use Claude via Web App

Steps:

Go to: https://claude.ai
Sign up or log in (you need a US/UK/Canada/EU phone number).
Choose from free or paid plan (Claude 3.5 Opus is available only in Claude Pro – $20/month).

Claude Pro Includes:

Claude 3.5 Opus (latest, most powerful)
Larger context
Priority access during high demand

Currently, Claude is only available in select countries. If you’re outside the US/UK/Canada/EU, you may need to use a VPN and a virtual phone number to sign up (unofficial workaround).

2. Use Claude via API (For Developers)

API Access:

Go to: https://console.anthropic.com
Sign up and get an API key
Use the API with tools like Python, cURL, or Postman

Example (Python):

import anthropic

client = anthropic.Anthropic(api_key="your_api_key")

response = client.messages.create( model="claude-3.5-opus-20240620", max_tokens=1024, messages=[ {"role": "user", "content": "Explain quantum computing in simple terms"} ] )

print(response.content)

Can I Install Claude Locally?

No. Like ChatGPT or Gemini, Claude is not open-source or downloadable. It’s only available via:

Web app: claude.ai
API: console.anthropic.com

Feature	Claude 4 (Claude 3.5 Opus)
Developer	Anthropic
Model Type	Large Language Model (LLM)
Reasoning & Math	Top-tier performance
Context Length	Up to 200,000 tokens
Code Assistance	Strong support for multiple languages
Language Style	Human-like, calm, professional
Best Use Cases	Analysis, writing, coding, dialogue
Access	claude.ai or API

A Step-by-Step Guide to Integrating and Using Claude Code Action on GitHub

Posted on June 11, 2025 by hello@scuti

Investigate how Claude Code Action is great. Just create an issue and put a mention to Claude like @claude, Claude can write the code automatically

Introduction

In the current era of rapidly evolving technology, artificial intelligence (AI)

stands out as one of the most significant and transformative breakthroughs on a global scale. Among the various AI-driven tools, Claude — particularly the Claude Action Code — represents a powerful integration that can be embedded into user’s GitHub repositories to address raised issues with remarkable accuracy and efficiency. This paper aims to explore the capabilities and applications of Claude Action Code in modern software development workflows.

Body content

Claude Code Action is a extension categorized as a “Action” and made available on the GitHub Marketplace by Anthropic. Users can search for and utilize it by following the provided setup instructions outlined in the README documentation. Below is a summary of the basic setup steps for integrating Claude Code Action into user’s GitHub repository:

1.Create a workflow folder:

On GitHub: In user’s GitHub repository, click “Add file”:

insert the configuration into the path:“.git/workflows/[file_name].yml”. For instance:

Next, insert the appropriate workflow configuration for this extension, depending on your intended use:

For example:

name: Claude PR Assistant

on:

issue_comment:

types: [created]

pull_request_review_comment:

types: [created]

issues:

types: [opened, assigned]

pull_request_review:

types: [submitted]

jobs:

claude-code-action:

if: |

(github.event_name == ‘issue_comment’ &&

contains(github.event.comment.body, ‘@claude’)) ||

(github.event_name == ‘pull_request_review_comment’ && contains(github.event.comment.body, ‘@claude’)) ||

(github.event_name == ‘pull_request_review’ &&

contains(github.event.review.body, ‘@claude’)) ||

(github.event_name == ‘issues’ && contains(github.event.issue.body, ‘@claude’))

runs-on: ubuntu-latest

permissions:

contents: write

pull-requests: read

issues: read

id-token: write

steps:

– name: Checkout repository

uses: actions/checkout@v4

with:

fetch-depth: 1

– name: Run Claude PR Action

uses: anthropics/claude-code-action@beta

with:

anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}

timeout_minutes: “60”

Then, click “Commit changes” to successfully add the configuration to your repository.

On the user’s local machine: If a folder in VScode has already been connected to the GitHub repository, the user can manually create a workflow directory and a .yml file to store the Claude configuration. Then, file can be pushed to the GitHub repository

2.API key:

Claude API is not free, and you need to visit https://console.anthropic.com/dashboard to add credit and obtain an API key for your personal Anthropic account.

After that, the API key should be added to the repository’s Secrets under the Setting tab, rather than being hard-coded directly into workflow file to prevent unauthorized access

Find Action in Secret and variables

Create a new repository secret

Add your API key to Secret’s description

Name secret as key’s name in the workflow file

✅Correct

❌Never do it

3. Using Claude Code Action:

User creates a new issue within repository where Claude is intended to be used:

The user describes the issue to be resolved – such as feature creation, bug fixing, code review, … – in the issue’s description. You can tag “@claude” directly in the description or in a comment after the issue is created, in order trigger Claude to process the request

Ex: Ask Claude to generate complete login and registration pages based on the initial files in the repo

Claude is invoked via API to address the issue described, with the response time depending on the complexity of the request. It uses the token associated with your API key to read the issue content as well as to create or modify code within the repository

Claude’s response will appear in the comments section of the issue.

Here, Claude generates additional files, for example register.html and dashboard.html, as part of the requested implementation and show what changes are made to each file — including which parts are added, modified, or deleted.

At this point, Claude has created a separate branch in the repository containing the proposed changes. The user can then review and consider merging these updates into the main branch via a pull request.

After successfully merging into the main branch

Following a successful merge, the issue may be closed. At this point, Claude has been effectively utilized to generate complete, functional demo pages for user login and registration.

4.Result:

Registration page

Login screen

Dashboard screen

In summary, Claude Code Action proves to be a highly effective tool for streamlining development tasks, making it easier for both individuals and teams to enhance productivity.

Buổi học nội bộ! Tìm hiểu “MCP” – giao thức không thể thiếu trong ứng dụng AI tạo sinh

Posted on May 30, 2025June 19, 2025 by hello@scuti

Internal seminar about MCP

Xin chào, tôi là Kakeya, Giám đốc điều hành của Scuti.

Công ty chúng tôi chuyên phát triển offshore tại Việt Nam với thế mạnh về AI tạo sinh. Chúng tôi cung cấp các dịch vụ như Tư vấn AI tạo sinh và AI-OCR, và gần đây rất vui mừng khi nhận được nhiều yêu cầu phát triển hệ thống tích hợp cùng AI tạo sinh.

Gần đây, Scuti đã tổ chức một buổi học nội bộ để nâng cao hiểu biết về “MCP (Model Connection Protocol)”.

MCP là một giao thức dùng để kết nối AI – đặc biệt là LLM (Mô hình ngôn ngữ lớn) – với các dịch vụ bên ngoài. Mặc dù nghe có vẻ kỹ thuật, nhưng thực tế lại rất hữu ích ngay cả đối với những người không phải kỹ sư. Ví dụ, khi sử dụng các công cụ như Claude, MCP giúp tích hợp hiệu quả với các dịch vụ khác, nâng cao năng suất đáng kể.

Buổi học này hướng đến cả kỹ sư và những người không chuyên kỹ thuật. Nội dung bao gồm khái niệm cơ bản về MCP, các ví dụ ứng dụng thực tế, cũng như cách triển khai vào công việc hàng ngày. Một điểm nổi bật là việc sử dụng MCP kết hợp với các công cụ phát triển như Cursor để kết nối với các dịch vụ bên ngoài, từ đó tăng tốc độ phát triển và nâng cao chất lượng sản phẩm. Đây là kỹ năng gần như bắt buộc đối với kỹ sư.

Tại Scuti, chúng tôi luôn nỗ lực tạo ra môi trường giúp mọi thành viên cập nhật các công nghệ mới nhất. Ngoài các buổi hội thảo nội bộ thường xuyên, chúng tôi còn có chính sách thưởng cho việc nghiên cứu và chia sẻ kết quả, cũng như hỗ trợ nhân viên lấy các chứng chỉ kỹ thuật.

Trong thời đại AI tạo sinh ngày càng gắn liền với sự phát triển kinh doanh, saldo5d việc toàn bộ nhân viên có cùng nền tảng kiến thức và khả năng áp dụng thực tiễn là điều vô cùng quan trọng. Thông qua các buổi học như thế này, Scuti tiếp tục củng cố năng lực công nghệ và khả năng phối hợp nội bộ một cách vững chắc.

Cursor 0.50 Just Dropped – Your AI-Powered Coding Assistant Just Got Smarter

Posted on May 27, 2025May 28, 2025 by Cuong Dinh

💡 Cursor 0.50 Just Dropped – Your AI-Powered Coding Assistant Just Got Smarter

TL;DR: With the release of Cursor 0.50, developers get access to request-based billing, background AI agents, smarter multi-file edits, and deeper workspace integration. Cursor is fast becoming the most capable AI coding tool for serious developers.

🚀 What Is Cursor?

Cursor is an AI-native code editor built on top of VS Code, designed to let AI work with your code rather than next to it. With GPT-4 and Claude integrated deeply into its architecture, Cursor doesn’t just autocomplete — it edits, debugs, understands your full project, and runs background agents to help you move faster.

🔥 What’s New in Cursor 0.50?

💰 Request-Based Billing + Max Mode for All Models

Cursor now offers:

Transparent usage-based pricing — You only pay for requests you make.
Max Mode for all LLMs (GPT-4, Claude, etc.) — Access higher-quality reasoning per token.

This change empowers all users — from solo hackers to enterprise teams — to choose the right balance between cost and quality.

🤖 Background AI Agents (Yes, Parallel AI!)

One of the most powerful new features is background AI agents:

Agents run asynchronously and can take over tasks like bug fixing, PR writing, and large-scale refactoring.
You can now “send a task” to an agent, switch context, and return later — a huge leap in multitasking with AI.

Powered by the Multi-Context Project (MCP) framework, these agents can reference more of your codebase than ever before.

🧠 Tab Model v2: Smarter, Cross-File Edits

Cursor’s AI can now:

Suggest changes across multiple files — critical for large refactors.
Understand relationships between files (like components, hooks, or service layers).
Provide syntax-highlighted AI completions for better visual clarity.

🛠️ Redesigned Inline Edit Flow

Inline editing (Cmd/Ctrl+K) is now:

More intuitive, with options to edit the whole file (⌘⇧⏎) or delegate to an agent (⌘L).
Faster and scalable for large files (yes, even thousands of lines).

This bridges the gap between simple fixes and deep code transformations.

🗂️ Full-Project Context + Multi-Root Workspaces

Cursor now handles large, complex projects better than ever:

You can use @folders to add whole directories into the AI’s context.
Multi-root workspace support means Cursor can understand and work across multiple codebases — essential for microservices and monorepos.

🧪 Real Use Cases (from the Community)

According to GenerativeAI.pub’s deep dive, developers are already using Cursor 0.50 to:

Let background agents auto-refactor legacy modules.
Draft PRs from diffs in seconds.
Inject whole folders into the AI context for more accurate suggestions.

It’s not just about faster code — it’s about working smarter with an AI assistant that gets the big picture.

📌 Final Thoughts

With Cursor 0.50, the future of pair programming isn’t just someone typing next to you — it’s an agent that can read, think, and refactor your code while you focus on building features. Whether you’re a solo developer or a CTO managing a team, this update is a must-try.

👉 Try it now at cursor.sh or read the full changelog here.

🏷 Suggested Tags for SEO:

#AIProgramming, #CursorEditor, #GPT4Dev, #AIAgents, #CodeRefactoring, #DeveloperTools, #VSCodeAI, #Productivity, #GenerativeAI

Ask Questions about Your PDFs with Cohere Embeddings + Gemini LLM

Posted on May 14, 2025May 23, 2025 by hello@scuti

🔍 Experimenting with Image Embedding Using Large AI Models

Recently, I experimented with embedding images using major AI models to build a multimodal semantic search system, where users can search images with text (and vice versa).

🧐 A Surprising Discovery

I was surprised to find that as of 2025, Cohere is the only provider that supports direct image embedding via API.
Other major models like OpenAI and Gemini (by Google) do support image input in general, but do not clearly provide a direct embedding API for images.

Reason for Choosing Cohere

I chose to try Cohere’s embed-v4.0 because:

It supports embedding text, images, and even PDF documents (converted to images) into the same vector space.
You can choose the embedding size (I used the default, 1536).
It returns normalized embeddings that are ready to use for search and classification tasks.

⚙️ How I Built the System

I used Python for implementation. The system has two main flows:

1️⃣ Document Preparation Flow

Load documents, images, or text data that I want to store.
Use the Cohere API to embed them into vector representations.
Save these vectors in a database or vector store for future search queries.

2️⃣ User Query Flow

When a user asks a question or types a query:
- Use Cohere to embed the query into a vector.
- Search for the most similar documents in the vector space.
- Return results to the user using a LLM (Large Language Model) like Gemini by Google.

🔑 How to Get API Keys

To use Cohere, go to: https://cohere.com, sign up, and get your API key.
(Cohere currently offers a free tier – see details here: docs.cohere.com/docs/rate-limits)
To use Gemini (Google), go to: https://aistudio.google.com, sign up, and get your API key.
(Gemini also has a free tier – see details here: ai.google.dev/gemini-api/docs/rate-limits)

🔧 Flow 1: Setting Up Cohere and Gemini in Python

✅ Step 1: Install and Set Up Cohere

Run the following command in your terminal to install the Cohere Python SDK:

Then, initialize the Cohere client in your Python script:

✅ Step 2: Install and Set Up Gemini (Google Generative AI)

Install the Gemini client library with:

Then, initialize the Gemini client in your Python script:

from google import genai

# Replace <<YOUR_GEMINI_KEY>> with your actual Gemini API key
gemini_api_key = “<<YOUR_GEMINI_KEY>>”
client = genai.Client(api_key=gemini_api_key)

📌 Flow 1: Document Preparation and Embedding

Chúng ta sẽ thực hiện các bước để chuyển PDF thành dữ liệu embedding bằng Cohere.

📥 Step 1: Download the PDF

We start by downloading the PDF from a given URL.

🖼️ Step 2: Convert PDF Pages to Text + Image

We extract both text and image for each page using PyMuPDF.

python

import fitz # PyMuPDF
import base64
from PIL import Image
import io

def extract_page_data(pdf_path):
doc = fitz.open(pdf_path)
pages_data = []
img_paths = []

for i, page in enumerate(doc):
text = page.get_text()

pix = page.get_pixmap()
image = Image.open(io.BytesIO(pix.tobytes(“png”)))

buffered = io.BytesIO()
image.save(buffered, format=”PNG”)
encoded_img = base64.b64encode(buffered.getvalue()).decode(“utf-8″)
data_url = f”data:image/png;base64,{encoded_img}”

content = [
{“type”: “text”, “text”: text},
{“type”: “image_url”, “image_url”: {“url”: data_url}},
]

pages_data.append({“content”: content})
img_paths.append({“data_url”: data_url})

return pages_data, img_paths

# Example usage
pages, img_paths = extract_page_data(local_pdf_path)

📤 Step 3: Embed Using Cohere

Now, send the fused text + image inputs to Cohere’s embed-v4.0 model.

✅ Flow 1 complete: You now have the embedded vector representations of your PDF pages.

👉 Proceed to Flow 2 (e.g., storing, indexing, or querying the embeddings).

🔍 Flow 2: Ask a Question and Retrieve the Answer Using Image + LLM

This flow allows the user to ask a natural language question, find the most relevant image using Cohere Embed v4, and then answer the question using Gemini 2.5 Vision LLM.

💬 Step 1: Ask the Question

We define the user query in plain English.

🧠 Step 2: Convert the Question to Embedding & Find Relevant Image

We use embed-v4.0 with input type search_query, then calculate cosine similarity between the question embedding and previously embedded document images.

python

def search(question, max_img_size=800):
# Get embedding for the query
api_response = co.embed(
model=”embed-v4.0″,
input_type=”search_query”,
embedding_types=[“float”],
texts=[question],
output_dimension=1024,
)

query_emb = np.asarray(api_response.embeddings.float[0])

# Compute cosine similarity with all document embeddings
cos_sim_scores = np.dot(embeddings, query_emb)
top_idx = np.argmax(cos_sim_scores) # Most relevant image

hit_img_path = img_paths[top_idx]
base64url = hit_img_path[“data_url”]

print(“Question:”, question)
print(“Most relevant image:”, hit_img_path)

# Display the matched image
if base64url.startswith(“data:image”):
base64_str = base64url.split(“,”)[1]
else:
base64_str = base64url

image_data = base64.b64decode(base64_str)
image = Image.open(io.BytesIO(image_data))

image.thumbnail((max_img_size, max_img_size))
display(image)

return base64url

🤖 Step 3: Use Vision-LLM (Gemini 2.5) to Answer

We use Gemini 2.5 Flash to answer the question based on the most relevant image.

python

def answer(question, base64_img_str):
if base64_img_str.startswith(“data:image”):
base64_img_str = base64_img_str.split(“,”)[1]

image_bytes = base64.b64decode(base64_img_str)
image = Image.open(io.BytesIO(image_bytes))

prompt = [
f”””Answer the question based on the following image.
Don’t use markdown.
Please provide enough context for your answer.

Question: {question}”””,
image
]

response = client.models.generate_content(
model=”gemini-2.5-flash-preview-04-17″,
contents=prompt
)

answer = response.text
print(“LLM Answer:”, answer)

▶️ Step 4: Run the Full Flow

🧪 Example Usage:

question = “What was the total number of wildfires in the United States from 2007 to 2015?”

# Step 1: Find the best-matching image
top_image_path = search(question)

# Step 2: Use the image to answer the question
answer(question, top_image_path)

🧾 Output:

Question: What was the total number of wildfires in the United States from 2007 to 2015?

Most relevant image:

LLM Answer: Based on the provided image, to find the total number of wildfires in the United States from 2007 to 2015, we need to sum the number of wildfires for each year in this period. Figure 1 shows the annual number of fires in thousands from 1993 to 2022, which covers the requested period. Figure 2 provides the specific number of fires for 2007 and 2015 among other years. Using the specific values from Figure 2 for 2007 and 2015, and estimating the number of fires for the years from 2008 to 2014 from Figure 1, we can calculate the total.

The number of wildfires in 2007 was 67.8 thousand (from Figure 2).

Estimating from Figure 1:

2008 was approximately 75 thousand fires.

2009 was approximately 75 thousand fires.

2010 was approximately 67 thousand fires.

2011 was approximately 74 thousand fires.

2012 was approximately 68 thousand fires.

2013 was approximately 47 thousand fires.

2014 was approximately 64 thousand fires.

The number of wildfires in 2015 was 68.2 thousand (from Figure 2).

Summing these values:

Total = 67.8 + 75 + 75 + 67 + 74 + 68 + 47 + 64 + 68.2 = 606 thousand fires.

Therefore, the total number of wildfires in the United States from 2007 to 2015 was approximately 606,000. This number is based on the sum of the annual number of fires obtained from Figure 2 for 2007 and 2015, and estimates from Figure 1 for the years 2008 through 2014.

Try this full pipeline on Google Colab: https://colab.research.google.com/drive/1kdIO-Xi0MnB1c8JrtF26Do3T54dij8Sf

🧩 Final Thoughts

This simple yet powerful two-step pipeline demonstrates how you can combine Cohere’s Embed v4 with Gemini’s Vision-Language capabilities to build a system that understands both text and images. By embedding documents (including large images) and using semantic similarity to retrieve relevant content, we can create a more intuitive, multimodal question-answering experience.

This approach is especially useful in scenarios where information is stored in visual formats like financial reports, dashboards, or charts — allowing LLMs to not just “see” the image but reason over it in context.

Multimodal retrieval-augmented generation (RAG) is no longer just theoretical — it’s practical, fast, and deployable today.

7 Bí Quyết Giúp Nâng Cao Độ Chính Xác Của OCR Biểu Mẫu Phi Tiêu Chuẩn

Posted on May 7, 2025 by hello@scuti

Xin chào, tôi là Kakeya, đại diện của công ty Scuti.

Nhiều người có thể đang cân nhắc triển khai OCR biểu mẫu phi tiêu chuẩn nhưng vẫn do dự do lo ngại về độ chính xác. OCR (Nhận dạng ký tự quang học) là một công nghệ cực kỳ hữu ích giúp chuyển đổi tài liệu giấy thành dữ liệu số. Tuy nhiên, đối với các biểu mẫu phi tiêu chuẩn có bố cục linh hoạt và định dạng không cố định, vấn đề độ chính xác thường trở nên nghiêm trọng hơn.

Ngay cả khi đã áp dụng OCR, nếu độ chính xác nhận dạng thấp, khối lượng công việc chỉnh sửa thủ công sẽ tăng lên. Kết quả là, lợi ích kỳ vọng về nâng cao hiệu suất làm việc và cắt giảm chi phí có thể không đạt được như mong đợi. Do đó, trong bài viết này, chúng tôi sẽ giới thiệu 7 phương pháp cụ thể để cải thiện đáng kể độ chính xác của OCR biểu mẫu phi tiêu chuẩn.

Bằng cách áp dụng những bí quyết này, bạn có thể nâng cao đáng kể độ chính xác của OCR, tối ưu hóa quy trình làm việc và giảm chi phí. Hãy đọc đến cuối bài viết và áp dụng những kiến thức này vào doanh nghiệp của bạn.

Bài viết này sẽ cung cấp cái nhìn toàn diện, bắt đầu từ những kiến thức cơ bản về OCR biểu mẫu phi tiêu chuẩn, tiếp đến là các kỹ thuật cải thiện độ chính xác và cuối cùng là khám phá tương lai của OCR với sự hỗ trợ của các công nghệ tiên tiến nhất.

OCR Biểu Mẫu Phi Tiêu Chuẩn Là ì?

Nếu bạn muốn tìm hiểu thêm về AI OCR, hãy xem trước bài viết này.
Bài viết liên quan: Các Ứng Dụng Đột Phá Của AI OCR Tạo Sinh Và 5 Phương Pháp Chính

Kiến thức cơ bản về OCR: Cơ chế và các loại hình

OCR (Nhận dạng ký tự quang học) là công nghệ giúp trích xuất thông tin văn bản từ dữ liệu hình ảnh. Cụ thể, nó cho phép máy tính đọc chữ từ các tài liệu giấy đã được quét hoặc chụp ảnh. OCR được ứng dụng rộng rãi trong việc số hóa tài liệu, giúp nâng cao hiệu suất làm việc và cắt giảm chi phí đáng kể. Có bốn loại OCR chính:

OCR truyền thống (Nhận dạng mẫu truyền thống): Phương pháp này nhận dạng chữ bằng cách so khớp dữ liệu hình ảnh với các mẫu ký tự đã được định nghĩa trước. Nó mang lại độ chính xác cao đối với các phông chữ và bố cục tiêu chuẩn nhưng thiếu tính linh hoạt.
Nhận dạng dấu quang học (OMR): Phương pháp này nhận diện các mẫu cụ thể, chẳng hạn như ô kiểm tra và phiếu trắc nghiệm. Nó được sử dụng phổ biến trong khảo sát và chấm điểm bài thi.
Nhận dạng ký tự thông minh (ICR): Phương pháp này nhận diện chữ viết tay, vốn không tuân theo một mẫu cố định. Nó phù hợp để nhận diện biểu mẫu viết tay và chữ ký.
Nhận dạng mã vạch: Phương pháp này đọc các ký hiệu như mã vạch và mã QR. Nó được ứng dụng nhiều trong quản lý sản phẩm và kho hàng.

Thách thức và giải pháp của OCR biểu mẫu phi tiêu chuẩn

Biểu mẫu phi tiêu chuẩn là các tài liệu không tuân theo một định dạng cố định. Ví dụ, hóa đơn và đơn đặt hàng có bố cục khác nhau tùy theo từng doanh nghiệp. OCR biểu mẫu phi tiêu chuẩn là công nghệ giúp trích xuất thông tin văn bản từ các bố cục linh hoạt như vậy, nhưng OCR truyền thống thường gặp vấn đề về độ chính xác. Các công nghệ OCR truyền thống thường không đủ khả năng xử lý sự đa dạng của các định dạng và bố cục này.

Để giải quyết vấn đề này, công nghệ OCR tiên tiến ứng dụng AI và máy học đã ra đời. AI OCR học hỏi đặc điểm của văn bản từ một lượng lớn dữ liệu, giúp nhận diện ký tự với độ chính xác cao ngay cả đối với các biểu mẫu phi tiêu chuẩn. Các thuật toán máy học có khả năng tự động nhận diện và học tập các mẫu, từ đó tăng tính linh hoạt trong xử lý nhiều loại bố cục và định dạng khác nhau.

7 Bí quyết giúp nâng cao độ chính xác của OCR biểu mẫu phi tiêu chuẩn

1. Sử dụng hình ảnh chất lượng cao: Tầm quan trọng của máy quét và độ phân giải

Độ chính xác của OCR bị ảnh hưởng đáng kể bởi chất lượng hình ảnh đầu vào. Việc sử dụng hình ảnh chất lượng cao giúp nâng cao độ chính xác của OCR. Cụ thể, cần sử dụng máy quét hiệu suất cao và quét với độ phân giải phù hợp.

Lựa chọn máy quét: Hãy chọn máy quét dựa trên các yếu tố như độ phân giải, tốc độ quét và định dạng tệp hỗ trợ. Máy quét hiệu suất cao giúp tạo ra hình ảnh rõ nét hơn, góp phần nâng cao độ chính xác của OCR.
Cài đặt độ phân giải: Thông thường, độ phân giải tối thiểu 300 dpi được khuyến nghị. Độ phân giải càng cao, ký tự càng rõ nét, giúp giảm nguy cơ nhận dạng sai. Tuy nhiên, nếu độ phân giải quá cao, kích thước tệp sẽ tăng và có thể làm chậm tốc độ xử lý, do đó cần cân nhắc sự cân bằng hợp lý.

2. Tiền xử lý hình ảnh: Loại bỏ nhiễu và điều chỉnh độ tương phản

Hình ảnh quét có thể chứa nhiễu và bụi bẩn. Những yếu tố này có thể làm giảm độ chính xác của OCR, vì vậy việc loại bỏ chúng thông qua tiền xử lý là rất quan trọng.

Loại bỏ nhiễu: Sử dụng phần mềm chỉnh sửa hình ảnh hoặc công cụ tiền xử lý chuyên dụng để loại bỏ nhiễu và bụi bẩn trong hình ảnh. Điều này giúp OCR nhận diện ký tự chính xác hơn.
Điều chỉnh độ tương phản: Bằng cách điều chỉnh độ tương phản giữa ký tự và nền, các ký tự sẽ nổi bật rõ ràng hơn, giúp cải thiện độ chính xác khi nhận diện. Đặc biệt, nếu nền có vết bẩn hoặc bóng đổ, việc tăng độ tương phản sẽ làm tăng khả năng nhìn thấy ký tự.

3. Lựa chọn phần mềm OCR phù hợp: Tận dụng AI OCR

Có nhiều loại phần mềm OCR khác nhau. Đối với các biểu mẫu không chuẩn, việc chọn phần mềm phù hợp, như AI OCR, phù hợp với loại hình và mục đích của biểu mẫu là rất quan trọng, thay vì sử dụng phần mềm OCR đơn giản.

AI OCR: AI OCR có thể nhận diện ký tự với độ chính xác cao ngay cả đối với các biểu mẫu không chuẩn, vì nó học các đặc điểm của ký tự từ lượng dữ liệu lớn. Bằng cách tận dụng các thuật toán học máy, nó có thể linh hoạt xử lý các bố cục phức tạp và các định dạng đa dạng.
OCR dựa trên đám mây: Các dịch vụ OCR dựa trên đám mây cũng là một lựa chọn. Vì có thể truy cập qua internet, chúng giúp giảm chi phí triển khai và cung cấp khả năng mở rộng tốt. Tuy nhiên, việc kiểm tra độ tin cậy của dịch vụ từ góc độ bảo mật và quyền riêng tư dữ liệu là rất quan trọng trước khi sử dụng.

4. Sử dụng chức năng nhận diện bố cục: Cấu trúc hóa văn bản
Phần mềm OCR hiện đại có tích hợp chức năng nhận diện bố cục. Việc sử dụng chức năng này giúp nhận diện cấu trúc của văn bản, từ đó cải thiện độ chính xác khi trích xuất dữ liệu.

Trích xuất dữ liệu bảng biểu:
Bằng cách sử dụng chức năng nhận diện bố cục, dữ liệu dạng bảng có thể được trích xuất chính xác. Điều này cho phép sử dụng dữ liệu bảng dưới dạng dữ liệu số mà vẫn giữ nguyên mối quan hệ và cấu trúc vị trí của các dữ liệu trong bảng.
Liên kết tên mục và giá trị:
Việc tự động liên kết tên mục và giá trị của chúng giúp giảm đáng kể công sức nhập liệu. Điều này giúp duy trì tính toàn vẹn của dữ liệu và đảm bảo quá trình xử lý dữ liệu sau này được suôn sẻ.

5. Đăng ký từ điển: Hỗ trợ các thuật ngữ chuyên ngành
Một số phần mềm OCR có tính năng đăng ký từ điển. Bằng cách đăng ký các thuật ngữ chuyên ngành hoặc thuật ngữ đặc thù của ngành vào từ điển, có thể cải thiện độ chính xác trong việc nhận diện.

Ngăn ngừa nhận diện sai:
Việc đăng ký từ điển giúp phần mềm OCR nhận diện chính xác các thuật ngữ chuyên ngành. Điều này giúp giảm thiểu sự nhận diện sai và cải thiện độ chính xác của dữ liệu.
Cải thiện tỷ lệ nhận diện:
Việc đăng ký thuật ngữ chuyên ngành vào từ điển giúp cải thiện tỷ lệ nhận diện tổng thể của phần mềm OCR. Điều này đặc biệt hiệu quả đối với các mẫu biểu có nhiều thuật ngữ chuyên ngành.

6. Tạo mẫu: Tối ưu hóa theo loại biểu mẫu
Khi xử lý nhiều lần các biểu mẫu cùng loại bằng OCR, việc tạo mẫu sẽ giúp cải thiện độ chính xác và giảm thời gian xử lý.

Cố định vị trí mục:
Bằng cách sử dụng mẫu, các vị trí của từng mục có thể được cố định. Điều này giúp phần mềm OCR nhận diện văn bản dễ dàng hơn và giảm thiểu việc nhận diện sai.
Giảm thời gian xử lý:
Việc sử dụng mẫu giúp giảm đáng kể thời gian xử lý OCR. Với một bố cục cố định, phần mềm có thể trích xuất dữ liệu một cách hiệu quả, từ đó nâng cao tốc độ xử lý tổng thể.

7. Kiểm tra bởi con người: Tầm quan trọng của việc xác nhận cuối cùng

Sau khi xử lý OCR, việc thực hiện kiểm tra bởi con người là rất quan trọng. Phần mềm OCR có độ chính xác cao, nhưng không phải lúc nào cũng hoàn hảo, và có thể xảy ra nhận diện sai.

Sửa chữa nhận diện sai:
Bằng cách thực hiện kiểm tra bởi con người, những sai sót trong nhận diện của phần mềm OCR có thể được sửa chữa. Điều này giúp cải thiện độ chính xác của dữ liệu.
Cải thiện độ chính xác của dữ liệu:
Thông qua việc xác nhận cuối cùng, độ chính xác của dữ liệu được nâng cao hơn nữa. Đối với các dữ liệu kinh doanh quan trọng, quá trình kiểm tra này là không thể thiếu để đảm bảo độ tin cậy.

Tương lai của OCR trên mẫu không chuẩn: Sự tiến hóa qua công nghệ AI

Với sự tiến hóa của công nghệ AI, độ chính xác của OCR dự kiến sẽ tiếp tục được cải thiện trong tương lai. Đặc biệt, công nghệ học sâu (deep learning) đã đóng góp lớn trong việc nâng cao độ chính xác của OCR.

Sự tiến bộ trong nhận diện chữ viết tay:
Học sâu đã cải thiện đáng kể độ chính xác trong nhận diện chữ viết tay. Nhờ đó, việc số hóa các biểu mẫu viết tay và chữ ký giờ đây có thể được thực hiện chính xác và hiệu quả hơn.
Tăng cường hỗ trợ đa ngôn ngữ:
Phần mềm OCR sử dụng học sâu đã được cải thiện khả năng hỗ trợ đa ngôn ngữ. Điều này giúp các doanh nghiệp mở rộng toàn cầu xử lý tài liệu đa ngôn ngữ dễ dàng hơn, dự kiến sẽ nâng cao hiệu quả công việc quốc tế.

Kết Luận: Tối đa hóa việc sử dụng OCR trên mẫu không chuẩn

OCR trên mẫu không chuẩn là một công cụ mạnh mẽ giúp đạt được hiệu quả công việc và giảm chi phí. Bằng cách thực hiện 7 mẹo để cải thiện độ chính xác, bạn có thể tối đa hóa hiệu quả của OCR. Với sự tiến hóa của công nghệ AI, OCR sẽ tiếp tục phát triển và được kỳ vọng sẽ mang lại độ chính xác và tính linh hoạt cao hơn trong tương lai. Hãy áp dụng những chiến lược này để thúc đẩy quá trình số hóa trong doanh nghiệp của bạn.

7 Secrets To Improving The Accuracy Of Non-Standard Form OCR

Posted on May 7, 2025 by hello@scuti

Hello, I am Kakeya, the representative of Scuti.

Our company specializes in services such as Offshore Development And Lab-type Development in Vietnam, as well as Generative AI Consulting. Recently, we have been fortunate to receive numerous requests for system development in collaboration with generative AI.

Many people may be considering implementing non-standard form OCR but hesitate due to concerns about accuracy. OCR (Optical Character Recognition) is a highly useful technology that converts paper documents into digital data. However, when dealing with non-standard forms that have flexible layouts and inconsistent formats, accuracy issues tend to become more pronounced.

Even if OCR is introduced, low recognition accuracy may lead to an increased need for manual corrections. As a result, the expected improvements in operational efficiency and cost reduction may not be fully realized. Therefore, in this article, we introduce seven specific methods to dramatically enhance the accuracy of non-standard form OCR.

By applying these techniques, you can significantly improve OCR accuracy, streamline operations, and reduce costs. We encourage you to read through to the end and apply these insights to your business.

This article provides a comprehensive explanation, starting with the fundamentals of non-standard form OCR, followed by specific techniques for improving accuracy, and finally exploring the future of OCR through the utilization of the latest technologies.

What Is Non-Standard Form OCR?

If you want to learn more about AI OCR, be sure to check out this article first.
Related article: Innovative Applications Of Generative AI OCR And Five Key Methods

Basic Knowledge of OCR: Mechanism and Types

OCR (Optical Character Recognition) is a technology that extracts text information from image data. Specifically, it enables computers to read text from scanned or photographed paper documents. OCR is widely used for digitizing various types of documents, significantly contributing to operational efficiency and cost reduction. There are four main types of OCR:

Traditional OCR (Traditional Pattern Recognition): This method recognizes text by matching image data with predefined character templates. It delivers high accuracy for standardized fonts and layouts but lacks flexibility.
Optical Mark Recognition (OMR): This method identifies specific patterns, such as checkboxes and mark sheets. It is widely used for surveys and test scoring.
Intelligent Character Recognition (ICR): This method recognizes handwritten characters, which do not follow a fixed pattern. It is suitable for recognizing handwritten forms and signatures.
Barcode Recognition: This method reads symbols such as barcodes and QR codes. It is commonly used for product and inventory management.

Challenges and Solutions of Non-Standard Form OCR

Non-standard forms refer to documents that do not follow a fixed format. Examples include invoices and purchase orders, which vary in layout depending on the company. Non-standard form OCR is a technology that extracts text information from such flexible layouts, but traditional OCR often struggles with accuracy. Conventional OCR technologies alone are often insufficient to handle the wide variety of formats and layouts.

To address this challenge, advanced OCR technologies utilizing AI and machine learning have emerged. AI OCR learns text characteristics from large datasets, enabling high-accuracy recognition even for non-standard forms. Machine learning algorithms automatically identify and learn patterns, allowing for greater adaptability to diverse layouts and formats.

7 Secrets to Improving the Accuracy of Non-Standard Form OCR

1. Use High-Quality Images: The Importance of Scanners and Resolution

The accuracy of OCR is greatly influenced by the quality of the input images. Using high-quality images improves OCR recognition accuracy. Specifically, it is important to use a high-performance scanner and scan at an appropriate resolution.

Choosing a Scanner: Select a scanner by considering factors such as resolution, scanning speed, and supported file formats. A high-performance scanner provides clearer images, contributing to improved OCR accuracy.
Setting the Resolution: A resolution of at least 300 dpi is generally recommended. Higher resolution results in clearer character recognition and reduces the risk of misinterpretation. However, excessively high resolution increases file size and may slow down processing speed, so it is necessary to find a balance.

2. Image Preprocessing: Noise Removal and Contrast Adjustment

Scanned images may contain noise and dirt. These noise factors can reduce the accuracy of OCR, so it is important to remove them through preprocessing.

Noise Removal: Using image editing software or dedicated preprocessing tools, noise and dirt within the image are removed. This makes it easier for OCR to recognize characters accurately.
Contrast Adjustment: By adjusting the contrast between the characters and the background, the characters stand out more clearly, improving recognition accuracy. In particular, if there are spots or shadows on the background, increasing the contrast can improve the visibility of the characters.Choosing the Right OCR Software: Leveraging AI OCR

3. There Are Various Types of OCR Software

For non-standard forms, it is important to select the appropriate software, such as AI OCR, tailored to the type and purpose of the form, rather than using a simple OCR software.

AI OCR: AI OCR can achieve high-accuracy character recognition even for non-standard forms, as it learns the characteristics of characters from large volumes of data. By leveraging machine learning algorithms, it can flexibly handle complex layouts and various formats.
Cloud-based OCR: Cloud-based OCR services are also an option. Since they can be accessed via the internet, they help reduce implementation costs and offer good scalability. However, it is important to check the reliability of the service from the perspective of security and data privacy before using it.

4. Utilizing Layout Recognition Features: Structuring Text

Modern OCR software includes layout recognition features. By utilizing this feature, the structure of the text can be recognized, enabling more accurate data extraction.

Extracting Tabular Data:
By using the layout recognition feature, tabular data can be extracted accurately. This allows the data in the table to be utilized as digital data while maintaining its positional relationships and structure.
Linking Item Names and Values:
By automatically linking item names and their values, the effort required for data entry is significantly reduced. This maintains data integrity and ensures smooth processing of subsequent data.

5. Dictionary Registration: Supporting Technical Terms

Some OCR software includes a dictionary registration feature. By registering technical terms or industry-specific terminology in the dictionary, the recognition accuracy can be improved.

Preventing Misrecognition:
With dictionary registration, OCR software can accurately recognize technical terms. This reduces misrecognition and improves data accuracy.
Improving Recognition Rate:
Registering technical terms in the dictionary improves the overall recognition rate of the OCR software. This is particularly effective for forms that use many industry-specific terms.

6. Creating Templates: Optimization for Form Types

When repeatedly processing the same type of form with OCR, creating templates leads to improved accuracy and reduced processing time.

Fixing Item Positions:
By using templates, the positions of each item can be fixed. This makes it easier for OCR software to recognize the text and reduces misrecognition.
Reducing Processing Time:
Using templates significantly reduces OCR processing time. With a fixed layout, the software can efficiently extract data, improving overall processing speed.

7. Human Review: The Importance of Final Confirmation

After OCR processing, it is crucial to perform a human check. While OCR software is highly accurate, it is not perfect, and there is always a possibility of misrecognition.

Correcting Misrecognition:
By performing a human check, any misrecognition made by the OCR software can be corrected. This improves the accuracy of the data.
Improving Data Accuracy:
Through final confirmation, the accuracy of the data is further enhanced. For important business data, this review process is essential to ensure reliability.

The Future of Unconventional Form OCR: Evolution Through AI Technology

With the evolution of AI technology, the accuracy of OCR is expected to continue improving in the future. In particular, deep learning technology has made a significant contribution to improving the accuracy of OCR.

Advancements in Handwritten Character Recognition:
Deep learning has dramatically improved the accuracy of handwritten character recognition. As a result, the digitization of handwritten forms and signatures can now be done more accurately and efficiently.
Enhanced Multilingual Support:
OCR software using deep learning has strengthened multilingual support. This makes it easier for globally expanding businesses to process multilingual documents, which is expected to improve the efficiency of international operations.

Conclusion: Maximizing the Use of Unconventional Form OCR

Unconventional form OCR is a powerful tool for achieving business efficiency and cost reduction. By implementing the 7 tips for improving accuracy, you can maximize the effectiveness of OCR. With the evolution of AI technology, OCR will continue to evolve, and it is expected to offer even higher accuracy and flexibility in the future. By adopting these strategies, accelerate the digitalization of your business.