IMAGE BASED HUMAN-COMPUTER INTERACTION METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM

(19)

(11)

EP 4 495 804 A3

(12)	EUROPEAN PATENT APPLICATION

(88)	Date of publication A3:
	05.03.2025 Bulletin 2025/10

(43)	Date of publication A2:
	22.01.2025 Bulletin 2025/04

(21)	Application number: 24194902.3

(22)	Date of filing: 16.08.2024

(51)

International Patent Classification (IPC):

G06F 16/583^(2019.01)
G06V 30/418^(2022.01)

G06V 30/414^(2022.01)

(52)	Cooperative Patent Classification (CPC):
	G06V 30/418; G06V 30/414; G06F 16/583

(84)	Designated Contracting States:
	AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR
	Designated Extension States:
	BA
	Designated Validation States:
	GE KH MA MD TN

(30)

Priority:

15.03.2024 CN 202410302732

(71)	Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.
	Beijing 100085 (CN)

(72)	Inventors:
	WANG, Haiwei Beijing, 100085 (CN) ZHANG, Zhongwen Beijing, 100085 (CN) LI, Gang Beijing, 100085 (CN)

(74)	Representative: J A Kemp LLP
	80 Turnmill Street London EC1M 5QU London EC1M 5QU (GB)

(54)	IMAGE BASED HUMAN-COMPUTER INTERACTION METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM

(57) The present disclosure provides an image based human-computer interaction method and apparatus, a device, and a storage medium, which relates to the field of artificial intelligence and, in particular, to the field of image processing. A specific implementation solution is as follows: acquiring a to-be-analyzed image, and determining image layout information and image content information of the to-be-analyzed image, where the to-be-analyzed image includes a variety of modal data, the image layout information represents distribution of image elements with preset granularity in the to-be-analyzed image, and the image content information represents a content expressed by the modal data in the to-be-analyzed image; and determining, in response to acquiring question information, response information corresponding to the question information according to the image layout information and the image content information, where the question information represents a question proposed by a user for the to-be-analyzed image, and the response information represents a reply answer corresponding to the question information. By extracting layout information and content information from an image, the accuracy of answering a question and user experience of human-computer interaction are improved.

Search report

Search report