CN114359920B

CN114359920B - Image processing method, device, equipment and storage medium

Info

Publication number: CN114359920B
Application number: CN202011065951.9A
Authority: CN
Inventors: 王倩; 林彬彬; 邓佳康
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2025-07-29
Anticipated expiration: 2040-09-30
Also published as: CN114359920A

Abstract

The application discloses an image processing method, an image processing device, image processing equipment and a storage medium, wherein the method comprises the steps of identifying the image content of N images; when the image content of the N images is identified to contain document materials, the document material images are intercepted from the N images to obtain M intercepted images, the M intercepted images are spliced, and the spliced file is output in an electronic document format. According to the scheme provided by the embodiment of the application, the document data images are intercepted from the images containing the document data, the intercepted document data images are spliced, and the spliced file is output in the electronic document format, so that the time for sorting documents such as PPT or courseware is effectively reduced, and the efficiency is improved.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present invention relates generally to the field of image technology, and in particular, to an image processing method, apparatus, device, and storage medium.

Background

With the development of technology, at present, in meetings, training and teaching, modes of applying other document materials such as PPT or courseware are very popular, and the mode of applying documents such as PPT or courseware is adopted to carry out lecture so as to bring convenience to lecturers, so that the inefficiency of writing on a whiteboard or a blackboard in real time during lecture can be avoided, but inconvenience is brought to students, and the real-time writing time of the lecturers is saved when the mode of applying documents such as PPT or courseware is adopted, so that the lecture speed is relatively high, and the students can not take notes.

At present, most listeners record the contents of documents such as PPT or courseware by adopting a video recording or shooting mode, and after the lecture is finished, the documents such as PPT or courseware are arranged, so that the mode has lower efficiency.

Disclosure of Invention

In view of the foregoing drawbacks or shortcomings in the prior art, it is desirable to provide an image processing method, apparatus, device, and storage medium.

In a first aspect, the present application provides an image processing method, the method comprising:

Identifying image content of the N images;

When the image content of the N images is identified to contain document data, intercepting the document data images from the N images to obtain M intercepted images;

splicing the M intercepted images, and outputting a spliced file in an electronic document format;

Wherein N is a positive integer, and M is a positive integer less than or equal to N.

In one embodiment, the image is a video frame image;

before identifying the image content of the N images, further comprising:

Acquiring a mark point of a mark record in a target video;

and determining a video frame image corresponding to the mark point in the target video according to the mark point.

In one embodiment, before obtaining the mark point recorded in the mark record in the target video, the method further includes:

receiving a mark input on a target video in the recording process or the playing process of the target video;

marking a mark point in a corresponding video frame image in the target video in response to the mark input;

Wherein each marking point corresponds to a video frame image.

In one embodiment, stitching the truncated images includes:

Acquiring a playing time sequence of document material images corresponding to M intercepted images in a target video;

Determining a first splicing sequence of the M Zhang Jiequ images according to the playing time sequence;

and splicing M intercepted images according to a first splicing sequence.

In one embodiment, stitching the truncated images includes:

Determining a document page number of a document material image corresponding to the M Zhang Jiequ image;

determining a second splicing sequence of the M Zhang Jiequ images according to the page numbers of the documents;

and splicing M intercepted images according to a second splicing sequence.

In one embodiment, the step of capturing the document data image from the N images to obtain M captured images comprises:

In the case where the same document material image exists in the document material images of the N image cuts, one of the same document material images is taken as one cut image.

In one embodiment, when any boundary of the document material contained in the image content is identified, an included angle exists between the boundary corresponding to the image in which the document material is located, and the included angle is larger than an included angle threshold value,

Intercepting document material images from N images, including:

And performing perspective correction clipping on the document data.

In one embodiment, the electronic document format includes any one of a presentation file format, a PDF format, a rich text format, a word format, and a text editing system document format.

In one embodiment, the documentation includes any one of PPT documentation and courseware documentation.

In a second aspect, the present application provides an image processing apparatus comprising:

The identification module is used for identifying the image content of the N images;

the intercepting module is used for intercepting document data images from the N images to obtain M intercepted images when the image contents of the N images are identified to contain document data;

and the output module is used for splicing the M intercepted images and outputting the spliced file in an electronic document format.

In a third aspect, the present application provides an apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the image processing method as in the first aspect when executing the program.

In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method as in the first aspect.

According to the technical scheme provided by the embodiment of the application, the document data images are intercepted from the images containing the document data, the intercepted document data images are spliced, and the spliced file is output in an electronic document format, so that the time for sorting documents such as PPT or courseware is effectively reduced, and the efficiency is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present invention;

Fig. 2 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the application are shown in the drawings.

In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the described embodiments of the application may be implemented in other sequences than those illustrated or otherwise described herein.

Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules that are expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

At present, in meetings, training and teaching, modes of applying other document materials such as PPT or courseware are very popular, the mode of applying documents such as PPT or courseware is adopted to carry out lecture, convenience is brought to a lecturer, the inefficiency of writing on a whiteboard or a blackboard in real time during lecture can be avoided, and the time of writing in real time of the lecturer is saved when the mode of applying documents such as PPT or courseware is adopted, so that the lecture speed is relatively high, and a lecturer can not take notes.

At present, most listeners record the contents of documents such as PPT or courseware by adopting a video recording or shooting mode, and after the lecture is finished, the PPT or courseware is arranged, so that the mode has lower efficiency.

Based on the above problems, the application is expected to provide an image processing method, which has high efficiency and high user satisfaction when finishing the document materials such as PPT or courseware recorded in a video recording or photographing mode.

The method can be applied to terminal equipment provided with a camera, wherein the terminal equipment can be a mobile phone, a tablet computer, a notebook computer, an intelligent helmet, intelligent glasses, a telephone watch and the like.

It should be noted that, in the image processing method provided in the embodiment of the present invention, the execution body may be an image processing apparatus, and the image processing apparatus may be implemented as part or all of the terminal device by software, hardware, or a combination of software and hardware. In the following method embodiments, the execution subject is a terminal device.

Referring to fig. 1, a flowchart of an image processing method according to an embodiment of the present application is shown.

As shown in fig. 1, an image processing method may include:

s110, identifying the image content of the N images.

Specifically, the image may be a video frame image (i.e., a frame image corresponding to a certain frame in a video), or may be a picture image (e.g., a photograph taken by a camera, a screen capturing image, etc.), or the like. The image may be obtained directly from a terminal device that records a video or a picture, or may be obtained from a storage device that stores the recorded video or picture, or may be obtained by obtaining the image by downloading, and the form and the obtaining mode of the image are not limited.

Identifying the image content of the image may be accomplished by training the neural network. Identification may also be made in other ways.

If the image is a picture image, the obtained picture image is directly input into a neural network model, and then the image content identification of the image can be completed.

If the image is a video frame image, the acquired video needs to be processed to obtain the video frame image.

In one embodiment, the image is a video frame image, and prior to identifying the image content image of the N images, the method further comprises:

Acquiring a mark point of a mark record in a target video;

Specifically, the target video is a video recorded by the user, a stored video, a video obtained by downloading, or the like, which has a mark point recorded by a mark. The marking points of the marking record can be input by a user, or can be input by terminal equipment and the like, wherein the number of the marking points is N, and N is a positive integer. The video frame image may be any frame image in the images, in this embodiment, the video frame image is a video frame image corresponding to a mark point in the target video, and N mark points are marked, that is, N Zhang Shipin frame images are corresponding.

In one embodiment, before obtaining the mark point of the mark record in the target video, the method further includes:

Wherein each marking point corresponds to a video frame image.

Specifically, when a user records or plays a video, the user marks a mark point on the video when recording or playing the video according to actual needs. When marking the mark points, automatic marking can be carried out at intervals of preset time, and the preset time can be set according to actual needs. It can be understood that if the preset time length is set too large, namely, the mark points are marked for a long time interval, the images containing the document materials in the image content can be missed to be marked, and if the preset time length is set too small, namely, the mark points are marked for a short time interval, the images containing the same document materials in the image content can be repeated for a plurality of times, and when the images are identified, the images to be identified are more and the time consumption is long. The preset time period can be set according to learning training of the neural network model.

The marking point can be manually marked when a user turns pages according to the PPT of a lecture or courseware and the like while recording or playing the video.

The marking point can also be judged in real time by adopting an algorithm for judging whether the document materials contained in the image contents of the images of the adjacent frames are changed, if so, the marking point can be automatically marked, or a popup window can be used for inquiring whether the user needs marking or not, and the user selects whether to mark the marking point or not according to the actual requirement. It should be noted that, the manner of marking the points on the video may also be other manners, which are not limited herein.

After the target video and the marked points recorded in the target video are obtained, the marked video frame image can be determined according to the marked points in the target video. When the image content of the image is identified, the neural network model can be input to all marked video frame images, and the determination of whether the image content of the video frame images contains document materials can be completed.

S120, when the image content of the N images is identified to contain the document materials, the document material images are intercepted from the N images, and M intercepted images are obtained.

Specifically, when it is identified that the image content of the image includes document data, the image may be too dark or overexposed due to the recorded environmental factors, and at this time, the image that is too dark or overexposed needs to be processed first to the brightness normal range, and the processing may be performed by adopting the prior art, which is not described herein. Alternatively, the documentation may include any of PPT documents, courseware documents, and the like.

And detecting the boundary of the document material in the processed image according to the boundary recognition technology, and cutting the image according to the detected boundary. It can be understood that, in order to make the cut document data image attractive, when the image is cut according to the detected boundary, the surrounding boundaries can be all extended outwards (i.e. the left boundary extends leftwards, the right boundary extends rightwards, the upper boundary extends upwards, and the lower boundary extends downwards) for a preset length, and the preset lengths of the four directions can be equal or unequal, and can be set according to actual requirements.

In one embodiment, in the step of capturing document material images from N images to obtain M captured images:

Specifically, since the document data images intercepted from the N images may have the same document data image, it may be determined whether the document data included in the image content of the intercepted images has the same document data, and if so, one of the corresponding intercepted images in the same document data is retained, and the other corresponding intercepted images in the same document data are all rejected.

When judging whether the same document materials exist in the image contents of the intercepted images, a comparison algorithm of texts in the images can be adopted to compare the document materials contained in all the intercepted images.

From the above, because the same document data image exists in the intercepted document data image, the number of intercepted images obtained by interception may be smaller than or equal to the number of images, that is, the number M of screenshot images obtained is a positive integer smaller than or equal to the number N of images.

When video is recorded, the video is not normally recorded against a screen, that is, the document materials contained in the recorded video are inclined (wherein inclination refers to any boundary of the document materials, and an included angle is formed between the boundary corresponding to an image where the document materials are located, and the included angle is larger than an included angle threshold). Therefore, when document materials are intercepted, it is necessary to process them by using a tilt detection and correction method. That is, the document is first detected and if the document is tilted, the document needs to be corrected. Examples of the tilt detection method that is generally used include a text line-based detection method, a projection contour analysis method, and a Hough transform method.

In one embodiment, when any boundary of the document materials contained in the image content is identified, an included angle exists between the boundary corresponding to the image where the document materials are located, and the included angle is larger than an included angle threshold value, the document materials in the image are intercepted, and perspective correction clipping is carried out on the document materials.

Specifically, perspective correction clipping is performed on the document data, that is, the included angles between all the boundaries of the document data and the boundaries corresponding to the image where the document data is located are corrected to be within the included angle threshold, which may be a Photoshop technique, a distorted document image restoration technique, or other techniques, without limitation.

The included angle threshold may be set according to actual requirements, and exemplary, the included angle threshold may be set to 5 °.

S130, splicing the intercepted images, and outputting the spliced file in an electronic document format.

Specifically, the cut-out image is a document data image cut out from the image, the cut-out image is spliced, the cut-out image can be spliced to obtain a spliced image, the spliced image is output into a spliced file in an electronic document format, the cut-out image can be input into a word document, a PPT document, a PDF document or the like, the cut-out image is spliced in any document, or each cut-out image is respectively used as one page in the document in any document, and then the spliced image is uniformly output into the spliced file in the electronic document format. After outputting the spliced file in the electronic document format, a path for storing the file can be sent to the user, and the stored file can be found in the file manager.

The electronic document Format may be set according to actual needs of the user, and optionally, the electronic document Format may include any one of a presentation file Format, a PDF (Portable Document Format ) Format, a Rich Text Format (RTF) Format, a word Format, and a Text editing system document (Word Processing System, WPS) Format. The method can also display the image format such as a message Excel workbook format, a webpage format, an MHT file format and the like.

It will be appreciated that when a lecturer plays a lecture, there is often a document such as PPT or courseware that has been played back before, and in this case, the video frame image corresponding to the mark point of the person recording the video or the photo taken by the person taking the photo may contain the same content as the video frame image corresponding to the previous mark point or the photo taken. If all the intercepted images are spliced directly, the spliced page number possibly appearing in the obtained spliced image does not correspond to the page number of the original documents such as the PPT or courseware and the like, and the spliced image contains repeated contents. Therefore, at the time of stitching, it is necessary to sort the cut images.

In one embodiment, stitching the truncated images includes:

and splicing M intercepted images according to a first splicing sequence.

Specifically, the playing time sequence of the document data image in the target video is related to the time of marking the marking point in the target video, the playing time sequence corresponding to the marking point marked first is before, and the playing time sequence corresponding to the marking point marked later is after, namely, the playing time sequence of the document data image in the target image is the time sequence of marking the marking point.

The first splicing sequence is the display sequence of the intercepted images in the output spliced file, and is consistent with the playing time sequence and is the time sequence when marking the mark points. And splicing M cut images according to the first splicing sequence.

In one embodiment, stitching the truncated images includes:

and splicing M intercepted images according to the second splicing sequence.

Specifically, in general, the position of the page number in the document material such as PPT or courseware may be set at the top of the page number or the left, middle and right positions of the bottom of the page number, the position of the possible page number in the intercepted image is detected, the page number of the document material image is determined, and the document page number of the M Zhang Jiequ image is determined according to the page number of the document material image.

The second splicing sequence is the display sequence of the intercepted images in the output spliced file, and is consistent with the page sequence of document materials such as PPT or courseware. And splicing M cut images according to the second splicing sequence.

In the embodiment of the application, when the image content of the N images contains the document data, the document data images are intercepted from the N images to obtain M intercepted images, the M intercepted images are spliced, and the spliced file is output in an electronic document format, so that the time for a user to sort documents such as PPT or courseware can be reduced, and the efficiency is improved.

The following describes an image processing method according to the embodiment of the present application by taking recording tag (marker) video as an example.

After the recording is finished, opening a tag video album on a mobile phone, displaying a viewing tag entry, clicking the viewing tag entry, expanding the viewing tag to be the time number of the video frame image corresponding to each tag point, identifying the video frame image corresponding to each tag point to determine whether file data is contained in the video frame image, displaying a file export button on the interface of the mobile phone album when the file data is identified to be contained in the video frame image, clicking the file export button, intercepting and splicing the files in the video, performing perspective correction cutting on pages to be corrected, determining the sequence of the file data based on the page, time and other information of the video frame image, judging whether the same file data exists or not, if so, carrying out duplication removing processing, exporting and storing the file data in a PDF format file in the video, and prompting a storage path of the user file, wherein the file can be found in a file manager.

Fig. 2 is a schematic structural diagram of an image processing apparatus 200 according to an embodiment of the present application. As shown in fig. 2, the apparatus may implement the method shown in fig. 1, and the apparatus may include:

an identification module 210 for identifying image contents of the N images;

a capturing module 220, configured to, when it is identified that the image content of the N images includes document data, capture a document data image from the N images, and obtain M captured images;

And the output module 230 is used for splicing the M intercepted images and outputting the spliced file in an electronic document format.

Optionally, the image is a video frame image, and the apparatus further includes:

the first acquisition module is used for acquiring the target video and mark points recorded in the target video;

And the determining module is used for determining the video frame image corresponding to the mark point in the target video according to the mark point.

Optionally, the apparatus further comprises:

the input receiving module is used for receiving the mark input on the target video in the recording process or the playing process of the target video;

the response module is used for responding to the marking input and marking the marking points in the corresponding video frame images in the target video;

Wherein each marking point corresponds to a video frame image.

Optionally, the output module 230 is further configured to:

and splicing M intercepted images according to a first splicing sequence.

Optionally, the output module 230 is further configured to:

and splicing M intercepted images according to a second splicing sequence.

Optionally, the interception module 220 is further configured to:

Optionally, when any boundary of the document material contained in the image content is identified, an included angle exists between the boundary corresponding to the image where the document material is located, and the included angle is greater than the included angle threshold, the interception module 220 is further configured to:

And performing perspective correction clipping on the document data.

Optionally, the electronic document format includes any one of a presentation file format, a PDF format, a rich text format, a word format, and a text editing system document format.

Optionally, the document material includes any one of PPT document and courseware document.

The image processing device provided in this embodiment may execute the embodiment of the method, and its implementation principle and technical effects are similar, and will not be described herein.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 3, a schematic structural diagram of an electronic device 300 suitable for use in implementing embodiments of the present application is shown.

As shown in fig. 3, the electronic device 300 includes a Central Processing Unit (CPU) 301 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the device 300 are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other through a bus 304. An input/output (I/O) interface 306 is also connected to bus 304.

Connected to the I/O interface 305 are an input section 306 including a keyboard, a mouse, and the like, an output section 307 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like, a storage section 308 including a hard disk, and the like, and a communication section 309 including a network interface card such as a LAN card, a modem, and the like. The communication section 309 performs communication processing via a network such as the internet. The driver 310 is also connected to the I/O interface 306 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 310 as needed, so that a computer program read therefrom is installed into the storage section 308 as needed.

In particular, according to embodiments of the present disclosure, the process described above with reference to fig. 1 may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the above-described image processing method. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 309, and/or installed from the removable medium 311.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules involved in the embodiments of the present application may be implemented in software or in hardware. The described units or modules may also be provided in a processor. The names of these units or modules do not in some way constitute a limitation of the unit or module itself.

In another aspect, the present application also provides a storage medium, which may be a storage medium included in the foregoing apparatus in the foregoing embodiment, or may be a storage medium that exists alone and is not assembled into a device. The storage medium stores one or more programs for use by one or more processors in performing the image processing methods described in the present application.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. An image processing method, comprising:

Identifying the image content of N images, wherein the images are at least one of video frame images or picture images;

When the image content of the N images is identified to contain document materials, intercepting the document material images from the N images to obtain M intercepted images;

splicing the M Zhang Jiequ images, and outputting a spliced file in an electronic document format;

Wherein N is a positive integer, M is a positive integer less than or equal to N;

When the image is a video frame image, the method further comprises, prior to the identifying the image content of the N images:

Determining the video frame image corresponding to the mark point in the target video according to the mark point, wherein the target video is video with the mark point recorded by a mark, such as video recorded by a user, stored video or video obtained by downloading, and each mark point corresponds to one video frame image;

Before the mark points recorded in the mark records in the target video are obtained, the method further comprises the following steps:

Receiving a mark input on the target video in the recording process or the playing process of the target video;

marking a mark point in the corresponding video frame image in the target video in response to the mark input;

the generation process of the mark input comprises the following steps:

And acquiring the change condition of the document materials contained in the image content of the adjacent video frame images, and generating the mark input in an automatic mark mode or a popup window generating and inquiring mode of a user if the document materials contained in the image content of the adjacent video frame images are changed.

2. The method of claim 1, wherein stitching the M Zhang Jiequ images comprises:

acquiring a document data image corresponding to the M Zhang Jiequ image and a playing time sequence in the target video;

and splicing the M Zhang Jiequ images according to the first splicing sequence.

3. The method of claim 1, wherein stitching the M Zhang Jiequ images comprises:

Determining a second splicing sequence of the M Zhang Jiequ images according to the document page number;

and splicing the M Zhang Jiequ images according to the second splicing sequence.

4. The method of claim 1, wherein the step of capturing document material images from said N images to obtain M captured images comprises:

in the case where the same document material image exists in the document material images of the N image capturing, one of the same document material images is taken as one of the capturing images.

5. The method according to claim 1, wherein when any one of the boundaries of the document material contained in the image content is identified, an included angle exists between the boundaries corresponding to the image in which the document material is located, and the included angle is larger than an included angle threshold,

The capturing document material images from the N images includes:

and performing perspective correction clipping on the document data.

6. The method of claim 1, wherein the electronic document format comprises any one of a presentation file format, a PDF format, a rich text format, a word format, and a text editing system document format.

7. The method of claim 1, wherein the documentation includes any one of PPT documentation and courseware documentation.

8. An image processing apparatus, comprising:

The image recognition module is used for recognizing the image content of N images, wherein the images are at least one of video frame images or picture images;

the output module is used for splicing the M Zhang Jiequ images and outputting spliced files in an electronic document format;

When the image is a video frame image, the apparatus is further configured to, prior to the identifying the image content of the N images:

the generation process of the mark input comprises the following steps:

9. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the image processing method of any of claims 1-7 when the program is executed by the processor.

10. A readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the image processing method according to any one of claims 1-7.