(19)
(11)EP 2 525 587 B1

(12)EUROPEAN PATENT SPECIFICATION

(45)Mention of the grant of the patent:
05.07.2017 Bulletin 2017/27

(21)Application number: 11305593.3

(22)Date of filing:  17.05.2011
(51)International Patent Classification (IPC): 
H04N 21/442(2011.01)
H04N 21/6373(2011.01)
H04L 29/06(2006.01)

(54)

Method for streaming video content, node in a network for monitoring video content streaming

Verfahren zum Streaming von Videoinhalt, Knoten in einem Netzwerk zur Überwachung des Streaming von Videoinhalt

Procédé de diffusion en continu d'un contenu vidéo, noeud dans un réseau pour surveiller la diffusion en continu d'un contenu vidéo


(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(43)Date of publication of application:
21.11.2012 Bulletin 2012/47

(73)Proprietor: ALCATEL LUCENT
92100 Boulogne-Billancourt (FR)

(72)Inventors:
  • Huysegems, Raf
    2800 Walem (BE)
  • De Vleeschauwer, Bart
    2970 Schilde (BE)

(74)Representative: ALU Antw Patent Attorneys 
Copernicuslaan 50
2018 Antwerpen
2018 Antwerpen (BE)


(56)References cited: : 
WO-A1-2011/047335
US-A1- 2010 121 974
US-A1- 2004 049 576
  
      
    Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


    Description

    Field of the Invention



    [0001] The present invention relates to the field of networked video streaming services, in particular video streaming services offered over the Hypertext Transfer Protocol (HTTP) such as HTTP adaptive streaming (HAS).

    Background



    [0002] In one typical implementation, an Internet-based video streaming service is offered over the HTTP protocol. As the quality of service of the Internet as a transport network is substantially "best effort", protocols have been devised that take advantage to the maximal extent of the bandwidth available between a server and a client at any given time, by dynamically switching between different levels of video quality for the streamed content. HTTP adaptive streaming is an example.

    [0003] Accordingly, at a time when the available bandwidth is high, for instance due to a decreased level of overall network traffic, it is advantageous to stream video encoded at a relatively high quality, representing graphics with a high data rate and/or resolution and/or a high frame rate. Similarly, at a time when the available bandwidth is low, for instance due to an increased level of overall network traffic, it is advantageous to stream video encoded at a relatively low data rate, representing graphics with a low resolution and/or a low frame rate.

    [0004] HTTP Adaptive streaming (HAS) is an emerging technique for the delivery of video. It is supported by industry leaders such as Microsoft (Smooth-streaming) and Apple (Live-streaming). One of the advantages of HAS lies in the fact that the existing infrastructure for HTTP web-content (including HTTP servers and proxies, CDNs, ...) can be reused for video distribution.

    [0005] Despite the growing popularity of HAS as a novel, improved video delivery solution, it is currently impossible for providers (content-provider, ISPs, CDN-provider) to track the delivered quality to the consumer.

    [0006] WO 2011/047335 discloses an adaptive media stream manager at the user for streaming data from a server using HAS. A parameter at the user, such as media buffer or network bandwidth, is monitored and a future value for that parameter at the user is predicted. The media stream manager can adapt the request for a segment based on the future parameter.

    [0007] US 2004/049576 discloses a method for session reconstruction.

    Summary of the Invention



    [0008] According to an aspect of the present invention, there is provided a method for streaming video content from a server to a client over a channel via a network. The method includes the server offering said video content as a set of consecutive fragments, each fragment of said set of consecutive fragments being offered in a plurality of quality levels corresponding to respective encoded data rates. The client is able to receive and display the fragments, even if consecutive fragments have different quality levels. The fragments are requested and received as part of a session during the method. The session according to the method comprises transmitting one or more requests for fragments with target quality levels of said video content to be displayed at said client. The requests are sent over the network from said client to said server. Further the method comprises receiving over the network one or more replies to the requests at said client. The replies can contain the requested fragments with target quality levels. Further the one or more received fragments are displayed at said client. An embodiment of the method is HTTP-adaptive streaming.

    [0009] According to an embodiment the method further comprises in the network capturing requests and/or replies to the requests of the session. A node connecting the server with the client captures the session data. The method further comprises reconstructing at least part of the session. This will allow obtaining parameters regarding the session as experienced at the client. The parameters relating to the reconstructed session can relate to quality of service, quality of experience. Reconstructing a session can relate to session variables/parameters only, not so much to the actual content. In this application reconstructing results in an image, possibly not an exact copy, of a session without direct feedback from the actual session at the client.

    [0010] Accordingly session parameters are reconstructed and can be used to improve services or monitor the results of a service.

    [0011] The adaptive media stream manager of WO 2011/047335 is located at the client and monitors actual values happening at the client. The quality of the session is not reconstructed.

    [0012] In an embodiment of the present invention reconstructing at least part of the session as experienced at the client can comprise reconstructing selected fragment-qualities, buffer filling at the client, and user interactions (e.g. pause/resume, video cursor repositioning etc.). These parameters relate to the actual playing and displaying of the video during the session.

    [0013] As an example reconstructing user interaction at the client is described. Even though requests sent by the client contain no indication for interaction by the user with the client, from the requests and the behavior of the client, it is possible to deduce and thereby reconstruct user interaction with the client, e.g. pausing. If after the possible pause-event, the client shows a request/reply pattern that is typical for a steady-state condition of the client where the client buffer is completely filled, it can be deduced that the user paused playback.

    [0014] The features of the invention allow, once the reconstructed session is available, calculating the possible (HAS) artifacts from the reconstructed session. A list of artifacts includes picture freeze, quality decrease, quality variation and interactivity delay, explained in more detail below. These artifacts allow quantifying the user-experience. Any of these artifacts can be part of the reconstruction according to the invention.

    [0015] The requests can be captured physically at the client or at the server or at any intermediate point between the client and the server.

    [0016] In an embodiment, the method of the present invention further comprises measuring the available channel data rate at said client, and the target quality level is further selected as a function of the available channel data rate. In an embodiment the transfer time for transferring the fragment is taken into account. In an embodiment the buffer-filling is a parameter for selecting the target quality.

    [0017] In an embodiment of the method of the present invention, the transmitting of the request comprises sending an HTTP request datagram. In a particular embodiment, the target quality level is indicated by at least one uniform resource identifier. In a particular embodiment, the target quality level is indicated by a parameter. In an embodiment the request is a HTTP-GET comprising at least a capture time and a fragment quality level.

    [0018] In an embodiment reconstructing at least part of the session as experienced at the client comprises extrapolating at least one parameter of the reconstructed session. Although it is preferred that all requests and replies are captured as part of the method, caching, in particular local caching, can result in requests and replies not receiving the capturing point/node. The missing session information is reconstructed by making an interpolation between the latest received session information from before the interruption and after.

    [0019] In an embodiment the method comprises transmitting over the network the reconstructed session. The reconstructed session can be provided to a provider (content-provider, ISP's, CDN-provider) to track the delivered quality to the consumer.

    [0020] According to an aspect of the present invention, there is provided a node in a network for streaming video content as a set of consecutive fragments from a server to a client via the node in a session. The session will be displayed at the client. The client will see the consecutive fragments. Each fragment of said set of consecutive fragments is being offered in a plurality of quality levels. A session can comprise consecutive fragments of different quality levels. During the session requests for fragments with target quality levels are sent from the client to the server and replies upon the requests containing one or more fragments with target quality level are sent to the client.

    [0021] In an embodiment said node comprises at least a capturing device for capturing at least a part of said session and at least a reconstruction device for reconstructing part of said session as experienced (QoE) at the client. By capturing part of the session at a node details regarding the quality of the session are made available without adapting the player at the client. The insight of the invention is to use the captured session parts to reconstruct the session experience at the user. The raw requests and replies data are no indication for the quality of service/experience of the session at the client, nor for the experience at the client. Reconstruction however can be used to rebuild the session at the client using the session data captured at the node.

    [0022] Timing of the requests/replies is taken into account as an additional variable on top of the content of the requests themselves.

    [0023] In an embodiment of the present invention the reconstruction device is arranged to reconstruct user interactions at the client. By analyzing the requests and/or replies and the content, e.g. the requested quality level of the fragment, it is possible to deduct from the requested fragments the pausing of playout at the client.

    [0024] In an embodiment the reconstruction device is arranged to reconstruct buffer filling and picture freezes at the client during the session. Buffer filling and picture freeze are important session qualities and session artifacts respectively. Reconstruction of the buffer filling is obtained by taking into account the content of the requests and replies and the timing of the requests and replies. Further client properties, such as player characteristics can be taken into account.

    [0025] Other session qualities and artifacts can also be reconstructed.

    [0026] The node can further comprise a transmitting device for transmitting the reconstructed session over the network. This allows making available the collected information.

    [0027] According to yet another aspect a network for streaming video content as a set of consecutive fragments from a server in a session upon request from a client, wherein each fragment of said set of consecutive fragments is being offered in a plurality of quality levels corresponding to respective encoded data rates, over a channel. The network comprises a node according to any of the embodiment described herein.

    [0028] "session as experienced by the client" comprises at least some quality of service parameters relating to the perception of the received session data.

    [0029] "user interaction" is defined herein as any form of rendering of streamed video content other than normal-speed playback.

    Brief Description of the Figures



    [0030] Some embodiments of apparatus and/or methods in accordance with embodiments of the present invention are now described, by way of example only, and with reference to the accompanying drawings, in which:

    Figure 1 presents a flow chart of an embodiment of the method of the present invention;

    Figure 2 presents a schematic diagram of network of the present invention;

    Figure 3 is a schematic diagram of an embodiment of the present invention; and

    Figures 4-5 presents diagrams showing reconstructions of session parameters using the method according to the invention.


    Description of Embodiments



    [0031] In the following description, interaction between a client and a server is assumed, which client and server are defined in the usual way for hosts that participate in a network protocol with distinct roles.

    [0032] The skilled person will appreciate that actions be ascribed to the "client" 200 may be carried out by any combination of hardware and/or software configured to interact with the server, and such actions may or may not be explicitly initiated by a human operator of the client equipment.

    [0033] Likewise, actions ascribed to the "server" 270 may be carried out by any combination of hardware and/or software configured to interact with the server, and in particular by server equipment comprised in a content distribution network (CDN) 261 or storage area network (SAN) or an implementation of an HTTP proxy.

    [0034] To allow dynamic switching between video encoding rates, video servers 270 generate separate video files (e.g., video clips or chunks encoded as files according to a file format such as MP4) for different time-fragments of the streamed contents, each of which is offered to the client in a variety of qualities, i.e. encoded at different data rates.

    [0035] In an embodiment the client 200 assesses the available bandwidth of the downstream link from the server 270,261 to the client 200 from time to time or on a continuous basis, and requests 130 the appropriate version of the next required video fragment in accordance with that assessment. According to one scheme, the client requests 130 the version encoded at the highest data rate that is inferior to the presently measured downstream data rate. According to another scheme, the client takes into account statistical information about channel bandwidth fluctuations to determine the most appropriate encoding rate.

    [0036] The different fragments to be selected by the client according to the above schemes may be addressed by means of dedicated Uniform Resource Identifiers (URIs). In this manner, the client simply has to select the appropriate URI for each new fragment. The available versions and their URIs are documented by the server along with associated metadata in a playlist or "manifest" file.

    [0037] Combined with CDN caches 261, which provide caching and distribution of the chunks and thus offload the origin HTTP-server, adaptive streaming provides for a smooth video rendering experience in a scalable way, over best-effort internet.

    [0038] One of the embodiments of the invention uses HTTP adaptive streaming(HAS). In HAS content is encoded in several bit rates and fragmented in pieces of typically a number of seconds. Information describing the different fragments involved in the playout and the available quality levels is contained in a so called manifest file. This manifest file is available at the HAS-server/proxy. Before the playout and download of frag ents starts, the manifest file is retrieved by the client.

    [0039] To avoid picture freezes, the client software will apply a playout buffer that can range from 7 seconds for (near)-live HAS streaming to more than 30 seconds for on-demand video delivery. Based on a number of criteria such as the actual buffer filling and the measured transfer-time of the requested fragments, the client software will decide the quality level of the next fragment that must be downloaded.

    [0040] In the existing approach for QoE monitoring, a capture file or real-time captured information is used to detect generic network impairments (delay, loss, jitter, packet bursts ...). Based on these observed impairments, a prediction is made about the visual artifacts that will be perceived by the user (picture blocks, freezes, out of sync, loss of detail...). Depending on these artifacts, a quality-score can be defined for the entire session. Due to the video compression in the stream, this method turns out to be very complex and error prone as the impact of a packet loss is for example heavily depending on the frame-type (I/P/B), the amount of motion in the scene, the used motion-vectors in the compression.

    [0041] It is an object of embodiments of the invention to simplify the prior art method.

    [0042] The invention is at least partially based on the insight of using a capture file (or real-time capture) to reconstruct the entire (HAS) session, including selected fragment-qualities, buffer filling at the (HAS-)client, and user interactions (e.g. pause/resume, video cursor repositioning etc.).

    [0043] The technique of (HAS-)session reconstruction offers a number of new possibilities. ISPs, CDN-providers and content-provider can have a clear and objective view on the quality of each (HAS) delivery. The early detection of faulty (HAS) deliveries can avoid customer churn. Aggregated HAS measurements will enable providers to monitor, benchmark and troubleshoot the HAS delivery capabilities of the network. In an advanced embodiment the method and node according to the invention can be used to substantially reduce the amount of picture freezes and interactivity delay for consumers, improving the average QoE experience.

    [0044] In embodiments of the invention, the insights of the invention are combined with the known video streaming techniques.

    [0045] An embodiment of the method of the present invention will now be described in more detail with reference to Figure 1. In step 110 some local and/or network parameters can be assessed, which may be performed on a continuous basis, or intermittently, for instance once before every request for a fragment. A network parameter can be the available bandwidth. A local parameter may be the buffer filling. Different parameters can be assessed in combination. Also the desired playback mode as indicated by the user can be taken into account.

    [0046] The information from step 110 is combined in step 130, resulting in generating a request for a fragment with a quality level that corresponds with parameters taken into account in step 110.

    [0047] As a result of the request, the appropriate fragment is received at step 140. The fragment is then displayed by the client at step 150. In streaming the requests to receive further fragments can be sent e.g. before displaying (received) earlier fragments.

    [0048] An embodiment of the apparatus of the present invention will now be described in more detail with reference to Figure 2. The apparatus is client device 200, such as a set-top box, a personal computer, a mobile telephone, or similar, connected to a video server 270 via a network 260 such as the Internet. Part of the network can be CND 261.

    [0049] The access link connecting the client device 200 to the network 260 may for example be an xDSL line, a WiFi link, a 3G mobile link, a WiMAX link, or any other type of connection. The functions of the video server 270 may be fulfilled by one or more elements of a content distribution network (not shown) or a proxy element. Without loss of generality, the video server 270 is shown as having access to three versions 271, 272, 273 of a particular item of video content. Each version 271, 272, 273 comprises a set of fragments of certain duration.

    [0050] Within the client device 200, a processing device 210 estimates the available parameters, such as the data rate on the downstream channel of the network 260, linking the video server 270 to the client device 200. This assessment may involve estimating the bandwidth. The assessment may additionally of alternatively involve measuring the actual amount of time required to download a known amount of data; in particular, the assessment of the data rate may involve timing the duration of the transmission of a or each incoming video fragment.

    [0051] The client device 200 also comprises a user input means (not shown), which may be a remote control, a mouse, a keyboard, or a dedicated control panel, and the necessary drivers to interpret the signals provided by such devices. The user input means allows the user to select a desired playback mode, such as slow motion, full-speed playback, fast-forward, pause/resume, video-cursor-reposition, change viewpoint, etc. The desired playback mode can be assessed by the processing device 210.

    [0052] On the basis of information obtained from the processing device 210, the requesting agent 230 selects a target quality level from among the quality levels 271, 272, 273 offered by the video server 270. Preferably, the target quality level is the highest sustainable quality level that may be downloaded without causing congestion on the downstream link and that can be consumed - by displaying the corresponding video content - at substantially the rate at which it is downloaded. This determination can be made according to the calculations described above. Once the target quality level is determined, a request is generated to obtain the next video fragment or fragments at the selected target quality level. The request is formatted according to the requirements of the protocol supported by the server 270. Preferably, the request is an HTPP "get" request, in which a URI is specified that corresponds to one or more particular quality levels. More preferably, the request further comprises parameter that corresponds to one particular quality level among those designated by the URI, notably the selected target quality level.

    [0053] In response to the request, the video server 270 transmits the appropriate version of the requested video fragment to the client device 200 over the network 260. Using Scalable Video Coding or similar encoding techniques, a single fragment at a given quality level may be represented in multiple files, as part of the versions 271, 272, 273 of different quality levels. The skilled person will appreciate that different files pertaining to the same fragment, and files pertaining to different fragments of the same video stream, may be transmitted to the client device 200 over the network 260 from different sources, especially in network architectures that aim to optimize the usage of storage, such as content distribution networks 261. For the purpose of explaining the operation of the method of the present invention, this situation is no different than the situation with a single integrated video server 270.

    [0054] A receiving device 240 of the client device 200 receives the video fragment, and conveys it to the display 250. The display 250 performs the usual decrypting and decoding steps, and adapts the frame rate and/or resolution of the video stream in such a way that the content is displayed according to the selected playback mode. The display 250 may further comprise an actual display built into the client device 200. Alternatively, the display means 250 may comprise an interface to connect to an external display.

    [0055] As a result of the consecutive transmittal of requests, a set of consecutive fragments will be received at the client device 200.

    [0056] Displaying the received fragments will result in a certain experience by the user dependent on e.g. the quality level of the fragments. A non limiting list of examples of artifacts that can be taken into account in assessing the quality of service at the client are:
    • Picture freeze
    • Low Quality
    • Quality variation
    • Interactivity delay


    [0057] Picture freezes can be the result of an underrun of the client buffer. This can happen when there is a (temporary) mismatch between the bandwidth estimation/perception of the client and the actual available bandwidth in the network. Picture freezes can be triggered by bandwidth fluctuations or by a mixture of cached and uncached fragments.

    [0058] If the available bandwidth is low, the client will select a low(er) quality for its fragment downloads. In this way, the client will protect the viewer for picture freezes.

    [0059] For many reasons (including competing HTTP adaptive streaming clients), the perceived bandwidth per client could fluctuate. Based on the perceived bandwidth, the client adjusts the selected quality. Each time the quality is adapted, the viewing experience of the user is potentially disturbed.

    [0060] In many HAS clients, the user has the ability to reposition the video cursor or perform another type of user interaction. For most types of interaction (except pause/resume) the HAS client will have to invalidate its current fragment buffer and must fill this buffer again from scratch. Interactivity delay can be defined as the involved waiting time between the actual user-interaction and the display of the first picture of the new sequence.

    [0061] Note that besides these four types of HAS artifacts other (potential) artifacts can be internally resolved by the HAS algorithms and underlying protocols (e.g. HAS for timing and synchronization, TCP for retransmission, ...)

    [0062] Figure 3 is another representation of an embodiment of the present invention using a network 300. Network 300 is shown only schematically as a single connection between a client 301 and a server 302. The skilled man will understand that different and/or multiple connections are possible.

    [0063] The network 300 allows the transmittal of requests for fragments of target quality level as well as the transmittal of replies to the requests containing the fragment of target quality level.

    [0064] Node 303 is a node in the network and can be at any position within the connection between the server 303 and client 301. In embodiment node 303 is physically located on the client 301 or on the server 303. In an embodiment software installed on the client 301 forms the node 303.
    The HAS stream over network 300 is captured at node 303. In this application capturing is to be understood as a step of duplicating at the node (parts of) the HAS stream. Capturing can comprise probing at a node, eavesdropping or intervention.

    [0065] In an embodiment of the invention the captured data comprising at least some of requests and replies to the requests sent as part of the HAS session are used to reconstruct the quality of service (=experience) of the session at the client. Reconstructing the session at the client allows obtaining data regarding the quality of service and the experience of the user at the client regarding the HAS-session.

    [0066] To perform a complete reconstruction of the session, the following information can be retrieved 304 from the captured HAS stream:
    1. 1) On the level of the HTTP messages
      1. a. Request time per fragment : capture time of the HTTP-GET
      2. b. Requested fragment quality : available as parameter in the requested URL of the HTTP-GET
      3. c. Playtime of the individual fragments : available in the requested URL or in the manifest file
      4. d. Fragment duration: fixed or available in the manifest file (Others)
      5. e. Arrival time of the first bytes of the fragment : capture time of the HTTP-OK header
      6. f. Caching indications : use of HTTP message 304 "Not Modified" if the client has performed a conditional GET and the data in the server was not modified meanwhile
    2. 2) On the level of the TCP messages
      1. a. Arrival of the last byte of the HAS fragment
    3. 3) On the level of the HAS-client 301
      1. a. From the captured data, it is also possible to detect the used client software.


    [0067] When information regarding the used client software is available, this information can be used for example to find the maximum buffer filling used by this client (-version) or other important behavioral characteristics of this client. An example of a behavioral characteristic of the client software is e.g. the client jumping to the lowest quality level when the buffer filling drops under a certain panic-threshold.

    [0068] In an embodiment the client-version (represented by a *.XAP file in case of Smooth Streaming) can be determined from a combination of the file-name, the file-size, and a hash of the file can be used.

    [0069] As part of reconstruction of the session parameters at the client, client-characteristics can also be deduced for each detected client-version. Deduction of characteristics can be based on observed retrieval patterns. Further client characteristics can be the max buffer filling, the panic threshold level, etc.

    [0070] In an embodiment different nodes 303 can work together and forward characteristic client information to a central server (not shown in figure 3) in the network from where the client-behavior information is validated, filtered and conclusions distributed again to the node 303.

    [0071] The captured session data 304 is inputted into a reconstruction algorithm 305 that will be described in more hereunder.

    [0072] The reconstruction will allow obtaining parameters 306 relevant to the experience of the user at the HAS-client. The reconstructed parameters can be used to reconstruct 307 session artefacts such as picture freezes, quality decreases, quality variations and interactivity delay. These values can be used as data representing the experience of the user using the client 301. The duplicated session information can be provided 308 over the network 300 to a more central point in the network such as a network analyzer entity, collecting information for involved parties such as content provider, network-provider or CDN provider.

    [0073] Further embodiments of specific process steps for reconstruction of the session will now be described.

    [0074] In case fragments are locally cached in the browser or in an HTTP-proxy that is located between the node 303 and the end-user 301, these fragment requests will be missing in the node 303. As a consequence, the used quality level of these cached fragments remains invisible for the node 303.

    [0075] Reconstruction of the missing quality levels can be performed using simple extrapolation of the missing quality levels. If the last available quality level before missing fragment(s) was a level 1 quality and the next quality level (after the missing fragments) is also level 1, the missing level can be set at level 1.

    [0076] In an embodiment the quality level is inferred using elements such as current buffer filling, characteristics of the client such as threshold levels to move the quality up or down, the maximum number of quality steps in up and down direction that can be taken in a certain amount of time, the buffer-filling level after the missing fragments, etc.

    [0077] An exemplary embodiment is shown in figure 4. If a particular client restricted the quality-increase actions to e.g. a maximum of one quality level increase/decrease every 10 sec, and if a certain transition from level n to level n+3 took 30 seconds, the intermediate quality-levels can be calculated accurately based on this knowledge, without any knowledge of the received fragments during this 30 seconds.

    [0078] In this example, during a period of about 30 seconds, the probe 303 did not receive any fragments/requests. When the probe receives a new fragment, the fragment is the n+30th fragment. The fragments that should have been requested by the client in order to continue the playout were not received by the probe. Reason for this could be that these segments (seg n+1 to seg n+29) were requested before (in a previous playout) and were served either
    1. 1) Served from the local cache of the web-browser
    2. 2) Served from the cache of an intermediate node between the client and the HAS probe.
    Based on the duration per segment (e.g. 1 sec), the knowledge of the time when each segment must be played (seqnr) and the assumed progress of the playout (1 sec buffer every 1 sec), the HAS session reconstruction algorithm (HAS-SR) could extrapolate the missing segments in the session (e.g. equally spaced between the last received segment (n) and the next received segment (n+30) and based on this estimation calculate the possibility and the duration of a possible buffer-underrun.

    [0079] As no information was captured regarding fragments n+1 and n+2, the probe 303 did not receive any indication of the requested video quality (VQ). Fragment n was requested with VQ1 and fragment n+30 was requested with VQ4. When the HAS-SR recognized the specific client/client-version and it is known that this client will only step-up the VQ after a timeframe of 10 seconds after the previous up-step, the HAS-SR algorithm deduces the quality-transitions as shown in this figure. In this way, the moment of the transition to VQ3 and later to VQ4 could be accurately estimated.

    [0080] In an embodiment part of the reconstruction of the session at the client comprises tracking of the buffer-filling. Depending on the buffer filling, a fluctuation in available bandwidth can result in a picture-freeze or not. For this reason, the buffer-filling level (in seconds) at any point in the session is to be determined as part of the session reconstruction process.

    [0081] In an embodiment several factors/events are taken into account while tracking the buffer-filling :
    • arrival of video data : can be detected via the HTTP/TCP messages
    • the regular playout of video data : is equal to the elapsed time
    • the actual buffer filling : below zero, the playout will be suspended
    • user interactivity, such as

      o repositioning of the video cursor : brings the buffer to zero until new data arrives

      o pause/resume actions : suspends the playout



    [0082] In an embodiment part of the reconstruction of the session at the client comprises taking into account the influence of user interactivity with the client 301. In case user interactivity is involved, the buffer-filling can not be predicted deterministically. The reason is that the HAS client 301 does not provide information towards the server 300 about ongoing user actions. These user-actions disturb the deterministic prediction of the buffer filling.

    [0083] One example is the case where the viewer uses the pause-button while the client algorithm is fast-filling the buffer (e.g. during initial loading or after a user repositioning). The effect of this user action is not immediately visible in the GET pattern of the client since the client will use this pause to further fast-fill the buffer to a safe buffer-value. As a result, the pause potentially remains unnoticed by the HAS session-reconstruction, resulting in an underestimation of the buffer filling. The client heuristic gives the impression not to reach its full buffer capacity.

    [0084] Figure 5 shows an example. Generally during the initial loading, segments are loaded back to back to fill the client buffer. After this initial loading period, the client will switch to a steady-state behavior where additional waiting time between segments is inserted to avoid buffer overflow.

    [0085] In this example, during the initial loading period at t = x1 the user pauses the player. During a pause the player will continue to request further fragments and receiving will continue. The actual buffer filling at the client is shown by graph 400. At x4 the buffer filling increases as a result of receiving a subsequent fragment. The increase ends at x5. The actual filling of the buffer remains at that level as no data is displayed from the local buffer as a result of the pause.

    [0086] The actual buffer filling 400 increases in subsequent steps up to a maximum level indicated with a dotted lines. Typical maxlevels are between 20 seconds and 40 seconds for available players. Only at x2 will the buffer level decrease as a result of displaying fragments from the buffer. Thereafter further requests for fragments and receiving these requests/target fragments results in a buffer filling level at around the buffer max.

    [0087] At the capture node the requests and replies are captured. In an embodiment the reconstruction method assumes continuous play. With time progression in reconstruction graph 401 is constructed. The 'height' difference between the two graphs 400,401 is exactly equal to the pause length at that time moment.

    [0088] As a result of the pause, the probe underestimates the actual buffer filling.

    [0089] In the reconstruction of the session is seems that the client's buffer has a maximum buffer level indicated by 404. If the maximum level for the buffer filling of the client's player is known in the probe, e.g. because of previous sessions handled by this probe or when the probe receives this information from a central collection and information point in the network, the reconstruction method can correct the underestimation. The time difference 406 can be estimated. Time difference 406 corresponds with the duration of the actual pause (between x1 - x2). In the reconstruction method it is therefore possible to reconstruct a pause without actually receiving such information.

    [0090] Another example is the case where the user is combining a pause with a reposition to a point that is further away in the playout. In this case, the session-reconstruction could reasonably assume that the missing GETs and OKs are caused by intermediate caching. Also in this case, the estimated buffer filling level could differ from the actual buffer filling level in the client.

    [0091] In this case, information about the used client heuristic (e.g. max buffer filling applied by the client-(version)) will be used to correct the buffer filling. By detecting when the buffer is in steady-state, the buffer-filling can be adjusted accurately. By monitoring a complete session reconstruction, it is possible, even without receiving in the node 303 input regarding certain user actions at the client 301.

    [0092] In an embodiment based on the reconstruction of the HAS session in terms of requested fragment quality and actual buffer filling, it is possible to calculate the QoE parameters of the session, including but not limited to:
    • Picture freezes: number of picture freezes, total time of picture freezes, initial interactivity delay
    • Playout quality: average playout quality, highest playout quality, lowest playout quality
    • Quality variations: number of quality transitions (up/down), average number of steps in a quality transition
    • Total interactivity delay, average interactivity delay per user action.


    [0093] In addition, information could be provided on the average and highest buffer filling (as a measure for a smooth delivery of the video fragments) and on the observed behaviour of the user (number of pauses, total pause-time, user repositions, zapping and skipping, fragment popularity, ...)

    [0094] Using these QoE parameters, it becomes now possible, for ISPs, content-providers and CDN providers, to permanently measure, monitor and track the global performance of the network when delivering HAS content. Further ISPs and CDN providers can perform HAS troubleshooting in the network. Reconstruction also allows content providers and CDN-providers to define measurable objectives for HAS deliveries that can be the basis of Service Level Agreements (SLAs). Further minimum values for the HAS service delivery quality can be defined.

    [0095] As a result of the reconstruction of QoE at the client and providing the QoE data, caching issues relating to a HAS session can be optimized.

    [0096] Fragments at the start of the content or at certain popular jump-points in the content are always requested with a very low buffer-filling (due to the initial loading or due to cursor repositioning by user). Based on the average buffer-filling when they are requested, the network might decide to store these fragments in caches that are located closer to the end-user. Another possibility is to take the average buffer filling per fragment into account in the cache-eviction strategy of the node. Fragments that are always requested with a low buffer filling could be kept longer in the cache. In both cases, the initial waiting time and the interactivity delay can be significantly reduced.

    [0097] Note that reconstruction in a trusted network node can be considered safer compared to running a (modified) client that provides on a per request basis its actual buffer filling. Such a client could rather easily be manipulated or hacked to maliciously report very low buffer values to obtain an unfair amount of resources from the network.

    [0098] The HAS-SR function could be running in the nodes involved in the delivery of the HAS flow could add a number of additional parameters (such as the current buffer filling, user-action such as pause/resume, etc.) to the original request to inform the intermediate nodes.
    The functions of the various elements shown in the Figures, including any functional blocks labeled as "agents" or "processors", may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non volatile storage. Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the FIGS. are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.


    Claims

    1. A method for reconstructing a session of streaming video content from a server to a client over a channel via a network, said server offering said video content as a set of consecutive fragments, each fragment of said set of consecutive fragments being offered in a plurality of quality levels corresponding to respective encoded data rates, said method comprising a session of at least:

    - transmitting via the network from said client to said server one or more of requests for fragments with target quality levels of said video content to be displayed at said client; and

    - receiving via the network one or more replies to the requests containing fragments with target quality levels at said client,

    wherein one or more fragments are displayed at said client, wherein the method further comprises capturing requests and/or replies to the requests of the session at a node in the network and reconstructing at least part of the session as experienced, QoE, at the client.
     
    2. Method according to claim 1, wherein reconstructing at least part of the session as experienced at the client comprises reconstructing buffer filling and/or picture freezes at the client.
     
    3. Method according to claim 1 or 2, wherein reconstructing at least part of the session as experienced at the client comprises reconstructing client-selected quality levels and quality variation.
     
    4. Method according to any of the preceding claims, wherein reconstructing at least part of the session as experienced at the client comprises reconstructing user interaction at the client.
     
    5. Method according to any of the preceding claims, wherein said transmitting of said requests comprises sending an HTTP request datagram.
     
    6. Method according to claim 5, wherein said request is a HTTP-GET comprising at least a capture time and a fragment quality level.
     
    7. Method according to claim 5 or 6, wherein said target quality level is indicated by at least one uniform resource identifier.
     
    8. Method according to any of the preceding claims, wherein reconstructing at least part of the session as experienced at the client comprises extrapolating at least one parameter of the reconstructed session.
     
    9. Method according to any of the preceding claims, further comprising transmitting over the network the reconstructed session.
     
    10. A node in a network for streaming video content as a set of consecutive fragments from a server to a client via the node in a session, wherein each fragment of said set of consecutive fragments is being offered in a plurality of quality levels, the session comprising requests for fragments with target quality levels sent from the client to the server and replies upon the requests containing one or more fragments with target quality level, said node comprising

    - at least a capturing device for capturing at least a part of said session;

    - at least a reconstruction device for reconstructing part of said session as experienced, QoE, at the client.


     
    11. Node according to claim 10, wherein the reconstruction device is arranged to reconstruct user interactions at the client.
     
    12. Node according to claim 10 or 11, wherein reconstruction device is arranged to reconstruct buffer filling and picture freezes at the client during the session.
     
    13. Node according to any of the claims 10 - 12, wherein the node further comprises a transmitting device for transmitting the reconstructed session over the network.
     
    14. Network for streaming video content as a set of consecutive fragments from a server in a session upon request from a client, wherein each fragment of said set of consecutive fragments is being offered in a plurality of quality levels corresponding to respective encoded data rates, over a channel, comprising the node according to any of the claims 10 - 13.
     


    Ansprüche

    1. Verfahren zur Wiederherstellung einer Streaming-Sitzung von Videoinhalt von einem Server an einen Client über einen Kanal via Netzwerk, wobei der besagte Server den besagten Videoinhalt als eine Gruppe von aufeinanderfolgenden Fragmenten anbietet, wobei jedes Fragment aus der besagten Gruppe von aufeinanderfolgenden Fragmenten in einer Vielzahl von Qualitätsstufen entsprechend jeweiligen codierten Datenraten angeboten wird, wobei das besagte Verfahren eine Sitzung aus mindestens Folgendem umfasst:

    - Übertragen, via Netzwerk, von dem besagten Client an den besagten Server, von einer oder mehreren Anfragen für Fragmente mit Zielqualitätsstufen des besagten Videoinhalts, der an dem besagten Client angezeigt werden soll; und

    - Empfangen, via Netzwerk, von einer oder mehreren Antworten auf die Anfragen, die Fragmente mit Zielqualitätsstufen enthalten, an dem besagten Client,

    wobei eines oder mehrere Fragmente an dem besagten Client angezeigt werden,
    wobei das Verfahren weiterhin das Erfassen von Anfragen und/oder Antworten auf die Anfragen der Sitzung an einem Knoten in dem Netzwerk und das Wiederherstellen von mindestens einem Teil der Sitzung als erfahren, QoE, an dem Client umfasst.
     
    2. Verfahren nach Anspruch 1, wobei das Wiederherstellen von mindestens einem Teil der Sitzung als erfahren an dem Client das Wiederherstellen von Pufferfüllung und/oder Standbildern an dem Client umfasst.
     
    3. Verfahren nach Anspruch 1 oder 2, wobei das Wiederherstellen von mindestens einem Teil der Sitzung als erfahren an dem Client das Wiederherstellen von Clientgewählten Qualitätsstufen und Qualitätsschwankungen umfasst.
     
    4. Verfahren nach einem beliebigen der vorstehenden Ansprüche, wobei das Wiederherstellen von mindestens einem Teil der Sitzung als erfahren an dem Client das Wiederherstellen von Benutzerinteraktion an dem Client umfasst.
     
    5. Verfahren nach einem beliebigen der vorstehenden Ansprüche, wobei das besagte Übertragen der besagten Anfragen das Senden eines HTTP-Anfrage-Datengramms umfasst.
     
    6. Verfahren nach Anspruch 5, wobei die besagte Anfrage ein HTTP-GET ist, umfassend mindestens eine Erfassungszeit und eine Fragment-Qualitätsstufe.
     
    7. Verfahren nach Anspruch 5 oder 6, wobei die besagte Zielqualitätsstufe von mindestens einem einheitlichen Ressourcen-Identifikator angegeben wird.
     
    8. Verfahren nach einem beliebigen der vorstehenden Ansprüche, wobei das Wiederherstellen von mindestens einem Teil der Sitzung als erfahren an dem Client das Extrapolieren von mindestens einem Parameter der wiederhergestellten Sitzung umfasst.
     
    9. Verfahren nach einem beliebigen der vorstehenden Ansprüche, weiterhin umfassend das Übertragen der wiederhergestellten Sitzung über das Netzwerk.
     
    10. Knoten in einem Netzwerk zum Streamen von Videoinhalt als eine Gruppe von aufeinanderfolgenden Fragmenten von einem Server an einen Client über den Knoten in einer Sitzung, wobei jedes Fragment aus der besagten Gruppe von aufeinanderfolgenden Fragmenten in einer Vielzahl von Qualitätsstufen angeboten wird, wobei die Sitzung Anfragen für Fragmente mit Zielqualitätsstufen umfasst, die von dem Client an den Server gesendet werden, und Antworten auf die Anfragen, die eines oder mehrere Fragmente mit einer Zielqualitätsstufe enthalten, wobei der besagte Knoten Folgendes umfasst:

    - mindestens eine Erfassungsvorrichtung zum Erfassen von mindestens einem Teil der besagten Sitzung;

    - mindestens eine Wiederherstellungsvorrichtung zum Wiederherstellen eines Teils der besagten Sitzung als erfahren, QoE, an dem Client.


     
    11. Knoten nach Anspruch 10, wobei die Wiederherstellungsvorrichtung angeordnet ist, um Benutzerinteraktionen an dem Client wiederherzustellen.
     
    12. Knoten nach Anspruch 10 oder 11, wobei die Wiederherstellungsvorrichtung angeordnet ist, um Pufferfüllung und Standbilder an dem Client während der Sitzung wiederherzustellen.
     
    13. Knoten nach einem beliebigen der Ansprüche 10-12, wobei der Knoten weiterhin eine Übertragungsvorrichtung zum Übertragen der wiederhergestellten Sitzung über das Netzwerk umfasst.
     
    14. Netzwerk zum Streamen von Videoinhalt als eine Gruppe von aufeinanderfolgenden Fragmenten von einem Server in einer Sitzung nach Anfrage von einem Client, wobei jedes Fragment aus der besagten Gruppe von aufeinanderfolgenden Fragmenten in einer Vielzahl von Qualitätsstufen entsprechend jeweiligen codierten Datenraten, über einen Kanal, angeboten wird, umfassend den Knoten nach einem beliebigen der Ansprüche 10-13.
     


    Revendications

    1. Procédé de reconstruction d'une session de diffusion en continu d'un contenu vidéo entre un serveur et un client sur un canal par l'intermédiaire d'un réseau, ledit serveur proposant ledit contenu vidéo sous la forme d'un ensemble de fragments consécutifs, chaque fragment dudit ensemble de fragments consécutifs étant proposé dans une pluralité de niveaux de qualité correspondant à des débits de données codées respectifs, ledit procédé comprenant une session comprenant au moins les étapes suivantes :

    - transmettre par l'intermédiaire du réseau entre ledit client et ledit serveur une ou plusieurs demandes de fragments caractérisés par des niveaux de qualité cible dudit contenu vidéo devant être affichés chez ledit client ; et

    - recevoir par l'intermédiaire du réseau une ou plusieurs réponses aux demandes contenant des fragments caractérisés par des niveaux de qualité cible chez ledit client,

    dans lequel un ou plusieurs fragments sont affichés chez ledit client, dans lequel le procédé comprend en outre la capture de demandes et/ou de réponses aux demandes de la session au niveau d'un noeud dans le réseau et la reconstruction d'au moins une partie de la session dont le client fait l'expérience, QoE.
     
    2. Procédé selon la revendication 1, dans lequel la reconstruction d'au moins une partie de la session dont le client fait l'expérience comprend la reconstruction d'un remplissage de la mémoire tampon et/ou d'arrêts sur image chez le client.
     
    3. Procédé selon la revendication 1 ou 2, dans lequel la reconstruction d'au moins une partie de la session dont le client fait l'expérience comprend la reconstruction de niveaux de qualité choisis par le client et d'une variation de qualité.
     
    4. Procédé selon l'une quelconque des revendications précédentes, dans lequel la reconstruction d'au moins une partie de la session dont le client fait l'expérience comprend la reconstruction d'une interaction d'utilisateur chez le client.
     
    5. Procédé selon l'une quelconque des revendications précédentes, dans lequel ladite transmission desdites demandes comprend l'envoi d'un datagramme d'appel au protocole HTTP.
     
    6. Procédé selon la revendication 5, dans lequel ladite demande est un HTTP-GET comprenant au moins un moment de capture et un niveau de qualité des fragments.
     
    7. Procédé selon la revendication 5 ou 6, dans lequel ledit niveau de qualité cible est indiqué par au moins un identificateur de ressources uniformes.
     
    8. Procédé selon l'une quelconque des revendications précédentes, dans lequel la reconstruction d'au moins une partie de la session dont le client fait l'expérience comprend l'extrapolation d'au moins un paramètre de la session reconstruite.
     
    9. Procédé selon l'une quelconque des revendications précédentes, comprenant en outre la transmission sur le réseau de la session reconstruite.
     
    10. Noeud dans un réseau pour diffuser en continu un contenu vidéo sous la forme d'un ensemble de fragments consécutifs entre un serveur et un client par l'intermédiaire du noeud dans une session, dans lequel chaque fragment dudit ensemble de fragments consécutifs est proposé dans une pluralité de niveaux de qualité, la session comprenant des demandes de fragments caractérisés par des niveaux de qualité cible envoyées entre le client et le serveur et des réponses aux demandes contenant un ou plusieurs fragment(s) caractérisé(s) par un niveau de qualité cible, ledit noeud comprenant

    - au moins un dispositif de capture pour capturer au moins une partie de ladite session ;

    - au moins un dispositif de reconstruction pour reconstituer une partie de ladite session dont le client fait l'expérience, QoE.


     
    11. Noeud selon la revendication 10, dans lequel le dispositif de reconstruction est conçu pour reconstituer des interactions d'utilisateur chez le client.
     
    12. Noeud selon la revendication 10 ou 11, dans lequel le dispositif de reconstruction est conçu pour reconstituer un remplissage de la mémoire tampon et/ou des arrêts sur image chez le client durant la session.
     
    13. Noeud selon l'une quelconque des revendications 10 à 12, le noeud comprenant en outre un dispositif de transmission pour transmettre la session reconstruite sur le réseau.
     
    14. Réseau de diffusion en continu d'un contenu vidéo sous la forme d'un ensemble de fragments consécutifs à partir d'un serveur dans une session sur demande d'un client, dans lequel chaque fragment dudit ensemble de fragments consécutifs est proposé dans une pluralité de niveaux de qualité correspondant à des débits de données codées respectifs, sur un canal, comprenant le noeud selon l'une quelconque des revendications 10 à 13.
     




    Drawing




















    Cited references

    REFERENCES CITED IN THE DESCRIPTION



    This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

    Patent documents cited in the description