(19)
(11)EP 3 340 535 B1

(12)EUROPEAN PATENT SPECIFICATION

(45)Mention of the grant of the patent:
29.07.2020 Bulletin 2020/31

(21)Application number: 16848012.7

(22)Date of filing:  07.09.2016
(51)International Patent Classification (IPC): 
H04L 12/24(2006.01)
H04L 12/26(2006.01)
(86)International application number:
PCT/CN2016/098344
(87)International publication number:
WO 2017/050130 (30.03.2017 Gazette  2017/13)

(54)

FAILURE RECOVERY METHOD AND DEVICE

VERFAHREN UND VORRICHTUNG ZUR FEHLERBEHEBUNG

PROCÉDÉ ET DISPOSITIF DE REPRISE SUR INCIDENT


(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(30)Priority: 22.09.2015 CN 201510608782

(43)Date of publication of application:
27.06.2018 Bulletin 2018/26

(73)Proprietor: Huawei Technologies Co., Ltd.
Longgang District Shenzhen, Guangdong 518129 (CN)

(72)Inventors:
  • ZHANG, Wenge
    Shenzhen Guangdong 518129 (CN)
  • XU, Ridong
    Shenzhen Guangdong 518129 (CN)
  • CHEN, Yong
    Shenzhen Guangdong 518129 (CN)
  • LIU, Qingming
    Shenzhen Guangdong 518129 (CN)
  • CHEN, Taizhou
    Shenzhen Guangdong 518129 (CN)
  • XIONG, Fuxiang
    Shenzhen Guangdong 518129 (CN)

(74)Representative: Pfenning, Meinig & Partner mbB 
Patent- und Rechtsanwälte Theresienhöhe 11a
80339 München
80339 München (DE)


(56)References cited: : 
WO-A1-2015/061353
WO-A1-2015/126430
CN-A- 105 187 249
WO-A1-2015/109443
CN-A- 104 468 688
US-A1- 2011 122 761
  
      
    Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


    Description

    TECHNICAL FIELD



    [0001] This application relates to the network data processing field, and in particular, to a troubleshooting method and apparatus.

    BACKGROUND



    [0002] In a communications system, when a fault occurs on a device, a method is needed to troubleshoot the fault, so as to avoid severe impact on performance of the communications system caused when the fault cannot be troubleshot for a long time.

    [0003] A troubleshooting method may be manually performed. However, manually detecting a fault and then troubleshooting the fault usually lead to relatively high time and labor costs. Therefore, the industry gradually expects a device in a communications system to automatically troubleshoot a fault in the communications system, so as to improve troubleshooting efficiency and reduce labor costs.

    [0004] In a troubleshooting method in the prior art, whether a device becomes faulty is mainly determined according to a heartbeat message of the device. Specifically, a monitoring device may periodically send a heartbeat message to a monitored device, and after receiving the heartbeat message, the monitored device may return a response message to the monitoring device. If the monitoring device has not received, within a specified time after sending the heartbeat message, the response message returned by the monitored device, it is determined that the monitored device becomes faulty, and further, the entire monitored device is reset or a function carried by the monitored device is switched to another device for troubleshooting.

    [0005] However, there may be multiple causes why the monitoring device has not received the response message within the specified time. For example, the cause may be that an interface unit used by the monitored device to send the response message becomes faulty. In this case, another interface unit of the monitored device may be invoked to replace the interface unit without resetting the entire monitored device or function switching. Resetting the entire monitored device or function switching causes relatively high risks, and affects a relatively large quantity of services.

    [0006] In conclusion, in the troubleshooting method in the prior art, a fault is analyzed and troubleshot according to a heartbeat message of a device, causing relatively low precision in fault locating.

    [0007] US2011/0122761A1 discloses an apparatus comprising a network node that receives telecommunications network measurements where the network node calculates key performance indicator (KPI) measurements from the network measurements, and the network node performs system recovery actions based on the calculated KPI measurements.

    [0008] WO2015/126430A1 discloses a method of managing virtual network functions for a network, the method including providing a virtual network function (VNF) including a number of virtual network function components (VNFCs) of a number of different types, each VNFC comprising a virtual machine (VM) executing application software. The method further includes creating for up to all VNFC types a number of deactivated VMs having application software, monitoring at least one performance level of the VNF, and scaling-out the VNF by activating a number of deactivated VMs of a number of VNFC types when the at least one performance level reaches a scale-out threshold.

    SUMMARY



    [0009] The objects are solved by the features of the independent claims. The objective of this application is to provide a troubleshooting method and apparatus, to locate a fault by using key performance indicator information, to resolve a problem of relatively low precision in fault locating according to a heartbeat message of a device.

    [0010] To implement the foregoing objective, this application provides the following solutions:

    [0011] According to a first possible implementation of a first aspect of this application, this application provides a troubleshooting method, including:

    obtaining key performance indicator information of each service processing unit in a monitored network element;

    determining a faulty object according to the key performance indicator information;

    determining a troubleshooting policy according to the faulty object; and

    sending the troubleshooting policy to a management unit in a network function virtualization system, so that the management unit uses the troubleshooting policy to perform troubleshooting.



    [0012] With reference to a second possible implementation of the first aspect, the determining a faulty object specifically includes:

    determining that the faulty object is a service processing unit in the monitored network element; or

    determining that the faulty object is a communication path between the service processing units; and

    the determining a troubleshooting policy according to the faulty object specifically includes:
    when the faulty object is the service processing unit in the monitored network element or the communication path between the service processing units, determining a network-element-level troubleshooting policy, where the network-element-level troubleshooting policy is used to perform a troubleshooting operation inside the monitored network element.



    [0013] With reference to a third possible implementation of the first aspect, the determining a faulty object specifically includes:

    determining that the faulty object is the monitored network element; or

    determining that the faulty object is a communication path between the monitored network element and another network element; and

    the determining a troubleshooting policy according to the faulty object specifically includes:
    when the faulty object is the monitored network element or the communication path between the monitored network element and the another network element, determining a network-level troubleshooting policy, where the network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located.



    [0014] With reference to a first specific implementation of the second possible implementation of the first aspect, the determining that the faulty object is a service processing unit in the monitored network element specifically includes:

    calculating, according to a quantity of service requests, received by a service processing unit, in the key performance indicator information, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information, a service success rate of a service performed by a service processing unit;

    comparing the service success rate with a first reference value; and

    determining that a service processing unit whose service success rate is lower than the first reference value is the faulty object.



    [0015] With reference to a first more specific implementation of the first specific implementation of the second possible implementation of the first aspect, the comparing the service success rate with a first reference value specifically includes:

    comparing the service success rate with a preset reference value; or

    determining an average service success rate of a homogenized service processing unit;

    subtracting a preset value from the average service success rate to obtain a homogenized reference value; and

    comparing the service success rate with the homogenized reference value; where

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.



    [0016] With reference to a second more specific implementation of the first specific implementation of the second possible implementation of the first aspect, before the determining that a service processing unit whose service success rate is lower than the first reference value is the faulty object, the method further includes:

    determining a first unit set, whose service success rate is greater than the first reference value, in homogenized service processing units;

    determining a second unit set, whose service success rate is less than the first reference value, in the homogenized service processing units; and

    determining that a percentage of units included in the first unit set in all the homogenized service processing units is greater than a first preset percentage; where

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.



    [0017] With reference to a second specific implementation of the second possible implementation of the first aspect, the determining that the faulty object is a communication path between the service processing units specifically includes:

    calculating a service success rate of a communication path according to a quantity that is of service failures caused by a communication path fault and is in the key performance indicator information;

    comparing the service success rate with a third reference value; and

    determining that a communication path whose service success rate is lower than the third reference value is the faulty object.



    [0018] With reference to a first specific implementation of the third possible implementation of the first aspect, the determining that the faulty object is the monitored network element specifically includes:

    collecting statistics about a service success rate of each service processing unit according to a quantity of service requests, received by each service processing unit, in the key performance indicator information of each service processing unit, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information of each service processing unit;

    comparing the service success rate with a second reference value;

    determining a quantity of service processing units whose service success rates are lower than the second reference value;

    determining, according to the quantity, a percentage of the service processing units, whose service success rates are lower than the second reference value, in all service processing units in the monitored network element; and

    when the percentage is greater than a second preset percentage, determining that the monitored network element is the faulty object.



    [0019] With reference to a first more specific implementation of the first specific implementation of the third possible implementation of the first aspect, the comparing the service success rate with a second reference value specifically includes:

    comparing the service success rate with a preset reference value; or

    determining an average service success rate of a homogenized network element;

    subtracting a preset value from the average service success rate to obtain a homogenized reference value; and

    comparing the service success rate with the homogenized reference value; where

    the homogenized network element is a monitored network element that carries a service with same service logic as that of the monitored network element, and to which the service is discretely allocated.



    [0020] With reference to a third specific implementation of the second possible implementation of the first aspect, after the determining that the faulty object is a service processing unit in the monitored network element or after the determining that the faulty object is a communication path between the service processing units, the sending the troubleshooting policy to a management unit in a network function virtualization system specifically includes:
    sending the troubleshooting policy to a system management module in the monitored network element in the network function virtualization system.

    [0021] With reference to a second specific implementation of the third possible implementation of the first aspect, after the determining that the faulty object is the monitored network element or after the determining that the faulty object is a communication path between the monitored network element and another network element, the sending the troubleshooting policy to a management unit in a network function virtualization system specifically includes:
    sending the troubleshooting policy to a management and orchestration MANO unit in the network function virtualization system.

    [0022] With reference to a fourth specific implementation of the second possible implementation of the first aspect, after the determining that the faulty object is a service processing unit in the monitored network element, the method further includes:

    determining that a quantity of faulty service processing units reaches a preset threshold; and

    determining a network-level troubleshooting policy, where the network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located.



    [0023] With reference to a third specific implementation of the third possible implementation of the first aspect, the determining a network-level troubleshooting policy specifically includes:

    obtaining status information of a redundancy network element related to the monitored network element that is determined as the faulty object;

    determining a redundancy network element in a normal operating state according to the status information; and

    generating network-level troubleshooting indication information, where the troubleshooting indication information is used to instruct the management unit to replace, with the redundancy network element in the normal operating state, the monitored network element that is determined as the faulty object; or

    the determining a network-level troubleshooting policy specifically includes: obtaining status information of a redundancy network element of a back-end network element in the communication path that is determined as the faulty object;

    determining a redundancy network element in a normal operating state according to the status information; and

    generating network-level troubleshooting indication information, where the troubleshooting indication information is used to instruct the management unit to switch the back-end network element corresponding to a front-end network element in the communication path to the redundancy network element in the normal operating state.



    [0024] According to a first possible implementation of a second aspect of this application, this application provides a troubleshooting apparatus, including:

    an obtaining unit, configured to obtain key performance indicator information of each service processing unit in a monitored network element;

    a determining unit, configured to: determine a faulty object according to the key performance indicator information; and

    determine a troubleshooting policy according to the faulty object; and

    a sending unit, configured to send the troubleshooting policy to a management unit in a network function virtualization system, so that the management unit uses the troubleshooting policy to perform troubleshooting.



    [0025] With reference to a second possible implementation of the second aspect, the determining unit is specifically configured to:

    determine that the faulty object is a service processing unit in the monitored network element; or

    determine that the faulty object is a communication path between the service processing units; and

    when the faulty object is the service processing unit in the monitored network element or the communication path between the service processing units, determine a network-element-level troubleshooting policy, where the network-element-level troubleshooting policy is used to perform a troubleshooting operation inside the monitored network element.



    [0026] With reference to a third possible implementation of the second aspect, the determining unit is specifically configured to:

    determine that the faulty object is the monitored network element; or

    determine that the faulty object is a communication path between the monitored network element and another network element; and

    when the faulty object is the monitored network element or the communication path between the monitored network element and the another network element, determine a network-level troubleshooting policy, where the network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located.



    [0027] With reference to a first specific implementation of the second possible implementation of the second aspect, the determining unit is specifically configured to:

    calculate, according to a quantity of service requests, received by a service processing unit, in the key performance indicator information, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information, a service success rate of a service performed by a service processing unit;

    compare the service success rate with a first reference value; and

    determine that a service processing unit whose service success rate is lower than the first reference value is the faulty object.



    [0028] With reference to a first more specific implementation of the first specific implementation of the second possible implementation of the second aspect, the determining unit is specifically configured to:

    compare the service success rate with a preset reference value; or

    determine an average service success rate of a homogenized service processing unit;

    subtract a preset value from the average service success rate to obtain a homogenized reference value; and

    compare the service success rate with the homogenized reference value; where

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.



    [0029] With reference to a second more specific implementation of the first specific implementation of the second possible implementation of the second aspect, the determining unit is further configured to:

    before determining that the service processing unit whose service success rate is lower than the first reference value is the faulty object, determine a first unit set, whose service success rate is greater than the first reference value, in homogenized service processing units;

    determine a second unit set, whose service success rate is less than the first reference value, in the homogenized service processing units; and

    determine that a percentage of units included in the first unit set in all the homogenized service processing units is greater than a first preset percentage; where

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.



    [0030] With reference to a second specific implementation of the second possible implementation of the second aspect, the determining unit is specifically configured to:

    calculate a service success rate of a communication path according to a quantity that is of service failures caused by a communication path fault and is in the key performance indicator information;

    compare the service success rate with a third reference value; and

    determine that a communication path whose service success rate is lower than the third reference value is the faulty object.



    [0031] With reference to a first specific implementation of the third possible implementation of the second aspect, the determining unit is specifically configured to:

    collect statistics about a service success rate of each service processing unit according to a quantity of service requests, received by each service processing unit, in the key performance indicator information of each service processing unit, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information of each service processing unit;

    compare the service success rate with a second reference value;

    determine a quantity of service processing units whose service success rates are lower than the second reference value;

    determine, according to the quantity, a percentage of the service processing units, whose service success rates are lower than the second reference value, in all service processing units in the monitored network element; and

    when the percentage is greater than a second preset percentage, determine that the monitored network element is the faulty object.



    [0032] With reference to a first more specific implementation of the first specific implementation of the third possible implementation of the second aspect, the determining unit is specifically configured to:

    compare the service success rate with a preset reference value; or

    determine an average service success rate of a homogenized network element;

    subtract a preset value from the average service success rate to obtain a homogenized reference value; and

    compare the service success rate with the homogenized reference value; where

    the homogenized network element is a monitored network element that carries a service with same service logic as that of the monitored network element, and to which the service is discretely allocated.



    [0033] With reference to a third specific implementation of the second possible implementation of the second aspect, the sending unit is specifically configured to:
    after it is determined that the faulty object is the service processing unit in the monitored network element or after it is determined that the faulty object is the communication path between the service processing units, send the troubleshooting policy to a system management module in the monitored network element in the network function virtualization system.

    [0034] With reference to a second specific implementation of the third possible implementation of the second aspect, the sending unit is specifically configured to:
    after it is determined that the faulty object is the monitored network element or after it is determined that the faulty object is the communication path between the monitored network element and the another network element, send the troubleshooting policy to a management and orchestration MANO unit in the network function virtualization system.

    [0035] With reference to a fourth specific implementation of the second possible implementation of the second aspect, the determining unit is further configured to:

    after determining that the faulty object is the service processing unit in the monitored network element, determine that a quantity of faulty service processing units reaches a preset threshold; and

    determine a network-level troubleshooting policy, where the network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located.



    [0036] With reference to a third specific implementation of the third possible implementation of the second aspect, the obtaining unit is further configured to:

    obtain status information of a redundancy network element related to the monitored network element that is determined as the faulty object; and

    the determining unit is further configured to: determine a redundancy network element in a normal operating state according to the status information; and

    generate network-level troubleshooting indication information, where the troubleshooting indication information is used to instruct the management unit to replace, with the redundancy network element in the normal operating state, the monitored network element that is determined as the faulty object; or

    the obtaining unit is further configured to obtain status information of a redundancy network element of a back-end network element in the communication path that is determined as the faulty object; and

    the determining unit is further configured to: determine a redundancy network element in a normal operating state according to the status information; and

    generate network-level troubleshooting indication information, where the troubleshooting indication information is used to instruct the management unit to switch the back-end network element corresponding to a front-end network element in the communication path to the redundancy network element in the normal operating state.



    [0037] According to specific embodiments provided in this application, this application discloses the following technical effects:

    [0038] According to the troubleshooting method or apparatus disclosed in this application, key performance indicator information of each service processing unit in a monitored network element is obtained; a faulty object is determined according to the key performance indicator information; a troubleshooting policy is determined according to the faulty object; and the troubleshooting policy is sent to a management unit in a network function virtualization system. A fault can be located by using the key performance indicator information, thereby resolving a problem of relatively low precision in fault locating according to a heartbeat message of a network element.

    [0039] In addition, the troubleshooting policy is determined according to the faulty object, and the troubleshooting policy is sent to the management unit in the network function virtualization system. Therefore, an appropriate troubleshooting policy can be used, which reduces risks caused during a troubleshooting process and mitigates impact on a service during the troubleshooting process.

    BRIEF DESCRIPTION OF DRAWINGS



    [0040] To describe the technical solutions in the embodiments of the present application more clearly, the following briefly describes the accompanying drawings required for describing the embodiments. Apparently, the accompanying drawings in the following description show merely some embodiments of the present application, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.

    FIG. 1 is an architectural diagram of a network function virtualization (NFV) system according to this application;

    FIG. 2 is a flowchart of Embodiment 1 of a troubleshooting method according to this application;

    FIG. 3 is a flowchart of Embodiment 2 of a troubleshooting method according to this application;

    FIG. 4 is a flowchart of Embodiment 3 of a troubleshooting method according to this application;

    FIG. 5 is a structural diagram of an embodiment of a troubleshooting apparatus according to this application; and

    FIG. 6 is a structural diagram of a computing node according to this application.


    DESCRIPTION OF EMBODIMENTS



    [0041] The following clearly describes the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. Apparently, the described embodiments are merely some but not all of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without creative efforts shall fall within the protection scope of the present application.

    [0042] To make the objectives, features and advantages of this application more comprehensible, this application is further illustrated in detail in the following with reference to the accompanying drawings and specific embodiments.

    [0043] FIG. 1 is an architectural diagram of a network function virtualization (NFV) system according to this application. A troubleshooting method in this application is mainly applied to the NFV system. As shown in FIG. 1, the NFV system mainly includes the following network elements:

    an operations support system (Operations Support System, OSS)/a business support system (Business Support System, BSS), configured to initiate a service request to a network functions virtualization orchestrator (NFV Orchestrator) and provide a resource required for a service, and responsible for fault processing;

    the orchestrator (Orchestrator), responsible for implementing an NFV service according to the service request of the OSS/BSS; and responsible for lifecycle management of a network service (Network Service, NS), and for orchestrating a management resource and monitoring, in real time, resources and running status information of a virtualized network function (Virtualized Network Function, VNF) and a network functions virtualization infrastructure (Network Functions Virtualization Infrastructure, NFVI);

    a virtualized network function manager (VNF Manager, VNFM), responsible for VNF generation period management, for example, management of starting, time to live, and VNF running status information;

    a virtualized infrastructure manager (Virtualized Infrastructure Manager, VIM), responsible for managing and allocating an NFVI resource, and for monitoring and collecting NFVI running status information; and

    an element management system (Element Management System, EMS), responsible for fault, configuration, accounting, performance, security (Fault Management, Configuration Management, Accounting Management, Performance Management, Security Management, FCAPS) management of a network element.



    [0044] The NFVI resource includes all NFVI resources: available/reserved/allocated NFVI resources.

    [0045] The troubleshooting method in this application may be performed by a network element key performance indicator (Key Performance Indicator, KPI) monitoring and troubleshooting decision module or the network KPI monitoring and troubleshooting decision module. The network element KPI monitoring and troubleshooting decision module or a network KPI monitoring and troubleshooting decision module may be deployed in a VNF, an EMS, or a management and orchestrator (Management and Orchestrator, MANO) unit in the NFV system, or in an independent network node. The network element KPI monitoring and troubleshooting decision module and the network KPI monitoring and troubleshooting decision module may be deployed in an integrated manner or deployed separately.

    [0046] FIG. 2 is a flowchart of Embodiment 1 of a troubleshooting method according to this application. The method in this embodiment may be performed by a network element KPI monitoring and troubleshooting decision module or a network KPI monitoring and troubleshooting decision module. As shown in FIG. 2, the method may include the following steps.

    [0047] Step 101: Obtain key performance indicator (Key Performance Indicator, KPI) information of each service processing unit in a monitored network element.

    [0048] The monitored network element may be a network element in a network function virtualization (Network Function Virtualization, NFV) system, for example, a VNF.

    [0049] There may be one or more service processing units in the monitored network element.

    [0050] The key performance indicator information may include information such as a quantity of service requests received by a service processing unit, a quantity of service failures corresponding to the quantity of service requests, and/or a cause of each service failure. In actual application, an information type included in the key performance indicator information may be set according to a requirement. For example, the key performance indicator information may further include information such as service delay information.

    [0051] The monitored network element may periodically report the key performance indicator information.

    [0052] It should be noted that before step 101 is performed, a network element that needs to be monitored may be further determined according to information about an EMS and/or a MANO. Information about a service processing unit deployed inside a network element, and information about a network element deployed in a network that are recorded by the EMS and/or the MANO may be obtained. A network element corresponding to the recorded information about the network element deployed in the network is determined as the monitored network element. A service processing unit corresponding to the recorded information about the service processing unit deployed inside the network element is determined as a service processing unit that needs to be monitored.

    [0053] Step 102: Determine a faulty object according to the key performance indicator information.

    [0054] For example, a success rate of a service performed by a service processing unit may be calculated according to the key performance indicator information. When the success rate is lower than a particular rate, it may be determined that the faulty object is the service processing unit. When a quantity of service processing units with relatively low success rates is relatively large (for example, exceeds 80% of a total quantity of service processing units in the monitored network element), it may be determined that the faulty object is a network element other than the monitored network element. For another example, when a quantity, recorded in the key performance indicator information, of service failures caused by a timeout of communication from a monitored network element to a next-hop network element is relatively high, it may be determined that a communication path from the monitored network element to the next-hop network element is faulty or the next-hop network element is faulty.

    [0055] Step 103: Determine a troubleshooting policy according to the faulty object.

    [0056] When the faulty object is a service processing unit inside the monitored network element, a network-element-level troubleshooting policy may be determined. The network-element-level troubleshooting policy is used to perform a troubleshooting operation inside the monitored network element.

    [0057] When the faulty object is a network element other than the monitored network element, a network-level troubleshooting policy may be determined. The network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located.

    [0058] Step 104: Send the troubleshooting policy to a management unit in a network function virtualization system, so that the management unit uses the troubleshooting policy to perform troubleshooting.

    [0059] The management unit may be a system management module in the monitored network element in the network function virtualization system, or may be a management and orchestration MANO unit in the network function virtualization system.

    [0060] Manners of using the network-element-level troubleshooting policy to perform troubleshooting may include the following manners:

    determining a standby unit of the faulty service processing unit, and switching a service carried by the faulty service processing unit to the standby unit; or

    resetting the faulty service processing unit; where

    when the standby unit becomes faulty, the faulty service processing unit and the standby unit may be isolated.



    [0061] Manners of using the network-level troubleshooting policy to perform troubleshooting may include the following manners:

    determining a standby network element of the faulty network element; and

    switching a service carried by the faulty network element to the standby network element; or

    determining a standby path of the faulty path; and

    switching a service carried by the faulty path to the standby path; where

    when it is determined that the standby path becomes faulty, a standby network element of a network element on the standby path side may be further determined; and

    a service carried by the network element on the standby path side is switched to the standby network element.



    [0062] In conclusion, in this embodiment, key performance indicator information of each service processing unit in a monitored network element is obtained; a faulty object is determined according to the key performance indicator information; a troubleshooting policy is determined according to the faulty object; and the troubleshooting policy is sent to a management unit in a network function virtualization system. A fault can be located by using the key performance indicator information, thereby resolving a problem of relatively low precision in fault locating according to a heartbeat message of a network element. In addition, the troubleshooting policy is determined according to the faulty object, and the troubleshooting policy is sent to the management unit in the network function virtualization system. Therefore, an appropriate troubleshooting policy can be used, which reduces risks caused during a troubleshooting process and mitigates impact on a service during the troubleshooting process.

    [0063] In actual application, the determining a faulty object may specifically include:

    determining that the faulty object is a service processing unit in the monitored network element; or

    determining that the faulty object is a communication path between the service processing units.



    [0064] The determining a troubleshooting policy according to the faulty object may specifically include:
    when the faulty object is the service processing unit in the monitored network element or the communication path between the service processing units, determining a network-element-level troubleshooting policy, where the network-element-level troubleshooting policy is used to perform a troubleshooting operation inside the monitored network element.

    [0065] In actual application, the determining a faulty object may further specifically include:

    determining that the faulty object is the monitored network element; or

    determining that the faulty object is a communication path between the monitored network element and another network element.



    [0066] The determining a troubleshooting policy according to the faulty object may specifically include:
    when the faulty object is the monitored network element or the communication path between the monitored network element and the another network element, determining a network-level troubleshooting policy, where the network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located.

    [0067] It should be noted that based on the method in this embodiment of the present invention, in actual application, for a network-element-level fault, a network-element-level troubleshooting policy may be used first to perform troubleshooting; and if the troubleshooting fails, a network-level troubleshooting policy may be used then to perform troubleshooting.

    [0068] In actual application, the determining that the faulty object is a service processing unit in the monitored network element may specifically include the following steps:

    calculating, according to a quantity of service requests, received by a service processing unit, in the key performance indicator information, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information, a service success rate of a service performed by a service processing unit;

    comparing the service success rate with a reference value; and

    determining that a service processing unit whose service success rate is lower than the reference value is the faulty object.



    [0069] In the foregoing steps, the quantity of service failures may be a quantity of service failures caused by the service processing unit itself. Specifically, a cause of a service failure may be recorded in the key performance indicator information, and statistics about the quantity of service failures caused by the service processing unit itself may be collected according to the cause of the service failure.

    [0070] It should be further noted that in the foregoing steps, the reference value may be a preset value, or may be a homogenized reference value obtained according to statistics about an average service success rate of a homogenized service processing unit. Therefore, the comparing the service success rate with a reference value may specifically include:

    comparing the service success rate with a preset reference value; or

    determining an average service success rate of a homogenized service processing unit;

    subtracting a preset value from the average service success rate to obtain a homogenized reference value; and

    comparing the service success rate with the homogenized reference value; where

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.



    [0071] It should be noted that the homogenized service processing unit may sometimes encounter the following phenomenon: Service success rates of multiple homogenized service processing units are all lower than the preset reference value due to a reason. In this case, the homogenized service processing units with the service success rates lower than the preset reference value may not be faulty. Service success rates of most homogenized service processing units may be decreased due to a fault of another device. In the foregoing case, to avoid mistakenly determining that the homogenized service processing unit is faulty, before the determining that a service processing unit whose service success rate is lower than the reference value is the faulty object, the following steps may be further used:

    determining a first unit set, whose service success rate is greater than the preset reference value, in homogenized service processing units;

    determining a second unit set, whose service success rate is less than the reference value, in the homogenized service processing units; and

    determining that a percentage of units included in the first unit set in all the homogenized service processing units is greater than a preset percentage.



    [0072] In the foregoing steps, the preset percentage may be set according to an actual requirement, for example, may be set to 90%. That is, when service success rates of 90% or more of the homogenized service processing units are higher than the preset reference value and service success rates of 10% or less of the homogenized service processing units are lower than the preset reference value, homogenized service processing units with service success rates lower than the reference value may be determined as faulty objects.

    [0073] In actual application, the determining that the faulty object is a communication path between the service processing units may specifically include:

    calculating a service success rate of a communication path according to a quantity that is of service failures caused by a communication path fault and is in the key performance indicator information;

    comparing the service success rate with a reference value; and

    determining that a communication path whose service success rate is lower than the reference value is the faulty object.



    [0074] In actual application, determining that the faulty object is a network element in a network to which the monitored network element belongs may specifically include:

    collecting statistics about a service success rate of the monitored network element according to a quantity of service requests, received by a service processing unit, in the key performance indicator information, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information;

    comparing the service success rate with a reference value; and

    determining that monitored network element whose service success rate is lower than the reference value is the faulty object.



    [0075] It should be noted that one network element may include multiple service processing units. Therefore, key performance indicator information of each service processing unit in one network element may be obtained; statistics about a quantity of service requests received by the network element and a quantity of service failures corresponding to the quantity of service requests are collected according to a quantity, included in the key performance indicator information of each service processing unit, of service requests received by a service processing unit, and a quantity of service failures that is corresponding to the quantity of service requests and is included in the key performance indicator information of each service processing unit. Further, a service success rate of the monitored network element is calculated.

    [0076] In actual application, the comparing the service success rate with a reference value may specifically include:

    comparing the service success rate with a preset reference value; or

    determining an average service success rate of a homogenized network element;

    subtracting a preset value from the average service success rate to obtain a homogenized reference value; and

    comparing the service success rate with the homogenized reference value; where

    the homogenized network element is a monitored network element that carries a service with same service logic as that of the monitored network element, and to which the service is discretely allocated.



    [0077] FIG. 3 is a flowchart of Embodiment 2 of a troubleshooting method according to this application. The method in this embodiment may be performed by a network element KPI monitoring and troubleshooting decision module. As shown in FIG. 3, the method may include the following steps.

    [0078] Step 201: Obtain key performance indicator information of each service processing unit in a monitored network element.

    [0079] In this embodiment, the service processing unit may be a thread, a process, a virtual machine (Virtual Machine, VM), or the like. The key performance indicator information may include at least the following information: a quantity of service requests received by the service processing unit and a quantity of service failures corresponding to the quantity of service requests.

    [0080] Step 202: Calculate, according to a quantity of service requests, received by a service processing unit, in the key performance indicator information, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information, a service success rate of a service performed by a service processing unit.

    [0081] The service success rate may be obtained by subtracting the quantity of service failures from the quantity of service requests, then dividing the obtained difference by the quantity of service requests, and multiplying the obtained quotient by 100%.

    [0082] Step 203: Compare the service success rate with a reference value.

    [0083] The reference value may be set according to an actual requirement. For example, when a service success rate of a normal service processing unit is 95% or higher, the reference value may be set to 95%.

    [0084] Alternatively, the reference value may be calculated according to an average service success rate of a homogenized service processing unit. The homogenized service processing unit is a service processing unit that has same service logic as that carried by a service processing unit corresponding to the service success rate, and has same external service networking as that of the service processing unit corresponding to the service success rate. Service request messages received by (distributed to) multiple homogenized service processing units are randomly discrete. Therefore, service success rates of the multiple homogenized service processing units should be basically similar. Therefore, a homogenized reference value may be calculated according to the average service success rate of the homogenized service processing unit.

    [0085] Specifically, a preset value may be subtracted from the average service success rate to obtain the homogenized reference value. The preset value may be set according to an actual requirement, for example, may be 20% or 10%.

    [0086] Step 204: Determine that a service processing unit whose service success rate is lower than the reference value is a faulty object.

    [0087] Step 205: When the faulty object is a service processing unit, determine a network-element-level troubleshooting policy.

    [0088] Step 206: Send the troubleshooting policy to a system management module in the monitored network element in a network function virtualization system.

    [0089] The network-element-level troubleshooting policy in step 205 may instruct the system management module to reset the faulty service processing unit. After receiving the network-element-level troubleshooting policy, the system management module may reset the faulty service processing unit.

    [0090] It should be noted that if the reset service processing unit is still faulty, the faulty service processing unit may be further isolated. Further, when it is determined that a quantity of isolated service processing units reaches a second preset threshold, a network-level troubleshooting policy may be executed. The network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located. For example, a switchover may be performed on a next-hop faulty network element or communication path of the monitored network element. A target network element or communication path of the switchover may be selected according to health statuses of network elements or communication paths in a redundancy group.

    [0091] It should be further noted that when the faulty service processing unit is an active/standby service processing unit, the troubleshooting policy may be: determining a standby unit of the faulty service processing unit; and switching a service carried by the faulty service processing unit to the standby unit. Further, when the standby unit becomes faulty, the faulty service processing unit and the standby unit may be isolated.

    [0092] FIG. 4 is a flowchart of Embodiment 3 of a troubleshooting method according to this application. The method in this embodiment may be performed by a network KPI monitoring and troubleshooting decision module. As shown in FIG. 4, the method may include the following steps.

    [0093] Step 301: Obtain key performance indicator information of each service processing unit in a monitored network element.

    [0094] Step 302: Calculate, according to a quantity of service requests, received by a service processing unit, in the key performance indicator information, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information, a service success rate of a service performed by each service processing unit.

    [0095] Step 303: Compare the service success rate with a reference value.

    [0096] Step 304: Determine a quantity of service processing units whose service success rates are lower than the reference value.

    [0097] Step 305: Determine, according to the quantity, a percentage of the service processing units, whose service success rates are lower than the reference value, in all service processing units in the monitored network element.

    [0098] Assuming that the quantity of the service processing units whose service success rates are lower than the reference value is 8 and that a quantity of all the service processing units in the monitored network element is 10, the percentage is 80%.

    [0099] Step 306: When the percentage is greater than a preset percentage, determine that a faulty object is the monitored network element.

    [0100] The preset percentage may be set according to an actual requirement. For example, the preset percentage may be set to 50% or 80%.

    [0101] Step 307: When the faulty object is a network element in a network to which the monitored network element belongs, determine a network-level troubleshooting policy.

    [0102] When a fault location is the network element in the network to which the monitored network element belongs, the network-level troubleshooting policy needs to be used, to repair the faulty network element.

    [0103] In actual application, the determining a network-level troubleshooting policy may be specifically implemented in multiple manners. For example, the following steps may be used:

    obtaining status information of a redundancy network element related to the monitored network element that is determined as the faulty object;

    determining a redundancy network element in a normal operating state according to the status information; and

    generating network-level troubleshooting indication information, where the troubleshooting indication information is used to instruct a management unit to replace, with the redundancy network element in the normal operating state, the monitored network element that is determined as the faulty object.



    [0104] In the foregoing steps, it can be ensured that the redundancy network element used to replace the faulty monitored network element can operate normally. If all redundancy network elements of the monitored network element are abnormal, a preset redundancy network element may not be used to replace the faulty monitored network element, and another network element that can operate normally may be found, to replace the faulty monitored network element.

    [0105] For another example, the following steps may be used:

    obtaining status information of a redundancy network element of a back-end network element in a communication path that is determined as the faulty object;

    determining a redundancy network element in a normal operating state according to the status information; and

    generating network-level troubleshooting indication information, where the troubleshooting indication information is used to instruct the management unit to switch the back-end network element corresponding to a front-end network element in the communication path to the redundancy network element in the normal operating state.



    [0106] In the foregoing steps, it can be ensured that the switched-to redundancy network element can operate normally. If all redundancy network elements of the back-end network element in the communication path are abnormal, a preset redundancy network element may not be used for the switchover, and another network element that can operate normally may be found for the switchover.

    [0107] Step 308: Send the troubleshooting policy to a management and orchestration MANO unit in a network function virtualization system.

    [0108] The network-level troubleshooting policy may instruct the MANO unit to determine a standby network element of the faulty network element and to switch a service carried by the faulty network element to the standby network element.

    [0109] When receiving the network-level troubleshooting policy, the MANO unit may determine the standby network element of the faulty network element. After determining the standby network element of the faulty network element, the MANO unit may send indication signaling to a VNFM to instruct the VNFM to switch the service carried by the faulty network element to the standby network element. After receiving the indication signaling, the VNFM may switch the service carried by the faulty network element to the standby network element.

    [0110] It should be further noted that in this embodiment of this application, the key performance indicator information may further include information about a service failure cause, and information about a quantity of service failures caused by the service failure cause. The service failure cause may include: a timeout of communication to a downstream network element, resource insufficiency, a timeout of communication between internal modules of a monitored network element, an internal error of software (such as invalidity of internal data of software and code processing entering an abnormal branch), and the like. Therefore, the determining the faulty object according to the key performance indicator information in this application may further specifically include:
    determining the faulty object according to the service failure cause information included in the key performance indicator information.

    [0111] A percentage of failed services caused by a service processing timeout may be determined according to a quantity of service failures caused by a service processing timeout and a quantity of service requests sent by the monitored network element to a downstream network element that are recorded in the key performance indicator information.

    [0112] When the percentage of failed services is greater than or equal to a preset threshold, it may be determined that the fault location is the monitored network element. The network element in the network to which the monitored network element belongs may include an external network element of the network element and the network element itself. Accordingly, in this case, the network-level troubleshooting policy may also be used.

    [0113] In addition, for the previously mentioned homogenized service processing unit, a quantity of service failures caused by resource insufficiency may be excluded and not counted into a total statistical quantity of service failures during collection of statistics about a quantity of service failures. A main cause of this case is that a quantity of services is excessively large and the service processing unit itself is usually not faulty.

    [0114] This application further provides a troubleshooting apparatus.

    [0115] FIG. 5 is a structural diagram of an embodiment of a troubleshooting apparatus according to this application. As shown in FIG. 5, the apparatus may include:

    an obtaining unit 501, configured to obtain key performance indicator information of each service processing unit in a monitored network element;

    a determining unit 502, configured to: determine a faulty object according to the key performance indicator information; and

    determine a troubleshooting policy according to the faulty object; and

    a sending unit 503, configured to send the troubleshooting policy to a management unit in a network function virtualization system, so that the management unit uses the troubleshooting policy to perform troubleshooting.



    [0116] In this embodiment, key performance indicator information of each service processing unit in a monitored network element is obtained; a faulty object is determined according to the key performance indicator information; a troubleshooting policy is determined according to the faulty object; and the troubleshooting policy is sent to a management unit in a network function virtualization system. A fault can be located by using the key performance indicator information, thereby resolving a problem of relatively low precision in fault locating according to a heartbeat message of a network element. In addition, the troubleshooting policy is determined according to the faulty object, and the troubleshooting policy is sent to the management unit in the network function virtualization system. Therefore, an appropriate troubleshooting policy can be used, which reduces risks caused during a troubleshooting process and mitigates impact on a service during the troubleshooting process.

    [0117] In actual application, the determining unit 502 may be specifically configured to:

    determine that the faulty object is a service processing unit in the monitored network element; or

    determine that the faulty object is a communication path between the service processing units; and

    when the faulty object is the service processing unit in the monitored network element or the communication path between the service processing units, determine a network-element-level troubleshooting policy, where the network-element-level troubleshooting policy is used to perform a troubleshooting operation inside the monitored network element.



    [0118] In actual application, the determining unit 502 may be specifically configured to:

    determine that the faulty object is the monitored network element; or

    determine that the faulty object is a communication path between the monitored network element and another network element; and

    when the faulty object is the monitored network element or the communication path between the monitored network element and the another network element, determine a network-level troubleshooting policy, where the network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located.



    [0119] In actual application, the determining unit 502 may be specifically configured to:

    calculate, according to a quantity of service requests, received by a service processing unit, in the key performance indicator information, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information, a service success rate of a service performed by a service processing unit;

    compare the service success rate with a first reference value; and

    determine that a service processing unit whose service success rate is lower than the first reference value is the faulty object.



    [0120] In actual application, the determining unit 502 may be specifically configured to:

    compare the service success rate with a preset reference value; or

    determine an average service success rate of a homogenized service processing unit;

    subtract a preset value from the average service success rate to obtain a homogenized reference value; and

    compare the service success rate with the homogenized reference value; where

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.



    [0121] In actual application, the determining unit 502 may further be configured to:

    before determining that the service processing unit whose service success rate is lower than the first reference value is the faulty object, determine a first unit set, whose service success rate is greater than the first reference value, in homogenized service processing units;

    determine a second unit set, whose service success rate is less than the first reference value, in the homogenized service processing units; and

    determine that a percentage of units included in the first unit set in all the homogenized service processing units is greater than a first preset percentage; where

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.



    [0122] In actual application, the determining unit 502 may be specifically configured to:

    calculate a service success rate of a communication path according to a quantity that is of service failures caused by a communication path fault and is in the key performance indicator information;

    compare the service success rate with a third reference value; and

    determine that a communication path whose service success rate is lower than the third reference value is the faulty object.



    [0123] In actual application, the determining unit 502 may be specifically configured to:

    collect statistics about a service success rate of each service processing unit according to a quantity of service requests, received by each service processing unit, in the key performance indicator information of each service processing unit, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information of each service processing unit;

    compare the service success rate with a second reference value;

    determine a quantity of service processing units whose service success rates are lower than the second reference value;

    determine, according to the quantity, a percentage of the service processing units, whose service success rates are lower than the second reference value, in all service processing units in the monitored network element; and

    when the percentage is greater than a second preset percentage, determine that the monitored network element is the faulty object.



    [0124] In actual application, the determining unit 502 may be specifically configured to:

    compare the service success rate with a preset reference value; or

    determine an average service success rate of a homogenized network element;

    subtract a preset value from the average service success rate to obtain a homogenized reference value; and

    compare the service success rate with the homogenized reference value; where

    the homogenized network element is a monitored network element that carries a service with same service logic as that of the monitored network element, and to which the service is discretely allocated.



    [0125] In actual application, the sending unit 503 may be specifically configured to:
    after it is determined that the faulty object is the service processing unit in the monitored network element or after it is determined that the faulty object is the communication path between the service processing units, send the troubleshooting policy to a system management module in the monitored network element in the network function virtualization system.

    [0126] In actual application, the sending unit 503 may be specifically configured to:
    after it is determined that the faulty object is the monitored network element or after it is determined that the faulty object is the communication path between the monitored network element and the another network element, send the troubleshooting policy to a management and orchestration MANO unit in the network function virtualization system.

    [0127] In actual application, the determining unit 502 may further be configured to:

    after determining that the faulty object is the service processing unit in the monitored network element, determine that a quantity of faulty service processing units reaches a preset threshold; and

    determine a network-level troubleshooting policy, where the network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located.



    [0128] In actual application, the obtaining unit 501 may further be configured to:
    obtain status information of a redundancy network element related to the monitored network element that is determined as the faulty object.

    [0129] The determining unit 502 may further be configured to: determine a redundancy network element in a normal operating state according to the status information; and
    generate network-level troubleshooting indication information, where the troubleshooting indication information is used to instruct the management unit to replace, with the redundancy network element in the normal operating state, the monitored network element that is determined as the faulty object.

    [0130] Alternatively, the obtaining unit 501 is further configured to obtain status information of a redundancy network element of a back-end network element in the communication path that is determined as the faulty object.

    [0131] The determining unit 502 may further be configured to: determine a redundancy network element in a normal operating state according to the status information; and
    generate network-level troubleshooting indication information, where the troubleshooting indication information is used to instruct the management unit to switch the back-end network element corresponding to a front-end network element in the communication path to the redundancy network element in the normal operating state.

    [0132] In addition, an embodiment of this application further provides a computing node. The computing node may be a host server having a computing capability, a personal computer PC, a portable computer or terminal, or the like. Specific embodiments of this application impose no limitation on specific implementation of the computing node.

    [0133] FIG. 6 is a structural diagram of a computing node according to this application. As shown in FIG. 6, the computing node 600 includes:
    a processor (processor) 610, a communications interface (Communications Interface) 620, a memory (memory) 630, and a communications bus 640.

    [0134] The processor 610, the communications interface 620, and the memory 630 communicate with each other by using the communications bus 640.

    [0135] The processor 610 is configured to execute a program 632.

    [0136] Specifically, the program 632 may include program code, and the program code includes a computer operation instruction.

    [0137] The processor 610 may be a central processing unit CPU, or an application-specific integrated circuit ASIC (Application-Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of this application.

    [0138] The memory 630 is configured to store the program 632. The memory 630 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The program 632 may specifically include a corresponding module or unit in the embodiment shown in FIG. 5, and details are not described herein.

    [0139] In the end, it should be noted that in this specification, relational terms such as first and second are only used to distinguish one entity or operation from another, and do not necessarily require or imply that any actual relationship or sequence exists between these entities or operations. Moreover, the terms "include", "comprise", or their any other variant is intended to cover a non-exclusive inclusion, so that a process, a method, an article, or an apparatus that includes a list of elements not only includes those elements but also includes other elements which are not expressly listed, or further includes elements inherent to such process, method, article, or apparatus. An element preceded by "includes a ..." does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that includes the element.

    [0140] Based on the foregoing descriptions of the embodiments, a person skilled in the art may clearly understand that the present application may be implemented by software in addition to a necessary hardware platform or by hardware only. In most circumstances, the former is a preferred implementation. Based on such an understanding, all or the part of the technical solutions of the present application contributing to the technology in the background part may be implemented in the form of a software product. The computer software product may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, or an optical disc, and includes several instructions for instructing a computer device, which may be a personal computer, a server, or a network device, to perform the methods described in the embodiments or some parts of the embodiments of the present application.

    [0141] The embodiments in this specification are all described in a progressive manner, for same or similar parts in the embodiments, reference may be made to these embodiments, and each embodiment focuses on a difference from other embodiments. The apparatus disclosed in the embodiments is described relatively simply because it corresponds to the method disclosed in the embodiments, and for portions related to those of the method, reference may be made to the description of the method.

    [0142] Specific examples are used in this specification to describe the principle and implementations of the present application. The foregoing embodiments are merely intended to help understand the method and idea of the present application. In addition, with respect to the implementations and the application scope, modifications may be made by a person of ordinary skill in the art according to the idea of the present application. Therefore, the content of this specification shall not be construed as a limitation to the present application.


    Claims

    1. A troubleshooting method, comprising:

    obtaining (101) key performance indicator information of each service processing unit in a monitored network element;

    determining (102) a faulty object according to the key performance indicator information;

    determining (103) a troubleshooting policy according to the faulty object; and

    sending (104) the troubleshooting policy to a management unit in a network function virtualization system, so that the management unit uses the troubleshooting policy to perform troubleshooting;

    wherein the determining the faulty object specifically comprises:

    determining that the faulty object is a service processing unit in the monitored network element; or

    determining that the faulty object is a communication path between a first service processing unit and a second service processing unit; and

    the determining a troubleshooting policy according to the faulty object specifically comprises:

    when the faulty object is the service processing unit in the monitored network element or the communication path between the first service processing unit and the second service processing unit, determining a network-element-level troubleshooting policy, wherein the network-element-level troubleshooting policy is used to perform a troubleshooting operation inside the monitored network element; or,

    wherein the determining a faulty object specifically comprises:

    determining that the faulty object is the monitored network element; or

    determining that the faulty object is a communication path between the monitored network element and another network element; and

    the determining a troubleshooting policy according to the faulty object specifically comprises:
    when the faulty object is the monitored network element or the communication path between the monitored network element and the another network element, determining a network-level troubleshooting policy, wherein the network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located, wherein the faulty object is the monitored network element when a quantity of service processing units with a success rate lower than a first reference value is greater than a second preset percentage.


     
    2. The method according to claim 1, wherein the determining that the faulty object is a service processing unit in the monitored network element specifically comprises:

    calculating, according to a quantity of service requests, received by a service processing unit, in the key performance indicator information, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information, a service success rate of a service performed by a service processing unit;

    comparing the service success rate with the first reference value; and

    determining that a service processing unit whose service success rate is lower than the first reference value is the faulty object.


     
    3. The method according to claim 2, wherein the comparing the service success rate with a first reference value specifically comprises:

    comparing the service success rate with a preset reference value; or

    determining an average service success rate of a homogenized service processing unit;

    subtracting a preset value from the average service success rate to obtain a homogenized reference value; and

    comparing the service success rate with the homogenized reference value; wherein

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.


     
    4. The method according to claim 2, before the determining that a service processing unit whose service success rate is lower than the first reference value is the faulty object, further comprising:

    determining a first unit set, whose service success rate is greater than the first reference value, in homogenized service processing units;

    determining a second unit set, whose service success rate is less than the first reference value, in the homogenized service processing units; and

    determining that a percentage of units comprised in the first unit set in all the homogenized service processing units is greater than a first preset percentage; wherein

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.


     
    5. The method according to claim 1, wherein the determining that the faulty object is a communication path between the service processing units specifically comprises:

    calculating a service success rate of a communication path according to a quantity that is of service failures caused by a communication path fault and is in the key performance indicator information;

    comparing the service success rate with a third reference value; and

    determining that a communication path whose service success rate is lower than the third reference value is the faulty object.


     
    6. The method according to claim 1, wherein the determining that the faulty object is the monitored network element specifically comprises:

    collecting statistics about a service success rate of each service processing unit according to a quantity of service requests, received by each service processing unit, in the key performance indicator information of each service processing unit, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information of each service processing unit;

    comparing the service success rate with a second reference value;

    determining a quantity of service processing units whose service success rates are lower than the second reference value;

    determining, according to the quantity, a percentage of the service processing units, whose service success rates are lower than the second reference value, in all service processing units in the monitored network element; and

    when the percentage is greater than the second preset percentage, determining that the monitored network element is the faulty object.


     
    7. A troubleshooting apparatus, comprising:

    an obtaining unit (501), configured to obtain key performance indicator information of each service processing unit in a monitored network element;

    a determining unit (502), configured to: determine a faulty object according to the key performance indicator information; and

    determine a troubleshooting policy according to the faulty object; and

    a sending unit (503), configured to send the troubleshooting policy to a management unit in a network function virtualization system, so that the management unit uses the troubleshooting policy to perform troubleshooting;

    wherein the determining unit (502) is specifically configured to:

    determine that the faulty object is a service processing unit in the monitored network element; or

    determine that the faulty object is a communication path between a first service processing unit and a second service processing unit; and

    when the faulty object is the service processing unit in the monitored network element or the communication path between the first service processing unit and the second service processing unit, determine a network-element-level troubleshooting policy, wherein the network-element-level troubleshooting policy is used to perform a troubleshooting operation inside the monitored network element; or,

    wherein the determining unit (502) is specifically configured to:

    determine that the faulty object is the monitored network element; or

    determine that the faulty object is a communication path between the monitored network element and another network element; and

    when the faulty object is the monitored network element or the communication path between the monitored network element and the another network element, determine a network-level troubleshooting policy, wherein the network-level troubleshooting policy is used to perform a troubleshooting operation on one or more network elements in a network in which the monitored network element is located, wherein the faulty object is the monitored network element when a quantity of service processing units with a success rate lower than a first reference value is greater than a second preset percentage.


     
    8. The apparatus according to claim 7, wherein the determining unit (502) is specifically configured to:

    calculate, according to a quantity of service requests, received by a service processing unit, in the key performance indicator information, and a quantity of service failures that is corresponding to the quantity of service requests and is in the key performance indicator information, a service success rate of a service performed by a service processing unit;

    compare the service success rate with the first reference value; and

    determine that a service processing unit whose service success rate is lower than the first reference value is the faulty object.


     
    9. The apparatus according to claim 8, wherein the determining unit (502) is specifically configured to:

    compare the service success rate with a preset reference value; or

    determine an average service success rate of a homogenized service processing unit;

    subtract a preset value from the average service success rate to obtain a homogenized reference value; and

    compare the service success rate with the homogenized reference value; wherein

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.


     
    10. The apparatus according to claim 8, wherein the determining unit (502) is further configured to:

    before determining that the service processing unit whose service success rate is lower than the first reference value is the faulty object, determine a first unit set, whose service success rate is greater than the first reference value, in homogenized service processing units;

    determine a second unit set, whose service success rate is less than the first reference value, in the homogenized service processing units; and

    determine that a percentage of units comprised in the first unit set in all the homogenized service processing units is greater than a first preset percentage; wherein

    the homogenized service processing unit is a service processing unit that has same service logic as that of a service carried by the service processing unit, and to which the service is discretely allocated.


     
    11. The apparatus according to claim 7, wherein the determining unit (502) is specifically configured to:

    calculate a service success rate of a communication path according to a quantity that is of service failures caused by a communication path fault and is in the key performance indicator information;

    compare the service success rate with a third reference value; and

    determine that a communication path whose service success rate is lower than the third reference value is the faulty object.


     


    Ansprüche

    1. Fehlersuchverfahren, das Folgendes umfasst:

    Erhalten (101) von Schlüsselleistungsindikatorinformationen von jeder Dienstverarbeitungseinheit in einem überwachten Netzwerkelement;

    Bestimmen (102) eines fehlerhaften Objekts gemäß den Schlüsselleistung sindikatorinformationen;

    Bestimmen (103) einer Fehlersucherichtlinie gemäß dem fehlerhaften Objekt und Senden (104) der Fehlersucherichtlinie an eine Verwaltungseinheit in einem Netzwerkfunktionsvirtualisierungssystem, derart, dass die Verwaltungseinheit die Fehlersucherichtlinie verwendet, um eine Fehlersuche durchzuführen;

    wobei das Bestimmen des fehlerhaften Objekts speziell Folgendes umfasst:

    Bestimmen, dass das fehlerhafte Objekt eine Dienstverarbeitungseinheit im überwachten Netzwerkelement ist; oder

    Bestimmen, dass das fehlerhafte Objekt ein Kommunikationspfad zwischen einer ersten Dienstverarbeitungseinheit und einer zweiten Dienstverarbeitungseinheit ist; und

    das Bestimmen einer Fehlersucherichtlinie gemäß dem fehlerhaften Objekt umfasst speziell Folgendes:

    wenn das fehlerhafte Objekt die Dienstverarbeitungseinheit im überwachten Netzwerkelement oder der Kommunikationspfad zwischen der ersten Dienstverarbeitungseinheit und der zweiten Dienstverarbeitungseinheit ist, Bestimmen einer Fehlersucherichtlinie auf Netzwerkelementebene, wobei die Fehlersucherichtlinie auf Netzwerkelementebene verwendet wird, um eine Fehlersuchoperation im überwachten Netzwerkelement durchzuführen; oder

    wobei das Bestimmen eines fehlerhaften Objekts speziell Folgendes umfasst:

    Bestimmen, dass das fehlerhafte Objekt das überwachte Netzwerkelement ist; oder

    Bestimmen, dass das fehlerhafte Objekt ein Kommunikationspfad zwischen dem überwachten Netzwerkelement und einem anderen Netzwerkelement ist; und

    das Bestimmen einer Fehlersucherichtlinie gemäß dem fehlerhaften Objekt umfasst speziell Folgendes:

    wenn das fehlerhafte Objekt das überwachte Netzwerkelement oder der Kommunikationspfad zwischen dem überwachten Netzwerkelement und dem anderen Netzwerkelement ist, Bestimmen einer Fehlersucherichtlinie auf Netzwerkebene, wobei die Fehlersucherichtlinie auf Netzwerkebene verwendet wird, um eine Fehlersuchoperation auf einem oder mehreren Netzwerkelementen in einem Netzwerk, in dem sich das überwachte Netzwerkelement befindet, durchzuführen,

    wobei das fehlerhafte Objekt das überwachte Netzwerkelement ist, wenn eine Menge von Dienstverarbeitungseinheiten mit einer Erfolgsrate von weniger als einem ersten Referenzwert größer ist als ein zweiter voreingestellter Prozentsatz.


     
    2. Verfahren nach Anspruch 1, wobei das Bestimmen, dass das fehlerhafte Objekt eine Dienstverarbeitungseinheit im überwachten Netzwerkelement ist, speziell Folgendes umfasst:

    Berechnen einer Diensterfolgsrate eines Dienstes, der von einer Dienstverarbeitungseinheit durchgeführt wird, gemäß einer Menge von Dienstanforderungen, die von einer Dienstverarbeitungseinheit empfangen werden, in den Schlüsselleistungsindikatorinformationen und einer Menge von Dienstfehlschlägen, die der Menge von Dienstanforderungen entspricht und in den Schlüsselleistungsindikatorinformationen angegeben ist;

    Vergleichen der Diensterfolgsrate mit dem ersten Referenzwert und

    Bestimmen, dass eine Dienstverarbeitungseinheit, deren Diensterfolgsrate niedriger ist als der erste Referenzwert, das fehlerhafte Objekt ist.


     
    3. Verfahren nach Anspruch 2, wobei das Vergleichen der Diensterfolgsrate mit einem ersten Referenzwert speziell Folgendes umfasst:

    Vergleichen der Diensterfolgsrate mit einem voreingestellten Referenzwert oder Bestimmen einer durchschnittlichen Diensterfolgsrate einer homogenisierten Dienstverarbeitung seinheit;

    Subtrahieren eines voreingestellten Wertes von der durchschnittlichen Diensterfolgsrate, um einen homogenisierten Referenzwert zu erhalten; und

    Vergleichen der Diensterfolgsrate mit dem homogenisierten Referenzwert; wobei die homogenisierte Dienstverarbeitungseinheit eine Dienstverarbeitungseinheit ist, die eine selbe Dienstlogik aufweist wie die eines Dienstes, der von der Dienstverarbeitungseinheit transportiert wird und der der Dienst diskret zugeordnet ist.


     
    4. Verfahren nach Anspruch 2, wobei das Bestimmen, dass eine Dienstverarbeitungseinheit, deren Diensterfolgsrate niedriger ist als der erste Referenzwert, das fehlerhafte Objekt ist, ferner Folgendes umfasst:

    Bestimmen eines ersten Einheitensatzes, dessen Diensterfolgsrate größer ist als der erste Referenzwert, in homogenisierten Dienstverarbeitungseinheiten;

    Bestimmen eines zweiten Einheitensatzes, dessen Diensterfolgsrate kleiner ist als der erste Referenzwert, in den homogenisierten Dienstverarbeitungseinheiten und

    Bestimmen, dass ein Prozentsatz von Einheiten, die im ersten Einheitensatz umfasst sind, in allen homogenisierten Dienstverarbeitungseinheiten größer ist als ein erster voreingestellter Prozentsatz; wobei

    die homogenisierte Dienstverarbeitungseinheit eine Dienstverarbeitungseinheit ist, die eine selbe Dienstlogik aufweist wie die eines Dienstes, der von der Dienstverarbeitungseinheit transportiert wird und der der Dienst diskret zugeordnet ist.


     
    5. Verfahren nach Anspruch 1, wobei das Bestimmen, dass das fehlerhafte Objekt ein Kommunikationspfad zwischen den Dienstverarbeitungseinheiten ist, speziell Folgendes umfasst:

    Berechnen einer Diensterfolgsrate eines Kommunikationspfads gemäß einer Menge von Dienstfehlschlägen, die durch einen Kommunikationspfadfehler verursacht wird und die in den Schlüsselleistungsindikatorinformationen angegeben ist;

    Vergleichen der Diensterfolgsrate mit einem dritten Referenzwert und

    Bestimmen, dass ein Kommunikationspfad, dessen Diensterfolgsrate niedriger ist als der dritte Referenzwert, das fehlerhafte Objekt ist.


     
    6. Verfahren nach Anspruch 1, wobei das Bestimmen, dass das fehlerhafte Objekt das überwachte Netzwerkelement ist, speziell Folgendes umfasst:

    Sammeln einer Statistik über eine Diensterfolgsrate jeder Dienstverarbeitungseinheit gemäß einer Menge von Dienstanforderungen, die von jeder Dienstverarbeitungseinheit empfangen werden, in den Schlüsselleistungsindikatorinformationen jeder Dienstverarbeitungseinheit und einer Menge von Dienstfehlschlägen, die der Menge von Dienstanforderungen entspricht und in den Schlüsselleistungsindikatorinformationen jeder Dienstverarbeitungseinheit angegeben ist;

    Vergleichen der Diensterfolgsrate mit einem zweiten Referenzwert;

    Bestimmen einer Menge von Dienstverarbeitungseinheiten, deren Diensterfolgsraten niedriger sind als der zweite Referenzwert;

    Bestimmen eines Prozentsatzes der Dienstverarbeitungseinheiten, deren Diensterfolgsraten niedriger sind als der zweite Referenzwert, von allen Dienstverarbeitungseinheiten im überwachten Netzwerkelement gemäß der Menge und

    wenn der Prozentsatz größer ist als der zweite voreingestellte Prozentsatz, Bestimmen, dass das überwachte Netzwerkelement das fehlerhafte Objekt ist.


     
    7. Fehlersuchvorrichtung, die Folgendes umfasst:

    eine Erhaltungseinheit (501), die dazu ausgelegt ist, Schlüsselleistungsindikatorinformationen von jeder Dienstverarbeitungseinheit in einem überwachten Netzwerkelement zu erhalten;

    eine Bestimmungseinheit (502), die zu Folgendem ausgelegt ist:

    Bestimmen eines fehlerhaften Objekts gemäß den Schlüsselleistungsindikatorinformationen und

    Bestimmen einer Fehlersucherichtlinie gemäß dem fehlerhaften Objekt und

    eine Sendeeinheit (503), die dazu ausgelegt ist, die Fehlersucherichtlinie an eine Verwaltungseinheit in einem Netzwerkfunktionsvirtualisierungssystem zu senden, derart, dass die Verwaltungseinheit die Fehlersucherichtlinie verwendet, um eine Fehlersuche durchzuführen;

    wobei die Bestimmungseinheit (502) speziell zu Folgendem ausgelegt ist:

    Bestimmen, dass das fehlerhafte Objekt eine Dienstverarbeitungseinheit im überwachten Netzwerkelement ist; oder

    Bestimmen, dass das fehlerhafte Objekt ein Kommunikationspfad zwischen einer ersten Dienstverarbeitungseinheit und einer zweiten Dienstverarbeitungseinheit ist; und

    wenn das fehlerhafte Objekt die Dienstverarbeitungseinheit im überwachten Netzwerkelement oder der Kommunikationspfad zwischen der ersten Dienstverarbeitungseinheit und der zweiten Dienstverarbeitungseinheit ist,

    Bestimmen einer Fehlersucherichtlinie auf Netzwerkelementebene, wobei die Fehlersucherichtlinie auf Netzwerkelementebene verwendet wird, um eine Fehlersuchoperation im überwachten Netzwerkelement durchzuführen; oder

    wobei die Bestimmungseinheit (502) speziell zu Folgendem ausgelegt ist:

    Bestimmen, dass das fehlerhafte Objekt das überwachte Netzwerkelement ist; oder

    Bestimmen, dass das fehlerhafte Objekt ein Kommunikationspfad zwischen dem überwachten Netzwerkelement und einem anderen Netzwerkelement ist; und

    wenn das fehlerhafte Objekt das überwachte Netzwerkelement oder der Kommunikationspfad zwischen dem überwachten Netzwerkelement und dem anderen Netzwerkelement ist, Bestimmen einer Fehlersucherichtlinie auf Netzwerkebene, wobei die Fehlersucherichtlinie auf Netzwerkebene verwendet wird, um eine Fehlersuchoperation auf einem oder mehreren Netzwerkelementen in einem Netzwerk, in dem sich das überwachte Netzwerkelement befindet, durchzuführen,

    wobei das fehlerhafte Objekt das überwachte Netzwerkelement ist, wenn eine Menge von Dienstverarbeitungseinheiten mit einer Erfolgsrate von weniger als einem ersten Referenzwert größer ist als ein zweiter voreingestellter Prozentsatz.


     
    8. Vorrichtung nach Anspruch 7, wobei die Bestimmungseinheit (502) speziell zu Folgendem ausgelegt ist:

    Berechnen einer Diensterfolgsrate eines Dienstes, der von einer Dienstverarbeitungseinheit durchgeführt wird, gemäß einer Menge von Dienstanforderungen, die von einer Dienstverarbeitungseinheit empfangen werden, in den Schlüsselleistungsindikatorinformationen und einer Menge von Dienstfehlschlägen, die der Menge von Dienstanforderungen entspricht und in den Schlüsselleistungsindikatorinformationen angegeben ist;

    Vergleichen der Diensterfolgsrate mit dem ersten Referenzwert und

    Bestimmen, dass eine Dienstverarbeitungseinheit, deren Diensterfolgsrate niedriger ist als der erste Referenzwert, das fehlerhafte Objekt ist.


     
    9. Vorrichtung nach Anspruch 8, wobei die Bestimmungseinheit (502) speziell zu Folgendem ausgelegt ist:

    Vergleichen der Diensterfolgsrate mit einem voreingestellten Referenzwert oder Bestimmen einer durchschnittlichen Diensterfolgsrate einer homogenisierten Dienstverarbeitung seinheit;

    Subtrahieren eines voreingestellten Wertes von der durchschnittlichen Diensterfolgsrate, um einen homogenisierten Referenzwert zu erhalten; und

    Vergleichen der Diensterfolgsrate mit dem homogenisierten Referenzwert; wobei die homogenisierte Dienstverarbeitungseinheit eine Dienstverarbeitungseinheit ist, die eine selbe Dienstlogik aufweist wie die eines Dienstes, der von der Dienstverarbeitungseinheit transportiert wird und der der Dienst diskret zugeordnet ist.


     
    10. Vorrichtung nach Anspruch 8, wobei die Bestimmungseinheit (502) ferner zu Folgendem ausgelegt ist:

    bevor bestimmt wird, dass die Dienstverarbeitungseinheit, deren Diensterfolgsrate niedriger ist als der erste Referenzwert, das fehlerhafte Objekt ist, Bestimmen eines ersten Einheitensatzes, dessen Diensterfolgsrate größer ist als der erste Referenzwert, in homogenisierten Dienstverarbeitungseinheiten;

    Bestimmen eines zweiten Einheitensatzes, dessen Diensterfolgsrate kleiner ist als der erste Referenzwert, in den homogenisierten Dienstverarbeitungseinheiten und

    Bestimmen, dass ein Prozentsatz von Einheiten, die im ersten Einheitensatz umfasst sind, in allen homogenisierten Dienstverarbeitungseinheiten größer ist als ein erster voreingestellter Prozentsatz; wobei

    die homogenisierte Dienstverarbeitungseinheit eine Dienstverarbeitungseinheit ist, die eine selbe Dienstlogik aufweist wie die eines Dienstes, der von der Dienstverarbeitungseinheit transportiert wird und der der Dienst diskret zugeordnet ist.


     
    11. Vorrichtung nach Anspruch 7,
    wobei die Bestimmungseinheit (502) speziell zu Folgendem ausgelegt ist:

    Berechnen einer Diensterfolgsrate eines Kommunikationspfads gemäß einer Menge von Dienstfehlschlägen, die durch einen Kommunikationspfadfehler verursacht wird und die in den Schlüsselleistungsindikatorinformationen angegeben ist;

    Vergleichen der Diensterfolgsrate mit einem dritten Referenzwert und Bestimmen, dass ein Kommunikationspfad, dessen Diensterfolgsrate niedriger ist als der dritte Referenzwert, das fehlerhafte Objekt ist.


     


    Revendications

    1. Procédé de dépannage, comprenant :

    obtenir (101) des informations d'indicateurs de performance clés de chaque unité de traitement de service dans un élément de réseau surveillé ;

    déterminer (102) un objet défectueux en fonction des informations d'indicateurs de performance clés ;

    déterminer (103) une politique de dépannage en fonction de l'objet défectueux ; et

    envoyer (104) la politique de dépannage à une unité de gestion dans un système de virtualisation des fonctions réseau, de sorte que l'unité de gestion utilise la politique de dépannage pour effectuer le dépannage ;

    dans lequel la détermination de l'objet défectueux comprend spécifiquement :

    déterminer que l'objet défectueux est une unité de traitement de service dans l'élément de réseau surveillé ; ou

    déterminer que l'objet défectueux est un chemin de communication entre une première unité de traitement de service et une seconde unité de traitement de service ; et

    la détermination d'une politique de dépannage en fonction de l'objet défectueux comprend spécifiquement :

    lorsque l'objet défectueux est l'unité de traitement de service dans l'élément de réseau surveillé ou le chemin de communication entre la première unité de traitement de service et la seconde unité de traitement de service, déterminer une politique de dépannage au niveau de l'élément de réseau, dans lequel la politique de dépannage au niveau de l'élément de réseau est utilisée pour effectuer une opération de dépannage au sein de l'élément de réseau surveillé ; ou,

    dans lequel la détermination d'un objet défectueux comprend spécifiquement :

    déterminer que l'objet défectueux est l'élément de réseau surveillé ; ou

    déterminer que l'objet défectueux est un chemin de communication entre l'élément de réseau surveillé et un autre élément de réseau ; et

    la détermination d'une politique de dépannage en fonction de l'objet défectueux comprend spécifiquement :

    lorsque l'objet défectueux est l'élément de réseau surveillé ou le chemin de communication entre l'élément de réseau surveillé et l'autre élément de réseau,

    déterminer une politique de dépannage au niveau du réseau, dans lequel la politique de dépannage au niveau du réseau est utilisée pour effectuer une opération de dépannage sur un ou plusieurs éléments de réseau dans un réseau dans lequel se trouve l'élément de réseau surveillé, dans lequel l'objet défectueux est l'élément de réseau surveillé quand une quantité d'unités de traitement de service avec un taux de succès inférieur à une première valeur de référence est supérieure à un second pourcentage prédéfini.


     
    2. Procédé selon la revendication 1, dans lequel la détermination du fait que l'objet défectueux est une unité de traitement de service dans l'élément de réseau surveillé comprend spécifiquement :

    calculer, en fonction d'une quantité de demandes de service, reçues par une unité de traitement de service, dans les informations d'indicateurs de performance clés, et d'une quantité de pannes de service qui correspond à la quantité de demandes de service et qui est dans les informations d'indicateurs de performance clés, un taux de succès de service d'un service fourni par une unité de traitement de service ;

    comparer le taux de succès de service avec la première valeur de référence ; et

    déterminer qu'une unité de traitement de service dont le taux de succès de service est inférieur à la première valeur de référence est l'objet défectueux.


     
    3. Procédé selon la revendication 2, dans lequel la comparaison du taux de succès de service avec une première valeur de référence comprend spécifiquement :

    comparer le taux de succès de service avec une valeur de référence prédéfinie ; ou

    déterminer un taux de succès de service moyen d'une unité de traitement de service homogénéisée ;

    soustraire une valeur prédéfinie à partir du taux de succès de service moyen pour obtenir une valeur de référence homogénéisée ; et

    comparer le taux de succès de service avec la valeur de référence homogénéisée ;

    dans lequel

    l'unité de traitement de service homogénéisée est une unité de traitement de service qui a la même logique de service que celle d'un service supporté par l'unité de traitement de service, et à laquelle le service est affecté de façon discrète.


     
    4. Procédé selon la revendication 2, avant la détermination du fait qu'une unité de traitement de service dont le taux de succès de service est inférieur à la première valeur de référence est l'objet défectueux, comprenant en outre :

    déterminer un premier ensemble d'unités dont le taux de succès de service est supérieur à la première valeur de référence dans les unités de traitement de service homogénéisées ;

    déterminer un second ensemble d'unités dont le taux de succès de service est inférieur à la première valeur de référence dans les unités de traitement de service homogénéisées ; et

    déterminer qu'un pourcentage d'unités comprises dans le premier ensemble d'unités dans toutes les unités de traitement de service homogénéisées est supérieur à un premier pourcentage prédéfini ; dans lequel

    l'unité de traitement de service homogénéisée est une unité de traitement de service qui a la même logique de service que celle d'un service supporté par l'unité de traitement de service, et à laquelle le service est affecté de façon discrète.


     
    5. Procédé selon la revendication 1, dans lequel la détermination du fait que l'objet défectueux est un chemin de communication entre les unités de traitement de service comprend spécifiquement :

    calculer un taux de succès de service d'un chemin de communication en fonction d'une quantité qui est de pannes de service causées par un défaut de chemin de communication et qui est dans les informations d'indicateurs de performance clés ;

    comparer le taux de succès de service avec une troisième valeur de référence ; et

    déterminer qu'un chemin de communication dont le taux de succès de service est inférieur à la troisième valeur de référence est l'objet défectueux.


     
    6. Procédé selon la revendication 1, dans lequel la détermination du fait que l'objet défectueux est l'élément de réseau surveillé comprend spécifiquement :

    collecter des statistiques sur un taux de succès de service de chaque unité de traitement de service en fonction d'une quantité de demandes de service, reçues par chaque unité de traitement de service, dans les informations d'indicateurs de performance clés de chaque unité de traitement de service, et d'une quantité de pannes de service qui correspond à la quantité de demandes de service et qui est dans les informations d'indicateurs de performance clés de chaque unité de traitement de service ;

    comparer le taux de succès de service avec une deuxième valeur de référence ;

    déterminer une quantité d'unités de traitement de service dont les taux de succès de service sont inférieurs à la deuxième valeur de référence ;

    déterminer, en fonction de la quantité, un pourcentage des unités de traitement de service dont les taux de succès de service sont inférieurs à la deuxième valeur de référence dans toutes les unités de traitement de service dans l'élément de réseau surveillé ; et

    lorsque le pourcentage est supérieur au second pourcentage prédéfini, déterminer que l'élément de réseau surveillé est l'objet défectueux.


     
    7. Appareil de dépannage, comprenant :

    une unité d'obtention (501), configurée pour obtenir des informations d'indicateurs de performance clés de chaque unité de traitement de service dans un élément de réseau surveillé ;

    une unité de détermination (502), configurée pour : déterminer un objet défectueux en fonction des informations d'indicateurs de performance clés ; et

    déterminer une politique de dépannage en fonction de l'objet défectueux ; et

    une unité d'envoi (503), configurée pour envoyer la politique de dépannage à une unité de gestion dans un système de virtualisation des fonctions réseau, de sorte que l'unité de gestion utilise la politique de dépannage pour effectuer le dépannage ;

    dans lequel l'unité de détermination (502) est spécifiquement configurée pour :

    déterminer que l'objet défectueux est une unité de traitement de service dans l'élément de réseau surveillé ; ou

    déterminer que l'objet défectueux est un chemin de communication entre une première unité de traitement de service et une seconde unité de traitement de service ; et

    lorsque l'objet défectueux est l'unité de traitement de service dans l'élément de réseau surveillé ou le chemin de communication entre la première unité de traitement de service et la seconde unité de traitement de service, déterminer une politique de dépannage au niveau de l'élément de réseau, dans lequel la politique de dépannage au niveau de l'élément de réseau est utilisée pour effectuer une opération de dépannage au sein de l'élément de réseau surveillé ; ou,

    dans lequel l'unité de détermination (502) est spécifiquement configurée pour :

    déterminer que l'objet défectueux est l'élément de réseau surveillé ; ou

    déterminer que l'objet défectueux est un chemin de communication entre l'élément de réseau surveillé et un autre élément de réseau ; et

    lorsque l'objet défectueux est l'élément de réseau surveillé ou le chemin de communication entre l'élément de réseau surveillé et l'autre élément de réseau,

    déterminer une politique de dépannage au niveau du réseau, dans lequel la politique de dépannage au niveau du réseau est utilisée pour effectuer une opération de dépannage sur un ou plusieurs éléments de réseau dans un réseau dans lequel se trouve l'élément de réseau surveillé, dans lequel l'objet défectueux est l'élément de réseau surveillé quand une quantité d'unités de traitement de service avec un taux de succès inférieur à une première valeur de référence est supérieure à un second pourcentage prédéfini.


     
    8. Appareil selon la revendication 7, dans lequel l'unité de détermination (502) est spécifiquement configurée pour :

    calculer, en fonction d'une quantité de demandes de service, reçues par une unité de traitement de service, dans les informations d'indicateurs de performance clés, et d'une quantité de pannes de service qui correspond à la quantité de demandes de service et qui est dans les informations d'indicateurs de performance clés, un taux de succès de service d'un service fourni par une unité de traitement de service ;

    comparer le taux de succès de service avec la première valeur de référence ; et

    déterminer qu'une unité de traitement de service dont le taux de succès de service est inférieur à la première valeur de référence est l'objet défectueux.


     
    9. Appareil selon la revendication 8, dans lequel l'unité de détermination (502) est spécifiquement configurée pour :

    comparer le taux de succès de service avec une valeur de référence prédéfinie ; ou

    déterminer un taux de succès de service moyen d'une unité de traitement de service homogénéisée ;

    soustraire une valeur prédéfinie à partir du taux de succès de service moyen pour obtenir une valeur de référence homogénéisée ; et

    comparer le taux de succès de service avec la valeur de référence homogénéisée ;

    dans lequel

    l'unité de traitement de service homogénéisée est une unité de traitement de service qui a la même logique de service que celle d'un service supporté par l'unité de traitement de service, et à laquelle le service est affecté de façon discrète.


     
    10. Appareil selon la revendication 8, dans lequel l'unité de détermination (502) est en outre configurée pour :

    avant de déterminer que l'unité de traitement de service dont le taux de succès de service est inférieur à la première valeur de référence est l'objet défectueux, déterminer un premier ensemble d'unités dont le taux de succès de service est supérieur à la première valeur de référence dans les unités de traitement de service homogénéisées ;

    déterminer un second ensemble d'unités, dont le taux de succès de service est inférieur à la première valeur de référence dans les unités de traitement de service homogénéisées ; et

    déterminer qu'un pourcentage d'unités comprises dans le premier ensemble d'unités dans toutes les unités de traitement de service homogénéisées est supérieur à un premier pourcentage prédéfini ; dans lequel

    l'unité de traitement de service homogénéisée est une unité de traitement de service qui a la même logique de service que celle d'un service supporté par l'unité de traitement de service, et à laquelle le service est affecté de façon discrète.


     
    11. Appareil selon la revendication 7,
    dans lequel l'unité de détermination (502) est spécifiquement configurée pour :

    calculer un taux de succès de service d'un chemin de communication en fonction d'une quantité qui est de pannes de service causées par un défaut de chemin de communication et qui est dans les informations d'indicateurs de performance clés ;

    comparer le taux de succès de service avec une troisième valeur de référence ; et

    déterminer qu'un chemin de communication dont le taux de succès de service est inférieur à la troisième valeur de référence est l'objet défectueux.


     




    Drawing




















    Cited references

    REFERENCES CITED IN THE DESCRIPTION



    This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

    Patent documents cited in the description