(19)
(11)EP 3 667 669 A1

(12)EUROPEAN PATENT APPLICATION
published in accordance with Art. 153(4) EPC

(43)Date of publication:
17.06.2020 Bulletin 2020/25

(21)Application number: 18849945.3

(22)Date of filing:  23.04.2018
(51)International Patent Classification (IPC): 
G11C 11/418(2006.01)
(86)International application number:
PCT/CN2018/084100
(87)International publication number:
WO 2019/041840 (07.03.2019 Gazette  2019/10)
(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
Designated Extension States:
BA ME
Designated Validation States:
KH MA MD TN

(30)Priority: 04.09.2017 CN 201710785410

(71)Applicant: Huawei Technologies Co., Ltd.
Longgang District Shenzhen, Guangdong 518129 (CN)

(72)Inventors:
  • CHI, Sijie
    Shenzhen Guangdong 518129 (CN)
  • JI, Bingwu
    Shenzhen Guangdong 518129 (CN)
  • ZHAO, Tanfu
    Shenzhen Guangdong 518129 (CN)
  • ZHOU, Yunming
    Shenzhen Guangdong 518129 (CN)

(74)Representative: Isarpatent 
Patent- und Rechtsanwälte Behnisch Barth Charles Hassa Peckmann & Partner mbB Friedrichstrasse 31
80801 München
80801 München (DE)

  


(54)MEMORY UNIT AND STATIC RANDOM ACCESS MEMORY


(57) A storage unit and a static random access memory are provided. The storage unit includes a latch, and the latch provides a first storage bit. The storage unit further includes a first MOS transistor. A gate of the first MOS transistor is connected to the first storage bit, a source of the first MOS transistor is connected to a first read line, and a drain of the first MOS transistor is connected to a second read line. In a first state, the first read line is a read word line, and the second read line is a read bit line; or in a second state, the second read line is a read word line, and the first read line is a read bit line. The storage unit according to embodiments of the present invention can implement an exchange between a read word line and a read bit line.




Description


[0001] This application claims priority to Chinese Patent Application No. 201710785410.5, filed with the China National Intellectual Property Administration on September 4, 2017 and entitled "STORAGE UNIT AND STATIC RANDOM ACCESS MEMORY", which is incorporated herein by reference in its entirety.

TECHNICAL FIELD



[0002] The present invention relates to the storage field, and in particular, to a storage unit and a static random access memory (Static Random Access Memory, SRAM).

BACKGROUND



[0003] FIG. 1 is a schematic structural diagram of a storage unit of a conventional SRAM. The storage unit is a common six-transistor unit, and includes six metal-oxide-semiconductor (metal oxide semiconductor, MOS) field-effect transistors. The MOS field-effect transistor is referred to as an MOS transistor for short. Referring to FIG. 1, the storage unit includes two inverting gates and two MOS transistors. The two inverting gates include four MOS transistors. The two inverting gates in the figure form a latch, configured to store data. The two MOS transistors in the figure are configured for gating access. The latch formed by circularly connecting the two inverting gates has two stable states: 0 and 1. A and B are two storage bits in opposite states. For example, when the storage bit A is a logical 0, the storage bit B is a logical 1; or when the storage bit A is a logical 1, the storage bit B is a logical 0. A word line (word_line, WL) is used to control gating access to the storage unit, and a bit line (bit_line, BL) is used to perform reading/writing in the storage unit. Reading is pulling up the WL and reading bits from the BL, while writing is pulling up the WL and pulling up or pulling down the BL. Because a drive capability of the BL is stronger than that of the storage unit, an original state is forcibly overwritten. In FIG. 1, both bit and nbit represent bit lines. The bit line bit is used to read data of the storage bit A, and the bit line nbit is used to read data of the storage bit B. The data read from the storage unit by using the two bit lines has opposite states.

[0004] FIG. 2 is a schematic diagram of a storage array of the conventional SRAM. The storage array includes the 3 x 3 storage unit shown in FIG. 1. When data is read from the conventional SRAM, only data of a specific address can be read usually. If data of an address that is symmetric to the address about a matrix diagonal needs to be obtained, the data needs to be processed by writing an algorithm into code of a peripheral digital circuit of the SRAM for processing. This manner has a low speed and comparatively high power consumption and workload.

SUMMARY



[0005] Embodiments of the present invention provide a storage unit and a static random access memory, so that a read word line and a read bit line are interchangeable, and data of an address that is symmetric to the address about a matrix diagonal can be obtained. In this way, a speed is slow, and both power consumption and workload are relatively high.

[0006] According to a first aspect, a storage unit is provided. The storage unit includes a latch, and the latch provides a first storage bit. The storage unit further includes a first MOS transistor. A gate of the first MOS transistor is connected to the first storage bit, a source of the first MOS transistor is connected to a first read line, and a drain of the first MOS transistor is connected to a second read line. In a first state, the first read line is a read word line, and the second read line is a read bit line. In a second state, the second read line is a read word line, and the first read line is a read bit line.

[0007] In this embodiment of the present invention, based on a conventional structure of a latch that can provide a storage bit, an MOS transistor is added. A gate of the MOS transistor is connected to the storage bit, and a source and a drain of the MOS transistor are connected to a first read line and a second read line, respectively. Based on a characteristic of the MOS transistor, either of the first read line and the second read line may be used as a read bit line, and the other is used as a read word line, so that the read word line and the read bit line are interchangeable. A bottom-layer hardware storage unit is changed to quickly obtain data of an address that is symmetric to a specific address about a matrix diagonal, instead of writing an algorithm into code. An SRAM in this form is equivalent to a hardware accelerator. Compared with writing an algorithm into code of a digital circuit, changing the storage unit can greatly increase a system speed, and greatly reduce power consumption and workload. Usually, one or two cycles (clock cycle) are required to complete transpose reading, while thousands of cycles may be required when the algorithm is written into the code of the digital circuit.

[0008] In a possible implementation, the first MOS transistor is an NMOS transistor or a PMOS transistor.

[0009] In a possible implementation, the latch further provides a second storage bit. The storage unit further includes a second MOS transistor. A gate of the second MOS transistor is connected to the second storage bit, a source of the second MOS transistor is connected to the first read line, and a drain of the second MOS transistor is connected to the second read line. According to this implementation, data of two storage bits in opposite states is separately read by using a pair of MOS transistors. This can improve data reading accuracy.

[0010] In a possible implementation, the first MOS transistor is the NMOS transistor and the second MOS transistor is the PMOS transistor; or the first MOS transistor is the PMOS transistor and the second MOS transistor is the NMOS transistor.

[0011] According to a second aspect, a static random access memory is provided. The static random access memory includes: a plurality of storage units. The storage unit includes a latch, and the latch provides a first storage bit. The storage unit further includes a first MOS transistor. A gate of the first MOS transistor is connected to the first storage bit, a source of the first MOS transistor is connected to a first read line, and a drain of the first MOS transistor is connected to a second read line. In a first state, the first read line is a read word line, and the second read line is a read bit line. In a second state, the second read line is a read word line, and the first read line is a read bit line. The plurality of storage units form a rectangular storage array with a same quantity of rows and columns, where storage units of a same row in the storage array are connected to the same first read line; and storage units of a same column in the storage array are connected to the same second read line.

[0012] In this embodiment of the present invention, based on a conventional structure of a latch that can provide a storage bit, an MOS transistor is added into a storage unit of a static random access memory. A gate of the MOS transistor is connected to the storage bit, and a source and a drain of the MOS transistor are connected to a first read line and a second read line, respectively. Based on a characteristic of the MOS transistor, either of the first read line and the second read line may be used as a read bit line, and the other is used as a read word line, so that the read word line and the read bit line are interchangeable. A bottom-layer hardware storage unit is changed to quickly obtain data of an address that is symmetric to a specific address about a matrix diagonal, instead of writing an algorithm into code. An SRAM in this form is equivalent to a hardware accelerator. Compared with writing an algorithm into code of a digital circuit, changing the storage unit can greatly increase a system speed, and greatly reduce power consumption and workload. Usually, one or two cycles (clock cycle) are required to complete transpose reading, while thousands of cycles may be required when the algorithm is written into the code of the digital circuit.

[0013] In a possible implementation, the first MOS transistor is an NMOS transistor or a PMOS transistor.

[0014] In a possible implementation, the latch further provides a second storage bit. The storage unit further includes a second MOS transistor. A gate of the second MOS transistor is connected to the second storage bit, a source of the second MOS transistor is connected to the first read line, and a drain of the second MOS transistor is connected to the second read line. According to this implementation, data of two storage bits in opposite states is separately read by using a pair of MOS transistors. This can improve data reading accuracy.

[0015] In a possible implementation, the first MOS transistor is the NMOS transistor and the second MOS transistor is the PMOS transistor; or the first MOS transistor is the PMOS transistor and the second MOS transistor is the NMOS transistor.

[0016] In a possible implementation, each storage unit is connected to a control signal. When the control signal is a first sequence, the storage unit is considered to be in the first state; or when the control signal is a second sequence, the storage unit is considered to be in the second state. According to this implementation, the control signal is used to control switching of read states of the storage unit. This is simple and quick.

[0017] In a possible implementation, a length of the first sequence is 1 bit, a length of the second sequence is 1 bit, and the first sequence is 0 and the second sequence is 1, or the first sequence is 1 and the second sequence is 0. According to this implementation, the 1-bit control signal is used to control switching of read states of the storage unit. This is simple and quick.

[0018] In a possible implementation, the static random access memory further includes: a first decoder, a second decoder, a first word line driver, a second word line driver, a first read amplifier, a second read amplifier, and a phase inverter, where the control signal is connected to an enable end of the first decoder, an enable end of the first word line driver, and an enable end of the first read amplifier; an input end of the first decoder inputs an address value, a first output end of the first decoder outputs a row number, and a second output end of the first decoder outputs a column number; the first output end of the first decoder is connected to a control end of the first read amplifier, and an input end of the first read amplifier is connected to the first read line; a second output end of the first decoder is connected to a control end of the first word line driver, an output end of the first word line driver is connected to the second read line, where the control end of the first read amplifier is configured to control the first read amplifier to perform amplification processing on data of at least one input end, and the control end of the first word line driver is configured to control at least one output end of the first word line driver to output a first logical state; the control signal is connected to an input end of the phase inverter, and an output end of the phase inverter is connected to an enable end of the second decoder, an enable end of the second word line driver, and an enable end of the second read amplifier; and an input end of the second decoder inputs an address value, a first output end of the second decoder outputs a row number, and a second output end of the second decoder outputs a column number; the first output end of the second decoder is connected to a control end of the second read amplifier, and an input end of the second read amplifier is connected to the second read line; the second output end of the second decoder is connected to a control end of the second word line driver, an output end of the second word line driver is connected to the first read line, where the control end of the second read amplifier is configured to control the second read amplifier to perform amplification processing on data of at least one input end, and the control end of the second word line driver is configured to control at least one output end of the second word line driver to output the first logical state. According to this implementation, two groups of decoders, word line drivers, and read amplifiers are set. The two groups of decoders, word line drivers, and read amplifiers have different paths. A control signal is used to control one of the two groups of decoders, word line drivers, and read amplifiers to work, so that the read word line and the read bit line are interchanged.

[0019] In a possible implementation, the static random access memory further includes: a decoder, a word line driver, a read amplifier, first selectors with a same quantity as a row quantity of the storage array, and second selectors with a same quantity as a column quantity of the storage array, where an input end of the decoder inputs an address value, a first output end of the decoder outputs a row number, and a second output end of the decoder outputs a column number; the first output end of the decoder is connected to a control end of the word line driver, where the control end of the word line driver is configured to control at least one output end of the word line driver to output a first logical state, and the second output end of the decoder is connected to a control end of the read amplifier, where the control end of the read amplifier is configured to control the read amplifier to perform amplification processing on data of at least one input end; the control signal is connected to a control end of the first selector and a control end of the second selector; a first port of the first selector is connected to the first read line, a second port of the first selector is connected to the second read line, and a third port of the first selector is connected to an output end of the word line driver, where the control end of the first selector is configured to control the first port of the first selector to connect to the third port of the first selector, or control the second port of the first selector to connect to the third port of the first selector; and a first port of the second selector is connected to the second read line, a second port of the second selector is connected to the first read line, and a third port of the second selector is connected to an input end of the read amplifier, where the control end of the second selector is configured to control the first port of the second selector to connect to the third port of the second selector, or control the second port of the second selector to connect to the third port of the second selector. According to this implementation, a selector is controlled to switch between circuit connection modes, so that a read word line and a read bit line are interchanged.

[0020] According to a third aspect, a chip is provided, and the chip includes: the static random access memory according to the second aspect.

[0021] In a possible implementation, the chip is configured to decompose a convolution kernel in a convolutional neural network.

BRIEF DESCRIPTION OF DRAWINGS



[0022] 

FIG. 1 is a schematic structural diagram of a storage unit of a conventional SRAM;

FIG. 2 is a schematic diagram of a storage array of the conventional SRAM;

FIG. 3 is a schematic structural diagram of a storage unit of an SRAM according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a storage unit of another SRAM according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a storage unit of still another SRAM according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of reading by a storage unit in conventional reading according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of reading by a storage array in conventional reading according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of reading by a storage unit in transpose reading according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of reading by a storage array in transpose reading according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a control circuit according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of another control circuit according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of an image matrix and a convolution kernel matrix; and

FIG. 13 is a schematic diagram of convolution calculation between a 5 x 5 image matrix and a 3 x 3 convolution kernel matrix.


DESCRIPTION OF EMBODIMENTS



[0023] The embodiments of the present invention provide a storage unit. The storage unit includes a latch, and the latch provides a first storage bit. The storage unit further includes a first MOS transistor. A gate of the first MOS transistor is connected to the first storage bit, a source of the first MOS transistor is connected to a first read line, and a drain of the first MOS transistor is connected to a second read line. In a first state, the first read line is a read word line, and the second read line is a read bit line; or in a second state, the second read line is a read word line, and the first read line is a read bit line.

[0024] The latch may be, but is not limited to, a latch formed by circularly connecting two inverting gates. Based on a latch included in a common storage unit, an original bit line and an original word line that are corresponding to the latch are no longer used to read data.

[0025] It may be understood that the first MOS transistor may be an NMOS transistor or a PMOS transistor.

[0026] In an example, the latch further provides a second storage bit. The storage unit further includes a second MOS transistor. A gate of the second MOS transistor is connected to the second storage bit, a source of the second MOS transistor is connected to the first read line, and a drain of the second MOS transistor is connected to the second read line. According to this implementation, data of two storage bits in opposite states is separately read by using a pair of MOS transistors. This can improve data reading accuracy.

[0027] In an example, the first MOS transistor is the NMOS transistor and the second MOS transistor is the PMOS transistor; or the first MOS transistor is the PMOS transistor and the second MOS transistor is the NMOS transistor.

[0028] In this embodiment of the present invention, based on a conventional structure of a latch that can provide a storage bit, an MOS transistor is added. A gate of the MOS transistor is connected to the storage bit, and a source and a drain of the MOS transistor are connected to a first read line and a second read line, respectively. Based on a characteristic of the MOS transistor, either of the first read line and the second read line may be used as a read bit line, and the other is used as a read word line, so that the read word line and the read bit line are interchangeable. A bottom-layer hardware storage unit is changed to quickly obtain data of an address that is symmetric to a specific address about a matrix diagonal, instead of writing an algorithm into code. An SRAM in this form is equivalent to a hardware accelerator. Compared with writing an algorithm into code of a digital circuit, changing the storage unit can greatly increase a system speed, and greatly reduce power consumption and workload. Usually, one or two cycles (clock cycle) are required to complete transpose reading, while thousands of cycles may be required when the algorithm is written into the code of the digital circuit.

[0029] FIG. 3 is a schematic structural diagram of a storage unit of an SRAM according to an embodiment of the present invention. If two inverting gates are designed based on a structure of four MOS transistors in the prior art, the storage unit in this embodiment of the present invention may be considered as a seven-transistor unit. An MOS transistor in a circle shown in the figure is added to a conventional six-transistor unit, so that the SRAM has two modes: conventional reading and transpose reading.

[0030] The following specifically describes a structure of the storage unit shown in FIG. 3. The storage unit includes a first MOS transistor, a second MOS transistor, a third MOS transistor, a fourth MOS transistor, a fifth MOS transistor 31, a sixth MOS transistor 32 (the fifth MOS transistor and the sixth MOS transistor are also referred to as write MOS transistors), and a seventh MOS transistor 33.

[0031] The first MOS transistor, the second MOS transistor, the third MOS transistor, and the fourth MOS transistor are not shown in FIG. 3. The first MOS transistor and the second MOS transistor form a first inverting gate 34, and the third MOS transistor and the fourth MOS transistor form a second inverting gate 35. The first inverting gate 34 and the second inverting gate 35 are connected in a head-to-tail manner to form a latch. An output end of the first inverting gate 34 is a first storage bit Q, and an output end of the second inverting gate 35 is a second storage bit Q.

[0032] One of a source and a drain of the fifth MOS transistor 31 is connected to the second storage bit Q, and one of a source and a drain of the sixth MOS transistor 32 is connected to the first storage bit Q. A gate of the fifth MOS transistor 31 and a gate of the sixth MOS transistor 32 are connected to a write word line (write word line, WWL), the other one of the source and the drain of the fifth MOS transistor 31 is connected to a first write bit line (write bit line, WBL), and the other one of the source and the drain of the sixth MOS transistor 32 is connected to a second write bit line (write bit line negate, WBLN). Because a logical state of the second write bit line is opposite to a logical state of the first write bit line, in this embodiment of the present invention, the first write bit line is denoted as WBL, and the second write bit line is denoted as WBLN. When data is written into the storage unit, the fifth MOS transistor 31 and the sixth MOS transistor 32 are conducted by using the write word line. For example, when the write word line WWL is a logical 1, the logical state of the first write bit line WBL is a logical state of the data, and the data is written into the second storage bit Q; or the logical state of the second write bit line WBLN is a logical state of the data, and the data is written into the first storage bit Q.

[0033] It may be understood that the first state and the second state are merely used to distinguish between two different read states, and the first state and the second state are not limited in this embodiment of the present invention. For example, in the first state, the first read line is a read word line, and the second read line is a read bit line; or in the second state, the second read line is a read word line, and the first read line is a read bit line. Alternatively, in the first state, the second read line is a read word line, and the first read line is a read bit line; or in the second state, the first read line is a read word line, and the second read line is a read bit line.

[0034] A gate of the seventh MOS transistor 33 is connected to one of the first storage bit Q and the second storage bit Q, one of a source and a drain of the seventh MOS transistor 33 is connected to a first read line L1, and the other one of the source and the drain of the seventh MOS transistor 33 is connected to a second read line L2. When the storage unit is a first read state, the first read line L1 is a read bit line (RBL_H, where H is used to identify the first read state, and the first read state may be referred to as a first state for short), and the second read line L2 is a read word line (RWL_H). When the storage unit is a second read state, the first read line L1 is a read word line (RWL_V, where V is used to identify the second read state, and the second read state may be referred to as a second state for short), and the second read line L2 is a read bit line (RBL_V). When data is read from the storage unit, the read word line (RWL_H or RWL_V) is set to a logical 1, and data stored in one of the first storage bit Q and the second storage bit Q is read from the read bit line (RBL_H or RBL_V).

[0035] The first read state may be referred to as conventional reading, and the second read state may be referred to as transpose reading. WBL and WBLN are positive and negative write bit lines and are used to write stored data. WWL is a write word line and is used for gating access when the stored data is written. RBL_H is a read bit line used in conventional reading, and is used to read the stored data. RWL_H is a read word line used in conventional reading, and is used for gating access when the stored data is read. RWL_V is a read word line used in transpose reading, and is used for gating access when the stored data is read. RBL_V is a read bit line used in transpose reading, and is used to read the stored data. The read bit line used in conventional reading is the read word line used in transpose reading, and the read word line used in conventional reading is the read bit line used in transpose reading. Q and Q are storage bits. A 1-bit (bit) control signal, for example, T_EN, may be set for a peripheral circuit, and whether a read mode is conventional reading or transpose reading is controlled by using T_EN.

[0036] In FIG. 3, the seventh MOS transistor 33 is an NMOS transistor.

[0037] In another embodiment of the present invention, the seventh MOS transistor 33 in FIG. 3 may be replaced with a PMOS transistor, as shown in a storage unit in FIG. 4.

[0038] In still another embodiment of the present invention, an MOS transistor may further be added based on the storage unit shown in FIG. 3 or FIG. 4, as shown in a storage unit in FIG. 5. The storage unit further includes an eighth MOS transistor 36. A gate of the eighth MOS transistor 36 is connected to the other one of the first storage bit Q and the second storage bit Q (for example, when the seventh MOS transistor 33 is connected to the first storage bit Q, the eighth MOS transistor 36 is connected to the second storage bit Q). One of a source and a drain of the eighth MOS transistor 36 is connected to the first read line L1, and the other one of the source and the drain of the eighth MOS transistor 36 is connected to the second read line L2 (for example, the source of the eighth MOS transistor 36 is connected to the first read line L1, and the drain of the eighth MOS transistor 36 is connected to the second read line L2). Referring to FIG. 5, the seventh MOS transistor 33 is a PMOS transistor, and the eighth MOS transistor 36 is an NMOS transistor. It may be understood that, alternatively, the seventh MOS transistor 33 may be an NMOS transistor, and the eighth MOS transistor 36 may be a PMOS transistor. According to this implementation, a pair of MOS transistors are used to separately read data of two storage bits in opposite states. This can improve data reading accuracy.

[0039] In a possible implementation, the storage unit is connected to the control signal. When the control signal is a first sequence, the storage unit is the first read state, to be specific, L1 is a read bit line and L2 is a read word line; and the control signal is used to control the peripheral circuit to pull up a level of L2, so as to read data from L1; or when the control signal is a second sequence, the storage unit is the second read state, to be specific, L1 is a read word line and L2 is a read bit line; and the control signal is used to control the peripheral circuit to pull up a level of L1, so as to read data from L2. According to this implementation, the control signal is used to control switching of read states of the storage unit. This is simple and quick.

[0040] In a possible implementation, a length of the first sequence is 1 bit, a length of the second sequence is 1 bit, and the first sequence is 0 and the second sequence is 1, or the first sequence is 1 and the second sequence is 0. According to this implementation, the 1-bit control signal is used to control switching of read states of the storage unit. This is simple and quick.

[0041] An embodiment of the present invention further provides a static random access memory. The static random access memory includes a plurality of storage units described above (for example, the storage unit shown in FIG. 3, FIG. 4, or FIG. 5). The plurality of storage units form a rectangular storage array with a same quantity of rows and columns. Storage units of a same row in the storage array have the same first read line, and storage units of a same column in the storage array have the same second read line. In the following description, only the storage unit shown in FIG. 3 is used as an example for description. A static random access memory formed by other storage units is similar, and therefore, details are not described.

[0042] In this embodiment of the present invention, based on a structure of a conventional storage unit, an MOS transistor is added to the storage unit of the static random access memory, and the newly added MOS transistor is the seventh MOS transistor. Because a gate of the seventh MOS transistor is connected to a storage bit, a source and a drain of the MOS transistor are connected to a first read line and a second read line, respectively. Based on a characteristic of the MOS transistor, either of the first read line and the second read line may be used as a read bit line, and the other is used as a read word line, so that the read word line and the read bit line are interchangeable. A bottom-layer hardware storage unit is changed to quickly obtain data of an address that is symmetric to a specific address about a matrix diagonal, instead of writing an algorithm into code. An SRAM in this form is equivalent to a hardware accelerator. Compared with writing an algorithm into code of a digital circuit, changing the storage unit can greatly increase a system speed, and greatly reduce power consumption and workload. Usually, one or two cycles (clock cycle) are required to complete transpose reading, while thousands of cycles may be required when the algorithm is written into the code of the digital circuit.

[0043] In an example, each storage unit is connected to the control signal. When the control signal is a first sequence, the storage unit is the first read state; or when the control signal is a second sequence, the storage unit is the second read state. According to this implementation, the control signal is used to control switching of read states of the storage unit of the static random access memory. This is simple and quick.

[0044] In an example, a length of the first sequence is 1 bit, a length of the second sequence is 1 bit, and the first sequence is 0 and the second sequence is 1, or the first sequence is 1 and the second sequence is 0. According to this implementation, the 1-bit control signal is used to control switching of read states of the storage unit of the static random access memory. This is simple and quick. In this embodiment of the present invention, only an example in which the control signal is 1 bit is used for description.

[0045] FIG. 6 is a schematic diagram of reading by a storage unit in conventional reading. When data is read, if a control signal T_EN = 0, the control signal drives a control circuit to pull up a level of L2 and read the data from L1. In this case, an SRAM read mode is conventional reading, where a vertical line is a read bit line RBL_H, and a horizontal line is a read word line RWL_H. The read word line RWL_H remains to be 1. If a value stored in a node Q is 0, and an added MOS transistor is turned off, data read from the read bit line RBL_H is 0. If a value stored in a node Q is 1, and an added MOS transistor is turned on, data read from the read bit line RBL_H is 1. FIG. 7 is a schematic diagram of reading by a storage array in conventional reading. It is assumed that data stored in a storage unit whose word line number is 0 and bit line number is 2 needs to be read, data in (RBL_H2, RWL_H0) is read, that is, data in a rightmost storage unit in a first row is read.

[0046] FIG. 8 is a schematic diagram of reading by a storage unit in transpose reading. When data is read, if a control signal T_EN = 1, the control signal drives a control circuit to pull up a level of L1 and read the data from L2. In this case, an SRAM read mode is transpose reading, where a horizontal line becomes a read bit line RBL_V, and a vertical line becomes a read word line RWL_V. In this embodiment of the present invention, values of a word line and a bit line that are sent from a logical circuit outside the storage array are not changed, but the word line and the bit line are changed, to be specific, the word line and the bit line are interchanged. FIG. 9 is a schematic diagram of reading by a storage array in transpose reading. It is assumed that data stored in a storage unit whose word line number is 0 and bit line number is 2 still needs to be read, as shown in FIG. 9, data in (RBL_V2, RWL_V0) is read in this case, that is, data stored in a topmost storage unit in a first column is read, and data in a storage unit that is symmetric to the storage unit in conventional reading in FIG. 7 about a matrix diagonal is read. In this way, data of a storage address that is symmetric to a storage address about a diagonal of the storage array can be read in only one cycle namely, one clock cycle), thereby greatly improving a system speed and greatly reducing power consumption and workload.

[0047] In a possible implementation, each storage unit is connected to a control signal. When the control signal is a first sequence, the storage unit is the first read state; or when the control signal is a second sequence, the storage unit is the second read state. According to this implementation, the control signal is used to control switching of read states of the storage unit of the static random access memory. This is simple and quick.

[0048] In a possible implementation, a length of the first sequence is 1 bit, a length of the second sequence is 1 bit, and the first sequence is 0 and the second sequence is 1, or the first sequence is 1 and the second sequence is 0. According to this implementation, the 1-bit control signal is used to control switching of read states of the storage unit of the static random access memory. This is simple and quick.

[0049] In this embodiment of the present invention, the control signal can be used to control a control circuit outside the storage array, to switch the static random access memory between the first read state and the second read state.

[0050] FIG. 10 is a schematic diagram of a control circuit according to an embodiment of the present invention. In addition to a storage array, the static random access memory includes a control circuit of the following structure: a first decoder 1001, a second decoder 1002, a first word line driver 1003, a second word line driver 1004, a first read amplifier 1005, a second read amplifier 1006, and a phase inverter 1007. The control signal T_EN is connected to an enable end (a1) of the first decoder 1001, an enable end (a1) of the first word line driver 1003, and an enable end (a1) of the first read amplifier 1005. An input end (a2) of the first decoder 1001 inputs an address value, a first output end (a3) of the first decoder 1001 outputs a row number, and a second output end (a4) of the first decoder 1001 outputs a column number. The first output end (a3) of the first decoder 1001 is connected to a control end (a3) of the first read amplifier 1005, and input ends (a10, a11, and a12) of the first read amplifier 1005 are connected to first read lines (namely, a vertical line L10, L11, and L12). The second output end (a4) of the first decoder 1001 is connected to a control end (a4) of the first word line driver 1003, and output ends (a20, a21, and a22) of the first word line driver 1003 are connected to second read lines (namely, a horizontal line L20, L21, and L22). The control end (a3) of the first read amplifier 1005 is configured to control the first read amplifier 1005 to perform amplification processing on data of at least one input end (a10, a11, and a12), and the control end (a4) of the first word line driver 1003 is configured to control at least one output end (a20, a21, and a22) of the first word line driver 1003 to output a logical 1. The control signal is connected to an input end of the phase inverter 1007, and an output end of the phase inverter 1007 is connected to an enable end (b1) of the second decoder 1002, an enable end (b1) of the second word line driver 1004, and an enable end (b1) of the second read amplifier 1006. An input end (b2) of the second decoder 1002 inputs an address value, a first output end (b3) of the second decoder 1002 outputs a row number, and a second output end (b4) of the second decoder 1002 outputs a column number. The first output end (b3) of the second decoder 1002 is connected to a control end (b3) of the second read amplifier 1006, and input ends (b20, b21, and b22) of the second read amplifier 1006 are connected to the second read lines (namely, the horizontal lines L20, L21, and L22). The second output end (b4) of the second decoder 1002 is connected to a control end (b4) of the second word line driver 1004, output ends (b10, b11, and b12) of the second word line driver 1004 are connected to the first read lines (namely, the vertical lines L10, L11, and L12). The control end (b3) of the second read amplifier 1006 is configured to control the second read amplifier 1006 to perform amplification processing on data of at least one input end (b20, b21, and b22), and the control end (b4) of the second word line driver 1004 is configured to control at least one output end (b10, b11, and b12) of the second word line driver 1004 to output a logical 1. According to this implementation, two groups of decoders, word line drivers, and read amplifiers are set. The two groups of decoders, word line drivers, and read amplifiers have different paths. A control signal is used to control one of the two groups of decoders, word line drivers, and read amplifiers to work, so that the read word line and the read bit line are interchanged.

[0051] FIG. 11 is a schematic diagram of another control circuit according to an embodiment of the present invention. In addition to a storage array, the static random access memory includes a control circuit of the following structure: a decoder 1101, a word line driver 1102, a read amplifier 1103, first selectors 1104 with a same quantity as a row quantity of the storage array, and second selectors 1105 with a same quantity as a column quantity of the storage array. An input end (c1) of the decoder 1101 inputs an address value, a first output end (c2) of the decoder 1101 outputs a row number, and a second output end (c3) of the decoder 1101 outputs a column number. The first output end (c2) of the decoder 1101 is connected to a control end (c2) of the word line driver 1102, the control end (c2) of the word line driver 1102 is configured to control at least one output end (c10, c11, and c12) of the word line driver 1102 to output a logical 1, the second output end (c3) of the decoder 1101 is connected to a control end (c3) of the read amplifier 1103, and the control end (c3) of the read amplifier 1103 is configured to control the read amplifier 1103 to perform amplification processing on data of at least one input end (c20, c21, and c22). The control signal T_EN is connected to control ends (k0) of the first selectors 1104 and control ends (k0) of the second selectors 1105. First ports (k1) of the first selectors 1104 are connected to first read lines (namely, the vertical lines L10, L11, and L12, for example, first ports (k1) of three first selectors 1104 from top to bottom are respectively connected to L10, L11, and L12), second ports (k2) of the first selectors 1104 are connected to second read lines (namely, the horizontal lines L20, L21, and L22, for example, second ports (k2) of three first selectors 1104 from top to bottom are respectively connected to L20, L21, and L22), and third ports (k3) of the first selectors 1104 are connected to output ends of the word line driver 1102 (c10, c11, and c12. For example, third ports (k3) of three first selectors 1104 from top to bottom are respectively connected to c10, c11, and c12). Control ends (k0) of the first selectors 1104 are configured to control the first ports (k1) of the first selectors 1104 to connect to the third ports (k3) of the first selectors 1104, or control the second ports (k2) of the first selectors 1104 to connect to the third ports (k3) of the first selectors 1104. First ports (k1) of the second selectors 1105 are connected to the second read lines (namely, the horizontal lines L20, L21, and L22), second ports (k2) of the second selectors 1105 are connected to the first read lines (namely, the vertical lines L10, L11, and L12), and third ports (k3) of the second selectors 1105 are connected to input ends of the read amplifier 1103 (c20, c21, and c22, for example, first ports (k3) of three second selectors 1105 from left to right are respectively connected to c20, c21, and c22). The control ends (k0) of the second selectors 1105 are configured to control the first ports (k1) of the second selectors 1105 to connect to the third ports (k3) of the second selectors 1105, or control the second ports (k2) of the second selectors 1105 to connect to the third ports (k3) of the second selectors 1105. According to this implementation, the control signal T_EN is used to control a selector to switch between circuit connection modes, so that a read word line and a read bit line are interchanged.

[0052] In the embodiments of the present invention, the static random access memory may be disposed on a chip with a specific function, or may be independent of the foregoing chip. The chip may be a chip on which a large amount of fast matrix calculation is performed. For example, in the field of artificial intelligence, an SRAM customized on a convolutional neural network (Convolutional Neural Network, CNN) chip can be used for quick decomposition of a convolution kernel. The SRAM in the embodiments of the present invention may also be used as an SRAM on some encryption chips.

[0053] The following describes some effects of the SRAM customized on the CNN chip.

[0054] With intelligence of chips, based on a standard single-instruction multiple-data stream (Single Instruction Multiple Data, SIMD), data communication on a bus may further be reduced because of a special multiplexing mechanism of a CNN. The concept of multiplexing is very important to an ultra-large neural network. For mid-high-level convolution kernels in these models, a parameter amount of convolution kernels may reach 3 x 3 x 512. Properly decomposing these ultra-large convolution kernels to effective hardware becomes a problem worth studying.

[0055] By using the SRAM in transpose reading mentioned in this embodiment of the present invention, to be specific, when a bit line and a word line may be interchanged during data reading, additional data processing in the convolutional network can be omitted, processing power consumption of the CNN can be reduced, and a concept of underlying hardware can be provided for a large convolution structure.

[0056] The name of the convolutional network comes from a "convolution" operation. A basic purpose of convolution is to extract features from an input image. Convolution means that an image feature is learned by using data in a small square matrix, so that a spatial relationship between pixels may be retained. As described above, each image is a matrix of pixel values. Considering a 5 x 5 image, pixel values of the image are 0 and 1. In FIG. 12, a left matrix is an image matrix, the image matrix is a special case of a gray-scale image (pixel values of a conventional gray-scale image are 0 to 255); and a right matrix is a convolution kernel matrix, and the convolution kernel matrix is a 3 x 3 matrix. FIG. 13 is a schematic diagram of convolution calculation between a 5 x 5 image matrix and a 3 x 3 convolution kernel matrix. Each 3 x 3 sub-matrix in the left image matrix is multiplied by a 3 x 3 convolution kernel matrix, and then a value obtained by adding results of multiplying in all positions is placed in a new matrix, namely, the right matrix in FIG. 13. The matrix is a new image matrix generated after a convolution operation is performed on an original image matrix and an original convolution kernel matrix.

[0057] The 3 x 3 matrix in FIG. 12 or FIG. 13 is also referred to as a "filter", a "convolution kernel", or a "feature detector". A matrix obtained by sliding the filter on the original image and multiplying matrices is referred to as a "convolution feature", "excitation mapping", or "feature mapping". The filter is a feature detector for an original input image. An image is considered as a matrix, and new images are obtained after convolution operations are performed on the image and different convolution kernel matrices. Performing a convolution operation on the different convolution kernel matrices produces different effects. The convolution kernel is decomposed. For example, a two-dimensional convolution kernel that is symmetrical about a diagonal is decomposed into two one-dimensional convolution kernels. Convolution calculation performed on an image matrix (donated as A) and a two-dimensional convolution kernel matrix (denoted as C) may be approximately convolution calculation performed on the image matrix and two one-dimensional convolution kernel matrices (denoted as S1 and S2), that is:

(⊗ is a convolution operation symbol)

[0058] Substituting S1 and S2 for the two-dimensional convolution kernel matrix C is referred to as decomposition of the convolution kernel. S 1 and S2 are one-dimensional matrices that are symmetric about a diagonal of a two-dimensional matrix C.

[0059] If a conventional SRAM is used, and a convolution kernel needs to be decomposed, data of an address that is symmetrical to a specific address in relation to a matrix diagonal cannot be directly obtained, and the data needs to be processed by writing an algorithm into code of an SRAM peripheral digital circuit. The SRAM in the embodiments of the present invention is applied to the convolutional neural network chip, so that the convolution kernel can be quickly decomposed, power consumption of frequently decomposing the convolution kernel can be greatly reduced, and a speed of the operation can be greatly increased.

[0060] The objectives, technical solutions, and benefits of the present invention are further described in detail in the foregoing specific embodiments. It should be understood that the foregoing descriptions are merely specific implementations of the present invention, but are not intended to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made based on the technical solutions of the present invention shall fall within the protection scope of the present invention.


Claims

1. A storage unit, wherein the storage unit comprises a latch, the latch provides a first storage bit, and the storage unit further comprises a first MOS transistor; and
a gate of the first MOS transistor is connected to the first storage bit, a source of the first MOS transistor is connected to a first read line, and a drain of the first MOS transistor is connected to a second read line; in a first state, the first read line is a read word line, and the second read line is a read bit line; and in a second state, the second read line is a read word line, and the first read line is a read bit line.
 
2. The storage unit according to claim 1, wherein the first MOS transistor is an NMOS transistor or a PMOS transistor.
 
3. The storage unit according to claim 2, wherein the latch further provides a second storage bit and the storage unit further comprises a second MOS transistor; and a gate of the second MOS transistor is connected to the second storage bit, a source of the second MOS transistor is connected to the first read line, and a drain of the second MOS transistor is connected to the second read line.
 
4. The storage unit according to claim 3, wherein the first MOS transistor is an NMOS transistor and the second MOS transistor is a PMOS transistor; or the first MOS transistor is a PMOS transistor and the second MOS transistor is an NMOS transistor.
 
5. A static random access memory, wherein the static random access memory comprises a plurality of storage units;
the storage unit comprises a latch, the latch provides a first storage bit, and the storage unit further comprises a first MOS transistor;
a gate of the first MOS transistor is connected to the first storage bit, a source of the first MOS transistor is connected to a first read line, and a drain of the first MOS transistor is connected to a second read line; in a first state, the first read line is a read word line, and the second read line is a read bit line; and in a second state, the second read line is a read word line, and the first read line is a read bit line;
the plurality of storage units form a rectangular storage array with a same quantity of rows and columns;
storage units of a same row in the storage array are connected to the same first read line; and
storage units of a same column in the storage array are connected to the same second read line.
 
6. The static random access memory according to claim 5, wherein the first MOS transistor is an NMOS transistor or a PMOS transistor.
 
7. The static random access memory according to claim 6, wherein the latch further provides a second storage bit, and the storage unit further comprises a second MOS transistor; and a gate of the second MOS transistor is connected to the second storage bit, a source of the second MOS transistor is connected to the first read line, and a drain of the second MOS transistor is connected to the second read line.
 
8. The static random access memory according to claim 7, wherein the first MOS transistor is the NMOS transistor and the second MOS transistor is the PMOS transistor; or the first MOS transistor is the PMOS transistor and the second MOS transistor is the NMOS transistor.
 
9. The static random access memory according to any one of claims 5 to 8, wherein each of the storage units is connected to a control signal; and
when the control signal is a first sequence, the storage unit is considered to be in the first state; or
when the control signal is a second sequence, the storage unit is considered to be in the second state.
 
10. The static random access memory according to claim 9, wherein a length of the first sequence is 1 bit, and a length of the second sequence is 1 bit; and the first sequence is 0 and the second sequence is 1, or the first sequence is 1 and the second sequence is 0.
 
11. The static random access memory according to claim 9 or 10, wherein the static random access memory further comprises:

a first decoder, a second decoder, a first word line driver, a second word line driver, a first read amplifier, a second read amplifier, and a phase inverter, wherein

the control signal is connected to an enable end of the first decoder, an enable end of the first word line driver, and an enable end of the first read amplifier;

an input end of the first decoder inputs an address value, a first output end of the first decoder outputs a row number, and a second output end of the first decoder outputs a column number; the first output end of the first decoder is connected to a control end of the first read amplifier, and an input end of the first read amplifier is connected to the first read line; and the second output end of the first decoder is connected to a control end of the first word line driver, an output end of the first word line driver is connected to the second read line, the control end of the first read amplifier is configured to control the first read amplifier to perform amplification processing on data of at least one input end, and the control end of the first word line driver is configured to control at least one output end of the first word line driver to output a first logical state;

the control signal is connected to an input end of the phase inverter, and an output end of the phase inverter is connected to an enable end of the second decoder, an enable end of the second word line driver, and an enable end of the second read amplifier; and

an input end of the second decoder inputs an address value, a first output end of the second decoder outputs a row number, and a second output end of the second decoder outputs a column number; the first output end of the second decoder is connected to a control end of the second read amplifier, an input end of the second read amplifier is connected to the second read line, the second output end of the second decoder is connected to a control end of the second word line driver, an output end of the second word line driver is connected to the first read line, the control end of the second read amplifier is configured to control the second read amplifier to perform amplification processing on data of at least one input end, and the control end of the second word line driver is configured to control at least one output end of the second word line driver to output the first logical state.


 
12. The static random access memory according to claim 9 or 10, wherein the static random access memory further comprises:

a decoder, a word line driver, a read amplifier, first selectors with a same quantity as a row quantity of the storage array, and second selectors with a same quantity as a column quantity of the storage array, wherein

an input end of the decoder inputs an address value, a first output end of the decoder outputs a row number, and a second output end of the decoder outputs a column number; the first output end of the decoder is connected to a control end of the word line driver, the control end of the word line driver is configured to control at least one output end of the word line driver to output a first logical state, the second output end of the decoder is connected to a control end of the read amplifier, the control end of the read amplifier is configured to control the read amplifier to perform amplification processing on data of at least one input end;

the control signal is connected to a control end of the first selector and a control end of the second selector;

a first port of the first selector is connected to the first read line, a second port of the first selector is connected to the second read line, and a third port of the first selector is connected to an output end of the word line driver; and a control end of the first selector is configured to control the first port of the first selector to connect to the third port of the first selector, or control the second port of the first selector to connect to the third port of the first selector; and

a first port of the second selector is connected to the second read line, a second port of the second selector is connected to the first read line, and a third port of the second selector is connected to an input end of the read amplifier; and the control end of the second selector is configured to control the first port of the second selector to connect to the third port of the second selector, or control the second port of the second selector to connect to the third port of the second selector.


 
13. A storage unit, comprising a latch, a write MOS transistor, a write word line, a write bit line, a first read line, and a second read line, wherein the latch provides a first storage bit and a second storage bit, the write bit line is connected to the first storage bit and the second storage bit by using the write MOS transistor, and the write word line is connected to a gate of the write MOS transistor to control on and off of the write MOS transistor; and the storage unit further comprises a first MOS transistor, and a gate of the first MOS transistor is connected to the first storage bit, a source of the first MOS transistor is connected to the first read line, and a drain of the first MOS transistor is connected to the second read line.
 




Drawing































Search report










Cited references

REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description