Field of the Invention
[0001] The present invention relates generally to a texture mapping computer graphics system
and, more particularly, to a system and method for buffering texture data transferred
between circuit boards.
Background of the Invention
[0002] Computer graphics systems commonly are used for displaying graphical representations
of objects on a two dimensional display screen. Current computer graphics systems
can provide highly detailed representations and are used in a variety of applications.
[0003] In typical computer graphics systems, an object to be represented on the display
screen is broken down into a plurality of graphics primitives. Primitives are basic
components of a graphics picture and may include points, lines, vectors and polygons,
such as triangles. Typically, a hardware/software scheme is implemented to render,
or draw, on the two-dimensional display screen, the graphics primitives that represent
the view of one or more objects being represented on the screen.
[0004] Typically, the primitives that define the three-dimensional object to be rendered
are provided from a host computer, which defines each primitive in terms of primitive
data. For example, when the primitive is a triangle, the host computer may define
the primitive in terms of the x,y,z coordinates of its vertices, as well as the R,G,B
color values of each vertex. Rendering hardware interpolates the primitive data to
compute the display screen pixels that are turned on to represent each primitive,
and the R,G,B values for each pixel.
[0005] Early graphics systems failed to display images in a sufficiently realistic manner
to represent or model complex three-dimensional objects. The images displayed by such
systems exhibited extremely smooth surfaces absent textures, bumps, scratches, shadows
and other surface details present in the object being modeled.
As a result, methods were developed to display images with improved surface detail.
Texture mapping is one such method that involves mapping a source image, referred
to as a texture, onto a surface of a three-dimensional object, and thereafter mapping
the textured three-dimensional object to the two-dimensional graphics display screen
to display the resulting image. Surface detail attributes commonly texture mapped
include color, specular reflection, vector perturbation, specularity, transparency,
shadows, surface irregularities and grading.
[0006] Texture mapping involves applying one or more point elements (texels) of a texture
to each point element (pixel) of the displayed portion of the object to which the
texture is being mapped. Texture mapping hardware is conventionally provided with
information indicating the manner in which the texels in a texture map correspond
to the pixels on the display screen that represent the object. Each texel in a texture
map is defined by S and T coordinates which identify its location in the two-dimensional
texture map. For each pixel, the corresponding texel or texels that map to it are
accessed from the texture map, and incorporated into the final R,G,B values generated
for the pixel to represent the textured object on the display screen.
[0007] It should be understood that each pixel in an object primitive may not map in one-to-one
correspondence with a single texel in the texture map for every view of the object.
For example, the closer the object is to the view port represented on the display
screen, the larger the object will appear. As the object appears larger on the display
screen, the representation of the texture becomes more detailed. Thus, when the object
consumes a fairly large portion of the display screen, a large number of pixels is
used to represent the object on the display screen, and each pixel that represents
the object may map in one-to-one correspondence with a single texel in the texture
map, or a single texel may map to multiple pixels. However, when the object takes
up a relatively small portion of the display screen, a much smaller number of pixels
is used to represent the object, resulting in the texture being represented with less
detail, so that each pixel may map to multiple texels. Each pixel may also map to
multiple texels when a texture is mapped to a small portion of an object. Resultant
texel data is calculated for each pixel that maps to more than one texel, and typically
represents an average of the texels that map to that pixel.
[0008] Texture mapping hardware systems typically include a local memory that stores data
representing a texture associated with the object being rendered. As discussed above,
a pixel may map to multiple texels. If it were necessary for the texture mapping hardware
to read a large number of texels that map to a pixel from the local memory to generate
an average value, then a large number of memory reads and the averaging of many texel
values would be required, which would be time consuming and would degrade system performance.
[0009] To overcome this problem, a scheme has been developed that involves the creation
of a series of MIP maps for each texture, and storing the MIP maps of the texture
associated with the object being rendered in the local memory of the texture mapping
hardware. A MIP map for a texture includes a base map that corresponds directly to
the texture map as well as a series of filtered maps wherein each successive map is
reduced in size by a factor of two in each of the two texture map dimensions. An illustrative
example of a set of MIP maps is shown in Fig. 1. The MIP (multum in parvo-many things
in a small place) maps include a base map 100 that is eight-by-eight texels in size,
as well as a series of maps 102, 104 and 108 that are respectively four-by-four texels,
two-by-two texels, and one texel in size.
[0010] The four-by-four map 102 is generated by box filtering (decimating) the base map
100, such that each texel in the map 102 corresponds to an average of four texels
in the base map 100. For example, the texel 110 in map 102 equals the average of the
texels 112-115 in map 100, and texels 118 and 120 in map 102 respectively equal the
averages of texels 121-124 and 125-128 in map 100. The two-by-two map 104 is similarly
generated by box filtering map 102, such that texel 130 in map 104 equals the average
of texels 110 and 118-120 in map 102. The single texel in map 108 is generated by
averaging the four texels in map 104.
[0011] Conventional graphics systems generally download, from the main memory of the host
computer to the local memory of the texture mapping hardware, the complete series
of MIP maps for any texture that is to be used with the primitives to be rendered
on the display screen. Thus, the texture mapping hardware can access texture data
from any of the series of MIP maps. The determination of which map to access to provide
the texel data for any particular pixel is based upon the number of texels to which
the pixel maps. For example, if the pixel maps in one-to-one correspondence with a
single texel in the texture map, then the base map 100 is accessed. However, if the
pixel maps to four, sixteen or sixty-four texels, then the maps 102, 104 and 108 are
respectively accessed because those maps respectively store texel data representing
an average of four, sixteen and sixty-four texels in the texture map.
[0012] A pixel may not map directly to any one texel in the selected map, and may fall between
two or more texels. Some graphics systems employ bi-linear interpolation to accurately
produce texel data when this occurs. If a pixel maps into a MIP map between two or
more texel entries, then the resulting texel data used is a weighted average of the
closest texel entries. Thus, the texel data corresponding to any pixel can be the
weighted average of as many as four texel entries in a single map. For example, if
a pixel maps to a location in map 102 indicated at 132, the resulting texel data mapping
to that pixel would be the weighted average of the texels 110 and 118-120.
[0013] Pixels may also not map directly into any one of the maps in the series of MIP maps,
and may fall between two maps. For example, a pixel may map to a number of texels
in the texture map that is greater than one but less than four. Some graphics systems
address this situation by interpolating between the two closest MIP maps to achieve
the resultant texel data. For the example above wherein a pixel maps to greater than
one but less than four texels in the texture map, the texel data provided by maps
100 and 102 would be interpolated to achieve the resultant texel data for the pixel.
When combined with the above-described interpolation of multiple texel entries in
a single map, this scheme is known as tri-linear interpolation, and can lead to resultant
texel data for any one pixel being generated as a weighted average of as many as eight
texels, i.e., the four closest texels in each of the two closest maps.
[0014] In pipelined systems, in which various operations are performed simultaneously on
different object primitives by different system elements, it often is necessary to
buffer data being transferred between different chips or boards of the system. It
is desirable in such systems to reduce the size, cost and complexity of the buffering
hardware.
Summary of the Invention
[0015] In one embodiment of the invention, in a texture mapping computer graphics system,
a method is provided for transferring texture data including a plurality of texels
from a texture mapping chip to multiple frame buffer controller chips. The method
includes the following steps: receiving texels from the texture mapping chip, each
texel being destined for a particular frame buffer controller chip; temporarily storing
a limited number of texels in a texel array storage unit; shifting each texel into
each of a plurality of first storage registers, wherein each frame buffer controller
chip has a corresponding first storage register; and transferring the each texel from
the first storage registers into each frame buffer controller chip.
[0016] In one embodiment, the step of transferring farther includes the step of transferring
a portion of each texel from the first storage registers into second storage registers
and then transferring the portions of the texels from the second storage registers
into the frame buffer controller chips.
[0017] In one embodiment, the method farther includes, for each texel stored in the texel
array storage unit, the step of storing an address of the texel array storage unit
within one of a plurality of address storage units, wherein each frame buffer controller
chip has a corresponding address storage unit.
[0018] In one embodiment, the method farther includes, during the step of receiving texels,
the step of determining, for each texel, for which frame buffer controller chip the
texel is destined. The method also includes, before the step of shifting, the step
of determining the highest priority frame buffer controller chip for receiving a texel.
[0019] In another embodiment of the invention, a texture mapping computer graphics system
includes a texture mapping chip that stores a plurality of texels and multiple frame
buffer controller chips that process the texels. An interface between the texture
mapping chip and the frame buffer controller chip includes a texel array storage unit,
coupled between a portion of the texture mapping chip and the frame buffer controller
chips, that temporarily stores a limited number of texels, each texel being destined
for a particular frame buffer controller chip. A control unit, coupled to the texel
array storage unit, controls shifting texels from the texture mapping chip into locations
within the texel array storage unit and transferring texels from the texel array storage
unit into appropriate frame buffer controller chips.
[0020] In an embodiment, the interface further includes a plurality of address storage units,
coupled to the control unit, that store addresses of locations within the texel array
storage unit in which texels are stored, wherein each address storage unit corresponds
to a different frame buffer controller chip.
[0021] In an embodiment, the control unit includes a first portion that controls shifting
texels from a texel interpolator into locations within the texel array storage unit
and a second portion that controls transferring texels from the texel array storage
unit into appropriate frame buffer controller chips.
[0022] In an embodiment, the first portion of the control unit includes a decoder, coupled
to the texture mapping chip, that determines to which frame buffer controller chip
each texel is destined and, for each texel, enables writing the texel array storage
unit address to the corresponding address storage unit. The second portion of the
control unit includes a priority decoder that determines relative priorities of the
frame buffer controller chips for receiving texels from the texel array storage unit.
[0023] In an embodiment, the interface further includes a plurality of registers, one register
corresponding to each frame buffer controller chip, each register coupled between
the texel array storage unit and the corresponding frame buffer controller chip, for
temporarily storing texels destined for the frame buffer controller chip.
[0024] An even farther embodiment of the invention is directed to a texture mapping computer
graphics system including a texture mapping chip that stores a plurality of texels.
Also included is a plurality of frame buffer controller chips, coupled to the texture
mapping chip, each frame buffer controller chip receiving and processing different
texels from the texture mapping chip. A texel array storage unit, coupled between
the texture mapping chip and the frame buffer controller chips, temporarily stores
texels when the texels are transferred from the texture mapping chip to the frame
buffer controller chips.
[0025] The system further includes a control unit, coupled to the texel array storage unit,
having a first portion that controls shifting texels from the texture mapping chip
into locations within the texel array storage unit, and a second portion that controls
transferring texels from the texel array storage unit into appropriate frame buffer
controller chips.
[0026] In an embodiment, the system further includes a plurality of address storage units,
coupled to the control unit, that store addresses of locations within the texel array
storage unit in which texels are stored, wherein each address storage unit corresponds
to a different frame buffer controller chip.
[0027] In an embodiment, the first portion of the control unit includes a decoder, coupled
to the texture mapping chip, that determines to which frame buffer controller chip
each texel is destined and, for each texel, enables writing the texel array storage
unit address to the corresponding address storage unit. The second portion of the
control unit includes a priority decoder that determines relative priorities of the
frame buffer controller chips for receiving texels from the texel array storage unit.
Brief Description of the Drawings
[0028] For a better understanding of the present invention, reference is made to the accompanying
drawings, which are incorporated herein by reference and in which:
Fig. 1 is a graphical illustration of a set of texture MIP maps;
Fig. 2 is a block diagram of one embodiment of an overall computer graphics system;
Fig. 2A is a block diagram of another embodiment of an overall computer graphics system;
Fig. 3 is a block diagram of texture mapping hardware;
Fig. 4 is a more detailed block diagram of the parameter interpolator element of the
texture mapping hardware of Fig. 3;
Fig. 5 is a block diagram illustrating one embodiment of the screen space interleaving
for the frame buffer controller chips according to the present invention;
Fig. 6 is a block diagram illustrating the texel buffering required within the interface
to the frame buffer controller chips on the texture mapping chip, according to the
present invention;
Fig. 7 is a block diagram of the interface to the frame buffer controller chips on
the texture mapping chip;
Fig. 8 is a block diagram of a first portion of the control unit within the frame
buffer controller chips interface; and
Fig. 9 is a block diagram of a second portion of the control unit within the frame
buffer controller chips interface.
Detailed Description
I. System Overview
[0029] Fig. 2 is a block diagram of one embodiment of a graphics system that includes texture
mapping hardware. The present invention is directed to an interface for buffering
data transferred between boards. It should be understood that the illustrative implementation
shown is merely exemplary with respect to the number of boards and chips, the manner
in which they are partitioned, the bus widths, and the data transfer rates. Numerous
other implementations can be employed.
[0030] As shown, the system includes a front end board 10, a texture mapping board 12, and
a frame buffer board 14. The front end board communicates with a host computer 15
over a 52-bit bus 16. The front end board receives primitives to be rendered from
the host computer over bus 16. The primitives are specified by x,y,z vector coordinate
data, R,G,B color data and texture S,T coordinates, all for portions of the primitives,
such as for the vertices when the primitive is a triangle. Data representing the primitives
in three dimensions then is provided by the front end board 10 to the texture mapping
board 12 and the frame buffer board 14 over 85-bit bus 18. The texture mapping board
interpolates the primitive data received to compute the screen display pixels that
will represent the primitive, and determines corresponding resultant texture data
for each primitive pixel. The resultant texture data is provided to the frame buffer
board over five 11-bit buses 28, which are shown in Fig. 2 as a single bus to clarify
the figure. As will be described in detail herein, the present invention relates to
an interface on the texture mapping board 12 for buffering resultant texture data
destined for the frame buffer board.
[0031] The frame buffer board 14 also interpolates the primitive data received from the
front end board 10 to compute the pixels on the display screen that will represent
each primitive, and to determine object color values for each pixel. The frame buffer
board then combines, on a pixel by pixel basis, the object color values with the resultant
texture data provided from the texture mapping board, to generate resulting image
R,G,B values for each pixel. R,G,B color control signals for each pixel are respectively
provided over R,G,B lines 29 to control the pixels of the display screen (not shown)
to display a resulting image on the display screen that represents the texture mapped
primitive.
[0032] The front end board 10, texture mapping board 12 and frame buffer board 14 each is
pipelined and operates on multiple primitives simultaneously. While the texture mapping
and frame buffer boards operate on primitives previously provided by the front end
board, the front end board continues to operate upon and provide new primitives until
the pipelines in the boards 12 and 14 become full.
[0033] The front end board 10 includes a distributor chip 30, 3 three-dimensional (3-D)
geometry accelerator chips 32A, 32B and 32C, a two-dimensional (2-D) geometry accelerator
chip 34 and a concentrator chip 36. The distributor chip 30 receives the X,Y,Z coordinate
and color primitive data over bus 16 from the host computer, and distributes 3-D primitive
data evenly among the 3-D geometry accelerator chips 32A, 32B and 32C. In this manner,
the system bandwidth is increased because three groups of primitives are operated
upon simultaneously. Data is provided over 40-bit bus 38A to the 3-D geometry accelerator
chips 32A and 32B, and over 40-bit bus 38B to chip 32C. Both buses 38A and 38B transfer
data at a rate of 60 MHz and provide sufficient bandwidth to support two 3-D geometry
accelerator chips. 2-D primitive data is provided over a 44-bit bus 40 to the 2-D
geometry accelerator chip 34 at a rate of 40 MHz.
[0034] Each 3-D geometry accelerator chip transforms the x,y,z coordinates that define the
primitives received into corresponding screen space coordinates, determines object
R,G,B values and texture S,T values for the screen space coordinates, decomposes primitive
quadrilaterals into triangles, and computes a triangle plane equation to define each
triangle. Each 3-D geometry accelerator chip also performs view clipping operations
to ensure an accurate screen display of the resulting image when multiple windows
are displayed, or when a portion of a primitive extends beyond the view volume represented
on the display screen. Output data from the 3-D geometry accelerator chips 32A, 32B
and 32C respectively is provided over 44-bit buses 42A, 42B and 42C to concentrator
chip 36 at a rate of 60 MHz. Two-dimensional geometry accelerator chip 34 also provides
output data to concentrator chip 36 over a 46-bit bus 44 at a rate of 45 MHz. Concentrator
chip 36 combines the 3-D primitive output data received from the 3-D geometry accelerator
chips 32A-C, re-orders the primitives to the original order they had prior to distribution
by the distributor chip 30, and provides the combined primitive output data over bus
18 to the texture mapping and frame buffer boards.
[0035] Texture mapping board 12 includes a texture mapping chip 46 and a local memory 48
which preferably is arranged as a cache memory. The local memory may be formed from
a plurality of SDRAM (synchronous dynamic random access memory) chips. The cache memory
48 stores texture MIP map data associated with the primitives being rendered in the
frame buffer board. The texture MIP map data is downloaded from a main memory 17 of
the host computer 15, over bus 40, through the 2-D geometry accelerator chip 34, and
over 24-bit bus 24.
[0036] The texture mapping chip 46 successively receives primitive data over bus 18 representing
the primitives to be rendered on the display screen. As discussed above, the primitives
provided from the 3-D geometry accelerator chips 32A-C include points, lines and triangles.
The texture mapping board does not perform texture mapping of points or lines, and
operates only upon triangle primitives. The data representing the triangle primitives
includes the x,y,z object pixel coordinates for at least one vertex, the object color
R,G,B values of the at least one vertex, the coordinates in S,T of the portions of
the texture map that correspond to the at least one vertex, and the plane equation
of the triangle. The texture mapping chip 46 ignores the object pixel z coordinate
and the object color R,G,B values. The chip 46 interpolates the x,y pixel coordinates
and interpolates S and T coordinates that correspond to each x,y screen display pixel
that represents the primitive. For each pixel, the texture mapping chip accesses the
portion of the texture MIP map that corresponds to the pixel from the cache memory,
and computes resultant texture data for the pixel, which may include a weighted average
of multiple texels.
[0037] The cache may store sixty-four blocks of 256x256 texels. Unlike the local memory
employed in the texture mapping hardware of prior art systems, the cache memory may
not store the entire series of MIP maps of the texture that maps to the primitive
being rendered, such as for large textures. Rather, the cache memory stores at any
one time only the particular portions of the series of MIP maps actually used in currently
rendering the primitive. Therefore, for most applications, only a portion of the complete
texture data for the image being rendered will be stored in the cache memory at any
one time.
[0038] The complete series of MIP maps for each texture is arranged and stored in the main
memory 17 of the host computer 15. For each pixel of the primitive being rendered,
the texture mapping chip 46 accesses a directory of the cache memory 48 to determine
whether the corresponding texel or texels of the texture MIP maps currently are present
in the cache. If the corresponding texels are stored in the cache memory at the time
of the access, then a cache hit occurs, and the texels are read from the cache and
operated upon by the texture mapping chip 46 to compute the resultant texture data
which is passed to the frame buffer board. If, however, the corresponding texels for
the current primitive pixel are not stored in the cache memory when accessed by the
texture mapping chip 46, a cache miss occurs. When a cache miss occurs, the portion
of the texture MIP map data needed to render the primitive is downloaded from the
main memory 17 of the host computer 15 into the cache memory 48, possibly replacing
some data previously stored therein. Unlike conventional texture mapping systems that
download the entire series of MIP maps for any primitive being rendered, the present
system downloads only the portion of the series of MIP maps actually needed to currently
render the primitive or the currently rendered portion thereof. When a cache miss
occurs, an interrupt control signal is generated by the texture mapping chip 46 to
initiate a texture interrupt manager in the host computer 15. The interrupt control
signal is provided over line 94 to the distributor chip 30, which in turn provides
an interrupt signal over line 95 to the host computer.
[0039] The requested texture data is retrieved by the host computer from its main memory
and is downloaded to the texture mapping board 48 over bus 24, bypassing the 3-D primitive
rendering pipeline through the front end board and the texture mapping chip. Thus,
when a cache miss interrupt occurs, the front end board can continue to operate upon
3-D primitives and provide output primitive data over bus 18 to the texture mapping
chip and the frame buffer board, while the texture data associated with a primitive
that caused the cache miss is being downloaded from main memory 17. In contrast to
conventional texture mapping systems, the downloading of texture data to the texture
mapping hardware does not require a flushing of the 3-D primitive pipeline, thereby
increasing the bandwidth and performance of the system.
[0040] According to the present invention, the resultant texture data is buffered within
a frame buffer board interface on the texture mapping board in the texture mapping
chip before being shifted to the frame buffer board. The resultant texture data for
each pixel is stored temporarily in a RAM array (not shown) accessible by all five
frame buffer controller chips 50A-50E on the frame buffer board. From the RAM array,
resultant texture data is shifted into registers (not shown) accessible by all five
frame buffer controller chips 50A-50E in parallel through buses 28. As described in
detail below, an interface control unit (not shown) of the present invention coordinates
the transfer of resultant texture data from the texture mapping chip 46 to the five
frame buffer controller chips 50A-50E.
[0041] The frame buffer controller chips 50A-E respectively are coupled to groups of associated
VRAM (video random access memory) chips 51 A-E. The frame buffer board further includes
four video format chips, 52A, 52B, 52C and 52D, and a RAMDAC (random access memory
digital-to-analog converter) 54.
[0042] As described in more detail below, frame buffer controller chips control different,
non-overlapping segments of the display screen. Each frame buffer controller chip
receives primitive data from the front end board over bus 18, and resultant texture
mapping data from the texture mapping board over bus 28. The frame buffer controller
chips interpolate the primitive data to compute the screen display pixel coordinates
in their respective segments that represent the primitive, and the corresponding object
R,G,B color values for each pixel coordinate. For those primitives (i.e., triangles)
for which resultant texture data is provided from the texture mapping board, the frame
buffer controller chips combine, on a pixel by pixel basis, the object color values
and the resultant texture data to generate final R,G,B values for each pixel to be
displayed on the display screen.
[0043] The manner in which the object and texture color values are combined can be controlled
in a number of different ways. For example, in a replace mode, the object color values
can be simply replaced by the texture color values, so that only the texture color
values are used in rendering the pixel. Alternatively, in a modulate mode, the object
and texture color values can be multiplied together to generate the final R,G,B values
for the pixel. Furthermore, a color control word can be stored for each texel that
specifies a ratio defining the manner in which the corresponding texture color values
are to be combined with the object color values. A resultant color control word can
be determined for the resultant texel data corresponding to each pixel and provided
to the frame buffer controller chips over bus 28 so that the controller chips can
use the ratio specified by the corresponding resultant control word to determine the
final R,G,B values for each pixel.
[0044] The resulting image video data generated by the frame buffer controller chips 50A-E,
including R,G,B values for each pixel, is stored in the corresponding VRAM chips 51
A-E. Each group of VRAM chips 51A-E includes eight VRAM chips, such that forty VRAM
chips are located on the frame buffer board. Each of video format chips 52A-D is connected
to, and receives data from, a different set often VRAM chips. The video data is serially
shifted out of the VRAM chips and is respectively provided over 64-bit buses 58A,
58B, 58C, and 58D to the four video format chips 52A, 52B, 52C and 52D at a rate of
33 MHz. The video format chips format the video data so that it can be handled by
the RAMDAC and provide the formatted data over 32-bit buses 60A, 60B, 60C and 60D
to RAMDAC 54 at a rate of 33 MHz. RAMDAC 54, in turn, converts the digital color data
to analog R,G,B color control signals and provides the R,G,B control signals for each
pixel to a screen display (not shown) along R,G,B control lines 29.
[0045] Hardware on the texture mapping board 12 and the frame buffer board 14 can be replicated
so that certain primitive rendering tasks can be performed on multiple primitives
in parallel, thereby increasing the bandwidth of the system. An example of such an
alternate embodiment is shown in Fig. 2A, which is a block diagram of a computer graphics
system having certain hardware replicated. The system of Fig. 2A includes four 3-D
geometry accelerator chips 32A, 32B, 32C and 32D, two texture mapping chips 46A and
46B respectively associated with cache memories 48A and 48B, and ten frame buffer
controller chips 50A-50J, each with an associated group of VRAM chips. The operation
of the system of Fig. 2A is similar to that of the system of Fig. 2, described above.
The replication of the hardware in the embodiment of Fig. 2A allows for increased
system bandwidth because certain primitive rendering operations can be performed in
parallel on multiple primitives.
II. Texture Mapping Chip Overview
[0046] A block diagram of the texture mapping chip 46 is shown in Fig. 3. The chip 46 includes
a front end pipeline interface 60 that receives object and texture primitive data
from the front end board over 64-bit bus 18. The triangle primitives operated upon
the texture mapping chip are defined by up to fifty-two 32-bit digital words but may
be defined by words of different lengths. The pipeline interface includes a set of
master registers and a set of corresponding slave registers. During rendering, the
master registers are filled sequentially with the fifty-two digital words of data
that define the primitive. Then, upon receipt of an appropriate rendering command,
the data is shifted into the slave registers in the pipeline interface, allowing,
in a pipelined fashion, the master registers to be filled with data representing another
primitive. The primitive data provided over bus 18 includes the x,y,z vector coordinate
data, the S,T texture coordinates and the R,G,B object color data for at least one
triangle vertice, as well as data representing the triangle plane equation. As discussed
above, the texture mapping chip ignores the object pixel z coordinate and the object
color R,G,B values, and stores only the other data in the front end pipeline interface
60.
[0047] The slave registers of the pipeline interface 60 transfer the primitive data over
bus 62 to a parameter interpolator circuit 64. Parameter interpolator circuit 64 interpolates
each primitive triangle to determine, for each display screen pixel coordinate that
represents the triangle, the S,T texture map coordinates for the texture map that
maps to the pixel, and an S and T gradient value (ΔS, ΔT). The S and T gradients respectively
equal changes in the S and T coordinates between adjacent pixels, and are computed
in a manner discussed below.
[0048] The parameter interpolator circuit 64, shown in more detail in Fig. 4, includes an
edge stepper 66, a FIFO ("first-in, first-out") buffer 68, a span stepper 70 and a
gradient and perspective correction circuit 72, all connected in series. The edge
stepper starts at the x,y pixel coordinate of one of the triangle vertices, and utilizing
the triangle plane equation, steps the edges of the triangle to determine the pixel
coordinates that define the triangle edges. For each pixel coordinate, texture map
S and T coordinates are determined, based on the S,T values of the triangle vertices,
to identify which texels in the texture map correspond to each display screen pixel
coordinate. The pixel and texel coordinates temporarily are stored in the FIFO buffer
and then are provided to the span stepper. At each x,y pixel location along an edge
of the triangle, the span stepper steps across the corresponding span of the triangle
to determine the S,T texel coordinates for each pixel location along the span.
[0049] Each S and T coordinate for a display screen pixel may have an integer portion and
a fractional portion if the pixel does not map directly (in one-to-one correspondence)
to a single texel in one of the series of MIP maps for the texture. As explained above,
when mapped to the texture map, each display screen pixel may lie between multiple
texels in one of the series of MIP maps for the texture, and furthermore, may lie
between adjacent (in size) MIP maps in the series.
[0050] The gradient and perspective correction circuit 72 determines the gradient values
of S and T(ΔS, ΔT) for each display screen pixel. In one embodiment, gradient ΔS is
selected to be the larger of gradient ΔSx and gradient ΔSy, wherein gradient ΔSx is
the change in the S coordinate in the texture map as coordinate x changes between
adjacent pixels on the display screen, and gradient ΔSy is the change in the S coordinate
as coordinate y changes between adjacent pixels. Gradient ΔT is similarly computed.
The gradients ΔS, ΔT for a display screen pixel indicate the rate of change in coordinate
position within the texture map for a change of one pixel on the display screen in
the corresponding S,T dimension, and are used to determine which MIP map or maps should
be accessed to provide the resultant texture data for the pixel. For example, a gradient
equal to two for a display screen pixel indicates that the pixel maps to four (i.e.,
22 as discussed below) texels, so that the MIP map reduced in size by two from the
base map (e.g., the map 102 in Fig. 1) should be accessed to provide the resultant
texture data for the pixel. Thus, as the gradient increases, the size of the MIP map
that is accessed to provide the resultant texture data for the pixel is reduced.
[0051] A single gradient, equal to the larger of ΔS and ΔT, may be used to select the appropriate
MIP map for each pixel, such that the gradient equals the largest of ΔSx, ΔSy, ΔTx,
and ΔTy for the pixel. It should be understood, however, that the gradient can alternatively
be selected in a different fashion, such as by selecting the smallest of those values,
an average of those values, or some other combination. Since a single gradient is
selected that indicates the rate of change in only one of the S,T coordinates, the
square ofthe gradient represents the number of texels that map to the corresponding
pixel.
[0052] From the gradient, the parameter interpolator determines the closest map to which
the pixel maps, and a value indicating by how much the pixel varies from mapping directly
to that map. The closest map is identified by the whole number portion of a map number,
the value indicating by how much the pixel varies from a direct mapping is identified
by a fractional component of the map number.
[0053] Referring again to the block diagram of the texture mapping chip in Fig. 3, the texel
data output from the parameter interpolator circuit 64 is provided over line 70 to
a tiler and boundary checker 72, which determines the address of the four texels that
are closest to the position in each of the texture maps specified by the texel data,
and cheeks to determine whether each is within the boundary of the texture. The texel
data includes the interpolated S, T coordinates (integer and fractional values) as
well as the map number and map fraction. The tiler uses the integer portion of the
S and T coordinates computed by the parameter interpolator 64, and adds one to the
integer portion of each to generate the addresses of the four closest texels. The
boundary checker then determines whether the S,T coordinates for any of these four
texels fall outside the boundary of the texture map. If a display screen pixel maps
to an S,T coordinate position that falls outside the boundary of the texture map,
then one of several texture mapping schemes is implemented to determine whether any
resultant texture data is to be generated for that pixel, and how that data is to
be generated. Examples of such schemes include wrapping (a repeat of the texture),
mirroring (a repeat of the mirror image of the texture), turning off texture mapping
outside the boundary, and displaying a solid color outside the boundary.
[0054] The capability of allowing a pixel to map to a location in a texture map that is
beyond its boundary provides flexibility in the manner in which textures can be mapped
to object primitives. It may be desirable to map a texture to an object in a repeating
fashion, such that the texture is mapped to multiple portions of the object. For example,
if a texture is defined having S,T coordinates ranging from [0, 0] inclusive through
(10, 10) non-inclusive, a user could specify certain portions of the object to map
to S,T coordinates [10, 10] inclusive through (20, 20) non-inclusive. The notation
of the bracketed inclusive coordinates indicates that those coordinates are included
in the portion of the texture mapped to the object, whereas the object maps to only
the S,T coordinates up to but not including the non-inclusive coordinates in parentheses.
If the wrapping feature is selected for S,T coordinates falling outside the boundary
of the texture, pixels having S,T coordinates [10, 10] through (20, 20) would respectively
map to the texels at S,T coordinates [0, 0] through (10, 10).
[0055] As discussed above, the resultant texture data from a two-dimensional texture map
for a single pixel may be the result of a combination of as many as eight texels,
i.e., the four closest texels in the two closest MIP maps. There are a number of ways
in which the eight texels can be combined to generate the resultant texel data. For
example, the single closest texel in the closest map can be selected, so that no averaging
is required. Alternatively, the single closest texel in each of the two closest maps
can be averaged together based on the value of the gradient. Such schemes do not map
the texture as accurately as when the eight closest texels are averaged.
[0056] Trilinear interpolation may be supported wherein the resultant texture data for a
single pixel may be calculated as a weighted average of as many as eight texels. The
gradient representing rates of change of S,T is used to identify the two closest MIP
maps from which to access texture data, and the four closest texels within each map
are accessed. The average of the four texels within each map is weighted based on
which texels are closest to the S,T coordinates of the position in the MIP map that
the display screen pixel maps to. The fractional portion of the S and T coordinates
for the pixel are used to perform this weighting. The average value from each of the
two closest MIP maps is then weighted based upon the value of the gradient. A fractional
value is computed from the gradient for use in this weighting process. For example,
a gradient of three is half-way between the MIP maps that respectively correspond
to gradients of two and four.
[0057] The texel interpolation process is performed by the texel interpolators 76. The fractional
portions of the S and T coordinates for each display screen pixel are provided from
the parameter interpolators, through the tiler/boundary checker, to texel interpolator
76 over lines 74. The fractional portions are used by the texel interpolator to determine
the weight afforded each texel during interpolation ofthe multiple texels when computing
resultant texel data.
[0058] As discussed above, texture MIP maps associated with a primitive being rendered are
stored locally in the cache memory 48 (Fig. 2). The cache may be fully associative.
The cache may include eight SDRAM chips divided into four interleaves, with two SDRAM
chips in each interleave. Four separate controllers are provided, with one corresponding
to each interleave so that the SDRAM chips within each interleave can be accessed
simultaneously. Each SDRAM chip includes two distinct banks of memory in which different
pages of memory can be accessed in consecutive read cycles without incurring repaging
penalties commonly associated with accessing data from two different pages (i.e.,
from two different row addresses) in a conventional DRAM.
[0059] The texture data (i.e., the MIP maps) may be divided into texel blocks of data that
each includes 256x256 texels. The cache memory can store as many as sixty-four blocks
of data at one time. Each block has an associated block tag that uniquely identifies
the block. The cache includes a cache directory 78 that stores the block tags that
correspond to the blocks of data currently stored in the cache. Each block tag includes
a texture identifier (texture ID) that identifies the particular texture that the
block of data represents, a map number that identifies the particular MIP map within
the texture's series of maps that the block of data represents, and high-order S and
T coordinates that identify the location of the block of data within the particular
map. The physical location of the block tag within the cache directory represents
the location of the corresponding block of data within the cache memory.
[0060] MIP maps from more than one texture may be stored in the cache memory simultaneously,
with the texture identifier distinguishing between the different textures. Some MIP
maps contain fewer than 256x256 texels, and therefore, do not consume an entire block
of data. For example, the smaller maps in a series of MIP maps or even the larger
maps for small textures may not exceed 256x256 texels. To efficiently utilize memory
space, portions of multiple maps may be stored in a single block of texture data,
with each map portion being assigned to a sub-block within the block. Each of the
multiple maps stored within a single block has an associated sub-texture identifier
(ID) that identifies the location of the map within the block.
[0061] During rendering, the tiler/boundary checker 72 generates a read cache tag for the
block of texture data that maps to the pixel to be rendered. The tags are 23-bit fields
that include eight bits representing the texture ID ofthe texture data, a bit used
in determining the map number of the texture data, and the seven high-order S and
T coordinates of the texture data. The cache directory 78 compares the read cache
tag provided from the tiler/boundary with the block tags stored in the directory to
determine whether the block of texture data to be used in rendering is in the cache
memory. If the block tag of the texture data that maps to the primitive to be rendered
is stored in (i.e., hits) the cache directory, then the cache directory generates
a block index that indicates the physical location of the block of texture data in
the cache that corresponds to the hit tag. A texel address is also generated by the
tiler/boundary checker 72 for each texel to be read from the cache and indicates the
location of the texel within the block. The texel address includes low-order address
bits of the interpolated S,T coordinates for larger size maps, and is computed based
on an algorithm described below for smaller size maps. The block index and texel address
together comprise the cache address which indicates the location of the texel within
the cache. The LSBs of the S and T coordinates for each texel are decoded to determine
in which of four cache interleaves the texel is stored, and the remaining bits of
the cache address are provided to the texel cache access circuit 82 along with a command
over line 84 to read the texel data stored at the addressed location in the cache.
[0062] When the read cache tag does not match any of the block tags stored in the cache
directory 78, a miss occurs and the cache directory 78 generates an interrupt control
signal over line 94 (Fig. 2) to the distributor chip 30 on the front end board, which
generates an interrupt over line 95 to the host computer 15. In response to the interrupt,
the processor 19 of the host computer executes a service routine which reads the missed
block tag from the cache directory and downloads the corresponding block of texture
data into the cache memory in a manner that bypasses the 3-D primitive pipeline in
the front end board 10 and the texture mapping chip 46. The texture data downloaded
from the main memory is provided over bus 24, through the texel port 92 (Fig. 3) to
the texel cache access circuit 82, which writes the data to the SDRAMs that form the
cache memory.
[0063] When a cache miss occurs, the texture mapping chip waits for the new texture data
to be downloaded before proceeding with processing the primitive on which the miss
occurred. The stages of the pipeline that follow the cache read continue to process
those primitives received prior to the miss primitive. Similarly, the stages of the
pipeline that precede the cache read also continue to process primitives unless and
until the pipeline fills up behind the cache read operation while awaiting the downloading
of the new texture data.
[0064] During rendering, the later stages of the pipeline in the frame buffer board 14 do
not proceed with processing a primitive until the texture data corresponding to the
primitive is received from the texture mapping board. Therefore, when a cache miss
occurs and the texture mapping chip waits for the new texture data to be downloaded,
the frame buffer board 14 similarly waits for the resultant texture data to be provided
from the texture mapping chip. As with the texture mapping chip, the stages of the
pipeline that follow the stage that receives the texture mapping data continue to
process those primitives received prior to the miss primitive, and the stages of the
pipeline that precede the stage that receives texture mapping data also continue to
process primitives unless and until the pipeline fills up.
[0065] It should be understood that when the pipeline of either the texture mapping board
or the frame buffer board backs up when waiting for new texture data in response to
a cache miss, the pipeline in the front end board 10 will similarly back up. Because
cache misses will occur and will result in an access to the host computer main memory
and a downloading of texture data that will take several cycles to complete, it is
desirable to ensure that the pipeline in the texture mapping chip never has to wait
because the pipeline in the frame buffer board has become backed up. Therefore the
frame buffer board can be provided with a deeper primitive pipeline than the texture
mapping board, so that the texture mapping pipeline should not be delayed by waiting
for the frame buffer pipeline to become available.
[0066] In one embodiment, the capability is provided to turn off texture mapping. This is
accomplished by software operating on the processor 19 of the host computer to set
a register in both the texture mapping board 12 and the frame buffer board 14. When
set to turn off texture mapping, these registers respectively inhibit the texture
mapping chip 46 from providing texture data to the frame buffer board 14, and instruct
the frame buffer board to proceed with rendering primitives without waiting for texture
data from the texture mapping board.
[0067] As described above, for each display screen pixel that is rendered with texture data
from a two-dimensional texture map, as many as four texels from one MIP map (bilinear
interpolation) or eight texels from two adjacent MIP maps (trilinear interpolation)
may be accessed from the cache memory to determine the resultant texture data for
the pixel. The texels read from the cache are provided over bus 86 (Fig 3) to the
texel interpolator 76, which interpolates the multiple texels to compute resultant
texel data for each pixel. The interpolation can vary depending upon a mode established
for the system. When a point sampling interpolation mode is established, the resultant
texel data equals the single texel that is closest to the location defined by the
pixel's S,T coordinates in the texture map. Alternatively, when bilinear or trilinear
interpolation is employed, the resultant texel data is respectively a weighted average
of the four or eight closest texels in the one or two closest maps. The weight given
to each of the multiple texels is determined based upon the value of the gradient
and the factional components of the S and T coordinates provided to the texel interpolator
76 from the tiler/boundary checker.
[0068] The resultant texel data for the display screen pixels is sequentially provided over
bus 88 to a frame buffer interface array storage unit 90. The frame buffer interface
array storage unit 90 can, in one embodiment of the invention, temporarily store up
to sixty four resultant texels, as explained in greater detail below. As explained
below, in accordance with one aspect of the invention, texels are shifted out of the
array storage unit 90 into intermediate registers from which the frame buffer controller
chips 50A-50E (see Fig. 2) can simultaneously and in parallel access the texels.
[0069] Each resultant texel is a 32-bit word including eight bits to represent each of R,G,B
and α. The α byte indicates to the frame buffer board 14 (Fig. 2) the manner in which
the R,G,B values of the resultant texture data should be combined with the R,G,B values
of the object data generated by the frame buffer board in computing final display
screen R,G,B values for any pixel that maps to the texel. The frame buffer interface
array storage unit outputs T0-T4 are provided to the frame buffer board 14 (Fig. 2)
over bus 28. The frame buffer board combines the R,G,B values of the resultant texel
data with the object R,G,B values in the manner specified by α to generate final R,G,B
values for each display screen pixel.
III. Screen Space Interleaving for Frame Buffer Controller Chips
[0070] Fig. 5 is a block diagram illustrating how the screen space is divided among the
five frame buffer controller chips 50A-50E (see Fig. 2). A portion of the screen 100
is shown in Fig. 5. In one embodiment, the screen comprises 1280 pixels horizontally
and 1024 pixels vertically. An interleave of the screen space is defined to be a portion
of a contiguous screen space (including multiple pixels) rendered by one and only
one frame buffer controller chip. In this embodiment, an interleave includes two scan
lines vertically and 16 pixels wide, including a total of 32 pixels per interleave.
The interleaves shown in Fig. 5 are labeled by A, B, C, D and E respectively corresponding
to frame buffer controller chips 50A, 50B, 50C, 50D and 50E. Only a portion of the
total screen space is illustrated in Fig. 5 including ten interleaves horizontally
and four interleaves vertically.
[0071] It should be understood that the number of horizontal pixels (1280) in this exemplary
embodiment is divisible by five (the number of frame buffer controller chips in this
embodiment). Thus, the interleaves are distributed evenly among the five frame buffer
controller chips 50A-50E. As a scan line moves horizontally across the screen space,
the pattern of interleaves, A, B, C, D and E, repeats itself The purpose behind such
an arrangement is so that pixels within adjacent interleaves will be rendered by distinct
frame buffer controller chips.
[0072] Thus, it would appear that the most number of pixels (or corresponding texels) required
to be processed by any one frame buffer controller chip at any one time would be 32.
It should be understood, however, that in certain circumstances, interleaves which
are diagonally adjacent one another are assigned to the same frame buffer controller
chip and, thus, a worst-case scenario would require a single frame buffer controller
chip to render 64 pixels within one primitive (triangle). Such a scenario is illustrated
in Fig. 6 in which a very small portion of the screen space 100 is illustrated including
only four interleaves. As shown, the interleaves (A) assigned to frame buffer controller
chip 50A are diagonally adjacent one another.
[0073] A portion of a triangle to be rendered is shown by dotted lines 102 and covers all
64 pixels within diagonally adjacent interleaves (A) assigned to frame buffer controller
chip 50A as well as eight pixels within interleave (B) and eight pixels within interleave
(E). Thus, the total number of pixels (and corresponding texels) required to be rendered
for this particular triangle includes 80, of which 64 pixels are required to be rendered
by the same frame buffer controller chip 50A. For this particular screen space arrangement
(shown in Fig. 5), this is a worst-case scenario.
IV. Texture Mapping Board/Frame Buffer Board Interface
[0074] The data interface between the texture mapping chip and the multiple frame buffer
controller chips was designed to minimize cost and power by reducing the number of
logic gates and reducing the silicon area. As previously discussed with reference
to Fig. 2, the last stage of the pipeline inside the texture mapping chip outputs
a resultant four-byte (32-bit) texel to one of five frame buffer controller chips,
which frame buffer controller chips reside on a different printed circuit board than
the texture mapping chip. The interface resides on the texture mapping board in the
texture mapping chip and is included in this last pipeline stage.
[0075] The interface enables the parallel transmission of texels from the texture mapping
chip to all five frame buffer controller chips. As discussed in more detail below,
the texel pipeline as it enters the interface is one texel (four bytes) wide whereas
the pipeline output (to each frame buffer controller chip) is only one byte wide.
Therefore, in order to maintain all five frame buffer controller chips busy simultaneously,
the interface was designed to include a texel buffering system. The buffering system
minimizes the amount of storage required.
[0076] As discussed above with reference to Fig. 3, the texel interpolator 76 outputs a
four-byte, 32-bit texel to the frame buffer interface storage array 90 every 45 MHz
state. Each 32-bit texel has an additional 3-bit field associated with it which indicates
to which of the five frame buffer controller chips the texel is destined. After buffering
the texel within the frame buffer interface storage array 90, the texture mapping
chip interface provides the texel to the appropriate frame buffer controller chip
serially, one byte (8 bits) every 45 MHz state. Transferring one byte at a time reduces
the number of pins required for the texture mapping chip interface almost by a factor
of four, but increases the total number of states to four in which to transfer the
entire four-byte texel.
[0077] As discussed above with reference to Fig. 6, 80 pixels (and 80 corresponding texels)
is the minimum number of pixels (and texels) (from a portion of a triangle, for example)
of which 64 pixels (and 64 texels) must be rendered by the same frame buffer controller
chip. As described, the input to the interface receives one texel each 45 MHz state.
Thus, the interface receives 80 texels over 80 states. In the 80 states, the interface
can output as many as 20 texels to a single frame buffer controller chip. This is
so because it requires four states (one byte per state) for the interface to output
a single texel. Therefore, only 20 of the 64 texels required to be rendered by the
single frame buffer controller chip can be output to that frame buffer controller
chip within the 80 states. Thus, 44 texels remain in the interface to be buffered.
[0078] A possible way to organize the interface to the frame buffer controller chips on
the texture mapping chip is as follows. A texel from the texel interpolator 76 arrives
at the interface once per each 45 MHz state. A five-way multiplexer, residing within
the last stage of the interpolator, then would guide the texel into one of five FIFO
(first in, first out) storage buffers, depending upon to which of the five frame buffer
controller chips the texel is destined. Each of the FIFO storage buffers would correspond
to a different frame buffer controller chip and each would be at least 44 texels deep;
48 texels of depth is selected herein for example. Remember that 44 texels is the
maximum number required to be buffered at any one time for a single frame buffer controller
chip. A state machine then would "handshake" between each FIFO buffer and the corresponding
frame buffer controller chip to unload the texel from the FIFO buffer to the frame
buffer controller chip, one byte at a time.
[0079] This possible solution requires a significant amount of storage. Specifically, having
five FIFOs, each with 48 texels deep of storage capacity, yields the following amount
of storage: (48 texels) ∗ (4 bytes/texel) ∗ (8 bits/byte) ∗ (5 FIFOs) = 7680 storage
cells.
[0080] Fig. 7 is a block diagram illustrating one embodiment of the interface according
to the present invention, which embodiment uses less storage space than the possible
solution discussed above. As illustrated, the interface includes a RAM array 90 that
temporarily stores texels, five address FIFO buffers 114A-114E that store addresses
of texels stored in the RAM array, and registers 120A-120E and registers 124A-124E
through which the texels are shifted on route to frame buffer controller chips 50A-50E
respectively. The interface also includes a control unit 110 that controls the shifting
of data throughout such interface elements.
[0081] The texels output by interpolator 76 (see Fig. 3) are provided along bus 88 to 64-texel
deep RAM array 90, which RAM array is shared by all five frame buffer controller chips
50A-50E. As described in more detail below, each frame buffer controller chip port
uses one of the four states to unload one of its texels from the shared RAM array
into a temporary register 120A-120E. During the remaining three states, the other
frame buffer controller chip ports may access the RAM array 90 as needed. Each of
the five address FIFO buffers 114A-114E corresponds to one of the frame buffer controller
chips 50A-50E. Each address FIFO buffer 114A-114E can store up to 48 six-bit words.
Each six-bit word stored is an address denoting one of the 64 locations within the
RAM array 90 in which a texel is stored. The address FIFO buffer 114A, 114B, 114C,
114D or 114E that stores the address of a particular texel is the address FIFO buffer
associated with the frame buffer controller chip to which the texel is destined.
[0082] As described in more detail below, control unit 110 controls storing texels received
from the texel interpolator within the RAM array 90 and corresponding addresses within
the address FIFO buffers 114A-114E, and also controls unloading texels from the RAM
array 90 through the intermediate registers and into one of the frame buffer controller
chips 50A-50E.
[0083] When a texel is provided by the texel interpolator to the interface, the 32-bit texel
is provided along bus 88 to a data input of the RAM array 90. The 3-bit frame buffer
controller number word is decoded in the interpolator to determine to which of the
frame buffer controller chips the texel is destined. Five one-bit signals are provided
along buses 106 to the control unit with only one of the five signals being asserted
at any one time. The one asserted signal corresponds to the frame buffer controller
chip to which the texel is destined. As will be described in more detail below, the
control unit 110 also determines which locations within the RAM array 90 are empty
and selects an empty location in which to load the texel. The address of this location
is written to the appropriate FIFO buffer 114A, 114B, 114C, 114D or 114E through one
of 6-bit buses 112A, 112B, 112C, 112D or 112E. That address also is provided along
bus 112 to an address input of the RAM array 90 so that the texel can be loaded into
the appropriate location within the RAM array.
[0084] As shown in Fig. 7, also included in the interface are five 32-bit registers 120A-120E,
one register corresponding to each frame buffer controller chip. Coupled between each
32-bit register 120A-120E and the corresponding frame buffer controller chip 50A-50E
is a corresponding 8-bit register 124A-124E. As is described in more detail below,
when (1) one of the registers 120A, 120B, 120C, 120D or 120E is available to receive
resultant texel data, (2) data corresponding to that frame buffer controller chip
presently is stored in the RAM array 90, and (3) that register 120A, 120B, 120C, 120D
or 120E is of highest priority among those available, then a texel is unloaded from
the RAM array 90 along bus 118 to that register 120A, 120B, 120C, 120D or 120E. This
transfer occurs during one state. The control unit 110 communicates with each of the
FIFO buffers 114A-114E to determine whether any currently store an address of a texel
stored within the RAM array 90. The addresses are transferred to the control unit
110 along buses 108A-108E and then to the address input of the RAM array 90 along
bus 112. The control unit 110 also communicates with each of the registers 120A-120E
to determine whether any is available to receive data along bus 116.
[0085] After a texel is written to one of the registers 120A, 120B, 120C, 120D or 120E,
that texel is shifted into the respective one of the intermediate registers 124A,
124B, 124C, 124D or 24E, one byte at a time over bus 122. Then, each byte of the texel
is shifted from the intermediate register 124A, 124B, 124C, 124D or 124E to the corresponding
frame buffer controller chip 50A, 50B, 50C, 50D or 50E over bus 28A, 28B, 28C, 28D
or 28E. Thus, it takes four states for the texel to be shifted from one of the registers
120A, 120B, 120C, 120D or 120E to the corresponding frame buffer controller chip 50A,
50B, 50C, 50D or 50E. It should be appreciated that once a texel is shifted from the
RAM array 90 to one of the registers 120A, 120B, 120C, 120D or 120E, another one of
the registers 120A, 120B, 120C, 120D or 120E can access texels from the RAM array.
Thus, during three of the four states required for transferring a texel from the RAM
array to one of the frame buffer controller chips, other registers can access the
RAM array.
[0086] The embodiment shown in Fig. 7 requires much less storage than the previously-mentioned
design. The total storage includes (64 texels) ∗ (4 bytes/texel) ∗ (8 bits/byte) +
(48 addresses) ∗ (6 bits/address) ∗ (5 FIFOs) = 3488 storage cells.
V. Control Unit
[0087] As previously described, the control unit 110 includes a first portion that controls
transferring resultant texel data from the texel interpolator to the RAM array 90,
and a second portion that controls transferring the texels from the RAM array 90 through
the registers 120A-120E to the frame buffer controller chips 50A-50B. Fig. 8 is a
diagram showing the first portion of the control unit 110. The first portion of the
control unit 110 is shown surrounded by a dotted line for ease of illustration. As
shown, part of the first portion of control unit 110 is located in the last pipelined
stage of the interpolator 76.
[0088] When the texel interpolator is ready to provide a texel to the interface, the texel
interpolator 76 provides five one-bit valid signals along lines 143A-143E, only one
of which is asserted at any one time. The texel then is provided along bus 88 to a
data input of the RAM array 90. As shown, each one-bit valid signal is provided as
an input to logical OR gate 145, the output of which is provided on line 128 as a
write-enable input to the RAM array 90 and as an input to vector register 134. Vector
register 134, in this embodiment, stores a 64-bit word, each bit corresponding to
a different location in the RAM array 90. When a texel is stored in a particular location
within the RAM array 90, the corresponding bit of the vector register is asserted.
Similarly, when a texel is unloaded from the RAM array 90, the corresponding bit is
cleared.
[0089] Vector register 134 and encoder 137 together select an empty location within the
RAM array in which the texel is to be stored. When the vector register 134 receives
the logical OR result of the five valid signals as an input, indicating that a texel
is ready to be loaded within the RAM array from the texel interpolator 76, the vector
register 134 outputs the 64-bit word along bus 135 to encoder 137. Encoder 137 then
selects one of the empty locations within the RAM array by locating the first zero
bit within the vector word. The encoder 137 may randomly select a zero bit among the
64-bit vector word. The 6-bit address associated with that location then is output
by the encoder 137 along bus 112 to the address input ofthe RAM array. This 6-bit
address also is provided to vector register 134 so that the appropriate bit, corresponding
to the addressed location, within the vector word can be set. The valid signal is
provided to the write-enable input of the RAM array 90 and the texel data, provided
along bus 88, then is stored in the appropriate location within the RAM array.
[0090] The first portion of the control unit 110 also includes a decoder 126, five logical
AND gates 132A-132E, one AND gate corresponding to each address FIFO buffer 114A-114E,
and five corresponding registers 141A-141E. When the texel is written to a particular
location within the RAM array 90, the address of the location within the RAM array
90 also is written to one of the five address FIFO buffers 114A-114E. The address
is written to the one address FIFO buffer that corresponds to the particular frame
buffer controller chip to which the texel is destined. The decoder 126 and AND gates
132A-132E determine to which address FIFO buffer 114A-114E the address will be written.
[0091] The 3-bit frame buffer number is provided from earlier pipelined stages of the interpolator
76 along bus 106 as an input to decoder 126. Decoder 126 decodes the 3-bit frame buffer
number to determine to which ofthe frame buffer controller chips the texel is destined,
and outputs five one-bit words along buses 130A-130E. Only one of the one-bit words
output from decoder 126 will be asserted and that one asserted word corresponds to
the frame buffer controller chip to which the texel is destined. The one-bit words
are provided along buses 130A-130E respectively to AND gates 132A-132E. Also, a valid
signal is provided from earlier pipelined stages of the interpolator along bus 128
as an input to each ofthe AND gates 132A-132E indicating that the earlier pipelined
stages of the interpolator has resultant texture data to store in the RAM array. Thus,
each of the five output signals from the decoder is ANDed logically with the valid
bit provided from the earlier pipelined stages of the texel interpolator. If the earlier
pipelined stages of the texel interpolator have asserted the valid bit, indicating
that valid resultant texel data can be provided, then only one of the valid signal
outputs from the AND gates 132A-132E will be asserted. That one valid signal output
will correspond to the frame buffer controller chip to which the texel is destined.
[0092] The valid signal outputs of the AND gates 132A-132E are provided along buses 139A-139E
respectively to registers 141A-141E. From registers 141A-141E, the valid signals are
provided to OR gate 145 and to the write enable inputs of the address FIFO buffers
114A-114E along buses 143A-143E. Thus, only one of the address FIFO buffers 114A,
114B, 114C, 114D or 114E, corresponding to the frame buffer controller chip to which
the texel is destined, will be write-enabled. The 6-bit address word representing
the location within the RAM array in which the texel will be stored is provided to
all of the address FIFO buffers 114A-114E along buses 112A-112E. Because only one
of the address FIFO buffers is write-enabled, the 6-bit address will be written only
to that address FIFO buffer.
[0093] The resultant texture data and corresponding address respectively will be written
to the RAM array and appropriate address FIFO buffer only if the following three conditions
are met: 1) the interpolator has valid resultant texture data; 2) the RAM array is
not full to its capacity; and 3) the appropriate address FIFO buffer is not full to
its capacity.
[0094] The portion of the control unit 110 that controls unloading texels from the RAM array
90 is shown in block diagram form in Fig. 9. That portion of the control unit 110
is shown surrounded by broken lines. The portion of the control unit shown in Fig.
9 can be considered as including a state machine associated with each of the five
frame buffer controller chip interfaces. On each clock state, one of the five state
machines will be allowed to unload a texel from the RAM array 90. Which of the five
selected is dependent upon three things. First, the address FIFO buffer associated
with that frame buffer controller chip must not be empty, indicating that that frame
buffer controller chip has a texel destined for it stored presently in the RAM array
90. Second, the particular frame buffer controller chip interface must be idle, not
busy shifting out a previous texel to its associated frame buffer controller chip
and not associated with a "halted" texel due to its unavailability to receive texel
data. Third, the frame buffer controller chip interface must have the highest priority
among the available interfaces to unload a texel, as determined by a round-robin priority
scheme described in more detail below.
[0095] If the three above conditions are met, the second portion of the control unit, shown
in Fig. 9, operates to unload the address from one of the address FIFO buffers 114A,
114B, 114C, 114D or 114E into address register 152 on the first clock state. On the
next clock state, this registered address will be input along bus 112 to the address
input of RAM array 90 to access the corresponding location within the RAM array, and
the texel will be unloaded from the RAM array 90 on bus 118 and along buses 118A-118E
to each of the registers 120A-120E. As will be explained herein below, only one of
the registers 120A, 120B, 120C, 120D or 120E, corresponding to the frame buffer controller
chip to which the texel is destined, will be load-enabled allowing the texel to be
loaded only into that register.
[0096] The round-robin priority scheme operates as follows. The control unit 110 includes
five 7-bit priority counters 154A-154E. Each priority counter is associated with a
respective frame buffer controller chip. The 7-bit priority counter consists of the
two following fields: a priority value and a priority state, wherein the priority
value consists of the three most significant bits of the priority counter and can
be any number between 0 and 4, 4 being the highest priority.
[0097] The priority state increments by one within each priority counter during each clock
cycle. The priority counter is free running. Therefore, the priority value increments
by one every four clock cycles. The priority value from each counter 154A, 154B, 154C,
154D and 154E is output along buses 156A, 156B, 156C, 156D and 156E respectively,
to priority decoder 160. The priority decoder 160 outputs five signals called priority
acknowledge along buses 162A-162E. Only one of the priority acknowledge signals will
be set at any one time. The one priority acknowledge signal that is set corresponds
to the frame buffer controller chip for which the corresponding register has the highest
priority value signal among those available to receive texel data.
[0098] The priority acknowledge signal for any particular frame buffer controller chip will
be set if the priority value for that particular chip is at its highest priority of
4. If the priority value for a particular frame buffer controller chip is equal to
3, then the corresponding priority acknowledge signal will be set only if the particular
register associated with the frame buffer controller chip having a corresponding priority
value of 4 is not then ready to unload a texel from the RAM array 90. Similarly, if
the priority value for a particular frame buffer controller chip is equal to 2, then
the priority acknowledge signal for that particular frame buffer controller chip will
be set only if the registers corresponding to the frame buffer controller chips having
corresponding priority value signals equal to 3 and 4 are not then ready to unload
texels from the RAM array 90. Similarly, if the priority value for a particular frame
buffer controller chip is equal to 1, then the corresponding priority acknowledge
signal will be set only if the three registers for the frame buffer controller chips
having corresponding priority value signals equal to 4, 3 and 2 are not then ready
to unload texels from the RAM array 90. Finally, if the priority value for a particular
frame buffer controller chip is equal to 0, then the priority acknowledge signal for
that frame buffer controller chip will be set only if the four registers associated
with the other four frame buffer controller chips are not then ready to unload texels
from the RAM array 90. Thus, when a particular frame buffer controller chip has a
highest priority value of 4, the value of the corresponding priority acknowledge signal
is not dependent on the state of any of the other frame buffer controller chip interfaces.
[0099] The priority decoder implements an algorithm in accordance with the above-described
scheme to output five priority acknowledge signals along buses 162A-162E. As shown,
the priority decoder 160 receives the five priority value signals along buses 156A-156E
respectively from counters 154A-154E. Also received by priority decoder 160 along
lines 116A-116E are five signals illustrating whether any of the registers 120A-120E,
respectively, are available to unload a texel from the RAM array 90. Priority decoder
160 additionally receives five one-bit acknowledge signals from the frame buffer controller
chips 50A-50E along lines 191A-191E, wherein an acknowledge signal is asserted when
the corresponding frame buffer controller chip is available to receive data. Using
the information received, including the priority value signals and the register ready
signals associated with each of the frame buffer controller chips, the priority decoder
implements the algorithm described above and outputs the five priority acknowledge
signals. The priority acknowledge signals output on buses 162A-162E are provided through
delay elements 184A-184E to the load enable inputs of registers 120A-120E. At most,
at any one time, only one of the priority acknowledge signals will be set such that
only one of the registers 120A, 120B, 120C, 120D or 120E will be load-enabled. When
one of the registers is load-enabled, the texel data received on bus 118 will be loaded
into that particular register.
[0100] Before the priority acknowledge signals reach the load-enable inputs of the registers
120A-120E, the same priority acknowledge signals are provided on buses 162A-162E to
separate inputs of respective AND gates 180A-180E. AND gates 180A-180E also receive
respective address outputs along buses 150A-150E from address FIFO buffers 114A-114E.
The priority acknowledge signal is a one-bit signal that is either high or low. Each
output from the address FIFO buffers is a 6-bit address. Thus, the logical AND operation
is performed on a bit-by-bit basis. In other words, the priority acknowledge signal
is logically ANDed separately with each bit of the 6-bit address output from the address
FIFO buffer.
[0101] The outputs ofthe AND gates 180A-180E are provided to an OR gate 182 which performs
a logical OR operation on the outputs. The output from the OR gate 182 is provided
to address register 152. As only one of the priority acknowledge signals will be set
at any one time, the outputs of all of the AND gates 180A-180E except for one will
be equal to zero. The output of the AND gate associated with the frame buffer controller
chip having the priority acknowledge signal set will be equal to the 6-bit address
output from the corresponding address FIFO buffer. Thus, the address register 152
stores the address output from the particular address FIFO buffer corresponding to
the frame buffer controller chip for which the priority acknowledge signal is set.
The address register 152 stores this address during the state when each of the priority
acknowledge signals are within the delay buffers 184A-184E. During a following state,
the address stored within the address register is provided along bus 112 to the address
input of the RAM array such that the texel from the addressed location will be unloaded
and provided on bus 118 to the registers 120A-120E.
[0102] Having thus described at least one illustrative embodiment of the invention, various
alterations, modifications and improvements will readily occur to those skilled in
the art. Such alterations, modifications and improvements are intended to be within
the spirit and scope of the invention. Accordingly, the foregoing description is by
way of example only and is not intended as limiting. The invention is limited only
as defined in the following claims and the equivalents thereto.