|
Part
III: DDR DRAM and RAMBUS
|
|
The
RAMBUS channel
That diagram below brings up the issue of how RDRAM chips are
used to build a complete RDRAM system. You'll notice that in it
the RDRAM banks are connected together differently than the SDRAM
banks. Since the RAMBUS data bus is only 16 bits wide, and each
individual RDRAM chip is able to feed the entire 16-bit data bus
by itself, then RDRAMs don't have to be interleaved the way that
SDRAMs do. Instead, each RDRAM is attached to a single, shared,
16-bit data bus. This common data bus is part of the larger, shared
RAMBUS Channel to which every RDRAM chip in the system is attached.
The RAMBUS channel, which includes control, addressing, and data
lines, consists of 30, high-quality, matched impedance lines,
with each line originating at the chipset and threading its way
past every RDRAM in the system all the way out past the outermost
device and being terminated at its end by a resistor.

RDRAMs
are attached to memory modules, called RIMMs, just like SDRAMs
are attached to DIMMs, but again, each RDRAM is attached to this
channel in series, so that all of the signals that are placed
on the RAMBUS channel pass by each RDRAM. Empty RIMM slots must
be filled with Continuity RIMMs (CRIMMs) to keep the channel going
so that it can reach its termination resistors.
You
could theoretically have up to four, independent RAMBUS channels
in a system, each of them putting out 1.6 GB/sec bandwidth for
a total of 6.4 GB/sec. This isn't possible with current Intel
chipsets, but may be possible with future designs.
While
the long, thin RAMBUS channel pumps a lot of bandwidth over a
small number of traces, it's nonetheless one of RAMBUS' most controversial
features. Operating at up to 400MHz, it's very fast, and since
it makes for a minimal number of signal traces that have to be
etched into the motherboard, it's simpler overall than SDRAM's
interleaving of data buses. However, it still carries with it
some drawbacks. One problem with the long, fast bus is its effect
on cost. Some of the savings in cost that RAMBUS gets from using
fewer traces are cancelled out by the fact that the RAMBUS channel
is a long series of wires that have to run at a whopping 400MHz.
To get the bus speed up that high, the board has to be manufactured
to a very high standard of quality in order to reduce noise, stray
capacitance, variations in line impedance, and other problems
associated with rising bus speeds. In some cases, you may even
have to add more layers to the motherboard just to be able to
provide a clean enough signal.
It's
important to note, though, that what I've said about the RAMBUS
channel and cost doesn't always apply. There are situations where
the RAMBUS channel can reduce costs quite a bit. Take the example
of Sony's Playstation2. It contains two RDRAM chips that are soldered
directly to its mainboard. Because of this, the PS2's RAMBUS channel
isn't very long. Furthermore, since the PS2's memory subsystem
doesn't use any RIMMs or RIMM connectors, it operates with a minimal
number of traces and a minimal cost.
The
next drawback to the RAMBUS channel will be apparent after a good
look at the previous diagram. Each SDRAM in an SDRAM system is
no more than a few inches along a straight path to the chipset,
so commands and data don't have very far to travel to reach their
destination. The RAMBUS channel, on the other hand, gets longer
as more RDRAMs are added to it, which means that the amount of
time that commands and data must travel to reach the outermost
device can get pretty high. What makes this even worse is that
the system read latency of the entire system can be only as fast
as that farthest (and, by extension, slowest) RDRAM. Here's why:
Remember
how, way back at the beginning of the first edition of this RAM
Guide, we said that, to the CPU, main memory looks just like one,
single file line of 1-byte locations? When the CPU asks for data
from a series of locations, it expects that it will come to it
in the order that it asked for it. It doesn't care where that
data lives, or how long it takes to get from one place to the
other--it just cares that it sent out a series of requests for
x, y, and z, one right after the other, and it expects x, y, and
z to be fed to it in that exact order, one right after the other.
Well, if x, y, and z each live in different RDRAM chips, where,
say, y and z live close to the chipset but x lives way out there
in the last chip on the outermost RIMM, then we've got problems.
The packet that's farthest from the chipset, x, is going to take
quite a bit longer than y and z to reach the chipset, but since
x has to be there first and all three packets have to file in
one right after the other, y and z will have to wait on x before
they can go in.
Because
of the need to be able to delay the output of read requests so
that reads from different RDRAM chips can arrive at the chipset
together and in the right order, a RAMBUS system has to go through
an elaborate initialization ritual on boot-up in order to determine
the amount of delay that needs to be inserted into each RDRAM.
The read delay value for each individual RDRAM chip is programmed
via the control pins into one of those control registers that
we met in the previous section. These read delays effectively
slow down the entire system so that each device has the same latency
as the outermost RDRAM. As you add more devices to a RAMBUS system,
the entire system has higher and higher read latency. So, while
individual RDRAM chips might have a read latency (access time)
of 20ns, which is about the same read latency as some SDRAMs,
once you stick them in a system with three full RIMMs the overall
system latency (which is the total amount of time from when the
CPU sends out the read command and the data arrives back at it)
will be either slightly better or significantly worse than the
system latency for an SDRAM system, depending on a myriad of factors.
(More on these factors in a second.)
Further
aggravating the read latency situation is the fact that RAMBUS
doesn't support critical word first bursting. When the CPU asks
for 8 bytes of data from a conventional SDRAM, the memory system
sends it back 16 bytes data along with under the presumption that
it'll probably need those extra 8 bytes shortly. Nevertheless,
the 8 bytes that were specifically asked for-- the critical word--arrive
at the CPU first, with the other freebie bytes coming next. RDRAM
doesn't do this. It just sends you a whole 16 byte train of data,
and if the 8 bytes you asked for are at the end of that train,
then you'll just have to wait until they get there. Finally, since
the bus is so long and passes through so many devices, the capacitances
added in by the loads of all of the attached devices significantly
increase bus signal propagation time. So again, the more devices
you stick on the RAMBUS channel, the worse the latency gets. However,
RAMBUS' signaling layer, high quality packaging, and strict specifications
for producing RIMMs are aimed at reducing these types of unwanted
electrical effects.
A
word about latency
The
system latency issues that surround the RAMBUS channel and that
I've pointed out here in the RAMBUS channel discussion are by
no means the whole story when it comes to latency and/or overall
performance. In particular, system latency, especially in a RAMBUS
system, is a complex issue that's affected by numerous factors,
some of which we've covered in earlier parts of this article and
some of which we'll cover in the next section. For instance, as
we discussed at the beginning of this piece, RAMBUS' high bank
count can reduce system read latencies significantly because more
rows can remain open at a time. Also, system read latencies will
be reduced in some upcoming systems that include RAMBUS memory
controllers integrated onto the CPU die. And as we'll discuss
in the next section, system latencies are affected by the power
management policies that particular RAMBUS controllers implement.
Finally, RDRAM's system read latency vs. SDRAM's depends partially
on the amount and nature of the memory subsystem traffic. In summation,
I've tried to show how different parts of a RAMBUS system affect
read latencies, for good or for ill, as we examine each individual
part of the RAMBUS technology. I've done this in order to give
you a feel for the complexity of the issue and the number and
nature of the factors that must be taken into account when discussing
it. What I haven't done is step back and tried to fit the individual
discussions of all of these separate factors together into one,
big latency picture. I don't plan on trying to build such a picture
either, because it's quite beyond the scope of this article. If
I were planning on doing such a thing, I'd also have to do the
same for SDRAM and DDR SDRAM so that we could step back and compare
and contrast all of the technologies and how different usage patterns
and configurations affect their latencies. While this would definitely
be a worthwhile project, it is, again, beyond the scope of this
article. It may seem like I'm copping out, here, because system
read latency is one of the most hotly debated issues surrounding
RAMBUS and how it stacks up, performance wise, to competing technologies.
I hope it's apparent, though, that such an out-and-out performance
comparison would be of a fundamentally different nature than the
sort of "how it works" information that this RAM Guide aims to
provide. In fact, the information on RAMBUS, DDR, and SDRAM that
I've provided throughout this Guide is intended to equip you,
the reader, to evaluate such performance comparisons when you
see them on the web.
The
RAMBUS Clock
Another
factor that complicates a RAMBUS system's design is the fact that
the outbound clock signal takes so long to reach the outermost
RDRAM in the system that, by the time it gets there, it can be
out of phase many times over with the version at the first RIMM.
As a result, the RAMBUS channel supports up to five separate clock
domains just so that it can keep its bus transfers in sync. These
clock domain changeovers, again, add complexity a RAMBUS system
implementation. Here's a simple picture of the RAMBUS channel
from one of RAMBUS' tech docs, which brings up one clock-related
issue that I've glossed over.

Remember
how we talked about the Clock From Master signal, which tracks
data and commands from the chipset out to the RDRAMs, and the
Clock To Master signal, which tracks data back in to the chipset?
Well, both of these clock signals are generated by a single clock
generator. Notice how, in the above picture, the clock generator
is sitting way out there beyond the farthest RIMM. This clock
generates the inbound, Clock To Master pulse that all of the RDRAMs
along its route use to send data to the chipset's memory controller.
Once the CTM pulse hits the memory controller, it loops around
180 degrees and heads back out to the outermost RIMM, becoming
the Clock From Master pulse that the chipset uses to communicate
with the RDRAMs. So, that one clock does double duty as both an
incoming and an outgoing clock. This clever clocking scheme is
one of RAMBUS' more elegant features.
Finally,
RDRAM isn't an open standard. It's a proprietary technology that
RAMBUS Inc. charges royalties for the use of. Even though these
royalties amount to a pretty small fraction of the total price
of a RIMM, they make some manufacturers and consumers antsy, and
understandably so. In the open climate of today's market, closed,
royalty-based technologies carry a stigma. (Consider the controversy
surrounding Apple's Firewire.) RAMBUS' closed nature puts it at
a psychological disadvantage in a post-Microsoft-trial, post-Napster,
post-Linux world. The general impression among many seems to be
that RAMBUS Inc., with its aggressive legal department and stock
price grabbing more headlines than any technical innovations coming
out of the company, is on the wrong side. And RAMBUS' highly publicized
relationship with the computing industry's second most unpopular
300 lb. gorilla hasn't helped it in this respect, either.
Industry
climate and public opinion aside, however, it seems that in the
end, neither RDRAM nor DDR SDRAM will "win" in any sort of general
sense. The two technologies are different enough to where they'll
be used in specific markets in order to meet specific application
usage profiles and specific system design requirements. How that
plays out in the mainstream PC market remains to be seen. With
the news of Intel's intention to produce chipsets for the P4 that
support DDR SDRAM, what was once thought to be an immanent, unstoppable
descent of RDRAM into the mainstream now looks like a very complicated
market scenario where only those comfortable with predicting the
future on scant information dare comment on the way things will
end up. With regards to today's market and performance concerns,
we've already had i820 Rambus boards in our labs, so our thoughts
on real world performance today are already old news. What I hope
this article has done is give you a free and clear overview and
analysis of the technological facets of the two technologies,
which then will hopefully serve as an aide to you in interpreting
additional benchmarking that's sure to come down the pipe.
Evo
mali slicica iz raznoraznih modova za Counter Strike pa se malo
lozite:

-----------------------------------------------------------------------------------------------------------------------
Novosti
- Interviews - Najave
- Opisi - Hadware
- Download - Links
-----------------------------------------------------------------------------------------------------------------------
Copyright
© 2000 Centurion Team. All rights reserved.
WebMaster -) bedada@hemo.net
(-
Optimal resolution 800x600
|