ICE Help GENERAL

From ICE Enterprises
Jump to navigation Jump to search
Go to the full list of ICE Help pages.

Summary: General (collection of concepts & features)

A brief description of the general features of the ICE family of DSP cards.

PERFORMANCE - Cost/Performance benefits

The ICE family of Digital Signal Processing boards are designed to deliver the highest performance at costs in line with Personal Computer budgets. For more information about what each card can do, see the HELP CARDS entry. For current pricing, visit www.ice-online.com.

SCALABILITY - Cards, Chassis, and Interconnects

The ICE family of DSP cards are sized to fit in a PC chassis. The ICE-PIC and ICE-MBT series are PCI devices. The ICE-SLIC series are Cardbus/PC-Card devices. The ICE-NIC series are external devices connected to a host via Gigabit ethernet.

FLEXIBILITY - Programmable Hardware Concepts

The 16-bit digital inputs are fed directly into an Field Programmable Gate Array. This part can be re-programmed to perform application specific front end bit processing. It is then fed into a SHARC or PowerPC DSP for further processing before it is DMA'd into the host computer. The DSP is programmed in C or assembly.

Standard configurations supported by the default boot code include 1,4,8, and 16 bit data packing, various acquisition triggers, and data gates.

Non-standard configurations might include feeding 8 pairs of clock and data into a 16 bit input module, or demultiplexing a serial bit stream for follow on controller processing.

The module sites include a set of master/slave pins which can be used to strap two modules to begin acquisition/playback on the same clock. The Series-3 and later cards have external access to these signals to synchronize multiple cards.

TIMECODE - Handling Embedded TimeCode

Digital time code embedded in the input stream is processed by library routines that run on the host computer. Double clutch timecode is handled automatically.

When acquiring data, the timecode bit from the raw input is processed in the FPGA with a defined Barker code. The timecode is tagged with the sample number and the last two stored in the FPGA's memory. Host code queries the FPGA for this information and maps the timecode to a specified index in the host acquisition buffer. This allows time tagging 8 or 16 bit packed data, as well as the on-board tuner output. The delay through the tuner chips is is compensated for in the host software. See HELP PIC_TC.

Digital IRIG-B input to the external trigger port is processed by the IOC FPGA into a barker code and 32 bits of data much like the other digital time code standards. The accuracy is about 100uS on most GPS receivers. The A2Dr7 modules have an optional 1PPS port that can be used to refine the measurement to the accuracy of the 1PPS +- 10nS.

If a computer has NTP enabled (Network Time Protocol), only the 1PPS is needed to provide and accurate time stamp.

SDDS embeds the timecode in a packet header. This is read by the IOC and handled downstream in the same way embedded serialized timecode is handled.

The PIC5 series can also process serialized SDN timecode embedded in the SDDS payload section. To enable this, simply specify TC=SDN0 (or TC=SDN3 for some tape playback scenarios).

The PIC4 series handles this case with special setup steps. The I/O Module must use the RXSDDSDATA flag to to eat the SDDS packet headers and present the PIC4 with normal 16 bit data. This is then processed by the normal IOC=II or IOC=IO FPGA load which handles the SDN timecode. Since the default download for an SDDS module is IIS or IOS, the IOC code must be specified in the card reset.

OVERSAMPLING - Upsampling Techniques for Digital Tuners

Digital Tuner chips typically have a fixed lower end to the decimation they support. This is usually limited by the number of filter taps it can compute per output sample. At low input clock rates, the chips multipliers are not used efficiently, unnecessarily limiting the output bandwidth. One technique to use more of the chip is to resample the input at a higher rate so there are more clock cycles available per output sample. The simplest form is to insert a fixed number of zeros between each input sample. This has the affect of duplicating the input spectrum N times, where N is the number of zeros inserted per sample.

To make software more generic, oversampling is applied to the tuner ports by setting the oversampling rate that the tuner inputs will be seeing before the tuner port is set up. The only affect on the tuner port will be to relax the minimum decimation. The gain loss from the zero insertion is compensated for in the pic_tuner library.

The oversampling circuit can also be used to shield the tuner chips from clock irregularities. When digital inputs are switched or tape playback machines loose signal, the clock presented to the ICE-PIC may contain glitches that the tuner chips cannot recover from. With an oversampling factor = 1, the input clock is conditioned by the IOC gate array to keep glitches from affecting the tuners.

An oversampling factor = 2, inserts 1 zero between each input sample.

The input clock must be < 20MHz to apply the OVSR=1 conditioning. The oversampled rate (inputrate*OVSR) must be < 40MHz on series 3 cards. The oversampled rate (inputrate*OVSR) must be <= 100MHz on series 4 cards.

RESAMPLING - Resampling Techniques for Digital Tuners

The GrayChip 4016 tuner chips have an optional digital resampler that can be applied after the Tune-Filter-Decimate stages. This can be used to create baud synchronous sample rates for demodulators or 8000Hz for VGC extraction. The filters from the GrayChip web site are available as Midas files in the DAT directory of the ice tree. The GC4016 user guide has a detailed discussion on the resampler algorithm.

In short, the tuner output is oversampled by inserting NDELAY-1 zeros between each tuner output sample. The resampling ratio is used to determine which of the NDELAY fractional sample points to use for each resampler output and runs an NTAP filter on that point. The filters in the ice dat directory are actually NDELAY*NTAP point symmetric filters. We call it an NTAP filter because only NTAP of the points need to be computed since only one in NDELAY taps have non-zero data values. The phase jitter introduced by this technique reduces the SNR to about 40dB.

The pic_loadfile(), NDELAY=n flag, RESAMP flag and pic_setKey(KEY_RATIO) function are used to setup the resampler. The resampler ratio is defined as desired output sample rate divided by the tuner output sample rate.

The NDELAY=n defaults to 32. If you are not using a filter built for NDELAY=32, the flag must be added to the config string when loading the filter. The filter file names from graychip use the naming convention, res_<NTAP>x<NDELAY>_<WIDTH>. For example, the file res_15x32_80 is a 15 tap 80% filter with 32x oversampling.

The PIC5 tuner has a 10 tap resampler inserted between the CFIR and PFIR filters with an NDELAY=2048. The CFIR and PFIR filters should be chosen to achieve optimal results. The CFIR is a decimate by 2 filter at 4 times the output rate so a 25% filter should be selected (the default is dfir_25). This presents a twice oversampled complex waveform to the resampler section. The resampler increment is a 28 bit counter with an automatic M over N circuit to preserve exact timing for most ratios. The accumulator register is reset every M samples to remove binary rounding errors. The M output samples for each N input samples actually used is displayed if the VERBOSE=2 flag is present. M and N are 16 bit integers. The output is then sent through the decimate by 2 PFIR filter for final output conditioning.

Note that if the real output mode is used with decimation=1, the output will be frequency shifted up by (Fso - Fsi)/4, where Fsi is the input frequency to the resampler and Fso is the output Frequency. This offset can be removed by tuning off of Fsi/4 by this amount.

The maximum output bandwidth of the PIC5 tuner is 64MHz. To preserve the whole band use decimation=1 with the AOVSR (auto-oversampling) flag. The tuner allows the center frequency to be adjusted for the new output rate. To disable the M over N circuit, user the NORESMON flag.

BOT - Bank Of Tuners on Processor Modules

There are 32 individual tuner channels on the DTDM/V6M/K8M processor modules. In normal mode, each channel has independent decimation, frequency, start/stop control, and DMA buffers.

A Tuner Bank is a block of tuners that share a DMA channel for efficiently handling a number of similarly configured channels. They must have the same decimation and are required to start/stop together. TunerBank=1 uses the 16 tuners on the Module=1 side, TunerBank=2 uses the 16 on other. TunerBank=3 uses the 32 tuners from both sides all being fed from Module=1 and returned in a single DMA buffer.

Tuner Banks are selected by specifying PORT=TBANKn instead of PORT=TUNERn. The DTDM/DTDMX Modules support up to 32 channels tunable anywhere in the spectrum. By default, the pic_ioport call will implement as many channels as are available on the named port. To use less channels, set KEY_CHNS=n before the call to pic_ioport() or add the CHNS=n flag to the config string, or use the /NCHN=n switch on SOURCEPIC.

To control individual channels from SOURCEPIC, set the CHAN key, before setting FREQ or GAIN. If CHAN is set to zero, the setting applies to all channels in the bank.

If channels are contiguous in the spectrum, using /DFREQ=dfreq with SOURCEPIC will set up the tuners equally spaced by dfreq Hertz starting at the <freq> parameter. Setting the FREQ in this mode, moves the whole block. You cannot tune individual channels. In this mode, the Fast Tuner Transform algorithm can be applied to increase the number of usable channels to 256 with FTT=2, or 4096 with FTT=3. See the FTT discussion for more details.

The frame or packet size for the output DMA buffer, KEY_PKTLEN, and the channel spacing, KEY_DFREQ, must be set ahead of the pic_ioport() call.

The output DMA buffer will contain KEY_PKTLEN bytes of data from channel 1, followed by channel 2, ... up to channel N, then start over at channel 1.

For more details, see the help on the FTT flag. For more details, see the help on the FTTM flag.

FTT - Fast Tuner Transform Concept of Operation

The DTDM/DTDMX Modules have 64Mby of DRAM and a fast memory crossbar that allows multiple reuse of the 8 graychips. The FTT is a multipass algorithm similar to a radix-16 FFT pass. A first Bank of 16 tuners selects 1-16 blocks of the input spectrum and streams them to circular buffers in memory. The second bank of 16 tuners then selects 1-16 blocks from each of these streams (much faster than real-time) and streams them back to memory. This is practical for 2 to 3 passes.

The FTT algorithm is enabled by adding the FTTM=2 or FTTM=3 flag in the device configuration stream and accessing a Tuner Bank. By default, the pic_ioport call will implement as many channels as possible given the port, decimation, channel spacing, and number of FTT passes (specified by FFTM=N). To use less channels, set KEY_CHNS=n before the call to pic_ioport().

For more details, see the help on the FTTM flag.

DMA - DMA Concepts and Channel Allocation

High speed data transfer is via the PCI controller's DMA engine which is given maximum hardware-level priority since the card has minimal buffer memory. The host computer typically allocates a circular buffer in memory to hold 1-2 seconds of data (to cover host application software latencies). The SHARC/PPC then processes DMA requests from 1 to 80 of it's input/output ports. All 80 DMA channels can be owned/controlled by different processes.

Acquisition/Playback can occur through the following device ports:

SERIAL1-2   : serial ports (PIC2 only)
LINK1-6     : link ports (PIC2 only)
TUNER1-32   : tuner channels
MODULE1-2   : I/O Modules
INTERNAL1-8 : internal algorithms
EXTERNAL1-8 : internal algorithms (extended memory on PIC4/MBT4)

The port is usually specified in the hardware configured device alias. See HELP PIC_OPEN for details, HWCONFIG.KEY in the DAT area for examples.

There are 8 hardware DMA channels on the SHARC that are shared between the ports. This means that up to 8 hardware acquisitions/playbacks can be occurring simultaneously on a single ICE card. There are also internal algorithms executing on the SHARC that may also produce or consume DMA data buffers.

The FPGA on the 5 series cards allow each DMA channel to have its own port so there are no resource conflicts. The PPC is a controller only. Its DMA resources are not used to handle data.

A serial port is tied to its DMA channel. A link port can be associated with any DMA channel supporting a link buffer. The DMA channel will be determined automatically from the port name.

The user is responsible for managing any sharing of the serial port, tuner/serial port, link port, and module/link port DMA resources.

The DMA Channel mappings for ICE-PIC2 are:

Chan 1	Serial Port 1 Receive			SERIAL1/TUNER1
Chan 2	Serial Port 2 Receive  or Link Buf 1	SERIAL2/TUNER2/LINK1
Chan 3	Serial Port 1 Transmit			SERIAL1
Chan 4	Serial Port 2 Transmit or Link Buf 2	SERIAL2/LINK2
Chan 5	Link Buffer 3 				LINK3/MODULE1
Chan 6	Link Buffer 4 				LINK4/MODULE2
Chan 7	Link Buffer 5           		LINK5/MODULE1HS
Chan 8	Link Buffer 6            		LINK6/MODULE2HS

The DMA Channel mappings for ICE-PIC3 are:

Chan 2	Link Buffer 1				TUNER-A
Chan 4	Link Buffer 2				TUNER-B
Chan 5	Link Buffer 3 				MODULE1HS
Chan 6	Link Buffer 4 				MODULE2HS
Chan 7	Link Buffer 5 				MODULE1
Chan 8	Link Buffer 6 				MODULE2

The DMA Channel mappings for ICE-MBT2 and ICE-MBT3 are:

Chan 2	Link Buffer 1				TUNER-A
Chan 4	Link Buffer 2				TUNER-B
Chan 5	Link Buffer 3 				TUNER-C/MODULE1HS
Chan 6	Link Buffer 4 				TUNER-D/MODULE2HS
Chan 7	Link Buffer 5 				TUNER-E/MODULE1
Chan 8	Link Buffer 6 				TUNER-F/MODULE2

Each tuner chip uses one of the sharc link ports for acquiring the tuner outputs. The four channels in each tuner chip must have the same decimation. Tuner channels are allocated such that odd and even channel numbers are fed by modules 1 and 2 respectively. See the allocation chart below:

TUNER-A	Channels 1,3,5,7	Link Port 1 
TUNER-B	Channels 2,4,6,8	Link Port 2 
TUNER-C	Channels 9,11,13,15	Link Port 3 
TUNER-D	Channels 10,12,14,16	Link Port 4 
TUNER-E	Channels 17,19,21,23	Link Port 5 
TUNER-F	Channels 18,20,22,24	Link Port 6 

The ICE-MBT3 can also collect wide-signals bypassing the tuner chips. Since the wideband paths and the tuners share the link ports, resource contention occurs. If the wideband transfer is < 38Mby/sec, only link ports 5 or 6 are used. If the wideband transfer is >= 38 Mby/sec, Module 1 will take link ports 5 and 3, and Module 2 will take link ports 6 and 4. This means that tuners C through F may be unusable while processing wideband simultaneously.

The DMA Channel mappings for ICE-PIC4T and ICE-MBT4 are:

Chan 5	Link Buffer 1 				MODULE1
Chan 6	Link Buffer 2 				MODULE2
Chan 7	Link Buffer 3 				MODULE1HS
Chan 8	Link Buffer 4 				MODULE2HS
Chan 9	Link Buffer 5				TUNER-N odd
Chan 10	Link Buffer 6				TUNER-N even

There is no link port sharing between tuners and modules on the series 4 cards. All odd tuners are multiplexed through DMA channel 9 and all even channels through DMA channel 10. The data is demultiplexed by the SHARC into separate host buffers.

The DMA Channel mappings for ICE-PIC5+ Input/Output are:

Chan 1        MODULE1
Chan 2        MODULE2
Chan 3        CORE1 / TUNER1
Chan 4        CORE2 / TUNER2
Chan 5        CORE11
Chan 6        CORE12
Chan 7        CORE21
Chan 8        CORE22
Chan 9        MCORE11 / TBANK11 / TUNER1-31
Chan 10       MCORE12 / TBANK12 / TUNER2-32
Chan 11       MCORE21 / TBANK21 / TUNER33-63
Chan 12       MCORE22 / TBANK22 / TUNER34-64

Access to the ICEMBT ports is made transparent via software such that the PICDRIVER and SOURCEPIC primitives may access a port on an ICE-MBT just as they would a port on an ICE-PIC.

CHAINING - DMA Chaining Concepts

When a DMA completes (dma->todo goes to 0), the controller checks the dma->chain field. If non-zero, the DMA structure's chain related fields are replaced by the values in the DMACHAIN structure pointed to by dma->chain. The new DMA will then be processed without interrupting the input/output stream.

The DMACHAIN structure has the following fields:

haddr	- the HOST buffer physical address in words
hsize	- the HOST buffer physical size in words
todo	- the number of buffers to process, 
or DMA_ONESHOT,DMA_CONTINUOUS,DMA_SPIN
chain	- pointer to the next DMACHAIN structure

The chain field for the last element in the chain must be zero. Users should use the pic_dmachain() routine to populate the chaining registers. Note that dmafunc(p,dmac,DMA_STATUS) offset values are referenced to the initial buffer start.

SHARCMEM - SHARC/PPC Memory Allocation

The controller chip on series 2 and 3 cards, has two 128kBy blocks of internal memory. The lower half is used for the sequencer logic and user programs. The upper block contains the circular buffers for DMA channels. The DMA block is divided as follows:

word addr range	usage
0x28000-29FFF		Module-1
0x2A000-2BFFF		Module-2
0x28000-2BFFF		Module-1 VHS
0x2C000-2FFFF		Module-2 VHS
0x2C000-2CFFF		Tuner-A		(MBT2/MBT3)
0x2D000-2DFFF		Tuner-B		(MBT2/MBT3)
0x2E000-2EFFF		Tuner-C		(MBT2/MBT3)
0x2F000-2FFFF		Tuner-D		(MBT2/MBT3)
0x28000-28FFF		Tuner-E		(MBT2/MBT3)
0x2A000-2AFFF		Tuner-F		(MBT2/MBT3)
0x2E000-2EFFF		Tuner-1		(PIC2/PIC3)
0x2F000-2FFFF		Tuner-2		(PIC2/PIC3)
0x28000-28FFF		Internal-1
0x29000-29FFF		Internal-2
0x2A000-2AFFF		Internal-3
0x2B000-2BFFF		Internal-4
0x2C000-2CFFF		Internal-5
0x2D000-2DFFF		Internal-6
0x2E000-2EFFF		Internal-7
0x2F000-2FFFF		Internal-8

The SHARC controller chip on series 4 cards, has two 256kBy blocks of internal memory. The lower half is used for the sequencer logic and user programs. The upper block contains the circular buffers for DMA channels. The DMA block is divided as follows:

word addr range	usage
0x48000-49FFF		Module-1
0x4A000-4BFFF		Module-2
0x48000-4BFFF		Module-1 VHS
0x4C000-4FFFF		Module-2 VHS
0x4C000-4DFFF		Tuner-A		(PIC4/MBT4)
0x4E000-4FFFF		Tuner-B		(PIC4/MBT4)
0x48000-48FFF		Internal-1
0x49000-49FFF		Internal-2
0x4A000-4AFFF		Internal-3
0x4B000-4BFFF		Internal-4
0x4C000-4CFFF		Internal-5
0x4D000-4DFFF		Internal-6
0x4E000-4EFFF		Internal-7
0x4F000-4FFFF		Internal-8
0x50000-51FFF		External-1 or ITDEC Channel-1	
0x52000-53FFF		External-2 or ITDEC Channel-2	
0x54000-55FFF		External-3    ...
0x56000-57FFF		External-4
0x58000-59FFF		External-5
0x5A000-5BFFF		External-6
0x5C000-5DFFF		External-7
0x5E000-5FFFF		External-8

Note that some of the memory buffers overlap and cannot be used simultaneously. Currently no internal checks are made to notify users of overlap.

MIDASDSM - Connecting to an STL Digital Switch Matrix

The MIDAS suite of hardware usually consists of a Digital Switch Matrix from Signal Technologies Laboratories. The 16+clock bit digital signals are brought in/out of the switch matrix on a 36 strand twisted pair ribbon cable. These cables connect to a switch "transition panel" usually at the back of the equipment rack. The SMS or SDN cables attach to the opposite side of the transition panel.

A transition panel will also exist near the ICEPIC's computer. This panel has a 40 pin interface and is available through ICE or STL. High density ribbon cables that attach the ICEPIC to the panel are available through ICE.

A diagram of the connectors is posted on the www.ice-online.com website.

CLOCKING - Clock sources and selection

Most IO Modules provide their own clock, either derived from the data or from an external source.

The two IO Module sites on ICE cards can operate independently or from a global muxed clock. The global clock is necessary when:

  1. Multiplexing data from the A and B ports
  2. VeryHighSpeed mode when the resources from both ports are bridged
  3. Synchronizing sampling clocks to both modules
  4. Driving a module without its own clock source (i.e. D2E,D2T)

The IOC code _II is for 2 independent input modules, each with their own clock. The IOC code _IIX is for 2 inputs with a global muxed clock. The IOC code _IO or _OI is for 1 input and 1 output. The input gets its

clock from the module, the output from the global muxed clock.

The IOC code _OO is for 2 outputs with a global muxed clock.

To set the source for the global muxed clock, add the MUXCLK=s flag to the card configuration string handed to the pic_open() call.

There 7 possible sources for the muxed clock signal:

s=N No MUXCLK
s=I Internal clock = 40MHz/N where (N=1,1024)
s=X External clock SMB on series 3/4 card edge
s=A Module A input clock (or s=1)
s=B Module B input clock (or s=2)
s=C Alternate Crystal CCLK on series 3/4 cards
s=D Alternate Crystal CCLK/N where (N=1,16)
s=P Programmable Clock on series 4 cards (.1 to 105 MHz)
s=PX Programmable Clock using the external reference (PREFX)

When using the global clock, the CLKI flag can be used to invert the clock. The DEGLITCH flag will run the A and B sources through a deglitching circuit.

PLATFORMS - Notes on specific platforms

Compaq ES40 Server PCI slot configuration

TOP	BUS0	SLOT7	5V/64b/33MHz
	BUS0	SLOT8	5V/64b/33MHz
	BUS0	SLOT9	5V/64b/33MHz
	BUS0	SLOT10	5V/64b/33MHz
	BUS1	SLOT1	5V/64b/33MHz
	BUS1	SLOT2	5V/64b/33MHz
	BUS1	SLOT3	5V/64b/33MHz
	BUS1	SLOT4	5V/64b/33MHz
	BUS1	SLOT5	5V/64b/33MHz
BOT 	BUS1	SLOT6	5V/64b/33MHz

Compaq ES45 Server PCI slot configuration

TOP	HOSE2	SLOT7	3V/64b/66MHz
	HOSE2	SLOT8	3V/64b/66MHz
	HOSE0	SLOT4	5V/64b/33MHz
	HOSE3	SLOT10	3V/64b/66MHz
	HOSE3	SLOT9	3V/64b/66MHz
	HOSE0	SLOT3	5V/64b/33MHz
	HOSE1	SLOT6	3V/64b/66MHz
	HOSE1	SLOT5	3V/64b/66MHz
	HOSE0	SLOT2	5V/64b/33MHz
BOT	HOSE0	SLOT1	5V/64b/33MHz