ICE Help JVCC

From ICE Enterprises
Jump to navigation Jump to search
Go to the full list of ICE Help pages.

Summary: Java Verilog Cross Compiler for data processing cores

Motivation:

The ICE-CORE (Code-Once-Run-Everywhere) framework is intended to simplify algorithm development and deployment by using a single test and development methodology when writing code that runs on different platforms such as CPUs, GPUs, VPUs and FPGAs.

The maintenance of these source files can be reduced in many situations by using the ICE-JVCC cross compiler. We define a new language, JavaVerilog, that has the information necessary to automatically generate the Java, C, and SystemVerilog code for various platforms.

Compiler

The JVCC cross compiler takes in a Java/Verilog coreName.jv file and generates the source code for each of the different platforms. This includes coreName.java for a JVM, coreName.c for a CPU, and coreName.sv for an FPGA supporting SystemVerilog.

The Java and C versions are self contained and will run on any JVM or CPU.

The SystemVerilog version contains instances of Java objects converted into System Verilog modules that can be compiled into .bit files on Xilinx, Altera or any other FPGA supporting SystemVerilog. In this case, the library calls in the C code initialize the objects, load the initial class variables into the FPGA, and start the data flow to execute the core's processing methods in the hardware device.

Language

The JavaVerilog language follows Java 1.6 constructs with the following extensions:

Integer data types can specify the number of bits, ex. uint6 for a 6 bit integer. The Verilog syntax for selecting bit ranges of an integer is adopted for ease of use. For example: myint[5:3] refers to bits 3 through 5 of the integer myint. Fixed floating point types fptx and dptx are introduced to support FPGA platforms that do not efficiently support IEEE floating point arithmetic.

Flows

The current JVCC supports three different processing flows: Stream, Buffer, and Array.

The Stream flow is useful for applications working on a stream of data accessing a window of a few samples at a time which is often the case in signal processing.

The Buffer flow is useful for packet processing where one needs random access to data within defined blocks of a data stream.

The Array flow is useful for implementing fixed vector operations.

The first two flows each have one data input stream and one data output stream. The Ice-Core framework handles getting control information and data to/from the core. Alternate frameworks may use OpenCL to implement these control and data flow functions. The compiled FPGA module behaves as an OpenCL kernel.

Data Types

The Java language supports primitive data types of byte, short, int, long, float and double. JavaVerilog extends this set to include integers of any bit length and fixed floating point types. When implementing these variables on non-FPGA platforms, they are handled by the larger native primitve type. The supported data types are defined in CoreTypes.lst which is read in by the compiler.

Floating point is currently implemented in the FPGA as fixed floating point. The fptx data type is 32 bits with 16 fractional bits to the right of the point. The dptx data type is 64 bits with 32 fractional bits to the right of the point.

Data Structures

To define data structures that do not have class methods or constructors, the class must extend the DataTypes class. These classes map into C structs and SystemVerilog packed structures. The structure members will be in the order the variables are encountered in the class. The offset of each variable in a class, including data structures, is tracked by the compiler for initialization, run-time modification and readback.

Cores

Cores are objects that can be accessed by the external world. They are composed of code that can perform operations on local variables, instantiate other Cores or Components, and call Tasks or Functions. They are accessable through a set of C or Java library calls.

core = new Core(N,M)   : instantiates a Core with max usage parameters
core.set(Name,value)   : sets a runtime parameter
value = core.get(Name) : gets a runtime parameter
core.open()            : prepares for processing loop with current parameters
core.process(isb,osb)  : runs the processing loop for a given Input/Output Streams
core.close()           : finishes processing and release resources

Cores currently have one data input stream, one data output stream and a control interface. The public class variables are accessable from the external interface for monitoring and/or real-time control.

Cores can instantiate other cores, components, and tasks.

Components

Components are blocks of code that implement functions that may be used by this core or others. Their variables are not readable from the external interface but are initialized by their calling core or component.

Components can instantiate other components and tasks, but not cores.

Functions

Functions for commonly used C math functions are available as methods in the CoreCommon class that both Cores and Components extend. This gives the JV code a more familiar C style for math functions. The functions are typically implemented as 1st order look-up tables in the FPGA code.

Unless called out in CoreFunctions.lst as a task, all functions complete in a single clock.

Tasks

Tasks are functions that may take multiple clock cycles in the FPGA version. Some Functions are implemented as Tasks automatically. These decisions are guided by the CoreFunctions.lst configuration file which is read in by the compiler.

Declarations

Although Java and Verilog support declarations almost anywhere in the code, to keep the C translation ANSI comlpiant, all declarations must be completed before the first operational line of code in each method.

Defines

All static declarations in the JV code are converted to defines in the C and FPGA code. The class constructors in the open() method are used to build the FPGA module resources. This requires all arguments to the constructor to be static variables that create resources for the worst case at runtime.

There are a few special static variables that are reserved for special use:

FLOW=v   : Type of data flow must be STREAM, BUFFER, or ARRAY
PIPE=n   : Pipe mode for loops: 1=On 0=Off -1=Auto (default=AUTO)
BW=n	   : Bus Width in bits for FPGA data interface
IBW=n	   : Input Bus Width in bits for FPGA data interface (default=BW)
OBW=n	   : Output Bus Width in bits for FPGA data interface (default=BW)
MC=n	   : Master Core mode: 1=Core is comprised of other cores, 0=Normal Core
VERBOSE  : Turn on verbose print statements (vprint) for debugging
AUTOLOCAL : Turn class varialbes into locals in the C process method to help optimizer

FPGA Implementation

The compiler assumes a synchronous design methodology in the FPGA. The system clock is used to supply all control interfaces as well as read the input stream/buffer and write the output stream/buffer. Most statements will use this clock. A 2x clock is available for special loops.

The coreName.sv file contains three sections: Declarations, Sequencer, and Execution.

The variables in the declarations section are allocated much as they are in C. All other statements are then evaluated for input and output variable sensitivity.

The sequencer section uses the sensitivity list to decide which clock on which to execute each line of code. Loops are unrolled in time by default. When pipelined, many of these lines are executing simultaneously. Each equals sign (or other form of assignment) infers a clock edge. Complex equations can be split into simpler equations of similar complexity and combined on the next line to improve timing.

The execution section implements the assignment statements in a single always block except for unrolled loops that are converted to unique generate-for loops with their own 1x or 2x clock.

Directives

The compiler can be given directives to tune its behavior. They must be entered as in-line comments and will apply to the entire line.

jvc.pipe	: pipeline this for or while loop - this is the default in Stream mode
jvc.clocksPer=N  : number of clocks per pass through pipelined loop
jvc.unroll=N	: unroll or parallelize a loop N indices at a time
jvc.accum=N	: calls out variables for an accumulator unrolled by N 
jvc.clk2x	: use the 2x clock for this loop
jvc.ROM	: implement array as Read Only Memmory, compiler handles initialization
jvc.passive	: this object is passed between two components and needs special handling

Compiler directives are case insensitive.