Advances in Robotics & Automation

The advancing complexity of contemporary microelectronics has motivated research in high-level and system synthesis (HLS). Formal and intelligent HLS techniques are presented in this contribution, thus the generated implementation is correct-by-construction. These intelligent techniques include RDF (Resource Description Framework) and logic relations, along with automatic implementation options and they are employed for the transformations of a hardware compiler. The proposed toolset utilizes compiler-generators, RDF rules and logic programming in combination with XML validation of the internal state of the compiler. These intelligent and formal techniques make the whole transformation from source code to implementation, formal. The HLS tool is enhanced with the Parallel, Abstract Resource – Constrained Scheduler, which aggressively optimizes the initial state schedules, into maximally parallelized ones. A number of custom options are applied by the user of this toolset, in order to automatically compile selected testcases from real-world applications which prove the usability of the embedded scheduler and the formal compilation of the intelligent HLS compiler.


Introduction
Digital microelectronics found in embedded, high-performance and portable computing systems have highly complex components, design hierarchy and interconnections. During the last couple of decades, commercial and academic organisations have invested in HLS and optimisation techniques, so as to achieve design automation, quality of implementations and short specification-to-product times [1,2]. However, existing HLS tools are not widely accepted by the engineering community because of their poor results, especially for large applications with complex module and control-flow hierarchy. Very often, the programming style of the source code has a severe impact on the quality of the synthesized implementation. For largescale applications, the complexity of the synthesis transformations (front-end compilation, algorithmic transformations, optimizing scheduling, allocation and binding), increases exponentially, with a linear increase of the design size [3].
Existing HLS tools impose proprietary extensions or restrictions (e.g. exclusion of while loops) on the programming model of the specifications that they accept as input, and various heuristics on the HLS transformations that they utilize (e.g. guards, speculation, loop shifting, trailblazing). Most of them are suitable for linear, dataflow dominated (e.g. stream-based) designs, such as pipelined DSP, image processing and video/sound streaming.
The contribution of this work is an integrated HLS toolset which utilises intelligent and formal techniques so as to apply the sourceto-implementation optimizing transformations, thus, the produced hardware implementations are correct-by-construction. Therefore, the design needs verification only at the top behavioral level, without spending days or even weeks, on lengthy RTL or annotated gate simulations. Moreover, various custom options can be applied by the user on the automatic HLS transformation, such as the type of the micro-architecture, the generated HDL code as well as the inclusion of custom (e.g. arithmetic) logic functions throughout the HLS compilation.
The author has designed and developed an intelligent HLS compiler [4] that includes a scheduler of operations into control steps, achieving the maximum functional parallelism in the synthesized implementation [5]. This HLS scheduler called PARCS, utilizes logic programming [6] and RDF subject-predicate-object relations [7], to formally achieve the maximum possible parallelism of operations. In this way, the functionality of the delivered implementations is correctby-construction [3] explores various scheduling techniques.
Formal HLS techniques are analysed in the next section. Next, the intelligent approach of the prototype optimising CCC synthesizer is described, such as formal predicate logic [6], RDF relations and XML schema validation [7]. Then, the usability and correctness of the CCC HLS toolset are evaluated with a number of benchmarks. The last section draws useful conclusions and proposes future work.

Existing Work in intelligent HLS techniques
Established and well studied HLS tasks include scheduling, allocation and binding [3]. The front-end part of HLS tools include parts of software programming language compilers [8], such as parsing, semantic analysis, intermediate variable optimization, elimination of dead code, etc. The front-ends exchange information with the back-ends using intermediate formats, such as the Electronic Design Interchange Format (EDIF) [9,10], used by most E-CAD tools. Complex control flow optimization has been evaluated in [2,11,12], but for small parts of code and by no means complete application tests [13], discusses synthesis for distributed logic and memory [14] uses communicating processes as a system specification medium. HLS methods that include memory access management are outlined in [1], where digital signal processing (DSP) and streaming applications are synthesized using performance constraints [15], analyses mutually exclusive scheduling on extended data-flow graphs (EDFG) [16] synthesizes behavioural descriptions with time constraints, where complex operations are decomposed into simpler ones, and a similar set of decomposed fragments of operators, with the same pattern, are scheduled in a clock cycle.
In [17] an actor, that is used to model every module or system process, communicates with other actors via communication channels. These actors are used by the System Co-Designer [17] to exercise electronic system level (ESL) design space exploration. In [18] the SURYA system utilises the Simplify theorem prover to prove that the RTL model generated by HLS tools is functionally-equivalent to the specification. SURYA found two bugs in the SPARK HLS tool [2], which were until then unknown. In [19] flip-flops are replaced with latches so as to improve implementation timing, since latches are inherently more tolerant to process variations than flip-flops. Nevertheless, latchbased design is more cumbersome than flip-flops.

The W3C Resource Description Framework
The Resource Description Framework (RDF) is a metadata model and is used to model the information of web resources [7]. RDF models include subject-predicate-object relations with explicit statements, called triplets. Triplets specify resources that store knowledge, information and data retrieval in large automated software tools. This is achieved due to the suitability of RDF to capture, store, exchange, and use machine-readable web information. Here, RDF is used to formally model and validate the internal data and state of the author's HLS tool.
The eXtensible Markup Language (XML) serialization format is used to define RDF. XML's syntax is formally specified [7] and is ideal to model simple triples such as subject-predicate-object (and other) relations. The following RDF relation: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://www.awl.com/ Formal_Synthesis"> <dc:title>High-level Synthesis</dc:title> <dc:publisher> Addison-Wesley Publishing</dc:publisher> </rdf:Description> </rdf:RDF> is the RDF knowledge model of the fact: "The title of this resource, published by the Addison-Wesley Publishing company, is Highlevel Synthesis". RDF and XML define object relations that also represent data attributes in large programs, and they can be used to validate internal data representations of E-CAD tools. XML files are easily readable by both humans and machines. An XML schema is a formal definition of an XML file, and it constraints the content and structure of such files. XML schema can be used to validate a particular XML instance. Well-known formalisms of the XML schema are the document type definition (DTD) language, the XML Schema and the Relax NG formats [7].
The graphical validation result of the above XML code is shown in Figure 1, and it was automatically produced by a DTD parser that helped to identify and correct some initial semantic issues in the tool development.
As widely documented, XML is a formal way to model internal information in design (E-CAD) tools. XML representation of C code is introduced in [20] to aid the CASE tool development. A hash table and a stream index are used in [21] to filter out invalid elements of an XML document. The IEEE std 1685-2009 IP_XACT, is based on XML schema, and is used as internal representation and exchange format between EDA tools (IEEE IP-XACT std, 2010). Such internal data include design units, interconnections, functional primitives, metadata, and IP blocks [22].
XML schema and the eXtensible Stylesheet Language Transformations (XSLT) language are used in [23] to enable the sharing, conversion, transfer and exchanging of healthcare database data. Many academic/commercial database and web tools utilise XML schemas and instantiations to represent internal information, such as primary data generated automatically and used by database, multimedia and web processing tools.
XML is combined with Web technology in [24] to structure, consult and share corporate data. Similarity algorithms for XML documents are analysed in [25]. The IBM DB2 query matching and compensation techniques are enhanced with XML functionality, to implement and evaluate query rewrite rules in [26]. An incremental approach called "T-Schema" (of XML-to-relational mapping storage), is proposed in [27], to address the strong dynamics of XML. The SMOQE tool generates the first regular XPath engine and provides answering queries technique, over recursively defined XML views [28] discusses the use of semantic web for EIS and databases. A XML metamodel captures NFRs and their relations in [29].

The Intermediate Predicate Format
The Intermediate Predicate Format (IPF) was invented by the author of this paper, to model the design and the HLS transformations in the CCC HLS tool [30] analyses the syntax and semantics of IPF, which uses the resolution of Horn clauses as formal object relations [6], to implement the HLS transformations. The front-end phase of the CCC compiler (see following paragraphs), generates the IPF database to capture all the algorithmic, structural and data typing attributes of the source program, as in the following Prolog fact: The Prolog predicate name fact_id relates the objects object1 to objectN in a formal manner. The identifier fact_id names the logical relation between the above objects. IPF facts represent program operations, data object descriptions, data types, operators, subprogram calls, etc. The back-end phase of the CCC compiler applies HLS transformations on these facts so as to produce the optimized design implementation. This is done in combination with the internal logical rules as a design knowledge-base in order to "conclude" and infer the RTL (register-transfer level) implementations. The IPF syntax facilitates declarative processing by Prolog predicates, as well as imperative processing. This is because IPF facts (e.g. data table facts) are referenced with their entry numbers in other IPF facts (e.g. program table facts).

The Intelligent CCC HLS Techniques
The synthesis design flow The HLS flow includes two major steps: the front-end phase and the back-end phase. These two tools exchange data via the IPF database. XML instances of IPF are automatically generated to validate the compilation process, using graphical or command-line based XML validators. The front-end phase performs the typical tasks of a software compiler, such as parsing, generation of abstract syntax trees, typechecking of data and program statements, optimization of auxiliary variables and constants, generation of syntactic and semantic error messages etc.
The XML schema view is also used by the front-end and backend phases to formally validate the translation, since IPF constitutes a formal link between the two phases of the hardware synthesizer.
CCC analyses the IPF database and generates the initial FSM schedule, considering custom options (such as the location of large data objects on shared memory). Then, it optimises the initial schedules using the PARCS (Parallel Abstract Resoure-Constrained Scheduling) scheduler. PARCS works with, or without resource constraints, and generates the maximally parallel hardware implementation [4], while satisfying source code dependencies.
A custom options file can be used to mark certain program subroutines as "custom blocks", so that they are used as pre-optimized and static custom modules. Hardware arithmetic blocks, or complete data-flow systems such as DSP filter blocks and cryptographic mathematical functions, can be used as custom blocks as in the benchmarks analysed in later sections.
Other custom options of the back-end compiler are the targeting of either massively-parallel (with resource redundancy), or conventional FSM+datapath micro-architectures modelled in Hardware Description Language (HDL) RTL code. More options include the targeted language which (at the moment) include VHDL and Verilog HDL.

Formal Back-end HLS Transformations
The logical relations of the back-end compiler use definite clauses [6] such as follows: where ← is the logical implication symbol (A ← B means that if B applies then A applies), ˄ is the logical conjunction symbol, and A0, …, An are atomic formulas (logic facts) of the form: predicate_symbol(Var_1, Var_2, …, Var_N) (form 3) Where the positional parameters Var_1,…,Var_N of the above predicate "predicate_symbol" are either variable names (such as in the back-end inference rules), or constants (such as in the IPF table statements) [6]. By combining these, the source code subroutines are transformed into optimized, provably-correct RTL hardware implementations.

Formal validation using XML schema
Formal logic rules (logic relations) as in form 2 construct the backend inference engine. Hence, the IPF's facts "drive" the logic rules of the back-end compiler which infers provably-correct hardware implementations, which are technology-independent, free of any standard template and custom microarchitectures in synthesizable HDL code. Here follows the XML view of the same relation: <complexType name=" transform1"> <sequence> <element name=" operation1"/> <element name=" left_operand_A "/> <element name=" right_operand_B "/> <element name=" result_data_C "/> </sequence> The graphical XML validation of the same transformation is shown in Figure 4. The XML schema instances are validated using any available validators.

Experimental Designs with Custom Options
A large number of ADA designs were synthesized with the CCC compiler and all of them were proven by simulations to be accurate and matching the behaviour of their source models. Here two of them have been analysed in order to demonstrate the usability of the CCC HLS custom options. These are a DSP FIR filter, and a RSA crypto-processor from cryptography applications.
In all tests, the intermediate form and the internal HLS transformations were validated against the respective XML schemas which were automatically extracted from critical points in the compilation flow. For the DSP filter, the two lower-level subroutines in ADA, which model the processing of one more incoming sample and the shifting of the filter history by one position, were chosen to be implemented as custom combinatorial blocks taking one clock cycle to implement. A top-level subroutine which contains calls to the above subroutines, was used to process a whole length of incoming sample arrays.
This subroutine was processed normally via the CCC compiler and produced VHDL code with an optimized FSM of 10 states. Within appropriate state descriptions of this FSM, the above custom block subroutines are called via VHDL call mechanism. The whole ADA coding and compilation of the DSP filter took less than half an hour to run. The hierarchy of the FIR ADA code and implementation is shown in Figure 5.
The experimental CCC environment is shown in Figure 6. The intended system is modelled in the ADA programming language. The initial ADA specification model is compiled with the GNU ADA and the generated binaries are verified with the ADA testbench and test vectors. This exhibits a rapid verification manner, due to extremely high compile-and-execute verification speed. Then the ADA subroutines that are intended for hardware (microelectronic) implementation are integrated in an autonomous ADA package (library module). The latter is compiled and synthesized into hardware using the prototype CCC tools. The generated hardware modules are downloaded in A Xilinx Virtex-2 FPGA was accommodating the synthesized hardware blocks, with the synthesis flow and the target architecture shown in Figure 6.
The initial state schedules are first extended with custom optionguided operations and the result is optimized with the PARCS scheduler as a new schedule. Statistics regarding the optimization rates of the PARCS scheduler are shown in Table 1. Table 1 indicates that the states reduction rate reaches up to 41% for the FIR processor case. This is a very efficient hardware implementation of large designs with a few hundred states per module, such as large ASICs and complex IP modules being part of embedded system SoCs.
The code of the ADA subroutines can be either standalone or hierarchical. This means that a number of ADA subroutines can include function/procedure calls to other subroutines of the library that is synthesized.
This hierarchy is maintained through the CCC compilation. Thus, the CCC designer dictates the modularity of the generated hardware blocks. Additionally, all the necessary co-processor interfaces and (e.g. memory) communication protocols are automatically synthesized by the CCC compiler and inserted in the initial schedules derived directly from the input models. This is implemented via a memory custom options file. Moreover, selected ADA subroutines can be "marked" as custom arithmetic modules. These are usually complex Boolean functions and they are inserted as expanded VHDL procedure calls in the generated FSM states, executed in a clock cycle. The custom blocks option strategy was followed on the lower-level functions of the FIR DSP filter and RSA crypto-processor benchmarks. Table 1 demonstrates that the number of hardware states is increasingly high, which constitutes the importance of the CCC framework contribution, in keeping the complexity of contemporary designs under control. It is a shared experience by experienced hardware designers, that development and verification of very complex FSMs with over 20-30 states is cumbersome and extremely prune to errors. Experienced programmers can use the CCC toolset, to implement very complex hardware designs in a few hours, whereas this takes usually more than 6 months of traditional development and verification time. This is due to the use of formal logic programming and RDF validation techniques embedded in the intelligence of the CCC synthesizer.
The ADA code hierarchy for the RSA public-key cryptography application is shown in Figure 7. The modular exponentiator and multiplier automatically produce optimized FSMs with 12 and 29 states respectively. The other 3 lower-level modules are transformed as custom blocks and their VHDL calls are instantiated into the higher-level FSM state descriptions, as shown in Figure 7. The multiplier subroutine is called within the exponentiator subroutine. This is a special case for the CCC translation and is being dealt as such by the compiler. In particular, this subroutine call is used to generate an "interface event" between the modular multiplier module and the exponentiator module. In general, when across both sides of a subroutine call, none of them is intended to be a custom block, then the subroutine calling is converted into a HDL module interface and data exchange mechanism. All of the VHDL modules for the FIR filter and the RSA cryptoprocessor have been simulated (although due to the formal nature of the CCC tool, not necessary) and the results coincided with the results of the ADA verification testbenches.

Conclusions and Future Work
The main contribution of this work is a formal, high-level hardware synthesis toolset and method developed by the author of this paper. The CCC HLS tool utilizes compiler-compiler and logic inference techniques to turn synthesis formal. The synthesis transformations are enhanced with XML schema validation as well as RDF logic relations to implement the execution and formal validation of the prototype hardware compiler. XML views of IPF as well as the formal CCC transformations are validated against their schema views.
Arbitrary and general input ADA code is synthesized into functionally-equivalent RTL VHDL/Verilog hardware implementation. Many applications were synthesized with the CCC toolset, two, very indicative ones being discussed in this paper. In any case, the functionality of the produced hardware accelerators (coprocessors) matched that of the input subprograms. The synthesized hardware can be used to accelerate complete hardware/software systems in their time-critical routines. The PARCS scheduler achieves high stateoptimization rates which exceeded 36% in some cases (Table 1).