Embedded Systems Software, Computer Networking and Geeky Fun

nerd1951.com

September 23, 2009

Big Website Changes Coming

Filed under: News, Projects, Geeky Fun — Harvey @ 9:18 pm

Activity on nerd1951.com has been little to none for the last several months (I have a talent for stating the obvious).  But his is about to change.  nerd1951.com will be getting a complete make-over soon.

My interests have moved on from computer networking in embedded system to multi-media and embedded systems.  It seems that a lot of the current interesting work in embedded systems includes multi-media.  The applications range from portable entertainment gadgets to robotics.  Tele-presence and remote sensing are important multi-media embedded systems applications.

The blog portion of nerd1951.com will be moved to its own page and I’ll continue to use Wordpress for that but I’ll probably delete most of the old content.  The main page will be dedicated to projects and news from other sources and will include some multi-media content.  I’m looking at using joomla for that.

I’m also working on a fun digital speech processing application that I will share here after the website make-ver is complete.

• • •
 

January 28, 2009

Using the Linux select() function for embedded systems

Filed under: Projects, Programming — Harvey @ 12:40 am

A couple of years ago I gave a presentation at the Embedded Systems Conference about implementing communications protocols in C++. One of the main themes in my talk was avoiding context switch because of its high cost. This is especially important in embedded Linux. The computational costs for context switching are high whether you are using processes or pthreads.

The primary reason for using embedded Linux is for network centric applications. One of the Linux Kernel’s biggest advantages is its reliable and efficient TCP/IP network stack. If you’re using Linux in an embedded system and networking isn’t a big part of your application you need to make sure you have another compelling reason for incurring the costs of a Linux kernel.

Linux provides an elegant mechanism for simulating multitasking for network applications. This is the select() function. For those who are not familiar with the select() function, it allows you to wait on events on multiple file descriptors. This might not seem too interesting unless you realize that network sockets are file descriptors. You can also write drivers for your specialized hardware that emulate file I/O so that you can create file descriptors for your application specific hardware devices.

The select() function works on three sets of file descriptors: read, write, and exception. You can also pass a timeout value to select allowing execution of periodic tasks. The main loop for an application that uses select() looks something like this:

Do forever:

Initialize the timeout
Initialize the File Descriptor Lists

Call select

Check for file descriptors in each list that are ready for servicing.
Determine if the time interval for periodic tasks has elapsed.

End do

In addition to the select() function, Linux provides macros for clearing, building file descriptor lists. There is also a macro to check which of the file descriptors are ready after select() returns.

The select() function is commonly used in implementing all kinds of network servers from Web servers to SNMP. The Linux man pages select(2) and select_tut(2) are very complete if a little overwhelming. On the Web, the world of select is a good starting point. If you want to look at a real world example that’s not too complicated, the boa web server is a good example.

• • •
 

October 2, 2008

uCLinux and the Analog Device Blackfin Processor

Filed under: News, Projects, Tools, Programming — harvey.sugar @ 3:35 pm

I’ve been doing a bit of work lately on a project that uses the Analog Devices Blackfin processor. The application is very ‘net-centric so we decided to use Linux, specifically uCLinux for the operating system. Of course this means that our development tools are the GNU compiler collection and all of its related applications.

Often, one of the hardest parts of a project like this is getting Linux up and running on your target hardware. The Blackfin uCLinux web page is very well organized and has a lot of good information. I downloaded the tools and the Linux distribution from their site and had Linux up and running on a development board in a matter of hours.

One thing that is an absolute necessity is to have a JTAG interface for programming FLASH through the processor. Once you have the boot program running you can program the FLASH using commands that the bootstrap provides but you need to get the bootstrap in FLASH first. An Austrian company, Blue Technix sells a USB JTAG ICE for the Blackfin called the Ice bear. It’s about $320 (US) and since they are in Austria, it could take a couple of weeks to get one so you need to plan ahead. The ICE worked just as advertised and was critical to the success of my project. Blue Technix also has some low cost eval boards too but given the Euro/Dollar exchange rate you might do better getting something from Analog Devices or one of their distributors.

As I was saying, the hard part is often getting Linux up on your own hardware. My target had some fundamental differences from the Eval board that I started working with. Like most micro controllers, the Blackfin has a bunch of multi-function pins. It also has two UARTS. Well our design used UART0’s data out pin as a software controlled hard reset. The other fundamental difference was that we were using SPI serial FLASH instead of the parallel NAND FLASH used on the eval board.

Porting the bootstrap went pretty smoothly. After about a days work, I had the bootstrap working. But it was different with the Linux kernel. I disabled UART0 and enabled UART1 using make menuconfig. But every time the board started to boot – bam! It reset almost immediately. I knew that somewhere the function for that pin was being set up to be UART0’s output. It took about three days of searching the code but I finally found it. The I/O pin setup for both UART0 and UART1 was hard coded deep in the init code. After I fixed that, I could boot Linux using TFTP.

The next step was getting Linux to work out of the SPI FLASH. There are two approaches to this. You can create a complete compressed file system image for a RAM disk system. Then you have the boot program copy the image from FLASH to RAM, decompress it and go. This approach is very simple to configure because it’s the stock build that comes with the distribution. The downside is that you don’t have any non-volatile storage without jumping through hoops.

The second approach took me a couple of more days to get running. You build a file system for FLASH and a kernel image. There’s more to configure to get this to work. One thing you need to keep in mind is that you can’t use any loadable kernel modules to access the file system since you need the file system to load them. The nice part about this is that now you have a file system that’s almost like a disk. If you make changes to your application you can use FTP to load them on the target. You can also use the system logger to log to syslog (they actually call the log /var/log/messages).

Throughout this whole process I was able to get very good support from their web site and forums. If an answer wasn’t in the documentation wiki, I would most likely find it by searching the forums. When I posted questions to the forum, I often had an answer within an hour or two. The documentation and support were better than many commercial RTOSs and cross-compilers that I’ve used.

What really amazed me was how well the tools all work. I wrote a pretty complex application in C++ using many of the C++ library classes, Pthreads, sockets, and all kinds of system facilities. I got it all working on my desktop Linux machine. Then I just recompiled my application using the Blackfin version of the tools, loaded it on my Blackfin uCLinux system and it just worked.

The one thing though that you have to remember is that if you’re messing with the kernel then according to the GPL, anyone who buys your product is entitled to the source code. Then they’re also allowed to re-distribute that code. If you’re working with custom hardware then you’re going to be messing with the kernel. I don’t think that this requirement ever hurt sales of the Linksys 54G routers though. If the project is for a government agency, they may require full source code anyway.

• • •
 

October 2, 2007

Processor and C++ compiler opinions anyone?

Filed under: News, Projects, Tools, Rants — harvey.sugar @ 8:46 pm

I know my readership has fallen off quite a bit since I haven’t been blogging regularly. I’m hoping that I still have a few loyal readers who might share their opinions with me. I am writing a book about using C++ for embedded systems. Much of the content is dissecting how C++ compilers implement the code they compile. This can vary from compiler to compiler and from processor to processor so I want to test a variety of C++ compilers on at least two processors that are commonly used in embedded systems.

I’m planning to use the ARM-7 processor which seems to be the ubiquitous low cost 32 bit processor and the Power Architecture (formally PowerPC) which is commonly used in more high-end embedded applications. MIPS is also a possibility. I want to stay away from single-vendor processors like Freescale’s ColdFire. I also don’t believe that C++ has much to offer for eight bit processors. Applications for eight bit processors should be simple enough that they can easily be written in C.

Compliers are more of a problem. GNU’s GCC covers all of the processors that are of interest. I like IAR’s compiler for the ARM though it is not a complete C++ implementation and does not support the Power Architecture. Freescale’s Code Warrior supports the Power Architecture but it’s not clear at this point how well it supports the ARM processors. Green Hills has a complete C++ implementation for all of the processor architectures but their tools are expensive and I have yet to explore obtaining some kind of evaluation copy.

So, my questions to you are: “Did I leave out any processor architectures of interest?” and “Do you know of any other C++ compilers that I should consider?”

• • •
 

May 9, 2007

Data structures for Ethernet MAC bridge

Filed under: Projects, Tools — harvey.sugar @ 8:13 pm

Ok, hardly anyone uses Ethernet bridges anymore. Since almost every application has switched to the Internet protocols and routers have gotten so cheap, MAC layer bridges are an obsolete concept. But it turns out that the data structures and algorithms that I used to manage Ethernet MAC addresses in a remote bridge are useful for number of cache applications, as I found out in an interview today.

This figure shows the application for a remote bridge. The problem is that you don’t want to send traffic over the WAN that doesn’t need to be forwarded to the remote LAN. The solution is a learning bridge. A learning bridge listens to the traffic to determine which MAC addresses are on the local network and which MAC addresses are on the remote network. In addition, you want to throw out any addresses that haven’t been used in a while in case someone moves a computer or maybe just to keep your table of addresses from getting full.

So the first part of the problem is that I need to look up Ethernet MAC addresses quickly so I used a hash table. Hash functions are never perfect and I had to use a relatively small hash table so collisions were likely. I used chaining to resolve collisions. In chaining the hash table is an array of linked lists. I took this approach because I knew that I would have to add and remove entries quickly and using chaining is the simplest way to do this.

Now to handle aging out old MAC addresses: Each entry had a second set of next and previous pointers. So each MAC address was actually on two linked lists. One from it’s position in the hash table and a second linked list was sorted by age. In both cases I used doubly-linked lists so that deletion from either end of a list or from the middle of a list is easy. Here is the data stored in each MAC address entry:

  • MAC Address
  • Local/Remote Flag
  • Time Last Accessed
  • Hash List Next Pointer
  • Hash List Previous Pointer
  • Age List Next Pointer
  • Age List Previous Pointer

A final note about the hash algorithm itself, I found a paper that compared several hash algorithms for MAC addresses. Common types of hash algorithms were used such as CRCs. The study found that simply using the low order N bits of the MAC address was nearly as good as the most complex hash algorithm.

• • •
 

April 12, 2007

Packet demultiplexing in hardware

Filed under: News, Projects, Tools — harvey.sugar @ 3:37 pm

I have to admit that ever since the fad in network processors back in the late ‘90s, I’ve been interested in developing hardware for packet processing. This has been at least part of my motivation to learn a Hardware Description Language such as Verilog or VHDL. I remember a time when logic could only be programmed with a soldering iron and wire but I did design some hardware back in the ‘80s using programmable logic and a language called ABLE. But ABLE is like the HDL version of FORTRAN.

The problem with most of the network processors was that they were trying to solve too many problems at the same time. This was an era when Frame Relay was still in common use and ATM was nearing its peek. Designing hardware that could deal with Frame Relay, ATM, MPLS and the Internet Protocols made network processors so complex that they were impracticably difficult to program. They were also physically large, expensive and power hungry.

Two things have changed since then. The capabilities of programmable logic have grown immensely due to their increased density and speed and the quality of development tools. We have a much smaller set of protocols to deal with since the Internet Protocols now dominate networking. This makes it possible to develop more specialized hardware for specific functions such as packet demultiplexing and header check-summing.

Two things came together that got me thinking along these lines. On my way to the Embedded Systems Conference I read the chapter on packet demultiplexing in Network Algorithmics and a paper about a specific packet demultiplexing scheme called Pathfinder. Pathfinder included a hardware implementation. The other influence was Altera’s booth at the Embedded Systems Conference.

(Disclaimer: my employer is an Altera Certified Design Center Partner). Altera was showing a low cost development kit that they market to schools for logic design courses. The student project was pretty cool. It took two video streams, one from a DVD player and one from a live camera feed and displayed them on one screen with the live feed in the upper left quarter of the screen.

I struck up a conversation with the guy at the booth and he told me about two low cost evaluation kits. One is based on Altera’s Cyclone II FPGA and lists for $150. This kit has the interfaces required if you want to fool around with audio or video signal processing. The second design kit was just announced to support the Cyclone III FPGA. The board includes a high speed interface connector and gobs of RAM. This kit usually lists for $200 but is available at $150 for a short time. Altera has design tools free for download from their website.

What really makes this interesting is that they have a processor core that fits on the Cyclone III and still leaves room for special purpose logic. There are also cores available for Ethernet MAC interfaces. I’m thinking that it could substantially reduce the overhead for processing packets if the MAC interface, packet filtering and check-summing logic and the processor all communicated over on chip busses. The packet headers could be processed in parallel with writing received packets into RAM.

But I’m getting way ahead of myself. I still want to finish a software version before I start designing hardware implementations.

• • •
 

April 9, 2007

Cannonball Update: Packet Demultiplexing

Filed under: Projects, Cannonball — harvey.sugar @ 9:37 pm

I was reading about packet demultiplexing in Network Algorithmics during my flight to San Jose last week. The chapter discussed the advantages of “early demultiplexing” vs the traditional protocol layer-by-layer demultiplexing. In early demultiplexing the packet’s entire path through the protocol stack is determined in one operation as I discussed in Cannonball Update: A Radical TCP/IP Architecture.

Two high-performance packet demultiplexers were discussed in the chapter: Pathfinder and the Dynamic Packet Filter. Of the two implementations, the Dynamic Packet Filter is faster but requires run-time code generation and I feel that this is beyond the scope of my project.

Both of these implementations are based on a data structure called a trie (from retrieval). As shown in the diagram the data is laid out in a tree like structure but the nodes do not contain the keys. Rather a complete key is represented by the position of the node in the trie. In a demultiplexing scheme, each node would contain a header field, a mask, and a bit pattern to match. For example the root of the tree would decode the Ethernet protocol type field.

trie example

Evgeniy Polyakov is working on tries for the Linux TCP/IP stack as he discusses in his blog: Zbr’s days.

Right now I’m designing classes to implement a demultiplexer similar to Pathfinder then I plan to generalize the classes into template classes. These will be the first component in my Networking Protocol Library.

• • •
 

March 26, 2007

Cannonball Update: A Radical TCP/IP Architecture

Filed under: Projects, Cannonball — harvey.sugar @ 5:36 am

I’ve been looking at several papers about TCP/IP stack architectures similar to the one proposed by Van Jacobson (and others, see the x-kernel for example). The basic idea is that packets are de-multiplexed as they enter the system and passed directly to the target application. Then the application thread is responsible for performing all the protocol processing. The application accomplishes the protocol processing by calling library functions for each layer. When packets are transmitted, the application calls library functions to push the protocol headers on to the packet and then the application queues the packet for transmission.

Radical software architecture for TCP/IP

I really like this approach because it illustrates how C++ classes can be used to implement the concepts of a layered protocol while the execution model is completely separate for the layer concept. C++ classes would be used to implement the protocol processing libraries. The de-multiplexer could also use layer-based methods for parsing the packets. So, while I am writing the code I can think of the protocols in a natural way – as protocol layers. When packets are processed by the system, the layers are virtually eliminated from the execution path.

All of the queues in this model have a single source thread and a single destination thread so lock free queues can be used. This would result in a considerable improvement in performance which was Van Jacobson’s goal in proposing this architecture. I believe that C++ makes this architecture easier to implement since each application can instantiate its own objects for the lower level protocols. This allows a natural sharing of protocol code among application threads.

It is a little more complicated than the diagram shows. Several of the protocol layers will need to run their own timer threads and some packets are destine for lower protocol layers. I’ll work out these issues as I move into a detailed design.

• • •
 

March 21, 2007

Cannonball Update: Test Environment

Filed under: News, Projects, Cannonball — harvey.sugar @ 8:49 pm

In other news I have a target system set up.  It’s an old Dell Pentium III running at 800 MHz. with 256 Megabytes of RAM.  I’m running Ubuntu Linux since it is easy to configure a minimal system with Ubuntu.  It’s basically just a shell – no GUI, GCC and friends, FTP, SSH and the Kernel.  The shell and FTP are pretty snappy even on this old machine.  I think it’s a tribute to the performance of the Linux Kernel.

My next step is to set up a separate network for testing my TCP/IP stack without bringing down the rest of my computers and my Internet link.  I plan to put a second NIC card in my Fedora Linux desk-top and into the Ubuntu test machine and put them on their own 100Base-T switch.  I’ve done enough protocol software testing to know that you don’t want to let an experimental protocol stack talk directly on your production network or the Internet.

• • •
 

CannonBall Update: Initializing the Protocol Stack

Filed under: News, Projects, Cannonball — harvey.sugar @ 8:43 pm

I’m still working on the architecture for my TCP/IP stack.  Specifically I’m trying to define a common interface for a protocol layer.

One problem that’s been bugging me with the TCP/IP protocol stack is initialization.  I’ve often worked on embedded systems where initialization was nor well planned and was sort of done ad-hoc.  This leads to all kinds of problems down the road with race conditions or when you add a new feature and find that something you need to be initialized isn’t.

What makes this even more complex for a protocol stack is that each layer needs to know what services are available from the lower layers.  The lower layers don’t know what layers above them are using their services.  You don’t want to hard code this stuff because it makes it harder to add new protocols to the system.

In Linux, the TCP/IP stack is initialized within the Kernel.  The order of initialization is pretty much hard coded but network interfaces can be stopped or started while the system is running.  The problem that I have with Linux is that much of the configuration information that is required to initialize networking is scattered in multiple files in different directories under /etc.  Network applications are started after the TCP/IP stack is running.

What I want to accomplish is to have all the configuration information in some common database and a single overall controller that can start, stop and reconfigure components of the TCP/IP stack.  Once I figure this part out I can define the interface between the controller and a protocol.  Then I can define the interface between protocol layers.

• • •
 
Next Page »