Embedded Systems Software, Computer Networking and Geeky Fun

nerd1951.com

March 30, 2007

Creative Commons License

Filed under: News, Rants — harvey.sugar @ 8:31 pm

If you are very observant you may have noticed that there is something new in the second column of this page: The Creative Commons License. It’s right above the copyright notice.

Basically this license states that it’s ok to copy my stuff, mix it up with your own stuff and display it or publish it in any way you like as long as you attribute my stuff to me. The easiest way to do this is to use a link to my web page.

I decided to use and display this license because I want to share my ideas and opinions freely. If anyone finds anything on my website interesting enough to re-publish it, that’s great. But my writing still belongs to me. That’s why there is a copyright notice on the page. Without the creative commons license your rights to republish my stuff is pretty restricted and I don’t want that.

As long as I’m talking about attribution, I have to note that I got the idea and the link to the creative commons license from the XKCD web site.

• • •
 

How to get 500 hits in one day

Filed under: News, Geeky Fun — harvey.sugar @ 8:13 pm

I have to admit that I’m vain enough to check my web stats every day on Google Analytics. I usually get no more than ten or twelve hits on a good day. On the weekends I usually get just one or two. Most of you people must be reading this stuff at work when you should be grinding out code.

Imagine my surprise when I found out that I had almost 500 hits today. My initial reaction was that this had to be related to some kind of spam attack or other web nonsense. Then I looked at the source of all this traffic. Almost all of my web visitors came here from the Dilbert Blog and I knew the reason why.

Today, Scott Adams was talking about April Fools Jokes and asked his readers to comment on their own favorite pranks. A while back I talked about a harmless yet evil stunt that I pulled off in middle school. I thought Scott’s audience would appreciate my evil genius stunt so I posted a link to my blog on the Dilbert Blog. I guess I did an April Fools Joke on myself. Well at least four or five hundred of his readers were curious enough to come and have a look. Most of them, like over 95%, didn’t find my blog interesting enough to look around at any other pages. I guess that proves that there are really only about a dozen people in the world who think the stuff I write about is interesting…. Sigh…..

• • •
 

March 28, 2007

Learning by Teaching

Filed under: News, Rants — harvey.sugar @ 9:50 pm

I was a Boy Scout leader for a number of years and the part I enjoyed most was teaching outdoor skills to the scouts and new leaders. We had a motto about learning: “Watch it, do it, teach it.” Meaning to really learn a new skill you should get instruction, put what you learned to use, then teach the skill to someone else.

I’m always trying to learn new skills and technology and this is one reason I love to teach. When I wanted to learn Java a few years ago, I knew that I would not invest the time unless someone gave me a project to do and no one was about to give me a Java project until I learned Java. I looked at the typical professional seminars on Java programming but I feel like most of these are a waste of time. Unless you have some projects to do you won’t learn a programming language.

My solution was to take a couple of intro computer science classes at Marayland University’s night school that were taught using Java: Introduction to Programming and Data Structures. Classes like this force you to do lots of little simple projects and the data structures class introduced the Java container classes. Then the instructor for the data structures class invited me to T.A. for him which I did for two semesters. Why would I spend three or four hourrs a week for two semesters to earn $600? Because, by the end of the two semesters I had explained Java concepts so many times that I could code it in my sleep.

I learned C++ back before the standards were written, when templates were new, before the STL and before name spaces. I program in C++ all the time but I don’t use all the modern language feature effectively. So what am I doing about it? I’m teaching an in-house C++ course at work that includes all the modern C++ features. You can bet that I’ll know this stuff much better by the time the class is over.

• • •
 

March 26, 2007

Cannonball Update: A Radical TCP/IP Architecture

Filed under: Projects, Cannonball — harvey.sugar @ 5:36 am

I’ve been looking at several papers about TCP/IP stack architectures similar to the one proposed by Van Jacobson (and others, see the x-kernel for example). The basic idea is that packets are de-multiplexed as they enter the system and passed directly to the target application. Then the application thread is responsible for performing all the protocol processing. The application accomplishes the protocol processing by calling library functions for each layer. When packets are transmitted, the application calls library functions to push the protocol headers on to the packet and then the application queues the packet for transmission.

Radical software architecture for TCP/IP

I really like this approach because it illustrates how C++ classes can be used to implement the concepts of a layered protocol while the execution model is completely separate for the layer concept. C++ classes would be used to implement the protocol processing libraries. The de-multiplexer could also use layer-based methods for parsing the packets. So, while I am writing the code I can think of the protocols in a natural way – as protocol layers. When packets are processed by the system, the layers are virtually eliminated from the execution path.

All of the queues in this model have a single source thread and a single destination thread so lock free queues can be used. This would result in a considerable improvement in performance which was Van Jacobson’s goal in proposing this architecture. I believe that C++ makes this architecture easier to implement since each application can instantiate its own objects for the lower level protocols. This allows a natural sharing of protocol code among application threads.

It is a little more complicated than the diagram shows. Several of the protocol layers will need to run their own timer threads and some packets are destine for lower protocol layers. I’ll work out these issues as I move into a detailed design.

• • •
 

March 21, 2007

Cannonball Update: Test Environment

Filed under: News, Projects, Cannonball — harvey.sugar @ 8:49 pm

In other news I have a target system set up.  It’s an old Dell Pentium III running at 800 MHz. with 256 Megabytes of RAM.  I’m running Ubuntu Linux since it is easy to configure a minimal system with Ubuntu.  It’s basically just a shell – no GUI, GCC and friends, FTP, SSH and the Kernel.  The shell and FTP are pretty snappy even on this old machine.  I think it’s a tribute to the performance of the Linux Kernel.

My next step is to set up a separate network for testing my TCP/IP stack without bringing down the rest of my computers and my Internet link.  I plan to put a second NIC card in my Fedora Linux desk-top and into the Ubuntu test machine and put them on their own 100Base-T switch.  I’ve done enough protocol software testing to know that you don’t want to let an experimental protocol stack talk directly on your production network or the Internet.

• • •
 

CannonBall Update: Initializing the Protocol Stack

Filed under: News, Projects, Cannonball — harvey.sugar @ 8:43 pm

I’m still working on the architecture for my TCP/IP stack.  Specifically I’m trying to define a common interface for a protocol layer.

One problem that’s been bugging me with the TCP/IP protocol stack is initialization.  I’ve often worked on embedded systems where initialization was nor well planned and was sort of done ad-hoc.  This leads to all kinds of problems down the road with race conditions or when you add a new feature and find that something you need to be initialized isn’t.

What makes this even more complex for a protocol stack is that each layer needs to know what services are available from the lower layers.  The lower layers don’t know what layers above them are using their services.  You don’t want to hard code this stuff because it makes it harder to add new protocols to the system.

In Linux, the TCP/IP stack is initialized within the Kernel.  The order of initialization is pretty much hard coded but network interfaces can be stopped or started while the system is running.  The problem that I have with Linux is that much of the configuration information that is required to initialize networking is scattered in multiple files in different directories under /etc.  Network applications are started after the TCP/IP stack is running.

What I want to accomplish is to have all the configuration information in some common database and a single overall controller that can start, stop and reconfigure components of the TCP/IP stack.  Once I figure this part out I can define the interface between the controller and a protocol.  Then I can define the interface between protocol layers.

• • •
 

March 17, 2007

Cannonball Update: Notes on Network Stack Architectures

Filed under: News, Projects, Cannonball — harvey.sugar @ 11:25 pm

I’ve started working on an architecture for my TCP/IP stack and I am deep into designing the abstract base class for the protocol layers.  Many network researchers advise against implementing protocols in the same layers as the protocols are specified.  The conventional wisdom is that this approach hurts performance.

I believe this stems from the original TCP/IP implementation done at Berkley in the early 1980s.  The Berkeley TCP/IP stack was widely distributed with BSD Unix and was the reference implementation for TCP/IP.  One major performance issue in the original BSD Unix TCP/IP stack was that each protocol layer was implemented as a separate process.  This approach burned up a lot of CPU cycles and as a result the BSD TCP/IP implementation was very inefficient.

The problem with the Berkley approach was that they used an operating system construct to do conceptual partitioning.  Conceptual partitioning is the method you use to divide a complex problem into manageable parts with each part representing a concept.  We do this all the time in software engineering, starting with dividing a system into subsystems, then modules or classes and finally individual functions.  When networking protocols were being designed, protocol layers provided a conceptual framework for partitioning networking functions.  This is the origin of the ISO seven layer model for networking protocols or the five layers used in TCP/IP.

The performance problem is not from partitioning the software into same protocol layers.  The issue is the mechanism you use for dividing the software into layers.  Processes or threads are not designed to be used for conceptual partitioning.

C++ classes on the other hand provide an ideal mechanism for conceptual partitioning.  This is what classes and objects were designed to do.  I can design an abstract protocol layer base class and assign networking functions to each concrete layer class just as the protocol layers are specified.  At the same time I can design the threading model for the networking protocol stack to obtain maximum performance and efficiency.

• • •
 

March 13, 2007

Cannonball Update: Moving Beyond Benchmarks

Filed under: News, Cannonball — harvey.sugar @ 7:29 pm

Now I remember why I don’t like benchmarks much. Benchmark results are often hard to interpret as I showed with the state machine measurements. Measuring one small feature or execution path out of the context of an application does not provide meaningful results. The proper way to optimize a system is to start with a solid design, build the system, instrument the system to find the bottle necks and optimize those areas. Then you measure again to determine if the optimization was effective. So I’m changing direction yet again and starting to design my TCP/IP stack using modern C++ concepts.

Designing and implementing a TCP/IP stack is a big job for one person to take on part time as I’ve noted in the past.  Here are the tactics I’ll use to make this feasible:

  • Use modern C++ design techniques including Generic Programming and Design patterns.  One of the major goals of this exercise is to show how using modern C++ programming makes the programmer or software engineer more productive without sacrificing performance.
  • Narrow the functionality.  I’ll only implement the minimum that I feel is essential for providing realistic performance results: IP, ICMP, UDP and a TFTP application that stores data in RAM.
  • Use Linux as both the development environment and the execution environment.  Linux provides a very productive development environment that includes the C++ Standard Library, the GNU C++ compiler GCC, and the Eclipse IDE.  Linux also provides tools for profiling and measuring code performance and tools for measuring network performance.  Finally, I can measure my TCP/IP stack against the Linux TCP/IP stack so I can do an almost apples-to-apples performance comparison.

The socket interface in Linux provides access to raw layer 2 (Ethernet) packets.  This is done by creating a packet socket:

packet_socket = socket(PF_PACKET, int socket_type, int protocol);

by specifying the protocol family to be PF_PACKET and the socket_type to be SOCK_RAW (see packet(7) - Linux man page for incomplete details). This socket option is what is used by sniffer programs to gain access to raw packets.  This also allows a TCP/IP stack to be implemented as a user space application.  This approach is the mainstream implementation method for research TCP/IP stacks.

I don’t plan to stop there with yet another research TCP/IP stack.  I already have some plans for the release beyond Cannonball.  My plan includes porting the TCP/IP stack to two or three embedded processors, ARM, PowerPC, and ColdFire.  I also plan to fill out the stack in the next release by implementing TCP and some additional protocols.

• • •
 

March 9, 2007

Who is Steve Healy and why is he in my Blogroll?

Filed under: News — harvey.sugar @ 7:10 pm

I don’t know what Steve Healy does for his day job but when he is not at work he develops game software and blogs about it. I’ve actually read quite a few game developer blogs lately because, as a group, they seem to be among the few who like myself are excited enough about computers to work on software in their free time. I especially like Steve’s blog because he talks a lot about motivation, maintaining focus and overcoming mental obstacles in programming. I have found his advice and insights valuable not only in my personal programming projects but at work too.

• • •
 

March 7, 2007

Cannonball Update: Some State Pattern Results (Abstraction Costs)

Filed under: News, Cannonball — harvey.sugar @ 10:22 pm

I’ve finished a very simple comparison of two state machine implementations; the State Design Pattern as described in the “Gang of Four” Design Patterns book and a C style state machine using switch statements.  I’ve taken a minimalist approach because I’m busy with studying C++ Templates and getting ready for the Embedded Systems Conference.  If you want to see the code you can download it from here:

StatePattern.tar.gz
StatePattern.zip
StateSwitch.tar.gz
StateSwitch.zip

To be brief, I’ve stripped away almost all the code except the state transitions themselves.  My goal is to measure the cost of using C++ virtual functions.  I had each implementation process one billion random events each of which cause one state transition.  The results were that the State Pattern implementation averaged about 45 nanoseconds per event and the C style implementation averaged about 23 nanoseconds on a 2 GHz. Pentium 4 running Fedora Core Linux.

There are two ways of interpreting these results.  I want to be careful here because there is so much mythology about the cost of virtual functions in the embedded systems community.  Some embedded systems programmers will totally avoid using virtual functions or any kind of class hierarchies because of the overhead incurred.

So you could look at this and say “Look, using virtual functions took TWICE as long as using the C style implementation!” But on the other hand, we are talking about 22 nanoseconds.  Let’s keep that in perspective.  Even if we scale that for the clock rate of a typical embedded processor running at say 60 MHz. the difference is still under half a microsecond and this ignores the fact that most embedded processors have more registers than an x86 processor so they could gain some additional performance in handling virtual functions. 

There are plenty of better places in a program to pick up performance gains without resorting to ignoring major language features.  As Knuth said “Early optimization is the root of all (programming) evil.”  Note also that I’m not talking about all embedded systems here.  I wouldn’t suggest using C++ and all of it’s features in a small embedded system based on a PIC or an Atmel AVR processor for example.  C++ is designed for larger more complex tasks and there are plenty of embedded systems that fall into that catagory.

I’ve ordered my NetBurner MOD5270LC Development Kit which is based on a 100 MHz. Coldfire Processor.  I’ll be able to get a more concrete result for a typical embedded system when I run the same tests on that.

• • •
 
Next Page »