Wednesday, January 14, 2009

Protocol Buffers

Protocol buffers is another way to serialize/deserialize data.

It was written by Google and supports Java, C++, Perl, and Python.

There website is at: http://code.google.com/p/protobuf/

I'm planning on trying this out for a c++ project I maintain where we need blindingly fast ( as fast we can get ) serialization, and de-serialization performance.

Once I've played with it, I'll post more information.

Tuesday, January 13, 2009

Flyweight Pattern

Flyweight design pattern is a software design pattern used to minimize memory usage by sharing data. It enables use of large number of objects that would typically require a lot of memory.

A common example of the Flyweight pattern is string pooling. Consider the Java programming language. The String data type is immutable. Because it is guaranteed that the string can never be changed the strings are pooled to ensure that only one instance exists in memory at any given time.

So if you create two strings s1 and s2 which both point to "foo" you really have two pointers to the same location in memory.

Java also employs the Flyweight pattern for Integer object. new Integer(0) actually returns a pointer to pre-constructed object. So if you create one-thousand Objects which each contain an Object of type Integer you will only have one Integer(0) which is an excellent way to save memory.

While your writing your programs consider if you could use the Flyweight pattern to save memory. An example I used was for a Object that contained three Strings. This tuple uniquely identified a configuration of runs that were stored in the database. The Strings were loaded via JDBC so they didn't get the String pooling provided by Java. Instead I made the constructor to the class private and exposed a public static method called 'get'. This method took the three strings and created the object if not already existing and returned a pointer. This cut my memory usage drastically.

Here is an example of my use of the Flyweight pattern.


class Product {
private Product() { } // don't let anyone explicitly create this object.
private static Map products = new HashMap();
public get(String arch, String chip, String config) {
Product p = products.get(arch + chip + config).hashCode());
if (p == null)
products.put(arch + chip + config).hashCode(),new Product(arch,chip,config));
return products.get(arch + chip + config).hashCode());
}
}


If your using c++ you should look at the boost flyweight class. It hides the implementation and makes adding a flyweight to your c++ code trivial. http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/doc/index.html

Counting code lines.

I was curious how many lines of code my Java application had. I found this script very useful in finding out.

Here it is, in case it is helpful:


find . -name "*java" -exec wc -l {} \; | grep -v "/test/" | awk '{print sum+=$1} END {print "Sum: "sum}'


First it finds all files with the extension java (change to cpp, or pl, or py if you need to). It runs wc -l on each file to count the lines.

Then it removes my test directory, you can use -vE for a regular expression of items to remove or remove that command all together.

I then use awk to sum up the file line count and print out the increasing amount as well as the sum at the end.

Hope that helps.

Monday, January 12, 2009

Google Chrome planned Mac rease middle of 2009

The popular Google Chrome browser is planned to be released for Mac in the middle of 2009. This should bring competition for the current default browser, Safari.

Google Chrome is also planning to add pluggable extensions which Mozilla's firefox has had for sometime.

These two changes should make Google Chrome an even more likely choice for consumers. In the end more competition is always good and Google's entry into the market will help keep the innovations coming.

Boost scoped lock

I have been very impressed with the Boost libraries. Each day I'm finding more and more uses of Boost that make C++ programming easier.

My latest enjoyment comes from boost::mutex::scoped_lock.

scoped_lock follows the RAII (Resource Allocation Is Initialization) pattern. When the lock is constructed it holds onto the mutex until it is destructed.

Here is a simple example:


class Foo
{
public:
void bar()
{
// Will grab the resource or wait until free
::boost::mutex::scoped_lock lock(m_mutex);
//Critical section

// No need to unlock the lock will do that itself.
}
private:
boost::mutex m_mutex;
}

One word of caution. Up to Boost 1.36 the behavior of the mutex was different on Windows vs Linux. On Windows the lock would be recursive, that is you could call another method and enter it your thread already held that lock. Whereas on Linux that would cause a deadlock. The 1.37 version has fixed this issue and the both behave like the Linux version.

Wednesday, January 7, 2009

Pro's and Con's of returing references

I've recently began working on a moderate size C++ application. The previous developer loved the benefits of returning const&.

For those who don't know returning const& can avoid an extraneous copy. Example:


class Foo {
Foo(std::string s) { m_name = s; }
std::string m_name;
public:
std::string const& name() { return m_name; }
}


The above code avoids the copy from m_name to the function return value that would have resulted with the following client code:


Foo f("bar");
std::string name = f.name();



After working in this code base for a while now I believe that returning references are evil and should be treated just like returning a pointer, which is avoid it.

For example the problem that arose that took a week to debug was the following:


class Foo {
std::vector< Bar > m_vec;
public:
void insert(Bar& b) { m_vec.push_back(b); }
Bar const& getById(int id) { return m_vec[id]; }
}


The problem in this example is clients are calling and getting references that are stored in the vector. Now what happens after clients insert a bunch of new elements? The vector needs to resize internally and guess what happens to all those references? That's right there invalid. This caused a very hard to find bug that was simply fixed by removing the &.

The conclusion I've arrived at is that returning a & is a pre-mature optimization. If your profiling shows you your code is very slow at this point than maybe consider it. But you should also be aware that most modern compilers optimize away this extra copy anyways.

Here is a contrived example that shows problems you can get into by blindly returning references. My advice avoid them until your profiler tells you that you have a problem.

#include
#include
#include

using namespace std;

class foo {
std::string m_string;
public:
std::string const getName(){
static int count = 0;
count++;
std::stringstream ss;
ss << count ;
m_string = ss.str();
return m_string;
}
};
foo f;
void fun(std::string const&s) {
f.getName();
std::cout << s;
}
int main() {
std::cout << " print version " << std::endl;
fun(f.getName());
}


Hope that Helps