Freewind @ Thoughtworks scala java javascript dart 工具 编程实践 月结 math python english [comments admin] [feed]

(2014-01-25) Java内置序列化与Gson性能比较

广告: 云梯:翻墙vpn (省10元) 土行孙:科研用户翻墙http proxy (有优惠)

昨天与同事讨论问题,我犯了一个想当然的错误:我认为Java内置的Serialization机制(即java.io.Serializable)的性能要比Gson快很多,因为前者只是向流中写入或读出一些数据,而后者还需要对字符串进行词法分析等。

今天写了一个测试之后,才发现事情并不是这样的,或者说,事情并不是这么简单。上代码。

首先定义了一个java bean,名为User,它实现了java.io.Serializable接口:

class User implements Serializable {
    private String aaa;
    private String bbb;
    private String ccc;
    private String ddd;
    private String eee;
    private String fff;
    private String ggg;

    // setters and getters for the fields
}


相应的一个`createUser`方法:

private static User createUser() {
    User user = new User();
    user.setAaa(newValue());
    user.setBbb(newValue());
    user.setCcc(newValue());
    user.setDdd(newValue());
    user.setEee(newValue());
    user.setFff(newValue());
    user.setGgg(newValue());
    return user;
}

private static String newValue() {
    return "" + System.currentTimeMillis();
}


下面是使用java内置序列化机制循环1000000万次:

long start = System.currentTimeMillis();
for (int i = 0; i < MAX_LOOP; i++) {
    ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
    ObjectOutputStream stream = new ObjectOutputStream(byteStream);
    stream.writeObject(createUser());
    byte[] binary = byteStream.toByteArray();

    ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(binary);
    ObjectInputStream input = new ObjectInputStream(byteArrayInputStream);
    input.readObject();
}
long end = System.currentTimeMillis();
System.out.println("Serialization cost: " + (end - start) + "ms");


结果是:

Serialization cost: 12339ms


然后是Gson版:

long start = System.currentTimeMillis();
Gson gson = new Gson();
for (int i = 0; i < MAX_LOOP; i++) {
    String json = gson.toJson(createUser());
    gson.fromJson(json, User.class);
}
long end = System.currentTimeMillis();
System.out.println("Gson cost: " + (end - start) + "ms");


结果是:

Gson cost: 3971ms


Gson的性能居然是Java内置序列化的3倍!

这个结果让我非常意外,猜想是不是默认序列化需要用到如反射之类的方式去取值才导致变慢呢?于是修改代码,在User类中增加了定制的内容:

private void readObject(ObjectInputStream stream) throws IOException, ClassNotFoundException {
    aaa = stream.readUTF();
    bbb = stream.readUTF();
    ccc = stream.readUTF();
    ddd = stream.readUTF();
    eee = stream.readUTF();
    fff = stream.readUTF();
    ggg = stream.readUTF();
}

private void writeObject(ObjectOutputStream stream) throws IOException {
    stream.writeUTF(aaa);
    stream.writeUTF(bbb);
    stream.writeUTF(ccc);
    stream.writeUTF(ddd);
    stream.writeUTF(eee);
    stream.writeUTF(fff);
    stream.writeUTF(ggg);
}


再重新测试,结果并没有好多少:

Serialization cost: 10793ms


又搜了一下资料,得知内置的序列化还需要处理与对象自身的一些信息,而这些信息对我们来说是没有用的。我们可以用Externalizable代替Serializable接口,如下:

class User implements java.io.Externalizable {
    // add new code to implement the methods

    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeUTF(aaa);
        out.writeUTF(bbb);
        out.writeUTF(ccc);
        out.writeUTF(ddd);
        out.writeUTF(eee);
        out.writeUTF(fff);
        out.writeUTF(ggg);
    }

    @Override
    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
        aaa = in.readUTF();
        bbb = in.readUTF();
        ccc = in.readUTF();
        ddd = in.readUTF();
        eee = in.readUTF();
        fff = in.readUTF();
        ggg = in.readUTF();
    }
}


测试代码如下:

private static void tryExternal() throws Exception {
    long start = System.currentTimeMillis();
    for (int i = 0; i < MAX_LOOP; i++) {
        User user = createUser();
        ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
        ObjectOutputStream stream = new ObjectOutputStream(byteStream);
        user.writeExternal(stream);
        stream.flush();
        byte[] binary = byteStream.toByteArray();

        ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(binary);
        ObjectInputStream input = new ObjectInputStream(byteArrayInputStream);
        user.readExternal(input);

    }
    long end = System.currentTimeMillis();
    System.out.println("External cost: " + (end - start) + "ms");
}


结果:

External cost: 2740ms


这次很快,总算打败了Gson,代价就是需要自己手动维护写和读,这样的代码维护起来很麻烦,而Gson那边是不需要的。而且,性能差的也不远。

继续google,又发现了一个序列化相关的对比,而一个叫Kryo的开源项目提供的,参看网页:https://github.com/eishay/jvm-serializers/wiki

从里面可以看到,性能好坏与是否使用json无关,还是要看具体实现。比如阿里提供的fastjson的性能就远远好于其它,再比如排名第一的keyo(王婆卖瓜:)使用了二进制。

好奇之下,又测了一下它们两者的性能:

private static void tryKryo() {
    long start = System.currentTimeMillis();
    Kryo kryo = new Kryo();
    for (int i = 0; i < MAX_LOOP; i++) {
        // ...
        ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
        Output output = new Output(byteOut);
        User user = createUser();
        kryo.writeObject(output, user);
        output.close();
        byte[] bytes = byteOut.toByteArray();
        // ...
        ByteArrayInputStream byteInput = new ByteArrayInputStream(bytes);
        Input input = new Input(byteInput);
        kryo.readObject(input, User.class);
        input.close();
    }
    long end = System.currentTimeMillis();
    System.out.println("Kryo cost: " + (end - start) + "ms");
}


结果为:

Kryo cost: 3551ms


比Gson略好,但比Externalizable略差。不过它不需要手动维护写入读出的实现,所以还是比较方便的。

再看fastjson:

private static void tryFastjson() {
    long start = System.currentTimeMillis();
    for (int i = 0; i < MAX_LOOP; i++) {
        String json = JSON.toJSONString(createUser());
        JSON.parseObject(json, User.class);
    }
    long end = System.currentTimeMillis();
    System.out.println("Fastjson cost: " + (end - start) + "ms");
}


结果让人非常振奋:

Fastjson cost: 1691ms

把其它对手都远远的抛在了后面,温少好样的~~

上面的测试代码虽然比较粗糙,但还是可以感受到之间性能的差别。

作为结论,我觉得如果使用合适的库,我们的确可以使用json作为序列化的中间格式,既高效,又易读。

参考资料:

comments powered by Disqus