2014年7月 – 求善之旅

使用ghostscript合并多个pdf文件

最近用chrome从网上打印了一系列文章，于是弄出来了一堆pdf，很想把这些pdf合并成一个pdf，再编辑下书签水印的，很容易弄的有点像个书的样子。

搜索来发现还是ghostscript这个工具比较适合，gentoo上直接emerge即可，其他发行版应该比gentoo更为简单。

从ghostscript的官方网站的这句介绍的话：

Welcome to the Home Page for Ghostscript, an interpreter for the PostScript language and for PDF, and related software and documentation.

可以很容易看出，ghostscript做为PDF解释器，是能够理解PDF的内部构造的，那么合并PDF这样的事情，应该会做的不错。

闲话不多说，安装完ghostscript后，可以用如下命令行来合并一系列的pdf。

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file1.pdf file2.pdf

简单解释一下上面的选项：

-dBATCH 批量处理，做完就乖乖退出
-dNOPAUSE 每页PDF都给我处理好，不要等我再下指令
-q 不必要告诉我你在正确处理
-sDEVICE=pdfwrite 使用内置pdfwrite设备来写文章
-sOutputFile=finished.pdf 合并后的文件名

怎么样，还是比较简单吧？

附：

[1] ghostscript官方主页

关于产品的思考（一）

最近百度一位30岁的副总引起了关注，在c114上甚至看到了有人发了将其与李一男对比谁更厉害的帖子，然而我对论坛口水战没有兴趣，我更关注是他们均随着成就产品的同时而成就了自己，前者搞火了百度贴吧、百度百科，使百度在移动互联网时代取得了不错的份额，后者则是用CC08程控交换机帮老任打了如今华为的坚实基础，人们需要好的产品。

上次我在京东购买了一条3.5mm的耳机插口延长线，其接口应该采用铜金属制成，直径约一厘米，长约三厘米，直到最近我才注意这样设计的好处，原来在桌子上，由于这么多的铜，使得这个东西对桌子的吸附力要比一般的普通延长线要好的多，也就不容易在听歌时掉到地上去，这款产品于我而言，就是很不错的产品，我如果再有类似的需求，也同样会优先考虑这样的产品。

以往在华为工作时，做为大公司里的一名螺丝钉，如果非要套上一个产品的话，我所负责的模块或者项目，可以称得上产品，交付一个少bug稳定的特性，或许就可以称得上交付了一个产品，然而正如那句经典的交换机上80%的特性都是无用特性的话所说，我不知道我所交付的这个所谓的产品，是否能让用户在用到时，有如同我上面的那样的感觉，而在敏感且对可靠性要求严格的路由器产品上，软件的稳定性尤其重要，这样的特性，即使用户不会使用，也将自此融入产品，再不敢轻易删除，从此软件越来越膨胀。

从华为辞职后，在新公司，因为公司规模的缘故，我与之前相比，更容易接近客户了，那么到底怎么样去衡量客户的需求是否合理，我们做出来的东西是否能让客户产生眼前一亮的感觉，惊叹一声这恰好就是我所期望的东西，如何避免上面软件膨胀的问题，对我而言都是具有挑战性的工作，使我越来越觉得如何评估如何做出好的产品，满足客户需要的产品的重要性了。

对于这样的问题，显然是不会现成的答案的，甚至我觉得，不同的领域，不同的客户，对于好的产品的定义，也自然是不同的，比如对于不爱听歌的人，显然无法给出什么样的随身播放器是一款好的产品。

如果要弄清楚这样的问题，首先就要对目标市场有深入的理解，甚至需要对目标市场的主要玩家的产品有深入的理解，真正了解客户目前在怎么用，是否遇到了什么问题，如果有机会的话，需要和客户做更多的交流，甚至在现场去了解，去分析。

其次，要多增加碰撞的机会，真理越辩越明，在相关人之间要着力创建思辩的氛围，在东西未经讨论之前，尽量要避免做老好人的角色，不能唯唯诺诺。

再次，在客户给予条件的时候，要尽早获得客户的反馈，这与敏捷开发的原理相似，借此可以快速迭代，快速优化，尽快让产品接近客户需求本质，避免走弯路。

如今我比以往更渴望，能够做出一款真正有人广泛应用的产品来，让人们的生活更美好，岂不是一件很有成就感的事情么？

AOSA之ZeroMQ阅读笔记

AOSA即《The Architecture of Open Source Application》是本不错的书，这本书的写成本本身也采源了开源社区的协作方式，目前已经出了两部，最新的版本为POSA即《The Performance of Open Source Application》，专注于开源软件的性能。

最近抽时间看了AOSA中关于ZeroMQ的章节，自己先前由于工作需求，简单了解到过ZeroMQ，这次可以借机会读读ZeroMQ创始人亲自写的章节，确实有不少的收获，记录在这里，在后续的项目实践中可以予以参考使用。

Library设计
The lesson here is pretty obvious: Don’t use global state in libraries. If you do, the library is likely to break when it happens to be instantiated twice in the same process.

ZeroMQ设计时经过分析和对比，最终采用了Library而非单一的消息服务器的方案，在设计Library时，得出了上述结论。

也即对于Library而言，最好避免全局状态，采用Context的方式较好，特别是存在library被额外的library依赖在同一程序中存在多份library实例时可以避免带来的竞争性问题。

了解真正的问题
There are many more pitfalls in benchmarking the messaging systems that we won’t go further into. The stress should rather be placed on the lesson learned: Make sure you understand the problem you are solving. Even a problem as simple as “make it fast” can take lot of work to understand properly. What’s more, if you don’t understand the problem, you are likely to build implicit assumptions and popular myths into your code, making the solution either flawed or at least much more complex or much less useful than it could possibly be.

原文附了一个很经典的双向消息交互的图片，解释了吞吐率和时延的评估上的思维误区，用户也许更关注的是从一个单点看到的吞吐率和时延而不是全局吞吐率和时延，这提醒我们要搞清楚到底我们面对的是什么样的问题，以便能够找出解决的办法。

写这个笔记时总让我想起自己这些年工作上的一些感悟，近来听到的一句很经典的话就网络设备商有时候是在自己发明方案，然后再去寻找问题，这样如何能做出真正满足客户需求的产品呢？

内存分配
Lesson learned: optimize where it makes difference. Optimizing pieces of code that are not on the critical path is wasted effort.

要在关键路径上做优化，在非关键路径上瞎优化是在浪费时间。

When thinking about performance, don’t assume there’s a single best solution. It may happen that there are several subclasses of the problem (e.g., small messages vs. large messages), each having its own optimal algorithm.

在做性能考量时，不要假定存在单一最佳方案，很可能一个问题下存在多个子类（如小消息和大消息，ZeroMQ的方案是小消息直接编码在消息句柄中，大消息则采用指针引用，避免内存拷贝），每个子类都有自己的最优算法。

批处理
Lesson learned: To get optimal throughput combined with optimal response time in an asynchronous system, turn off all the batching algorithms on the low layers of the stack and batch on the topmost level. Batch only when new data are arriving faster than they can be processed.

在异步系统中，在底层最好关闭批处理算法而让上层进行批处理操作。并且要按需开启批处理，在处理能力足够的情况，可以不进行批处理而减少开销。

并发处理
Lesson learned: When striving for extreme performance and scalability, consider the actor model; it’s almost the only game in town in such cases. However, if you are not using a specialised system like Erlang or ØMQ itself, you’ll have to write and debug a lot of infrastructure by hand. Additionally, think, from the very beginning, about the procedure to shut down the system. It’s going to be the most complex part of the codebase and if you have no clear idea how to implement it, you should probably reconsider using the actor model in the first place.

在追求高性能和弹性的时候，要考虑使用actor模型，ZeroMQ这里采用了多个线程，线程间采用Event进行通信（印象中好像基于Libevent），从而使得线程可以在CPU核上进行水平扩展，取得极大的并发性能。

这里ZeroMQ的创建中还额外提醒要在设计之初就考虑系统的关闭处理，这通常是系统中最为复杂的地方；以我们的经验来看，我们很多业务进程都不能良好的shutdown，或者说都不支持shutdown…

无锁算法
Lesson learned: Lock-free algorithms are hard to invent, troublesome to implement and almost impossible to debug. If at all possible, use an existing proven algorithm rather than inventing your own. When extreme performance is required, don’t rely solely on lock-free algorithms. While they are fast, the performance can be significantly improved by doing smart batching on top of them.

尽可能采用已知的无锁算法，避免自己造轮子发明一个（前面在公司我在开发一个软转发时，就参考了一个网上搜索到的无锁ring queue算法，实践证明比我自己想一个要好用的多，性能也比较好），另外，在无锁算法上再加上一些智能的批处理机制，会取得更大的性能提升。

API设计
Lesson learned: While code reuse has been promoted from time immemorial and pattern reuse joined in later on, it’s important to think of reuse in an even more generic way. When designing a product, have a look at similar products. Check which have failed and which have succeeded; learn from the successful projects. Don’t succumb to Not Invented Here syndrome. Reuse the ideas, the APIs, the conceptual frameworks, whatever you find appropriate. By doing so you are allowing users to reuse their existing knowledge. At the same time you may be avoiding technical pitfalls you are not even aware of at the moment.

这个经验与上面的类似，本质上就是尽可能减少造轮子，ZeroMQ参考BSD socket的API设计是非常成功的，从而使得用户学习起来非常容易；这就如同在通信业务里面，采用CISCO风格的CLI，显然用户会更容易上手一些。

附：

[1] The Architecture of Open Source Application

[2] WIKI百科上的Actor模型