Dependence of CNN computation time on the filter size and the number of fully connected layer units

The computation time as well as the accuracy of a Covolutional Neural Network pretty much depends on the architectural parameters you set and even minute looking changes can produce vastly differing results. This is what I experienced while doing a project on Facial Keypoints Detection ( a Kaggle.com challenge) using Cascaded Convolutional Neural nets. Our architecture consisted of a three-level cascaded CNN ( three sets of each convolution layer followed by a max-pooling layer ) followed by two fully connected hidden layers and the final output layer as shown in the figure below. Architecture of the three-level cascaded CNN

The above is similar to a very common architecture in Image processing using neural nets and has previously been used by many researchers. If we increase the filter sizes in the three convolution layers to (4x4 , 3x3 and 2x2 respectively ) we get a computation time reduction to one third of what was required in the previous architecture. Increasing the number of nodes in hidden layers doesn't cost much in terms of computation resources and helps build accuracy in addition.

A detailed report regarding the project is here http://cse.iitk.ac.in/users/cs365/2015/_submissions/abheet/report.pdf