The ability to grasp objects is one of the basic functions of modern industrial robots. In this article, the focus is placed on a system for processing the image provided by a robot visual perception system leading to the detection of objects grasping points. The proposed processing system is based on a multi-step method using convolutional neural networks (CNN). The first step is to use the first CNN to transform the input image into a schematic image with labeled objects centers of gravity, which then serves as a supporting input to the second CNN. In this second CNN, original input and supporting input images are used to obtain a schematic image containing the grasping points of the objects. This solution is further compared with a network providing grasping points directly from the input image. As a result, the proposed method provided a 0.7% improvement in the average intersection over union for all of the models.