VEHICULAR ad-HOC NETWORKs (VANETs), as a result of today's vehicles equipped with different wireless technology, have been attracting interest for their potential roles in many fields such as emergency, safety, and intelligent transport system. However, the development of a reliable routing protocol to route data packets between vehicles is still a challenging task due to the high mobility, lack of fixed infrastructure, and obstacles. One technique to tackle this challenge is using machine learning. In this paper, we have proposed a protocol applying multi-agent reinforcement learning (MARL) as a technique that enables groups of reinforcement learning agents to solve system optimization problems online in dynamic, decentralized NETWORKs. Our protocol is based on a model-based reinforcement learning method which has a higher convergence speed compared to the model-free one. To form the needed model for MARL, we have developed a Fuzzy Logic (FL) system that evaluates the quality of links between neighbor nodes based on parameters such as velocity and connection quality. The performance of the proposed protocol is studied by extensive simulation with respect to various metrics such as delivery ratio, delay, and overhead. The results obtained show significant improvement of VANETs performance in terms of these metrics.