• Alizadeh, Milad, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, and Max Welling. “Gradient \(\ell_1\)Regularization for Quantization Robustness.” In International Conference on Learning Representations, 2020." new
      title = {Gradient \(\ell_1\) Regularization for Quantization Robustness},
      author = {Alizadeh, Milad and Behboodi, Arash and van Baalen, Mart and Louizos, Christos and Blankevoort, Tijmen and Welling, Max},
      booktitle = {International Conference on Learning Representations},
      year = {2020}
    We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for “on the fly” post-training quantization to various bit-widths. We show that by modeling quantization as a \(\ell_∞\)-bounded perturbation, the first-order term in the loss expansion can be regularized using the \(\ell_1\)-norm of gradients. We experimentally validate our method on different architectures on CIFAR-10 and ImageNet datasets and show that the regularization of a neural network using our method improves robustness against quantization noise.
  • Alizadeh, Milad, Javier Fernández-Marqués, Nicholas D Lane, and Yarin Gal. “An Empirical Study of Binary Neural Networks’ Optimisation.” International Conference on Learning Representations, 2019."
      title = {An Empirical study of Binary Neural Networks' Optimisation},
      author = {Alizadeh, Milad and Fern{\'a}ndez-Marqu{\'e}s, Javier and Lane, Nicholas D and Gal, Yarin},
      booktitle = {International Conference on Learning Representations},
      year = {2019}
    Binary neural networks using the Straight-Through-Estimator (STE) have been shown to achieve state-of-the-art results, but their training process is not well-founded. This is due to the discrepancy between the evaluated function in the forward path, and the weight updates in the back-propagation, updates which do not correspond to gradients of the forward path. Efficient convergence and accuracy of binary models often rely on careful fine-tuning and various ad-hoc techniques. In this work, we empirically identify and study the effectiveness of the various ad-hoc techniques commonly used in the literature, providing best-practices for efficient training of binary models. We show that adapting learning rates using second moment methods is crucial for the successful use of the STE, and that other optimisers can easily get stuck in local minima. We also find that many of the commonly employed tricks are only effective towards the end of the training, with these methods making early stages of the training considerably slower. Our analysis disambiguates necessary from unnecessary ad-hoc techniques for training of binary neural networks, paving the way for future development of solid theoretical foundations for these. Our newly-found insights further lead to new procedures which make training of existing binary neural networks notably faster.
  • Fernández-Marqués, Javier, Milad Alizadeh, Vincent W-S Tseng, Sourav Bhattachara, and Nicholas D Lane. “On-the-Fly Deterministic Binary Filters for Memory Efficient Keyword Spotting Applications on Embedded Devices.” In Proceedings of the 2nd International Workshop on Embedded and Mobile Deep Learning, 13–18. ACM, 2018."
      title = {On-the-fly deterministic binary filters for memory efficient keyword spotting applications on embedded devices},
      author = {Fern{\'a}ndez-Marqu{\'e}s, Javier and Alizadeh, Milad and Tseng, Vincent W-S and Bhattachara, Sourav and Lane, Nicholas D},
      booktitle = {Proceedings of the 2nd International Workshop on Embedded and Mobile Deep Learning},
      pages = {13--18},
      year = {2018},
      organization = {ACM}
    Lightweight keyword spotting (KWS) applications are often used to trigger the execution of more complex speech recognition algorithms that are computationally demanding and therefore cannot be constantly running on the device. Often KWS applications are executed in small microcontrollers with very constrained memory (e.g. 128kB) and compute capabilities (e.g. CPU at 80MHz) limiting the complexity of deployable KWS systems. We present a compact binary architecture with 60% fewer parameters and 50% fewer operations (OP) during inference compared to the current state of the art for KWS applications at the cost of 3.4% accuracy drop. It makes use of binary orthogonal codes to analyse speech features from a voice command resulting in a model with minimal memory footprint and computationally cheap, making possible its deployment in very resource-constrained microcontrollers with less than 30kB of on-chip memory. Our technique offers a different perspective to how filters in neural networks could be constructed at inference time instead of directly loading them from disk.
  • Tseng, Vincent WS, Sourav Bhattachara, Javier Fernández-Marqués, Milad Alizadeh, Catherine Tong, and Nicholas D Lane. “Deterministic Binary Filters for Convolutional Neural Networks.” International Joint Conferences on Artificial Intelligence Organization, 2018."
      title = {Deterministic binary filters for convolutional neural networks},
      author = {Tseng, Vincent WS and Bhattachara, Sourav and Fern{\'a}ndez-Marqu{\'e}s, Javier and Alizadeh, Milad and Tong, Catherine and Lane, Nicholas D},
      year = {2018},
      organization = {International Joint Conferences on Artificial Intelligence Organization}
    We propose Deterministic Binary Filters, an approach to Convolutional Neural Networks that learns weighting coefficients of predefined orthogonal binary basis instead of the conventional approach of learning directly the convolutional filters. This approach results in model architectures with significantly fewer parameters (4x to 16x) and smaller model sizes (32x due to the use of binary rather than floating point precision). We show our deterministic filter design can be integrated into well-known network architectures (such as ResNet and SqueezeNet) with as little as 2% loss of accuracy (under datasets like CIFAR-10). Under ImageNet, they result in 3x model size reduction compared to sub-megabyte binary networks while reaching comparable accuracy levels.
  • Dziyauddin, Rudzidatul Akmam, Dritan Kaleshi, Angela Doufexi, and Milad Alizadeh. “Performance Evaluation of Quality of Service for Joint Packet Dropping and Scheduling.” Wireless Personal Communications 83, no. 2 (2015): 1549–66."
      title = {Performance evaluation of quality of service for joint packet dropping and scheduling},
      author = {Dziyauddin, Rudzidatul Akmam and Kaleshi, Dritan and Doufexi, Angela and Alizadeh, Milad},
      journal = {Wireless Personal Communications},
      volume = {83},
      number = {2},
      pages = {1549--1566},
      year = {2015},
      publisher = {Springer}
    Quality of Service is particularly necessary to serve delay-sensitive applications in heavy-loaded wireless networks. In this paper we evaluate a strategy of combining packet dropping and scheduling policies at Medium Access Control layer in guaranteeing maximum packet latency for real-time applications. The purpose of this work is to evaluate how significance the mentioned combination schemes can meet the required latency and also the achievable system throughput. For the case study, a real time Polling Service class in the Worldwide Interoperability for Microwave Access System for downlink transmission is assumed. The main analysis is undertaken for User Datagram Protocol (UDP) traffic in stationary and mobile user scenarios under heavy load conditions, and the impact of mixed Transmission Control Protocol and UDP traffic is also investigated. Results show that the introduction of a packet dropping policy ensures that the latency is kept well within the required maximum latency requirement, regardless of the types of scheduler used. However, the packet drop percentage (or packet loss) depends strongly on the types of schedulers. All schedulers show similar goodput performance for low load conditions, and the results can only be distinguished for the cases of heavy load/overloaded conditions.
  • Alizadeh, Milad, Rudzidatul Akmam Dziyauddin, Dritan Kaleshi, and Angela Doufexi. “A Comparative Study of Mixed Traffic Scenarios for Different Scheduling Algorithms in WiMAX.” In 2012 IEEE 75th Vehicular Technology Conference (VTC Spring), 1–6. IEEE, 2012."
      title = {A comparative study of mixed traffic scenarios for different scheduling algorithms in WiMAX},
      author = {Alizadeh, Milad and Dziyauddin, Rudzidatul Akmam and Kaleshi, Dritan and Doufexi, Angela},
      booktitle = {2012 IEEE 75th Vehicular Technology Conference (VTC Spring)},
      pages = {1--6},
      year = {2012},
      organization = {IEEE}
    WiMAX promises an advanced framework to support Quality-of-Service (QoS) requirements of different types of applications and scheduling is a key part in its QoS provisioning. The scheduling algorithms used in this paper are based on our proposed Greedy-Latency scheduler, a modified form of Greedy algorithm which can guarantee delay requirements of real-time applications while optimising the system throughput. Our study of TCP performance in WiMAX shows that unlike UDP traffic, there are fluctuations in TCP throughput even for low traffic loads. It is seen that employing Automatic Repeat reQuest (ARQ) and setting the right TCP window size are crucial for a stable optimal TCP performance. WiMAX QoS mechanism can successfully maintain the inter-class priority between TCP traffic in Best Effort (BE) class and UDP in higher priority Real-Time Polling Service (rtPS) class. For intra-class scenarios, it is observed that TCP flows in general need a protection mechanism as the UDP traffic tend to seize the channel. The proposed Greedy-Scheduler can provide better intra-class protection for TCP flows due to its packet dropping policy.