• Matton, Alexandre, Tom Sherborne, Dennis Aumiller, Elena Tommasone, Milad Alizadeh, Jingyi He, Raymond Ma, Maxime Voisin, Ellen Gilsenan-McMahon, and Matthias Gallé. “On Leakage of Code Generation Evaluation Datasets.” EMNLP Findings, 2024."
    @misc{matton2024leakage,
      title = {On Leakage of Code Generation Evaluation Datasets},
      author = {Matton, Alexandre and Sherborne, Tom and Aumiller, Dennis and Tommasone, Elena and Alizadeh, Milad and He, Jingyi and Ma, Raymond and Voisin, Maxime and Gilsenan-McMahon, Ellen and Gall\'{e}, Matthias},
      year = {2024},
      booktitle = {EMNLP Findings}
    }
    
    In this paper we consider contamination by code generation test sets, in particular in their use in modern large language models. We discuss three possible sources of such contamination and show findings supporting each of them: (i) direct data leakage, (ii) indirect data leakage through the use of synthetic data and (iii) overfitting to evaluation sets during model selection. Key to our findings is a new dataset of 161 prompts with their associated python solutions, dataset which is released at https://huggingface.co/datasets/CohereForAI/lbpp.
  • Dupont, Emilien, Hrushikesh Loya, Milad Alizadeh, Adam Golinski, Yee Whye Teh, and Arnaud Doucet. “COIN++: Neural Compression Across Modalities.” Transactions on Machine Learning Research, 2022."
    @article{dupont2022coin,
      title = {{COIN}++: Neural Compression Across Modalities},
      author = {Dupont, Emilien and Loya, Hrushikesh and Alizadeh, Milad and Golinski, Adam and Teh, Yee Whye and Doucet, Arnaud},
      journal = {Transactions on Machine Learning Research},
      year = {2022}
    }
    
    Neural compression algorithms are typically based on autoencoders that require specialized encoder and decoder architectures for different data modalities. In this paper, we propose COIN++, a neural compression framework that seamlessly handles a wide range of data modalities. Our approach is based on converting data to implicit neural representations, i.e. neural functions that map coordinates (such as pixel locations) to features (such as RGB values). Then, instead of storing the weights of the implicit neural representation directly, we store modulations applied to a meta-learned base network as a compressed code for the data. We further quantize and entropy code these modulations, leading to large compression gains while reducing encoding time by two orders of magnitude compared to baselines. We empirically demonstrate the feasibility of our method by compressing various data modalities, from images and audio to medical and climate data.
  • Alizadeh, Milad, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, and Yarin Gal. “Prospect Pruning: Finding Trainable Weights at Initialization Using Meta-Gradients.” In International Conference on Learning Representations, 2022."
    @inproceedings{alizadeh2022prospect,
      title = {Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients},
      author = {Alizadeh, Milad and Tailor, Shyam A. and Zintgraf, Luisa M and van Amersfoort, Joost and Farquhar, Sebastian and Lane, Nicholas Donald and Gal, Yarin},
      booktitle = {International Conference on Learning Representations},
      year = {2022}
    }
    
    Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network while consuming fewer computational resources for training and inference. However, current methods are insufficient to enable this optimization and lead to a large degradation in model performance. In this paper, we identify a fundamental limitation in the formulation of current methods, namely that their saliency criteria look at a single step at the start of training without taking into account the trainability of the network. While pruning iteratively and gradually has been shown to improve pruning performance, explicit consideration of the training stage that will immediately follow pruning has so far been absent from the computation of the saliency criterion. To overcome the short-sightedness of existing methods, we propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune. ProsPr combines an estimate of the higher-order effects of pruning on the loss and the optimization trajectory to identify the trainable sub-network. Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.
  • Dupont, Emilien, Adam Goliński, Milad Alizadeh, Yee Whye Teh, and Arnaud Doucet. “COIN: COmpression with Implicit Neural Representations.” Neural Compression Workshop at ICLR 2021, 2021." (Spotlight)
    @misc{dupont2021coin,
      title = {COIN: COmpression with Implicit Neural representations},
      author = {Dupont, Emilien and Goliński, Adam and Alizadeh, Milad and Teh, Yee Whye and Doucet, Arnaud},
      year = {2021},
      eprint = {2103.03123},
      archiveprefix = {arXiv},
      primaryclass = {eess.IV},
      booktitle = {Neural Compression Workshop at ICLR 2021}
    }
    
    We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.
  • Alizadeh, Milad, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, and Max Welling. “Gradient \(\ell_1\)Regularization for Quantization Robustness.” In International Conference on Learning Representations, 2020."
    @inproceedings{alizadeh2020gradient,
      title = {Gradient \(\ell_1\) Regularization for Quantization Robustness},
      author = {Alizadeh, Milad and Behboodi, Arash and van Baalen, Mart and Louizos, Christos and Blankevoort, Tijmen and Welling, Max},
      booktitle = {International Conference on Learning Representations},
      year = {2020}
    }
    
    We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for “on the fly” post-training quantization to various bit-widths. We show that by modeling quantization as a \(\ell_∞\)-bounded perturbation, the first-order term in the loss expansion can be regularized using the \(\ell_1\)-norm of gradients. We experimentally validate our method on different architectures on CIFAR-10 and ImageNet datasets and show that the regularization of a neural network using our method improves robustness against quantization noise.
  • Alizadeh, Milad, Javier Fernández-Marqués, Nicholas D Lane, and Yarin Gal. “An Empirical Study of Binary Neural Networks’ Optimisation.” International Conference on Learning Representations, 2019."
    @article{alizadeh2018empirical,
      title = {An Empirical study of Binary Neural Networks' Optimisation},
      author = {Alizadeh, Milad and Fern{\'a}ndez-Marqu{\'e}s, Javier and Lane, Nicholas D and Gal, Yarin},
      booktitle = {International Conference on Learning Representations},
      year = {2019}
    }
    
    Binary neural networks using the Straight-Through-Estimator (STE) have been shown to achieve state-of-the-art results, but their training process is not well-founded. This is due to the discrepancy between the evaluated function in the forward path, and the weight updates in the back-propagation, updates which do not correspond to gradients of the forward path. Efficient convergence and accuracy of binary models often rely on careful fine-tuning and various ad-hoc techniques. In this work, we empirically identify and study the effectiveness of the various ad-hoc techniques commonly used in the literature, providing best-practices for efficient training of binary models. We show that adapting learning rates using second moment methods is crucial for the successful use of the STE, and that other optimisers can easily get stuck in local minima. We also find that many of the commonly employed tricks are only effective towards the end of the training, with these methods making early stages of the training considerably slower. Our analysis disambiguates necessary from unnecessary ad-hoc techniques for training of binary neural networks, paving the way for future development of solid theoretical foundations for these. Our newly-found insights further lead to new procedures which make training of existing binary neural networks notably faster.
  • Tseng, Vincent WS, Sourav Bhattachara, Javier Fernández-Marqués, Milad Alizadeh, Catherine Tong, and Nicholas D Lane. “Deterministic Binary Filters for Convolutional Neural Networks.” International Joint Conferences on Artificial Intelligence Organization, 2018."
    @inproceedings{tseng2018deterministic,
      title = {Deterministic binary filters for convolutional neural networks},
      author = {Tseng, Vincent WS and Bhattachara, Sourav and Fern{\'a}ndez-Marqu{\'e}s, Javier and Alizadeh, Milad and Tong, Catherine and Lane, Nicholas D},
      year = {2018},
      organization = {International Joint Conferences on Artificial Intelligence Organization}
    }
    
    We propose Deterministic Binary Filters, an approach to Convolutional Neural Networks that learns weighting coefficients of predefined orthogonal binary basis instead of the conventional approach of learning directly the convolutional filters. This approach results in model architectures with significantly fewer parameters (4x to 16x) and smaller model sizes (32x due to the use of binary rather than floating point precision). We show our deterministic filter design can be integrated into well-known network architectures (such as ResNet and SqueezeNet) with as little as 2% loss of accuracy (under datasets like CIFAR-10). Under ImageNet, they result in 3x model size reduction compared to sub-megabyte binary networks while reaching comparable accuracy levels.
  • Dziyauddin, Rudzidatul Akmam, Dritan Kaleshi, Angela Doufexi, and Milad Alizadeh. “Performance Evaluation of Quality of Service for Joint Packet Dropping and Scheduling.” Wireless Personal Communications 83, no. 2 (2015): 1549–66."
    @article{dziyauddin2015performance,
      title = {Performance evaluation of quality of service for joint packet dropping and scheduling},
      author = {Dziyauddin, Rudzidatul Akmam and Kaleshi, Dritan and Doufexi, Angela and Alizadeh, Milad},
      journal = {Wireless Personal Communications},
      volume = {83},
      number = {2},
      pages = {1549--1566},
      year = {2015},
      publisher = {Springer}
    }
    
    Quality of Service is particularly necessary to serve delay-sensitive applications in heavy-loaded wireless networks. In this paper we evaluate a strategy of combining packet dropping and scheduling policies at Medium Access Control layer in guaranteeing maximum packet latency for real-time applications. The purpose of this work is to evaluate how significance the mentioned combination schemes can meet the required latency and also the achievable system throughput. For the case study, a real time Polling Service class in the Worldwide Interoperability for Microwave Access System for downlink transmission is assumed. The main analysis is undertaken for User Datagram Protocol (UDP) traffic in stationary and mobile user scenarios under heavy load conditions, and the impact of mixed Transmission Control Protocol and UDP traffic is also investigated. Results show that the introduction of a packet dropping policy ensures that the latency is kept well within the required maximum latency requirement, regardless of the types of scheduler used. However, the packet drop percentage (or packet loss) depends strongly on the types of schedulers. All schedulers show similar goodput performance for low load conditions, and the results can only be distinguished for the cases of heavy load/overloaded conditions.
  • Alizadeh, Milad, Rudzidatul Akmam Dziyauddin, Dritan Kaleshi, and Angela Doufexi. “A Comparative Study of Mixed Traffic Scenarios for Different Scheduling Algorithms in WiMAX.” In 2012 IEEE 75th Vehicular Technology Conference (VTC Spring), 1–6. IEEE, 2012."
    @inproceedings{alizadeh2012comparative,
      title = {A comparative study of mixed traffic scenarios for different scheduling algorithms in WiMAX},
      author = {Alizadeh, Milad and Dziyauddin, Rudzidatul Akmam and Kaleshi, Dritan and Doufexi, Angela},
      booktitle = {2012 IEEE 75th Vehicular Technology Conference (VTC Spring)},
      pages = {1--6},
      year = {2012},
      organization = {IEEE}
    }
    
    WiMAX promises an advanced framework to support Quality-of-Service (QoS) requirements of different types of applications and scheduling is a key part in its QoS provisioning. The scheduling algorithms used in this paper are based on our proposed Greedy-Latency scheduler, a modified form of Greedy algorithm which can guarantee delay requirements of real-time applications while optimising the system throughput. Our study of TCP performance in WiMAX shows that unlike UDP traffic, there are fluctuations in TCP throughput even for low traffic loads. It is seen that employing Automatic Repeat reQuest (ARQ) and setting the right TCP window size are crucial for a stable optimal TCP performance. WiMAX QoS mechanism can successfully maintain the inter-class priority between TCP traffic in Best Effort (BE) class and UDP in higher priority Real-Time Polling Service (rtPS) class. For intra-class scenarios, it is observed that TCP flows in general need a protection mechanism as the UDP traffic tend to seize the channel. The proposed Greedy-Scheduler can provide better intra-class protection for TCP flows due to its packet dropping policy.