Machine Learning for Programming (Seminar)

Quick Facts

Lecturer Prof. Dr. Michael Pradel
Course type Seminar
Kick-off meeting Monday, Oct 15, 9:50-11:30
TUCaN ID 20-00-1046-se
Piazza Course page

Content

This seminar is about recent research on improving software and increasing developer productivity by using machine learning, including deep learning. We will discuss research papers that present novel techniques for improving software reliability and security, such as program analyses to detect bugs, to complete partial code, or to de-obfuscate code, based on machine learning models of code.

After an initial kick-off meeting, where each student is assigned a research paper, students work individually on a term paper that summarizes the research paper. Each student will have a mentor that is available for individual meetings. In the final weeks of the semester, we'll have a half- or full-day meeting, where each student gives a talk on the research paper.

Schedule

When What Where
Oct 15, 9:50-11:30 Kick-off meeting (mandatory for all participants)
Slides: Organization and topics, DeepBugs
S4|14, Room 3.1.01
Oct 21 to Jan 13 Individual work and meetings with mentor ---
Nov 23, 2018 Term papers due for peer-review ---
Dec 7, 2018 Reviews due ---
Dec 28, 2018 Final term papers due ---
Jan 14 to 18 (exact time TBD) Presentations and deadline for term paper TBD

Topics

The following research papers are available for discussion. After the kick-off meeting, each student gets assigned one paper.

[1] Wojciech Zaremba and Ilya Sutskever. Learning to execute. CoRR, abs/1410.4615, 2014.
[2] Veselin Raychev, Martin T. Vechev, and Andreas Krause. Predicting program properties from "big code". In POPL, pages 111--124, 2015.
[3] Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. Deep learning code fragments for code clone detection. In ASE, pages 87--98, 2016.
[4] Song Wang, Taiyue Liu, and Lin Tan. Automatically learning semantic features for defect prediction. In ICSE, pages 297--308, 2016.
[5] Pavol Bielik, Veselin Raychev, and Martin T. Vechev. PHOG: probabilistic model for code. In ICML, pages 2933--2942, 2016.
[6] Sahil Bhatia and Rishabh Singh. Automated correction for syntax errors in programming assignments using recurrent neural networks. CoRR, abs/1603.06129, 2016.
[7] Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. A convolutional attention network for extreme summarization of source code. In ICML, pages 2091--2100, 2016.
[8] Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. Deep API learning. In FSE, pages 631--642, 2016.
[9] Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. Neural network-based graph embedding for cross-platform binary code similarity detection. In CCS, pages 363--376, 2017.
[10] Ke Wang, Rishabh Singh, and Zhendong Su. Dynamic neural program embedding for program repair. CoRR, abs/1711.07163, 2017.
[11] Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. Deepfix: Fixing common C language errors by deep learning. In AAAI, 2017.
[12] Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. Synthesizing benchmarks for predictive modeling. In CGO, pages 86--99, 2017.
[13] Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. CoRR, abs/1711.00740, 2017.
[14] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep generative models of graphs. arXiv:1803.03324, 2018.
[15] Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. Deep code search. In ICSE, 2018.
[16] Daniel DeFreez, Aditya V. Thakur, and Cindy Rubio-González. Path-based function embedding and its application to specification mining. CoRR, abs/1802.07779, 2018.
[17] M. Brockschmidt, M. Allamanis, A. L. Gaunt, and O. Polozov. Generative Code Modeling with Graphs. ArXiv e-prints, 2018.
[18] Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: Learning distributed representations of code. CoRR, arXiv:1803.09473, 2018.
[19] Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. A general path-based representation for predicting program properties. In PLDI, 2018.

Template for Term Paper

Please use this LaTeX template for writing your term paper. The page limit is six pages (strict).

If you're not yet familiar with LaTeX, you may want to try the Overleaf online editor (click on "SIG Proceedings Paper" to start with the required template).

Grading

Grading is based on the term paper, the talk, and active participation during the final meeting.