Referensi - AI Simulator

Fondasi Machine Learning & Statistika

[F1] T. M. Mitchell, Machine Learning. New York, NY, USA: McGraw-Hill, 1997. ML Definition
Definisi ML: halaman 2

[F2] C. M. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006. ML & Pattern Recognition
https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/

[F3] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2009. Statistical Learning
https://hastie.su.domains/ElemStatLearn/ (free online)

[F4] D. R. Cox, "The regression analysis of binary sequences," Journal of the Royal Statistical Society: Series B, vol. 20, no. 2, pp. 215-242, 1958. Logistic Regression
https://doi.org/10.1111/j.2517-6161.1958.tb00292.x

[F5] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016. Gradient Descent
https://arxiv.org/abs/1609.04747

[F6] L. Bottou, "Large-scale machine learning with stochastic gradient descent," in Proc. COMPSTAT, 2010, pp. 177-186. SGD
https://doi.org/10.1007/978-3-7908-2604-3_16

[F7] S. García, J. Luengo, and F. Herrera, Data Preprocessing in Data Mining. Cham, Switzerland: Springer, 2015. Preprocessing
https://doi.org/10.1007/978-3-319-10247-4

[F8] A. Zheng and A. Casari, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. Sebastopol, CA, USA: O'Reilly Media, 2018. Feature Engineering

Neural Network & Deep Learning

[1] F. Rosenblatt, "The perceptron: A probabilistic model for information storage and organization in the brain," Psychological Review, vol. 65, no. 6, pp. 386-408, 1958. Perceptron
https://doi.org/10.1037/h0042519

[2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, no. 6088, pp. 533-536, 1986. Backpropagation
https://doi.org/10.1038/323533a0

[3] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015. Deep Learning Review
https://doi.org/10.1038/nature14539

[4] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016. Textbook
https://www.deeplearningbook.org/

Recurrent Neural Networks

[5] J. L. Elman, "Finding structure in time," Cognitive Science, vol. 14, no. 2, pp. 179-211, 1990. Elman RNN
https://doi.org/10.1207/s15516709cog1402_1

[6] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997. LSTM
https://doi.org/10.1162/neco.1997.9.8.1735

[7] K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," in Proc. 2014 Conf. Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724-1734. GRU / Encoder-Decoder
https://doi.org/10.3115/v1/D14-1179

Autoencoders & Embeddings

[8] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006. Autoencoder
https://doi.org/10.1126/science.1127647

[9] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in Proc. 1st Int. Conf. Learning Representations (ICLR) Workshop, 2013. Word2Vec
https://arxiv.org/abs/1301.3781

[10] J. Pennington, R. Socher, and C. D. Manning, "GloVe: Global vectors for word representation," in Proc. 2014 Conf. Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532-1543. GloVe
https://doi.org/10.3115/v1/D14-1162

Transformer Architecture

[11] A. Vaswani et al., "Attention is all you need," in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 5998-6008. Transformer (seminal)
https://arxiv.org/abs/1706.03762

[12] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," in Proc. 3rd Int. Conf. Learning Representations (ICLR), 2015. Attention Mechanism
https://arxiv.org/abs/1409.0473

[13] J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," arXiv preprint arXiv:1607.06450, 2016. Layer Norm
https://arxiv.org/abs/1607.06450

[14] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. Residual Connection
https://doi.org/10.1109/CVPR.2016.90

BERT & Encoder Models

[15] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp. 4171-4186. BERT (seminal)
https://doi.org/10.18653/v1/N19-1423

[16] Y. Liu et al., "RoBERTa: A robustly optimized BERT pretraining approach," arXiv preprint arXiv:1907.11692, 2019. RoBERTa
https://arxiv.org/abs/1907.11692

GPT & Decoder Models

[17] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," OpenAI, 2018. GPT-1
https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

[18] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, vol. 1, no. 8, p. 9, 2019. GPT-2
https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

[19] T. Brown et al., "Language models are few-shot learners," in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 1877-1901. GPT-3
https://arxiv.org/abs/2005.14165

LLM Training, Alignment & Efficiency

[20] L. Ouyang et al., "Training language models to follow instructions with human feedback," in Advances in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 27730-27744. InstructGPT / RLHF
https://arxiv.org/abs/2203.02155

[21] J. Kaplan et al., "Scaling laws for neural language models," arXiv preprint arXiv:2001.08361, 2020. Scaling Laws
https://arxiv.org/abs/2001.08361

[22] R. Sennrich, B. Haddow, and A. Birch, "Neural machine translation of rare words with subword units," in Proc. 54th Annual Meeting of the Association for Computational Linguistics (ACL), 2016, pp. 1715-1725. BPE Tokenization
https://doi.org/10.18653/v1/P16-1162

[23] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, 2015. Knowledge Distillation
https://arxiv.org/abs/1503.02531

[24] H. Touvron et al., "LLaMA: Open and efficient foundation language models," arXiv preprint arXiv:2302.13971, 2023. LLaMA
https://arxiv.org/abs/2302.13971

[25] E. J. Hu et al., "LoRA: Low-rank adaptation of large language models," in Proc. 10th Int. Conf. Learning Representations (ICLR), 2022. LoRA
https://arxiv.org/abs/2106.09685

[26] A. Q. Jiang et al., "Mistral 7B," arXiv preprint arXiv:2310.06825, 2023. Mistral
https://arxiv.org/abs/2310.06825

Buku Rujukan

[B1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016. Textbook
https://www.deeplearningbook.org/ (free online)

[B2] D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed. draft. 2024. NLP Textbook
https://web.stanford.edu/~jurafsky/slp3/ (free online draft)

[B3] C. M. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006. ML Textbook
https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/

[B4] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning. Cambridge, UK: Cambridge University Press, 2023. Interactive Textbook
https://d2l.ai/ (free online with code)

Daftar Referensi