Skip to main content

What is your ideal development environment for deep learning/NLP?

Created
Active
Viewed 186 times
6 replies
10

I'm curious on what are the ideal development setups for NLP developers who train deep learning models? I know in my journey to work more with deep learning, I use Jupyter Notebooks a lot, but I'm wondering if anyone is also using regular Python scripts and developing in an IDE?

What is your development process? Do you start experimenting first with Jupyter and then turn your refined code into actual Python app with different structured modules and such, or do you just take your notebook and run it as an application directly using something like jupyter execute?

  • 4.2k
  • 14
  • 45
  • 54

6 replies

Sorted by:
78224460
3

I'm using a Chromebook, so I settled on running and connecting to Cog remotely to experiment with training and running models.

I have a monorepo set up for different models and tasks, and some code that I share around. I'll use notebooks to explore and prepare the training data, and re-use these during fine-tuning. I'll also initially write notebooks for interacting with deployed models, to try things out. Once I'm happy and I want to try things out on another model, I'll usually extract those bits into the shared code.

Replicate offers model hosting, with training available for a select few models, which was quite handy to get into fine-tuning and running predictions, since my machine can't really handle such workloads.

78235220
3

I definitely do a lot of pre-processing, analysing, and designing in notebooks (which I run locally through VSCode, which can run .ipynb files) on a smaller subset of the data. I do this as a sanity check that what I am doing makes sense and works.

Then, for larger data I have access to a Linux VM which can handle more throughput so I package or pipeline my processes and run them there, and then once complete I do a few more sanity checks now that I have processed more data, again in a Notebook back locally, and once I am happy then I use the scripts to run the model instead of the notebook. I'll come back to a Notebook when I need to analyse the outputs again.

For what it's worth, there is not necessarily an "ideal" development setup, it will differ from project to project, company to company, and developer preference to developer preference.

My locals are all MacOS, and the VMs I use are all Linux (usually Fedora)

78286743
1

Jypyter Lab or kaggle notebook

78399506
1

PyCharm is my favorite IDE for Deep Learning and Python in General. Sometimes I use VSCode as well. I also use both conda and pip as package managers. And life is beautiful. :)

78742696
0
  • 81.9k
  • 42
  • 209
  • 276

I've been using C# in VS 2022. It lacks the convenience that Jupyter has with regards to retaining memory states (for example keeping trained sets is nice), but it is in my opinion worth it to use C# (a personal preference, python is great too).

Pytorch is very well managed memory wise, and the libraries for C# tend to have issues with memory management as a result of pointer marshalling... so that's not the best.

Needing to define the training and generation, and seeing what it looks like all at once, does provide a strong level of insight into the efficiency of the model states and layers.