Combine Art and Sciences

High performance optimization and acceleration for randomWalk, deepwalk, node2vec (Python)

deepwalk、node2vec的高性能优化转自知乎： https://zhuanlan.zhihu.com/p/348778146 原作者：马东什么首先是graph的存储问题，看了很多实现node2vec或deepwalk的library，大都是以networkx或是类似的python class对象为graph的存储形式进行开发，首先我们需要知道目前单机版的deepwalk和node2vec的主要的性能瓶颈在random walk部分，因为底层的word2vec基本上都是直接调用的gensim的word2vec（gensim的word2vec基于c++开发，目前新版的gensim的word2vec的fast version使用了cython代替了原来的一些逻辑实现，个人感觉从word2vec层面再去做优化太复杂了，也很难比gensim做的更好，除此之外另一个思路就是用tf.keras或torch来实现word2vec，借用gpu的算力来试图击败gensim，然而看完这几篇，我心灰意冷了： Word2vec on GPU slower than CPU · Issue #13048 · tensorflow/tensorflow[github.comDoes Gensim library support GPU acceleration?stackoverflow.com![图标](https://pic1.zhimg.com/v2-2d47e939feed796bcf7483d306661c88_ipico.jpghttps://www.reddit.com/r/learnmachinelearning/comments/88eeua/why_is_gensim_word2vec_so_much_faster_than_keras/www.reddit.com ） graph通常以相互引用的内存指针的方式存储在内存中，这意味着每个节点在内存中都位于不同的位置，这种基于基于链表的思想，使得内存访问的速度成为了主要的瓶颈，因为每次从一个节点移动到另一个节点，都需要在内存中寻址和查询，一种更好的方式是把节点都打包到数组中，因为数据的内存地址一般是连续的，在内存访问上的开销要小很多；（这里还没有提到networkx本身用纯python编写导致的较低效的编译效率问题，具体可见：马东什么：为什么python这么慢？numba解决了什么问题？）一个较好的解决方案就是使用scipy.sparse.csr_matrix对graph进行存储和后续的访问等， csr_matrix可见下：马东什么：scipy sparse 中稀疏矩阵的常见存储方式zhuanlan.zhihu.com 可以看到，通过将graph转化为csr_matrix，利用其巧妙地存储方式，可以大大提高节点和节点之间的访问效率。下面就来看一个非常nice的小众的graph library，csr_graph和nodevectors，nodevectors基本上是对csr_graph做的封装，底层的逻辑直接使用了csr_graph的接口api，所以直接研究csr_graph即可。 VHRanger/CSRGraphgithub.com![图标](https://pic4.zhimg.com/v2-9b74ff2ecca657554aa17776a6083d37_ipico.jpg 整个csr graph的核心在于graph的重构： class csrgraph(): """ This top level python class either calls external JIT'ed methods or methods from the JIT'ed internal graph """ def __init__(self, data, nodenames=None, copy=True, threads=0): """ A class for larger graphs. NOTE: this class tends to "steal data" by default.

…

June 16, 2021

How to add conda env into jupyter notebook installed by pip

How to add conda env into jupyter notebook installed by pip ref: https://medium.com/@nrk25693/how-to-add-your-conda-environment-to-your-jupyter-notebook-in-just-4-steps-abeab8b8d084 If your linux server has no Anaconda installed but conda installed. You don’t want to install Anaconda for just installing jupyter notebook. You just installed jupyter notebook with pip3 with this link 1. check the env on which jupyter notebook is runing: which pip3 where pip3 #for windows cmd 2. add your Conda environment to your jupyter notebook: Step 1: Create a Conda environment. conda create --name firstEnv

…

June 15, 2021

The Power of WordNet and How to Use It in Python

The Power of WordNet and How to Use It in Python Posted on July 20, 2017 by Prachi Kumar in this link In this post, I am going to talk about the relations in WordNet (https://wordnet.princeton.edu) and how you can use these in a Python project. WordNet is a database of English words with different relations between the words. Take a look at the next four sentences. “She went home and had pasta.” “Then she cleaned the kitchen and sat on the sofa.

…

March 08, 2021

背单词app测评，2018年

TFT测评丨所有好用的背单词app一网打尽 TFT小组（微信号：cmu_doers）发表于 2018年06月27日阅读 6.1K 分享 30 平均阅读时长为 2分钟无论你是要考四六级、雅思、托福还是GRE，有经验的人总是会告诉你：“先买本单词书把单词背了再说。” 一个通俗的说法是：背了单词不一定能把英语考试考好，但是不背单词一定不能把英语考试考好（这一点尤其体现在GRE考试中）。有些人依赖直接捧着纸质书背单词，但是仍然有相当一部分同学喜欢用APP背单词。那么问题来了，APP Store的词汇类APP简直令人眼花缭乱，到底哪个最适合我啊？ TFT为你找到了10款优质单词APP，总有一款适合你！ Lingvist 推荐指数：★★★★ 优点：APP Store评分超高，界面非常漂亮。一次把 AI 运用在语言学习方面的优秀尝试，基于“间隔学习理论“和”智能演算法“，据说是“真正突破了‘遗忘曲线’“。它最大的优点在于用户几乎感受不到背单词的挫败感，其它词汇类APP都会用打卡来激励用户坚持下去，但是对于有些用户来说，久而久之打卡本身变成了一种痛苦。而在Lingvist中，AI根据你的个人情况给你挑的单词、安排的复习进度可以让你几乎感受不到背单词的痛苦。此外，你不仅可以用Lingvist来背英语，还可以背西班牙语、法语等。缺点：对于喜欢挑战自我的用户来说，AI给你挑的单词可能会太简单，不适合背GRE单词；免费版每天能背的单词有限，升级需要付费。扇贝单词推荐指数：★★★★ 优点：我从七年前就开始用扇贝，当时觉得这款APP相当惊艳。界面干净、没有广告。有英文解释、例句、词根、有他人笔记，结合扇贝新闻，我的词汇量几年内得到了大量提高。 **缺点：**单词的释义经常出现错误，有的单词只有偏僻的释义，却没有常用的释义。 Supermemo（电脑端）推荐指数：★★★★★ （TFT后台花一分钱购买《GRE还你命1000》课程可以送Supermemo安装包和单词包噢！）优点：论坛上人人被推荐次数最多的背单词软件，据说算法非常非常科学。单词包可以自制，也可以下载别人制作的。结合别人制作的《再要你命三千》的单词包，可以短时间高效把GRE核心单词背个十几遍。我当时每天中午在这个软件上花一小时时间，把三千和Magoosh核心1000词背了四十多遍，后来考GRE就几乎没有不认识的词了。缺点：推荐电脑端的原因是因为手机端收费，目前电脑端只有Windows用户可以使用。 Quizlet 推荐指数：★★★★☆ 优点：可以背自己制作的单词表（各种语言皆可）或者是学科名词，特别适合拿来制作完全属于自己的单词本（比如做题时遇到不会的词可以放进来），用了这个我就把扇贝单词弃了。里面所有的单词/名词和其对应的解释都是我自己加的。因为自己最了解自己，知道自己哪个义项不熟悉，所以写解释的时候针对自己的薄弱环节按照自己最好背的方式写，背起来效率很高，对单词的掌握也更加全面。我个人很喜欢加着词根来背，有时也用它来背数学公式和政治经济名词。缺点：自己添加单词有点麻烦，对于比较懒的同学来说只能背别人做好的单词本。人人词典和不背单词推荐指数：★★★★ 优点：这两个APP最大的优点就是它们的“海量有声例句”，每个例句来自各种电影和电视剧，比如你一点开“recite“这个单词的例句就能听见哈利波特说话，适合爱看电影和电视剧、想在背单词的同时练练听力和口语的用户。缺点：除了上面这个优点之外我感觉它们和其它背单词软件没有不同之处。百词斩推荐指数：★★★ 优点：每个单词的配图、例句中脑洞大开的发音都很有趣，让背单词的过程不再枯燥。可以通过英文选意、中文选词、听音辩义等7种方式复习单词，巩固记忆。文案和音效十分特别，充满江湖侠客气息，很好地契合了“斩”字。缺点：太过依赖图片联想记忆，一张图片很难把一个单词的各种用法解释荆楚，经常出现把图片记住了，单词却忘了的状况。沪江开心词场推荐指数：★★★ 优点：游戏化的闯关、PK背单词模式，根据背单词的不同表现可以得到不同的奖励，很有趣，不会感到枯燥。语种很全，包括英语、日语、法语、韩语、德语、西班牙语等等。缺点：不能集中查看单词的释义、例句。界面比较花哨，不太适合小编这种老年人。知米背单词推荐指数：★★★★☆ 优点：适合学习单词的词根、用法，还可以进行单词的造句练习，学习时间长但是效果好。每个单词都有搭配的短语，运用情景记忆来背单词。辅助功能强大：单词锁屏、屏幕取词、词汇助记、词根词缀、相关单词等等，并且可以自己修改单词释义。可以通过熟词巩固、听音复习、生词全拼等方式复习单词。还有像扇贝单词那样的打卡小组，督促自己每日背单词。缺点：单词只有美式发音（如果这也算的话）。微软必应词典推荐指数：★★★★ 优点：适用于查询并记忆阅读过程中遇到的生词。把查词与背单词相结合，单词释义、词组、例句丰富，联想词功能强大。缺点：收录的词书较少，学习和复习形式单一，复习进度需要自己规划。单词日记推荐指数：★★★★ 优点：适用于深入学习单词及其用法，学得慢而精。界面简洁。单词释义详细：中英文释义+双语例句+影视剧视频例句，看完视频后还有思考题，能够学习到一些实用的表达。有多种复习方式，可以自选难度。缺点：不能离线学习，收录的词书较少。

March 08, 2021

For quick reference: Configure VSCode for Python

For quick reference: Configure VSCode for Python in command line/Terminal, the environment manager such as conda is correctly installed and configured. Create proj. folder Run vs code in terminal/cmd, cd to the proj. folder and run code . vsCode UI: File » Open Folder Note: By starting VS Code in a folder, that folder becomes your “workspace”. VS Code stores settings that are specific to that workspace in .vscode/settings.json, which are separate from user settings that are stored globally. to add settings.

…

March 01, 2021

IoT platforms that deserves to be noticed

IoT platforms that deserves to be noticed When you are developing an IoT project, you often need to connect, communicate, and collect real-time data from sensors, cameras, and actuators to other devices. Then this bridge becomes especially important. It needs to be deployed easily and efficiently, and it needs to be low cost, secure, and accessible to the world. Here are a few platforms that may allow you to quickly deploy such a bridge to support your IoT projects. 1. Node-RED https://nodered.

…

December 25, 2020

【未显示首页】Access IOT (Arduino) from anywhere remotely

How to Access Arduino Video Stream Over Internet I’m working on an Arduino project that capture video and stream the video via web service. The ESP32-CAM moduel is ready-to-use and it provide a web service in LAN (e.g. access it via 192.168.1.11). I hope to access this video streaming web service from Internet. I’m familiar with the process of accessing local network service via LAN exposing services like FRP or Ngrok. I hope to find some frp-like open source code that is compatible with Arduino.

…

December 18, 2020

Draw graph(network) with nx_pydot in networkx

Draw graph(network) with nx_pydot in networkx with nx_pydot in nx.drawing, we can have more plotting style of network/ tree/ graph. Here is a simple sample: import networkx as nx g=nx.DiGraph() edgelist = [(0, 1), (1, 12), (2, 12), (3, 17), (4, 11), (5, 4), (6, 10), (7, 12), (8, 14), (9, 14), (10, 11), (11, 14), (12, 11), (13, 17), (14, -1), (15, -1), (16, 10), (17, 11), (18, -1)] g.add_edges_from(edgelist) p=nx.drawing.nx_pydot.to_pydot(g) p.write_png('pydot_Tree.png') the plot result:

July 23, 2020

How to Share Wired Internet Via Wi-Fi and Vice Versa on Linux

How to Share Wired Internet Via Wi-Fi and Vice Versa on Linux Aaron KiliApril 15, 2020 CategoriesNetworking Commands Leave a comment In this article, you will learn how to share a wired (Ethernet) internet connection via a wireless hotspot and also how to share a wireless internet connection via a wired connection on a Linux desktop. This article requires you to have at least two computers: a Linux desktop/laptop with a wireless card and an Ethernet port, then another computer (which may not necessarily be running Linux) with either a wireless card and/or an Ethernet port.

…

July 09, 2020

Deluge: Enables BT download on your Raspberry Pi

写在前面：Arm板卡上的 BT 下载器得可以通过web进行操作才比较方便。所以人家推荐了 Deluge 这款工具。 references: Installing Deluge on the Raspberry Pi 7 Best BitTorrent Clients for Linux in 2020 Installing Deluge on the Raspberry Pi by Emmet Jul 30, 2019 Updated May 03, 2020 Beginner In this Raspberry Pi Deluge project, we will show you how to set up the popular Deluge torrent client on the Pi. Throughout this tutorial, we will be showing you how to install and configure the Deluge torrent client alongside the Deluge web interface.

…

July 06, 2020